Monday, July 17, 2006

Column Dependencies and Consequences

There's one task that every database administrator and every database designer hates. Database schema modifications. Executing appropriate DDL scripts usually does the trick, but is the ALTER script for the modified object enough? What about dependencies? In a relational database pretty much every object is in some way related to one or more other objects - either depending on other objects or having other objects depend on it.

The issue of SQL object dependencies is a rather large one, so in this post let's focus on one particular area of object dependencies - the column - i.e. modifying a column with dependent objects.

There are several SQL objects that may be dependent on a column:

  • key constraints (i.e. primary key and unique constraints);
  • indexes;
  • foreign key constraints
  • defaults; and
  • check constraints.

Whenever a column with dependent objects needs to be altered the above need to be handled appropriately. The column may of course be also referenced by procedures, functions and other dynamic objects. These dependencies are, however, not covered by this post - in other words: they demand a higher degree of (manual) labor.

Anyway, the usual practice when altering columns is to drop dependent objects before the modification, and recreating them afterwards. Sounds fairly simple, especially if all scripts are under source control, and if all objects have user-defined names, and if the schema is properly documented. Quite a few if's.

Sometimes the reality is oblivious to these conditions.

Sadly, some companies either tend not to keep scripts under source control or not all of the scripts have been placed under source control. The use of simplified DDL scripts is also very popular - not to mention the use of various visual editors. This results in system-generated object names, forcing the DBA to deal with meaningless designators rather than meaningful object names.

create table dbo.DummyTable
 (
 DummyTableId
  int
  identity (1, 1)
  not null
  primary key
 ,DummyCreated
  datetime
  not null
  default(getdate())
 ,DummyValue
  decimal(8, 4)
  not null
  check(DummyValue between -9999.9999 and 9999.9999)
 )
go

If you execute the above script you'll have generated three dependent objects with system-defined names.

They need to be and they most probably will be documented in one way or another, however, the names in the documentation might not match the actual object names, if the system is allowed to generate names. Especially after the objects have been subsequently altered or dropped and recreated, and let's not forget their migration into production. The imminent result is: more work hours for whoever is 'blessed' with having to implement database schema modifications.

Most companies with such less than appropriate database administration habits eventually end up hiring an expert to clear the mess. God bless these companies. ;)

This procedure provides basic assistance to the poor soul left with the task of implementing modifications of columns. It uses this function, the purpose of which is to return a string of columns that a particular constraint is composed of.

To see the procedure in action create a test database and add the objects using this script.

The definition of the procedure is a bit too long to post here, but has been designed by utilising the following four queries:

  • The list of default constraints:
    select schema_name(sys.objects.schema_id) as SchemaName
     ,sys.objects.[name] as TableName
     ,sys.columns.[name] as ColumnName
     ,sys.default_constraints.[name] as DefaultConstraintName
     from sys.objects
      inner join sys.columns
        on (sys.columns.[object_id]
          = sys.objects.[object_id])
      inner join sys.default_constraints
        on (sys.default_constraints.parent_object_id
          = sys.objects.[object_id])
         and (sys.default_constraints.parent_column_id
          = sys.columns.column_id)
     where (sys.objects.[type] = 'U')
      and (sys.objects.[name] not in ('sysdiagrams'))
  • The list of check constraints:
    select schema_name(sys.objects.schema_id) as SchemaName
     ,sys.objects.[name] as TableName
     ,sys.columns.[name] as ColumnName
     ,sys.check_constraints.[name] as CheckConstraintName
     from sys.objects
      inner join sys.columns
        on (sys.columns.[object_id] = sys.objects.[object_id])
      inner join sys.check_constraints
        on (sys.check_constraints.parent_object_id
          = sys.objects.[object_id])
         and (sys.check_constraints.parent_column_id
          = sys.columns.column_id)
     where (sys.objects.[type] = 'U')
      and (sys.objects.[name] not in ('sysdiagrams'))
  • The list of foreign relationships:
    select schema_name(sys.objects.schema_id) as SchemaName
     ,sys.objects.[name] as TableName
     ,sys.columns.[name] as ColumnName
     ,ForeignKeys.ForeignKeyName as ForeignKeyName
     ,ForeignKeys.ForeignTableSchema as ForeignTableSchema
     ,ForeignKeys.ForeignTableName as ForeignTableName
     ,Composite
      = case ForeignKeys.IsComposite
       when 0
        then 'NO'
       when 1
        then 'YES'
       else 'unknown'
       end
     from sys.objects
      inner join sys.columns
        on (sys.columns.[object_id] = sys.objects.[object_id])
      inner join (
        select sys.foreign_keys.[name] as ForeignKeyName
         ,schema_name(sys.objects.schema_id) as ForeignTableSchema
         ,sys.objects.[name] as ForeignTableName
         ,IsComposite
          = case (
           select count(*)
            from sys.foreign_keys CtrlFk
             inner join sys.foreign_key_columns CtrlFkCol
               on (CtrlFkCol.constraint_object_id
                 = CtrlFk.[object_id])
            where (CtrlFk.[object_id]
               = sys.foreign_keys.[object_id])
           )
           when 1
            then 0
           else 1
           end
         ,sys.foreign_keys.referenced_object_id as referenced_object_id
         ,sys.foreign_key_columns.referenced_column_id as referenced_column_id
         from sys.foreign_keys
          inner join sys.foreign_key_columns
            on (sys.foreign_key_columns.constraint_object_id
              = sys.foreign_keys.[object_id])
          inner join sys.objects
            on (sys.objects.[object_id]
              = sys.foreign_keys.parent_object_id)
            inner join sys.columns
              on (sys.columns.[object_id]
                = sys.objects.[object_id])
               and (sys.columns.column_id
                = sys.foreign_key_columns.parent_column_id)
        ) ForeignKeys
        on (ForeignKeys.referenced_object_id = sys.objects.[object_id])
         and (ForeignKeys.referenced_column_id = sys.columns.column_id)
     where (sys.objects.[type] = 'U')
      and (sys.objects.[name] not in ('sysdiagrams'))
  • The list of indexes:
    select schema_name(sys.objects.schema_id) as SchemaName
     ,sys.objects.[name] as TableName
     ,sys.columns.[name] as ColumnName
     ,Indexes.IndexName as IndexName
     ,Composite
      = case Indexes.IsComposite
       when 0
        then 'NO'
       when 1
        then 'YES'
       else 'unknown'
       end
     from sys.objects
      inner join sys.columns
        on (sys.columns.[object_id] = sys.objects.[object_id])
      inner join (
        select sys.indexes.[name] as IndexName
         ,IsComposite
          = case (
           select count(*)
            from sys.indexes CtrlNdx
             inner join sys.index_columns CtrlNdxCol
               on (CtrlNdxCol.[object_id]
                 = CtrlNdx.[object_id])
                and (CtrlNdxCol.index_id
                 = CtrlNdx.index_id)
            where (CtrlNdx.index_id
               = sys.indexes.index_id)
             and (CtrlNdx.[object_id]
              = sys.objects.[object_id])
           )
           when 1
            then 0
           else 1
           end
         ,sys.indexes.[object_id] as [object_id]
         ,sys.index_columns.column_id as column_id
         from sys.indexes
          inner join sys.index_columns
            on (sys.index_columns.[object_id]
              = sys.indexes.[object_id])
             and (sys.index_columns.index_id
              = sys.indexes.index_id)
          inner join sys.objects
            on (sys.objects.[object_id]
              = sys.indexes.[object_id])
            inner join sys.columns
              on (sys.columns.[object_id]
                = sys.objects.[object_id])
               and (sys.columns.column_id
                = sys.index_columns.column_id)
        ) Indexes
        on (Indexes.[object_id] = sys.objects.[object_id])
         and (Indexes.column_id = sys.columns.column_id)
     where (sys.objects.[type] = 'U')
      and (sys.objects.[name] not in ('sysdiagrams'))

The above queries are based directly on the SQL Server 2005 catalog views and would require some modifications to be compatible with SQL Server 2000. If I have the time, I'll add a 2000 version of the procedure (and the function).

The following operations are supported by the Action parameter (acts as a bitmap):

Value Meaning
0 An overview of dependencies
1 DROP scripts
2 CREATE scripts
3 All scripts

The procedure also replaces existing object names with more descriptive names - following the »object-type/table-name/column-name(s)« formula. Feel free to adapt to your standards.

A few examples of use:

  • To see an overview of column-dependencies in the current database:
    exec dbo.ColumnDependencies
  • To build the statemets used when dropping the dependent objects of the table named 'TableWithNaughtyColumn' (use the Results to text option in SQL Server Management Studio):
    exec dbo.ColumnDependencies
      @TableName = 'TableWithNaughtyColumn'
      ,@Action = 1
  • To build the DROP and CREATE statements for the dependendent objects of a particular column (use the Results to text option in SQL Server Management Studio):
    exec dbo.ColumnDependencies
      @TableName = 'TableWithNaughtyColumn'
      ,@ColumnName = 'NaughtyColumn'
      ,@Action = 3
    The resulting script includes a designation of where the column DDL should be placed to complete the modification script.

Quite simple really.

Oh, and a word of warning: read the resulting script before attempting to execute it. And never execute the script on a live production system, unless you've thoroughly tested the effects in a controlled environment!

The purpose of this procedure is to assist you in preparing the full modification script, not to replace you!


ML

P.S. The fnColumnList function used in this post is based on the aggregate concatenation function presented here.

Wednesday, July 12, 2006

Aggregate concatenation in SQL 2005

In a previous post I've discussed aggregate concatenation and the issues related to that particular solution.

SQL Server 2005 provides new possibilities of implementing safer and more efficient methods. The most obvious one would seem to be in designing a user-defined CLR function, since .Net 2.0 seems like a pretty efficient place to manipulate strings (in respect to T-SQL). There is, however, another way.

As Tony Rogerson writes in his post (and in a few posts to the newsgroups) the FOR XML clause and the resulting XML can be used in SQL Server 2005 to provide a safer T-SQL alternative to the function I've posted some time ago.

Here is an example based on the AdventureWorks database:

create function dbo.fnOrderDetail_Products_asArray
 (
 @PurchaseOrderID int
 ,@separator  varchar(8) = null
 )
returns varchar(max)
as
begin
 declare @resultString varchar(max)

 if (@separator is null)
  begin
   set @separator = ', '
  end

 select @resultString
   = (
   select [data()]
     = '<>'
     + Production.Product.[Name]
    from Purchasing.PurchaseOrderDetail
     inner join Production.Product
       on Production.Product.ProductID
         = Purchasing.PurchaseOrderDetail.ProductID
    where (Purchasing.PurchaseOrderDetail.PurchaseOrderID
       = @PurchaseOrderID)
    order by Production.Product.[Name]
    for xml path(''), type
   ).value('.', 'varchar(max)')

 set @resultString = replace(@resultString, ' <>', @separator)
 set @resultString = replace(@resultString, '<>', @separator)
 set @resultString
   = substring(@resultString, charindex(@separator,
   @resultString) + len(@separator), len(@resultString))
 set @resultString = ltrim(rtrim(@resultString))

 return @resultString
end
go

The function returns the names of the product(s) in a given Purchase Order as a delimited string.

An example of use:

select Purchasing.PurchaseOrderDetail.PurchaseOrderId as PurchaseOrderId
 ,Purchasing.PurchaseOrderDetail.DueDate as DueDate
 ,dbo.fnOrderDetail_Products_asArray
  (Purchasing.PurchaseOrderDetail.PurchaseOrderId, ' - ') as Products
 from Purchasing.PurchaseOrderDetail

ML

Monday, July 10, 2006

Unidirectional synchronisation

In the newsgroups a frequent question is on how to synchronise data between two locations without the use of replication.

Once again, the set-based aproach proves to be the most transparent, the most efficient and the most easy to uderstand.

In this example we'll look at unidirectional synchronisation, which as a matter of facts isn't as much about synchronisation as much as it is about the propagation of data changes from the source to the destination.

In brief, these are the rules involved:

  • Insert the rows from the Source table that do not exist in the Destination table;
  • Update the rows in the Destination table that are different from the rows in the Source table; and
  • Delete rows from the Destination table that do not exist in the Source table.

Elementary.

First of all, here are the tables. We'll be propagating data from the Remote table to the Local table. A primary key, or at least a candidate key, is - of course - a must. In this example the Product name is the primary key.

-- Remote Table
create table dbo.RemoteProductStatus
 (
 ProductStatusId int  identity (1, 1) not null
 ,DateChanged datetime   not null
 ,ProductName varchar(512)   not null
 ,ProductStatus varchar(10)   not null
 )
go

alter table dbo.RemoteProductStatus
 add constraint pk_RemoteProductStatus_ProductName
   primary key nonclustered
   (
   ProductName
   )
go

create unique clustered index ux_RemoteProductStatus_ProductStatusId
 on dbo.RemoteProductStatus
  (
  ProductStatusId
  )
go

alter table dbo.RemoteProductStatus
 add constraint chk_RemoteProductStatus_ProductStatus
   check (ProductStatus in ('out', 'returned', 'destroyed'))
go

-- Local Table
create table dbo.LocalProductStatus
 (
 ProductStatusId int  identity (1, 1) not null
 ,DateChanged datetime   not null
 ,ProductName varchar(512)   not null
 ,ProductStatus varchar(10)   not null
 )
go

alter table dbo.LocalProductStatus
 add constraint pk_LocalProductStatus_ProductName
   primary key nonclustered
   (
   ProductName
   )
go

create unique clustered index ux_LocalProductStatus_ProductStatusId
 on dbo.LocalProductStatus
  (
  ProductStatusId
  )
go

alter table dbo.LocalProductStatus
 add constraint chk_LocalProductStatus_ProductStatus
   check (ProductStatus in ('out', 'returned', 'destroyed'))
go

Now here comes the data:

insert dbo.LocalProductStatus
 (
 DateChanged
 ,ProductName
 ,ProductStatus
 )
 select dateadd(m, -6, getdate()) as DateChanged
  ,'Winter coat' as ProductName
  ,'out' as ProductStatus
 union all
 select dateadd(m, -6, getdate())
  ,'Skis'
  ,'out'
 union all
 select dateadd(m, -6, getdate())
  ,'Helmet'
  ,'out'
 union all
 select dateadd(m, -6, getdate())
  ,'Bathing suit'
  ,'destroyed'
go

insert dbo.RemoteProductStatus
 (
 DateChanged
 ,ProductName
 ,ProductStatus
 )
 select getdate() as DateChanged
  ,'Sun glasses' as ProductName
  ,'out' as ProductStatus
 union all
 select getdate()
  ,'Beach towel'
  ,'out'
 union all
 select getdate()
  ,'Winter coat'
  ,'returned'
 union all
 select getdate()
  ,'Skis'
  ,'returned'
 union all
 select getdate()
  ,'Helmet'
  ,'destroyed'
go

The three tasks described above are extremely simple with the set-based logic:

  1. Insert - rows from the Remote table that do not exist in the Local table:
    insert dbo.LocalProductStatus
     (
     DateChanged
     ,ProductName
     ,ProductStatus
     )
     select dbo.RemoteProductStatus.DateChanged as DateChanged
      ,dbo.RemoteProductStatus.ProductName as ProductName
      ,dbo.RemoteProductStatus.ProductStatus as ProductStatus
      from dbo.RemoteProductStatus
      where (not exists (
        select *
         from dbo.LocalProductStatus
         where (dbo.LocalProductStatus.ProductName
            = dbo.RemoteProductStatus.ProductName)
        ))
  2. Update - change the rows in the Local table to correspond to the rows in the Remote table:
    update dbo.LocalProductStatus
     set dbo.LocalProductStatus.DateChanged
       = dbo.RemoteProductStatus.DateChanged
      ,dbo.LocalProductStatus.ProductStatus
       = dbo.RemoteProductStatus.ProductStatus
     from dbo.LocalProductStatus
      inner join dbo.RemoteProductStatus
        on dbo.RemoteProductStatus.ProductName
          = dbo.LocalProductStatus.ProductName
     where (dbo.LocalProductStatus.DateChanged
        != dbo.RemoteProductStatus.DateChanged)
      or (dbo.LocalProductStatus.ProductStatus
        != dbo.RemoteProductStatus.ProductStatus)
  3. Delete - simply remove rows from the Local table that no longer exist in the Remote table:
    delete dbo.LocalProductStatus
     where (not exists (
       select *
        from dbo.RemoteProductStatus
        where (dbo.RemoteProductStatus.ProductName
           = dbo.LocalProductStatus.ProductName)
       ))

No fuss.


ML