Thursday, December 01, 2005

IFilter error - byDesign or byBug?

Full-text indexing of binary columns in MS SQL 2005 is – to say the least – somewhat overly vulnerable.

The root of possible problems is the TYPE COLUMN, specifying the type of the binary value (file) that one tries to build a full-text index (FTI) for. The TYPE column is used by the full-text indexing service (FTS) to select an appropriate IFilter to parse the binary value and create a proper FTI.

E.g.: a value of ".doc" in the TYPE column would invoke the IFilter for Microsoft Word documents, and a value of ".pdf" would instruct the FTS to use the Adobe PDF IFilter (if available). No problems here.

Oh, really?

Actually, all is well as long as the value of the TYPE column corresponds to the actual binary value – if the TYPE column says "doc", there better be a MS Word document stored in the binary column, or else the entire row is not processed.

That's right – not just the 'offending' column as is the practice in cases where an appropriate IFilter is not available. The entire row is not processed. Imagine a document management system, where one document is stored per row – this document will never be found through the use of full-text search, not even by title, if the title is stored in another column in the same table, and the latter is also full-text indexed.

The error is, however, reported in the SQL Server Error Log, and is represented with the following text:

Errors were encountered during full-text index population
for table or indexed view '%table/view name%', database
'%database name%' (table or indexed view ID '%object id%',
database ID '%database id%').
Please see full-text crawl logs for details.

Details are available in the full-text crawl log, located (by default) in the following folder:

%ProgramFiles%\Microsoft SQL Server\MSSQL.1\MSSQL\LOG

The name of the crawl log is defined by the following naming scheme.

Any volunteers to design a CLR function to help programatically track possible mishaps?

So far I've experienced the following error (copied from the crawl log):

Error '0x80030050' occurred during full-text index population
for table or indexed view '%table/view name%' (table or
indexed view ID '%object id%', database ID '%database id%'),
full-text key value %key value (binary)%.
Failed to index the row.

...paired up with this one:

The component 'offfilt.dll' reported error while indexing.
Component path 'C:\WINDOWS\system32\offfilt.dll'.

In fact the same goes for the rest of the supported office documents – if any of the IFilters in offfilt.dll (version: 2003.5.28.0) fail, the row is not full-text indexed.

In conclusion: use the TYPE COLUMN carefully – i.e. do not let the user choose the document type, read it from the document header. Incidentaly – why does the SQL Server not detect the actual content type this way? Is this by design or is it a bug?

Any volunteers to design a CLR function to help programatically detect *proper* content type?


ML

Wednesday, September 14, 2005

Index properties with a kick

The following function is an expansion to the Microsoft SQL Server's built-in function INDEXPROPERTY.

It returns a table containing table names, index names, property names and property values, so it can be used for analysis and documentation purposes.

create function dbo.fnList_IndexProperty
 (
 @tableName sysname  = null
 ,@indexName nvarchar(128) = null
 ,@propertyName varchar(128) = null
 )
returns @Result table
  (
  ResultID bigint identity (1, 1) primary key
  ,TableName sysname
  ,IndexName nvarchar(128)
  ,PropertyName varchar(128)
  ,PropertyValue int
  )
as
begin
 declare @objectList table
    (
    ListID  bigint identity (1, 1) primary key
    ,TableID int
    ,TableName sysname
    ,IndexName nvarchar(128)
    )
 declare @startID bigint
 declare @endID  bigint
 declare @curID  bigint
 declare @tableId int
 declare @curTableName sysname
 declare @curIndexName nvarchar(128)

 set @tableId = object_id(@tableName)

 if (@tableName is null)
  and (@indexName is null)
  begin
   insert @objectList
    (
    TableID
    ,TableName
    ,IndexName
    )
    select dbo.sysobjects.[id] as TableID
     ,INFORMATION_SCHEMA.TABLES.TABLE_NAME as TableName
     ,dbo.sysindexes.[name] as IndexName
     from INFORMATION_SCHEMA.TABLES
      inner join dbo.sysobjects
        on dbo.sysobjects.[name] = INFORMATION_SCHEMA.TABLES.TABLE_NAME
      inner join dbo.sysindexes
        on dbo.sysindexes.[id] = dbo.sysobjects.[id]
  end
 else
  begin
   if (@tableName is not null)
    and (@indexName is null)
    begin
     insert @objectList
      (
      TableID
      ,TableName
      ,IndexName
      )
      select dbo.sysobjects.[id] as TableID
       ,INFORMATION_SCHEMA.TABLES.TABLE_NAME as TableName
       ,dbo.sysindexes.[name] as IndexName
       from INFORMATION_SCHEMA.TABLES
        inner join dbo.sysobjects
          on dbo.sysobjects.[name] = INFORMATION_SCHEMA.TABLES.TABLE_NAME
          inner join dbo.sysindexes
            on dbo.sysindexes.[id] = dbo.sysobjects.[id]
       where (dbo.sysobjects.[id] = @tableId)
    end

   if (@tableName is null)
    and (@indexName is not null)
    begin
     insert @objectList
      (
      TableID
      ,TableName
      ,IndexName
      )
      select dbo.sysobjects.[id] as TableID
       ,INFORMATION_SCHEMA.TABLES.TABLE_NAME as TableName
       ,dbo.sysindexes.[name] as IndexName
       from INFORMATION_SCHEMA.TABLES
        inner join dbo.sysobjects
          on dbo.sysobjects.[name] = INFORMATION_SCHEMA.TABLES.TABLE_NAME
          inner join dbo.sysindexes
            on dbo.sysindexes.[id] = dbo.sysobjects.[id]
       where (dbo.sysindexes.[name] = @indexName)
    end

   if (@tableName is not null)
    and (@indexName is not null)
    begin
     insert @objectList
      (
      TableID
      ,TableName
      ,IndexName
      )
      select dbo.sysobjects.[id] as TableID
       ,INFORMATION_SCHEMA.TABLES.TABLE_NAME as TableName
       ,dbo.sysindexes.[name] as IndexName
       from INFORMATION_SCHEMA.TABLES
        inner join dbo.sysobjects
          on dbo.sysobjects.[name] = INFORMATION_SCHEMA.TABLES.TABLE_NAME
          inner join dbo.sysindexes
            on dbo.sysindexes.[id] = dbo.sysobjects.[id]
       where (dbo.sysobjects.[id] = @tableId)
        and (dbo.sysindexes.[name] = @indexName)
    end
  end

 select @startID
   = min(objectList.ListID)
  ,@endID
   = max(objectList.ListID)
  from @objectList objectList

 set @curID = @startID

 while (@curID <= @endID)
  begin
   select @curTableName
     = objectList.TableName
    ,@tableId
     = objectList.TableID
    ,@curIndexName
     = objectList.IndexName
    from @objectList objectList
    where (objectList.ListID = @curID)

   if (@propertyName is null)
    begin
     insert @Result
      (
      TableName
      ,IndexName
      ,PropertyName
      ,PropertyValue
      )
      select @curTableName as TableName
       ,@curIndexName as IndexName
       ,'IndexDepth' as PropertyName
       ,indexproperty(@tableId, @curIndexName, 'IndexDepth') as PropertyValue
      union
      select @curTableName
       ,@curIndexName
       ,'IndexFillFactor'
       ,indexproperty(@tableId, @curIndexName, 'IndexFillFactor')
      union
      select @curTableName
       ,@curIndexName
       ,'IndexID'
       ,indexproperty(@tableId, @curIndexName, 'IndexID')
      union
      select @curTableName
       ,@curIndexName
       ,'IsAutoStatistics'
       ,indexproperty(@tableId, @curIndexName, 'IsAutoStatistics')
      union
      select @curTableName
       ,@curIndexName
       ,'IsClustered'
       ,indexproperty(@tableId, @curIndexName, 'IsClustered')
      union
      select @curTableName
       ,@curIndexName
       ,'IsFulltextKey'
       ,indexproperty(@tableId, @curIndexName, 'IsFulltextKey')
      union
      select @curTableName
       ,@curIndexName
       ,'IsHypothetical'
       ,indexproperty(@tableId, @curIndexName, 'IsHypothetical')
      union
      select @curTableName
       ,@curIndexName
       ,'IsPadIndex'
       ,indexproperty(@tableId, @curIndexName, 'IsPadIndex')
      union
      select @curTableName
       ,@curIndexName
       ,'IsPageLockDisallowed'
       ,indexproperty(@tableId, @curIndexName, 'IsPageLockDisallowed')
      union
      select @curTableName
       ,@curIndexName
       ,'IsRowLockDisallowed'
       ,indexproperty(@tableId, @curIndexName, 'IsRowLockDisallowed')
      union
      select @curTableName
       ,@curIndexName
       ,'IsStatistics'
       ,indexproperty(@tableId, @curIndexName, 'IsStatistics')
      union
      select @curTableName
       ,@curIndexName
       ,'IsUnique'
       ,indexproperty(@tableId, @curIndexName, 'IsUnique')
    end
   else
    begin
     insert @Result
      (
      TableName
      ,IndexName
      ,PropertyName
      ,PropertyValue
      )
      select @curTableName as TableName
       ,@curIndexName as IndexName
       ,@propertyName as PropertyName
       ,indexproperty(@tableId, @curIndexName, @propertyName) as PropertyValue
    end

   set @curID = @curID + 1
  end

 return
end
go

A few examples of use:

  • to show all properties for all indexes on all tables in the current database:
    select IP.TableName as TableName
     ,IP.IndexName as IndexName
     ,IP.PropertyName as PropertyName
     ,IP.PropertyValue as PropertyValue
     from dbo.fnList_IndexProperty
            (null, null, null) IP
  • to show the IsClustered property for all indexes on all tables in the current database:
    select IP.TableName as TableName
     ,IP.IndexName as IndexName
     ,IP.PropertyName as PropertyName
     ,IP.PropertyValue as PropertyValue
     from dbo.fnList_IndexProperty
            (null, null, 'IsClustered') IP
  • to show the IsUnique property for all indexes named "taspnet_Profile" on all tables in the current database:
    select IP.TableName as TableName
     ,IP.IndexName as IndexName
     ,IP.PropertyName as PropertyName
     ,IP.PropertyValue as PropertyValue
     from dbo.fnList_IndexProperty
            (null, 'taspnet_Profile', 'IsUnique') IP

ML

Tuesday, August 02, 2005

When binary values come as characters

When importing data into SQL through ODBC you might stumble upon a slight hiccup - binary data type is not one of the Microsoft Jet or ODBC data types.

The following procedure treats your binary values as character data, but returns them in the correct data type.

create proc dbo.CharToBin
 (
 @CharacterValue varchar(8000)
 ,@BinaryValue varbinary(8000)  = null  output
 )
as
declare @sql  nvarchar(4000)
declare @params  nvarchar(4000)

set @sql = N'select @BinaryValue = '
         + isnull(@CharacterValue, 0)

set @params = N'@BinaryValue varbinary(8000) output'

exec dbo.sp_executesql
  @stmt = @sql
  ,@parameters = @params
  ,@BinaryValue = @BinaryValue output
go

An example of use:

declare @BinaryValue varbinary(8000)

exec dbo.CharToBin
  @CharacterValue = '0x24806ff96b02a3a1f16cb2840598b236'
  ,@BinaryValue = @BinaryValue output

select @BinaryValue as BinaryValue

ML

Thursday, July 21, 2005

Defragment all indexes in the current database

Use this script to defragment every index on every table in the current database.

declare @workFlow table (
    WorkID  int identity (1, 1)
    ,TableName sysname
    ,IndexName sysname
    ,unique(TableName, IndexName)
    )
declare @startID int
declare @endID  int
declare @curID  int
declare @curTableName sysname
declare @curIndexName sysname

set nocount on

insert @workFlow
 (
 TableName
 ,IndexName
 )
 select dbo.sysobjects.[name] as TableName
  ,dbo.sysindexes.[name] as IndexName
  from dbo.sysobjects
   inner join dbo.sysindexes
     on dbo.sysindexes.[id] = dbo.sysobjects.[id]
  where (dbo.sysobjects.xtype = 'U')
   and (dbo.sysindexes.impid = 0)
  order by dbo.sysobjects.[name]

select @startID
  = min(WorkID)
 ,@endID
  = max(WorkID)
 from @workFlow workFlow

set @curID = @startID

while (@curID <= @endID)
 begin
  select @curTableName
    = workFlow.TableName
   ,@curIndexName
    = workFlow.IndexName
   from @workFlow workFlow
   where (workFlow.WorkID = @curID)

  dbcc indexdefrag (0, @curTableName, @curIndexName)
   with no_infomsgs

  set @curID = @curID + 1
 end

If you need to rebuild the indexes, replace the 'dbcc indexdefrag ...' statement with this one:

dbcc dbreindex (@curTableName, @curIndexName)
 with no_infomsgs

Also make sure you read this must-read white paper on index defragmentation by Mike Ruthruff: Microsoft SQL Server 2000 Index Defragmentation Best Practices.


ML

Tuesday, July 19, 2005

OPENXML custom UDF (inline table function)

Ever had the need to manipulate XML programmatically in SQL? How about this generic function? Of course it needs some alterations to suit your specific needs more appropriately... It recieves one parameter - the document handle of an internal representation of an XML document (see example below).
create function dbo.fnReturn_XML_asUniversalTable
 (
 @xmlObjectHandler int
 )
returns table
as
return (
 select OX.[id] as [id]
  ,OX.parentid as parentid
  ,OX.nodetype as nodetype
  ,OX.localname as localname
  ,OX.prefix as prefix
  ,OX.namespaceuri as namespaceuri
  ,OX.datatype as datatype
  ,OX.prev as prev
  ,OX.[text] as [text]
  from openxml(@xmlObjectHandler, '/') OX
 )
go
An example of use:
declare @xml   nvarchar(4000)
declare @xmlObjectHandler int

set @xml = 'FirstNameMatijaLastNameLah'

-- Prepare the internal representation of your XML
exec dbo.sp_xml_preparedocument
  @xmlObjectHandler output
  ,@xml

select *
 from dbo.fnReturn_XML_asUniversalTable(@xmlObjectHandler)

-- Never forget to remove the document and release the handle!
exec dbo.sp_xml_removedocument
  @xmlObjectHandler
Now go and customize! ML

Wednesday, July 13, 2005

Climbing trees is for monkeys

So, here's a monkey. A 'monkey' function, that is. The tree we'll be climbing is a self-referenced table:
create table dbo.TreeOfValues
 (
 NodeID   int  primary key
 ,Value   nvarchar(64)
 ,ChildOfNodeID  int 
  foreign key references dbo.TreeOfValues (NodeID)
 )
go

insert dbo.TreeOfValues
 (
 NodeID
 ,Value
 ,ChildOfNodeID
 )
 select 1 as NodeID
  ,'father1' as Value
  ,null as ChildOfNodeID
 union all
 select 2
  ,'son11'
  ,1
 union all
 select 3
  ,'son12'
  ,1
 union all
 select 4
  ,'son13'
  ,1
 union all
 select 5
  ,'father2'
  ,null
 union all
 select 6
  ,'son21'
  ,5
 union all
 select 7
  ,'son22'
  ,5
 union all
 select 8
  ,'grandson211'
  ,6
 union all
 select 9
  ,'grandson212'
  ,6
go
And this is the function:
create function dbo.Monkey
 (
 @NodeID  int
 ,@upToNodeID int
 )
returns int
as
begin
 declare @loopDone bit
 declare @currentNodeID int

 set @loopDone = 0

 while (@loopDone = 0)
  begin
   select @NodeID
     = dbo.TreeOfValues.ChildOfNodeID
    from dbo.TreeOfValues
    where (dbo.TreeOfValues.NodeID = @NodeID)

   select @currentNodeID
     = dbo.TreeOfValues.NodeID
    from dbo.TreeOfValues
    where (dbo.TreeOfValues.NodeID = @NodeID)

   if (@NodeID = @upToNodeID
       or @NodeID is null
       or @currentNodeID is null)
    begin
     set @loopDone = 1
    end
  end

 return @NodeID
end
go
And here is an example to find all the descendants of 'father2':
select *
 from dbo.TreeOfValues
 where (dbo.Monkey(dbo.TreeOfValues.NodeID , 5) = 5)
Beware! This function can get really slow on large tables. Indexes on NodeID and ChildOfNodeID are a necessity. ML

Monday, July 11, 2005

Return related values as array

Silly it may be but sometimes de-normalized relationships make more sense to the end-user. The following function presents a way to list related records (in the Northwind database) in a flat table, where in this case the names of products in an order can be listed in a single column as comma-delimited values. Also known as: aggregate concatenation.

create function dbo.fnGet_ProductName_asArray
 (
 @OrderID int
 ,@Separator nvarchar(16) = null
 )
returns nvarchar(4000)
as
begin
 declare @productNameArray nvarchar(4000)

 if (@Separator is null)
  begin
   set @Separator = N', '
  end

 set @productNameArray = N''

 select @productNameArray
   = @productNameArray
   + @Separator
   + convert(nvarchar(4000), isnull(dbo.Products.ProductName, N''))
  from dbo.[Order Details]
   inner join dbo.Products
     on dbo.Products.ProductID = dbo.[Order Details].ProductID
  where (dbo.[Order Details].OrderID = @OrderID)
  order by dbo.Products.ProductName

 set @productNameArray = ltrim(rtrim(substring(@productNameArray,
                       len(@Separator) + 1, len(@productNameArray))))

 return @productNameArray
end
go

Please, use this function for presentation purposes only. Better still - use appropriate presentation components to do this on the client-side.

WARNING!
There are known issues with aggregate concatenation in T-SQL:

Thanks to Erland Sommarskog for pointing them out to me.

Still, here's an example of use:

select dbo.[Order Details].OrderID
 ,dbo.fnGet_ProductName_asArray
  (dbo.[Order Details].OrderID, N' | ') as ProductNames
 from dbo.[Order Details]
  inner join dbo.Products
    on dbo.Products.ProductID = dbo.[Order Details].ProductID
 group by dbo.[Order Details].OrderID

ML


p.s. In SQL Server 2005 there are alternatives...

Wednesday, May 04, 2005

Test for existence of databases on the local and/or a linked server

Working with many databases on many servers can be a drag sometimes. In special situations - especially when using dynamic queries - it is imperative to test for existence of linked db's programmatically. I use this function:
create function dbo.fnGet_dbExists
 (
 @dbName varchar(8000)
 )
returns bit
as
begin
 declare @result  bit
 declare @serverName varchar(128)
 declare @serverExists bit

 if (@dbName like '%.%')
  begin
   set @serverName = substring(@dbName, 1, charindex('.', @dbName) - 1)
   set @dbName = substring(@dbName, charindex('.', @dbName) + 1, len(@dbName))

   if (@serverName = serverproperty('ServerName'))
    begin
     set @serverName = null
    end
  end
 else
  begin
   set @serverName = null
  end

 if (@serverName is not null)
  begin
   if (exists (
     select master.dbo.sysservers.srvid
      from master.dbo.sysservers
      where (master.dbo.sysservers.srvname = @serverName)
     ))
    begin
     set @serverExists = 1
    end
   else
    begin
     set @serverExists = 0
    end
  end
 else
  begin
   set @serverExists = 1
  end

 if (@serverExists = 1)
  begin
   if (@serverName is not null)
    begin
     if (exists (
       select RemoteC.catalog_name
        from master.dbo.sysremote_catalogs(@serverName) RemoteC
        where (RemoteC.catalog_name = @dbName)
       ))
      begin
       set @Result = 1
      end
     else
      begin
       set @Result = 0
      end
    end
   else
    begin
     if (db_id(@dbName) is not null)
      begin
       set @Result = 1
      end
     else
      begin
       set @Result = 0
      end
    end
  end
 else
  begin
   begin
    set @result = 0
   end
  end

 return @result
end
go
The name of the database passed as @dbName must be qualified with the name of the linked server. For databases on the local server such qualifications can be omitted. Examples:
  1. To test for existence of a database named "Product" on the linked server named "Vienna" dbName should contain the value "Vienna.Product".
  2. If the local server is named "Berlin" both "Berlin.Product" and "Product" are valid values.
I hope you find good use of this function. ML

Thursday, March 03, 2005

Fail-safe conversion to integer

The following function provides a simple solution to string-to-integer conversion.

It converts strings of data type nvarchar(4000) to numbers of data type integer.

By utilizing pattern-matching numeric characters are parsed from the input string from left to right until the pattern is matched. For instance: string "338/I" is converted to number 338. Before actual conversion, the part of the string to be converted is tested for compatibility using the isnumeric() system function.


NB (October 25th 2007):

Razvan Socol has spotted a flaw in the original function, so thanks to him here is a safer version:

create function dbo.fnGet_asInteger
 (
 @String  nvarchar(4000)
 ,@Pattern nvarchar(4000)  = null
 )
returns int
as
begin
 declare @Result   int
 declare @intermediateResult nvarchar(4000)

 if (@Pattern is null)
  begin
   set @Pattern = '%[^0-9]%'
  end

 set @intermediateResult
   = case
    when @String like @Pattern
     then substring(@String, 1, patindex(@Pattern, @String) - 1)
    else @String
    end

 if (isnumeric(@intermediateResult) = 1)
  begin
   set @Result = cast(@intermediateResult as int)
  end
 else
  begin
   set @Result = null
  end

 return @Result
end
go

However, if you're looking for a much more resilient solution you might find this article very useful:

In fact, the ASPFAQ site is practically overflowing with good advice on ASP, SQL Server, and much more. So, go ahead and learn!


ML

Thursday, February 03, 2005

Functions Overview

I use this query in MS Excel to view the parameters in each of the user-defined functions in a specific database. It also provides me with an overview into the design of return tables for table functions.
select  sysobjects.[name] as FuncName
        ,ParamName
                = case
                        when    syscolumns.[name] = ''
                                then    'RETURNS'
                        else    syscolumns.[name]
                        end
        ,systypes.[name] as Datatype
        ,syscolumns.length as Length
        ,'' as ReturnTable
        from    sysobjects
                left join       syscolumns
                on      syscolumns.[id] = sysobjects.[id]
                inner join      systypes
                on      systypes.xusertype = syscolumns.xusertype
        where   (sysobjects.xtype in ('FN', 'IF', 'TF'))
                and (left(sysobjects.[name], 3) != 'dt_')
                and (left(syscolumns.[name], 1) = '@'
                        or syscolumns.[name] is null
                        or syscolumns.[name] = '')
union
select  sysobjects.[name] as FuncName
        ,'RETURNS TABLE' as ParamName
        ,'' as Datatype
        ,-1 as Length
        ,ReturnTable
                = (
                select  substring(substring(syscomments.[text], 
                        patindex('%returns%', syscomments.[text]), 
                        patindex('%as' + char(13) + '%', 
                        syscomments.[text]) - patindex('%returns%', 
                        syscomments.[text])), patindex('%table%', 
                        substring(syscomments.[text], 
                        patindex('%returns%', syscomments.[text]), 
                        patindex('%as' + char(13) + '%', 
                        syscomments.[text]) - patindex('%returns%', 
                        syscomments.[text]))), 
                        len(substring(syscomments.[text], 
                        patindex('%returns%', syscomments.[text]), 
                        patindex('%as' + char(13) + '%', 
                        syscomments.[text]) - patindex('%returns%', 
                        syscomments.[text]))))
                        from    syscomments
                        where   (syscomments.[id] = sysobjects.[id])
                )
        from    sysobjects
                left join       syscolumns
                                on      syscolumns.[id] is null
        where   (sysobjects.xtype in ('TF'))
                and (left(sysobjects.[name], 3) != 'dt_')
                and (left(syscolumns.[name], 1) = '@'
                        or syscolumns.[name] is null
                        or syscolumns.[name] = '')
        order by        FuncName
The result-set is a flat table. ML

Procedures Overview

I use this query in MS Excel to view the parameters in each of the procedures in a specific database.
select  sysobjects.[name] as ProcName
        ,syscolumns.[name] as ParamName
        ,systypes.[name] as Datatype
        ,syscolumns.length as Length
        ,Special
                = case  syscolumns.isoutparam
                        when    1
                                then    'output'
                        else    ''
                        end
        from    sysobjects
                left join       syscolumns
                on      syscolumns.[id] = sysobjects.[id]
                inner join      systypes
                on      systypes.xusertype = syscolumns.xusertype
        where   (sysobjects.xtype = 'P')
        order by        ProcName
                        ,syscolumns.colorder
The result-set is a flat table. ML

The First One

I started this blog to store and to share little bits and pieces of T-SQL code. They are but snippets and scripts which have brought efficiency into my work with SQL.

I sincerely hope they do the same for you.


ML