Thursday, October 26, 2006

The expense of down-converting unicode values

Using functions on columns in query conditions is bad practice and should be avoided – I've mentioned that fact before. As it turns out, sometimes the database engine itself chooses to transform column data in order to test a condition, causing a negative performance impact. A serious one as well.

As Marco Russo describes in his post on a problem one of his users has encountered, the database engine apparently sometimes resorts to implicit data conversions when evaluating query conditions, resulting in poor performance as the optimizer fails to utilize Index Seeks and uses Index Scans instead. See this Microsoft Connect Feedback article for the repro and the official explanation from the SQL Server development team.

I've also posted a comment on this subject (twice, as it turns out ;) after I've discovered that the problem only ever occurs with SQL Server collations. Can anyone out there confirm this? Please, respond to the Microsoft Connect website if you can.


Is this yet another reason why Windows collations should be prefered over SQL collations?

After successfully validating the issue I have tried to see what could be done to overcome this apparent anomally and have narrowed the options down to two:

  • To explicitly cast the unicode value to the appropriate datatype and collation in the query – while this does indeed take care of the problem and gives us the ability to influence the query execution, it seems cumbersome and unintuitive; or
  • To use Windows collations instead of SQL collations – this would seem to be the simplest solution, but it would require modifications of existing database schemas, and I guess we all know *that* particular snowball game.

However, at the moment I would still prefer a confirmation of whether this issue truly only occurs with SQL Server collations.


ML

No comments: