Sunday, December 21, 2008

Sparse Columns and Filtered Indexes

At the December meeting of SLODUG, the Slovenian Developers User Group, I've presented two new functionalities introduced with SQL Server 2008:

  • Sparse Columns – a new (or, at least, an improved) way of storing data in nillable columns so that non-existent data takes up virtually no space in the database.

    Well, to be honest, storage usage *is* affected (as the maximum size of a row in a table is decreased from 8,060 Bytes to 8,018 Bytes) but with this change the maximum number of columns in a table is increased to 30,000 (including a former maximum of 1,024 non-sparse columns).

    With the introduction of sparse columns SQL Server 2008 recognizes two types of tables: wide tables (containing at least one sparse column) and non-wide tables (containing no sparse columns, only "regular" ones – nillable or not);
  • Filtered Indexes – by introducing the WHERE clause to the definition of indexes (and statistics) on base tables restrictions can be applied to data sets being indexed, thus introducing several new possibilities, such as:

    – sparse keys (nillable/sparse columns can now be used as keys);

    – conditional constraints (unique constraints can be created on subsets of data); and

    – this also extends the usability of covering indexes (by indexing one or more subsets of data one could effectively partition data without having to use Table Partitioning).

More on these two subjects (and why they actually represent the same one) is available in the presentation. Clue: Rational Flexibility.

(Or rather the files *will be* available as soon as they've been uploaded to the SLODUG site.)

Stay tuned...


ML

Some Old News

This may be old news, but I haven't said anything about this so far, so this may not be old news to everyone. ;)


SQL Server 2005 Service Pack 3

SQL Server 2005 SP3 has been released on December 15th 2008. The files (elevating the full version number to 9.00.4035) are available for download:

Before applying SP3 you should be aware of the fact that not all Cumulative Updates (CU) for SQL Server 2005 are included in SP3. More information is available in MSDN Knowledge Base article #959195: Cumulative update package 1 for SQL Server 2005 Service Pack 3.

The feature pack for SQL Server 2005 has also been updated:


SQL Server 2005 Books Online Update (December 2008)

The local install version of SQL Server 2005 Books Online has also been updated on December 16th 2008. The files are available for download:


ML

Wednesday, October 01, 2008

Tuesday, August 26, 2008

SQL Server 2008 Express with Advanced Services

The advanced edition of SQL Server 2008 Express is now available for download. As was the case with version 2005 when this edition was first introduced, the Advanced Services include (in addition to the Database Engine, of course):

  • SQL Server Management Studio Basic (formerly: Express);
  • Full-text Search; and
  • SQL Server Reporting Services (limited to local reporting).

You can find the installation files and instructions right here:

You can find more information at the SQL Server 2008 Express main page:


ML

Friday, August 08, 2008

SQL Server 2008 RTM

I'm sure you know by now that SQL Server 2008 has RTMed about two days ago and is already available for download to MSDN/TechNet subscribers.

There are additional free downloads that you might also be interested in:

As always, sample databases are available at CodePlex:

Before attempting to install SQL 2008, though, you really should read the release notes which might make it easier for you to avoid any unpleasant surprises. Knowing is half the battle.*


ML


Friday, June 13, 2008

SQL Server 2008 (RC0), FILESTREAM changes

At this year's Microsoft NT Conference in Portorož fellow MVP Andrej Tozon and I have presented a workshop on designing solutions with Visual Studio 2008 and SQL Server 2008 using the Windows Presentation Foundation and a few new SQL Server 2008 features. Andrej has blogged about this last month and has made all the workshop files available on his site.

You might also already know that SQL Server RC0 has been released on June 10th, which has been announced in the SQL Server documentation team blog – amongst several other places. Of course, a new version of Books Online is also available.


Why all these old news now?

There have been a few changes in RC0 regarding the FILESTREAM feature that I feel I should mention here – given the fact that as a consequence I'll also have to amend the workshop files. However, I think I'd better delay any changes until RTM has been released. Who knows what else I'll have to change... ;)

Long story short, the changes refer to the way FILESTREAM is enabled, and the new procedure is described in the current version of Books Online:

In addition, there's currently one more problem regarding the use of SQL Server Configuration Manager when enabling FILESTREAM. It's been reported in the SQL Server Storage Engine blog:

Well, that's about it. For now. But be sure to go through all the Books Online articles regarding FILESTREAM before attempting to use it in RC0 (in case I missed something).


ML

Wednesday, May 07, 2008

SQL Server 2008 Scoped Search

This may be old news, but it's also the kind of "old news" that's actually becoming more and more relevant these days. ;)

More than a month ago the Microsoft SQL Server documentation team have announced a new Live Search Macro:

Those of you who've found the 'old' one useful, must have been anticipating the 'new' one.

Personally, sometimes I really don't know what I'd do without them – providing a friend in need with a 'live' link to a specific Books Online article has never been easier. :)


ML

Tuesday, April 01, 2008

SQL Server: XQuery/XPath, Predicate Functions

Previously we have looked at the supported XQuery/XPath retrieval functions implemented in SQL Server 2005 with a few examples of use, and we have mentioned the fact that some functions are restricted to use in XPath predicates. Today we will see which XQuery/XPath functions are subject to this restriction.

Any attempt at using these restricted functions to retrieve data from the XML would result in the following exception:

Msg 2371, Level 16, State 1, Line 302
XQuery [query()]: '%function%' can only be used within a predicate or XPath 
selector

Where do we begin?

First of all, you'll need the XML sample (right-click + Save target) and the XML Schema that we've used in the previous post. Review each script and execute against your favorite testing database.

Of course, the XML Schema script must be executed first.


Predicate predicaments

The purpose of predicates is to restrict the result set returned by the XQuery or the XPath expression. Think of the XPath predicate as being to XQuery as the WHERE clause is to the T-SQL query.

The XPath predicate is an optional part of the XPath expression. It must follow the XPath reference to an XML node and must be enclosed in square brackets. For instance, if we wanted to retrieve the values of the Price element of all the books in our XML sample, we would use the following XPath expression:

/Books/Book/Price

However, to retrieve only the price for the third book in the sample we would also need the XPath predicate:

/Books/Book[3]/Price

...which is short for:

/Books/Book[position() = 3]/Price

Which, incidentally, brings us to our first predicate function.

NB: *any* supported XQuery/XPath function can be used inside an XPath predicate in SQL Server 2005 XML queries.


Only two?

position()

To retrieve the title of the third book in our XML we'd use either of these queries:

select Books.Book.query(
  'Title'
  ).value(
   '.'
   ,'varchar(80)'
   ) as TitleOfThirdBook
 from @Books.nodes('/Books/Book[position() = 3]') Books (Book)

select Books.Book.query(
  'Title'
  ).value(
   '.'
   ,'varchar(80)'
   ) as TitleOfThirdBook
 from @Books.nodes(
  '/Books/Book[3]'
  ) Books (Book)

Of course it would also be quite usefull to actually retrieve the position of a given node, wouldn't it? In SQL Server 2005 there's little we can do with the position() function directly, but by using a table of numbers we can circumvent this limitation:

declare @NumbersOnTheFly table
 (
 Number int primary key
 )

-- Use the spt_values table in the master database to create a table of numbers
insert @NumbersOnTheFly
 (
 Number
 )
 select Number
   = row_number()
    over
    (
    order by [name]
    )
  from master.dbo.spt_values

select NumbersOnTheFly.Number as Position
 ,Books.Book.query(
  'Book[position() = sql:column("NumbersOnTheFly.Number")]/Title'
  ).value(
   '.'
   ,'varchar(80)'
   ) as Title
 from @Books.nodes('/Books') Books (Book)
  cross join @NumbersOnTheFly NumbersOnTheFly
 where (NumbersOnTheFly.Number <= Books.Book.query(
   'count(Book)'
   ).value(
    '.'
    ,'int'
    ))

Using the sql:column function (look it up in Books Online) we can correlate the results of the XQuery with the results of the T-SQL query. The initial correlation is achieved by issuing another XQuery in the WHERE clause (i.e. using count(Book) to get the total number of Book elements), thus restricting the rows of the numbers table to the actual number of nodes targeted by the main XQuery expression passed to the nodes XML method in the FROM clause, followed by the "data-driven" query XML method in the SELECT statement.

If the position() XQuery/XPath function was not restricted to predicate use, we could have used something much simpler:

select Books.Book.query(
  'position()'
  ).value(
   '.'
   ,'int'
   ) as Position
 ,Books.Book.query(
  'Title'
  ).value(
   '.'
   ,'varchar(80)'
   ) as Title
 from @Books.nodes(
  '/Books/Book'
  ) Books (Book)

...not in SQL Server 2005, though.

Take a look at the execution plan for the workaround, and imagine the execution plan if the workaround were not needed. ;)


last()

The last() XQuery/XPath function is closely connected with the position() XQuery/XPath function; it returns the index of the last node in the node set returned by a given XPath expression. To retrieve the title of the last book in the XML we can use either of these queries:

select Books.Book.query(
  'Title').value(
   '.'
   ,'varchar(80)'
   ) as TitleOfLastBook
 from @Books.nodes(
  '/Books/Book[last()]'
  ) Books (Book)

select Books.Book.query(
  'Title'
  ).value(
   '.'
   ,'varchar(80)'
   ) as TitleOfLastBook
 from @Books.nodes(
  '/Books/Book[position() = last()]'
  ) Books (Book)

Why is there no first() XQuery/XPath function? Think about it: the first index can always be known in advance (= 1). :) On the other hand, knowing the last index would require counting the nodes (e.g. using the count() function) and additional processing that can be avoided thanks to the last() function.


Mental aggregation

So, here they are – the two XQuery/XPath functions the use of which is restricted to XPath predicates. Both of them are essentially in conflict with one of the principal concepts of the set theory, namely that sets are not ordered. In XML, on the other hand, order is preserved (though not guaranteed), so functions like these make sense.

And we've also seen one more use for the table of numbers (for the time being).

Anyway, there are still a few XQuery/XPath functions in SQL Server 2005 that we haven't discussed yet – so stay tuned.


ML

Monday, March 10, 2008

SQL Server: XQuery/XPath, Retrieval Functions

Even after all this time (since the release of SQL Server 2005 which introduced the XML data type) I still notice users having trouble with XQuery/XPath functions when processing (or rather trying to process) XML data in SQL Server, that's why I've decided to dedicate a few more posts to the subject. The ulterior motive here is me learning and testing what (I think) I know. ;)

I've introduced the SQL Server 2005 implementation of XQuery/XPath functions in an earlier post.

In this post we'll be looking at "pure" retrieval functions – in contrast to predicate-only functions. The difference between the former and the latter comes from the limitations emerging from the way the XQuery/XPath standard(s) have been implemented in SQL Server 2005. As most users have already found out (the hard way) the use of some functions is limited to XPath predicates. This time we'll be focusing on those that do not suffer from this limitation. You can read more about the supported XQuery/XPath functions in Books Online. Also note that all retrieval functions can also be used inside XPath predicates.

Have you ever encountered the following exception when using XQuery/XPath functions in SQL Server 2005?

XQuery [query()]: The XQuery syntax '/function()' is not supported.

Take a good look at the examples below – you'll notice that no XPath expression uses the XPath/function() syntax. Although this is valid syntax for XML queries according to the W3C XQuery/XPath Recommendations it is not currently allowed in SQL Server unless used inside XPath predicates (e.g. this is allowed: XPath[function() = 'value']). But as mentioned before – we'll discuss predicates later.


Are we ready?

Almost. We need an XML sample (right-click + Save target) (built from the pubs sample database), and – since I assume most of us have left kindergarten a long time ago – an XML Schema that will provide us with what every data type needs: domain integrity.

Remember: untyped XML is synonymous with unnecessary problems.

Oh, and make sure the XML Schema Collection has been created in your favorite testing database *before* attempting to do the rest of this exercise.


I. Numeric functions

number(), ceiling(), floor(), round()

These four functions provide the basic functionality related to numerical data. If an XPath expression is passed as the sole argument to one of these functions then this XPath expression must evaluate to a singleton.

select Books.BooksCollection.query(
  'Price'
  ).value(
   'number(.)'
   ,'money'
   ) as Price
 ,Books.BooksCollection.query(
  'Price'
  ).value(
   'ceiling(.)'
   ,'money'
   ) as CeilingPrice
 ,Books.BooksCollection.query(
  'Price'
  ).value(
   'floor(.)'
   ,'money'
   ) as FloorPrice
 ,Books.BooksCollection.query(
  'Price'
  ).value(
   'round(.)'
   ,'money'
   ) as RoundPrice
 from @Books.nodes(
    '/Books/Book'
    ) Books (BooksCollection)

A word of caution: absent and/or nilled nodes return an empty sequence that cannot be implicitly converted to numerical data types – hence the use of the number() function for the Price column. Try removing the function and observe the consequences.

The number() function converts the data returned by the XPath expression to a value of type xs:double (equivalent to float in SQL Server), and in contrast with the data() function (discussed further down) properly returns null to the caller when the XPath expression evaluates to an empty sequence.


II. Aggregate functions

Contrary to the W3C XPath Recommendation aggregate XPath functions implemented in SQL Server 2005 do not accept multiple arguments. Only a constant or an XPath expression can thus be passed as the sole argument to any of the aggregate functions, and the XPath expression must evaluate to a singleton or a node set.

sum(), avg(), count()

select Books.BooksCollection.query(
  'sum(Book/Price)'
  ).value(
   '.'
   ,'money'
   ) as PriceTotal
 ,Books.BooksCollection.query(
  'avg(Book/Price)'
  ).value(
   '.'
   ,'money'
   ) as AveragePrice
 ,Books.BooksCollection.query(
  'count(Book/Price)'
  ).value(
   '.'
   ,'int'
   ) as BookCount
 from @Books.nodes(
   '/Books'
   ) Books (BooksCollection)

Consider the XPath expression used for the BookCount column. Does it return the actual number of books in the XML sample?


max()

select Books.BooksCollection.query(
  'max(Book/Royalty)'
  ).value(
   '.'
   ,'int'
   ) as LargestRoyalty
 from @Books.nodes('/Books') Books (BooksCollection)

min()

select Books.BooksCollection.query(
  'min(Book/YtdSales)'
  ).value(
   '.'
   ,'int'
   ) as SmallestYtdSales
 from @Books.nodes(
   '/Books'
   ) Books (BooksCollection)

III. String functions

What can I say about these five functions? Their names pretty much speak for themselves, don't they?

string()

select Books.BooksCollection.query(
  'string(Price)'
  ).value(
   '.'
   ,'nvarchar(20)'
   ) as LocalName
 from @Books.nodes(
   '/Books/Book'
   ) Books (BooksCollection)

Be careful when using the string() function: when the XPath expression passed as the argument evaluates to a node set, the data from *all* elements of the node set is concatenated into a single string.


string-length()

select Books.BooksCollection.query(
  'Title'
  ).value(
   '.'
   ,'nvarchar(80)'
   ) as Title
 ,Books.BooksCollection.query(
  'string-length(Title)'
  ).value(
   '.'
   ,'int'
   ) as TitleLength
 from @Books.nodes(
   '/Books/Book'
   ) Books (BooksCollection)

substring()

select Books.BooksCollection.query(
  'substring(Title, 4, 12)'
  ).value(
   '.'
   ,'nvarchar(12)'
   ) as TitleSubstring
 from @Books.nodes(
   '/Books/Book'
   ) Books (BooksCollection)

concat()

The concat() function is used to concatenate values into a single string. Note that at least two arguments are mandatory, otherwise an exception is raised. Empty sequences (e.g. returned by XPath expressions when no node is matched) are treated as zero-length strings.

select Books.BooksCollection.query(
  'concat(Title, " (", @TitleId, ")")'
  ).value(
   '.'
   ,'nvarchar(max)'
   ) as TitleAndTitleId
 from @Books.nodes(
   '/Books/Book'
   ) Books (BooksCollection)

contains()

I would expect the contains() function to be restricted to predicate use, but apparently it can also be used to retrieve data. Of course, it either returns true if the string passed as the second argument is found inside the string passed as the first argument, or false if it isn't.

select Books.BooksCollection.query(
  'Title'
  ).value(
   '.'
   ,'nvarchar(80)'
   ) as Title
 ,Books.BooksCollection.query(
  'contains(Title, "Computer")'
  ).value(
   '.'
   ,'nvarchar(max)'
   ) as Contains_Computer_InTitle
 from @Books.nodes(
   '/Books/Book'
   ) Books (BooksCollection)

Warning: if the string passed as the second argument is longer than 4000 characters, contains() returns an empty sequence. This is yet another limitation of the SQL Server 2005 XQuery/XPath implementation.


Update (2009-03-07)

SQL Server 2008 introduces two additional XPath functions that are not supported by SQL Server 2005.

lower-case() and upper-case()

These two functions can make it just a little easier to circumvent live with the fact that XML is case-sensitive.

select Books.Book.query
  (
  'lower-case(Title)'
  ).value
   (
   '.'
   ,'nvarchar(80)'
   ) as LowerCaseTitle
 ,Books.Book.query
  (
  'upper-case(Title)'
  ).value
   (
   '.'
   ,'nvarchar(80)'
   ) as UpperCaseTitle
 from @books.nodes
   ('
   /Books/Book
   ') Books (Book)

IV. Miscellaneous functions

data()

We've mentioned the data() function before. It returns the data from the node(s) specified in the XPath expression in the appropriate type (if the node is typed). For details see the Remarks section in this Books Online article.

select Books.BooksCollection.query(
  'data(@TitleId)'
  ).value(
   '.'
   ,'varchar(6)'
   ) as TitleId
 ,Books.BooksCollection.query(
  'data(@PubDate)'
  ).value(
   '.'
   ,'datetime'
   ) as PubDate
 from @Books.nodes(
   '/Books/Book'
   ) Books (BooksCollection)

I've also mentioned the data() function when discussing the retrieval of XML attributes.


empty()

One might think this function would also be restricted to predicates (testing whether a node is empty or not), but how would we then make a node empty when modifying XML instances?

select Books.BooksCollection.query(
  'empty(Price)'
  ).value(
   '.'
   ,'nvarchar(max)'
   ) as EmptyPrice
 from @Books.nodes(
   '/Books/Book'
   ) Books (BooksCollection)

Books Online also suggests using the empty() function together with not() to compensate for the lack of the exists XPath function. I'll give an example when we discuss predicates.


true(), false()

Once again, not only are we allowed to test the state of a bolean type node in XML instances, but we can also modify its value using these two functions.

Both are mentioned here only as a reference.


not()

Essential to logical operations, wouldn't you agree?

select Books.BooksCollection.query(
  'not(Price)'
  ).value(
   '.'
   ,'nvarchar(max)'
   ) as NotPrice
 from @Books.nodes(
   '/Books/Book'
   ) Books (BooksCollection)

The query above returns true for books with no Price element.


local-name()

Now this is a function that causes many users migraines. But there's nothing actually wrong with it – provided the syntax limitations (mentioned earlier) are taken under consideration.

select Books.BooksCollection.query(
  'local-name(.)'
  ).value(
   '.'
   ,'nvarchar(max)'
   ) as LocalName
 from @Books.nodes(
   '/Books/Book[1]/*'
   ) Books (BooksCollection)

Look closely at the XQuery expression passed to the nodes function and the ones passed to the query and value functions. This is how the node name(s) can be retrieved in SQL Server 2005.


distinct-values()

This can be a very useful function, as demonstrated in another post.

select Books.BooksCollection.query(
  'distinct-values(Book/PubId)'
  ).value(
   '.'
   ,'nvarchar(max)'
   ) as PubId
 from @Books.nodes(
   '/Books'
   ) Books (BooksCollection)

Well, there are five more XQuery/XPath retrieval functions implemented in SQL Server 2005, but they deserve special attention, and we'll discuss them later. For instance, the predicate-only functions.


Mental aggregation

So, here they are, (most of) the XQuery/XPath functions implemented in SQL Server 2005 with examples of use.

Ergo:

  • Not all functions have been implemented;
  • Some functions are restricted to predicate use;
  • The XPath/function() syntax is not currently supported in SQL Server 2005;
  • Additional limitations may apply – Books Online is your friend.

ML


p.s. Regarding the BookCount column in the query demonstrating aggregate functions: count(Book/Price) only counts Book elements where the Price element is present, and does not return the number of *all* Book elements in the XML sample.

Sunday, February 24, 2008

SQL Server XQuery, Accessing Attributes

While browsing this blog's usage logs recently, I've noticed quite a few referrals from various search engines of users trying to troubleshoot problems they may have encountered when accessing XML attributes in a SQL Server XQuery.

Although the problem has been discussed in the past, and has even been documented in Books Online, some users out there still aren't familiar with a SQL Server 2005 limitation regarding the querying of XML attributes – and the error message itself isn't of much help either. That's why I've decided to point this problem out in my blog as well to help improve the visibility of this issue.

For example, when querying this XML:

declare @books xml

set @books = N'<xml version="1.0">
<Books xsi:noNamespaceSchemaLocation="PubsTitles.xsd">
 <Book TitleId="BU1032"PubDate="1991-06-12T00:00:00.000Z">
  <Title>The Busy Executive''s Database Guide</Title>
  <Type>business</Type>
  <PubId>1389</PubId>
  <Price>19.9900</Price>
  <Royalty>10</Royalty>
  <YtdSales>4095</YtdSales>
  <Notes>An overview of available database systems with emphasis 
on common business applications. Illustrated.</Notes>
 </Book>
 <Book TitleId="BU1111"PubDate="1991-06-09T00:00:00.000Z">
  <Title>Cooking with Computers: Surreptitious Balance Sheets</Title>
  <Type>business</Type>
  <PubId>1389</PubId>
  <Price>11.9500</Price>
  <Royalty>10</Royalty>
  <YtdSales>3876</YtdSales>
  <Notes>Helpful hints on how to use your electronic resources to 
the best advantage.</Notes>
 </Book>
</Books>'

...one would instinctively formulate the query like this:

select Books.Book.query('Title').value('.', 'varchar(max)') as Title
 ,Books.Book.query('@TitleId').value('.', 'varchar(6)') as TitleId
 ,Books.Book.query('@PubDate').value('.', 'datetime') as TitleId
 from @Books.nodes('/Books/Book') Books (Book)

However, referencing XML attributes this way is not allowed in SQL Server 2005, and this is the error message we would have encountered if we were to follow our instincts:

Msg 2396, Level 16, State 1, Line 26
XQuery [query()]: Attribute may not appear outside of an element

Not quite as eloquent as one would expect, wouldn't you agree?


fn:data() to the rescue!

By using the data() XQuery function like so:

select Books.Book.query('Title').value('.', 'varchar(80)') as Title
 ,Books.Book.query('data(@TitleId)').value('.', 'varchar(6)') as TitleId
 ,Books.Book.query('data(@PubDate)').value('.', 'datetime') as TitleId
 from @Books.nodes('/Books/Book') Books (Book)

...the query succeeds as expected:

Title                                                TitleId TitleId
---------------------------------------------------- ------- -----------------------
The Busy Executive's Database Guide                  BU1032  1991-06-12 00:00:00.000
Cooking with Computers: Surreptitious Balance Sheets BU1111  1991-06-09 00:00:00.000

This is true both for typed as well as untyped XML instances, and the function is, of course, documented in Books Online.


ML


p.s. I've decided on posting this example while preparing a more detailed overview of XPath/XQuery functions in SQL Server, which should follow soon.

p.p.s. The overview is divided into two parts:

Friday, January 11, 2008

What's wrong with COUNT(*)?

Counting – what's it all about? I'm sure you've heard about a trick that one should use when aggregating data using the COUNT system function in Microsoft SQL Server; specifically, how the performance of the function can be improved using a neat trick. But can we go beyond neat...?

The tests have been done on a SQL Server 2005 (SP2) instance; you're all kindly invited to also run them on an instance/version of your own choice.


Send in the rows

Let's start with some test data. I've provided a script that will do that for you, provided you have a testing database – which I'm sure you do (yes, tempdb is just fine).

The script creates a five column table and copies data into it from the AdventureWorks database, which I assume is available on the same server instance. If any of the above is not true in your particular case, you can still adapt the script as needed.

We end up with a brand new table containing 19,972 rows of data residing in 5,197 data pages.


The plan

We'll be counting rows (what a shock):

  • Query #1 – using no restrictions (i.e. counting all rows);
    select count(*)
     from dbo.Contact
    and
  • Query #2 – using a restriction (i.e. counting rows corresponding to specific criteria).
    select count(*)
     from dbo.Contact
     where (dbo.Contact.FirstName = N'Karl')

Get on with it!

If we execute the two queries above on our heap (remember, this new table has no keys and no indexes), we get these results:

Query Performance
(on a heap)
Result
(row count)
CPU Reads
#1 76 5203 19972
#2 62 5203 22

Also take a look at the execution plans (Query #1 execution plan, Query #2 execution plan). Nothing too impressive, but understandable: without any indexes, the optimizer needs to access all the pages in order to get the actual row count. With Query #1 the optimizer has no options but to do a full table scan, and the restriction in Query #2 can't really improve the performance as it requires data to be tested in order to restrict the result set.

Now, let's add the primary key. We'll use the ContactID column and also make the primary key clustered (which is the default anyway, but I prefer explicit declarations):

alter table dbo.Contact
 add constraint pk_Contact
   primary key clustered
   (
   ContactID
   )
Query Performance
(clustered primary key)
Result
(row count)
CPU Reads
#1 34 5208 19972
#2 68 5208 22

Ha! The number of reads has actually increased. Note that the clustered primary key actually takes up as many pages as the data itself. On the other hand, for Query #1 the stress on the CPU has decreased, while for Query #2 it seems that the clustered primary key can't really do much – data must be read and tested, and the referenced column is not part of the primary key.

Do you think making it part of the clustered primary key might make a difference? Do you think it would make a favorable difference? I've decided on not including the latter in this test as it affects the data model, and modifying the data model (especially, changing a primary key) just to improve performance is IMO far from being good practice. If you're curious enough you can do it yourself.1

Anyway, as expected (and quite imminent as well), the execution plans are different now (Query #1 execution plan, Query #2 execution plan) compared to before.

Let's see what happens if we add an index. Let's index the column used in the restriction of Query #2 – the FirstName column:

create nonclustered index x_Contact_FirstName
 on dbo.Contact
  (
  FirstName
  )
Query Performance
(cl. PK + noncl. FirstName)
Result
(row count)
CPU Reads
#1 4 59 19972
#2 0 2 22

Well, well, who'd have thought...? The optimizer once again stands up to its name. Let's examine the execution plans (Query #1 execution plan, Query #2 execution plan). No longer is the use of the clustered primary key the optimal choice – the nonclustered index requires far less reading. After all, it resides on only 57 pages, and the optimizer takes full advantage of this fact – on both occasions.


Some Romans fear Asterix...

..others fear the asterisk like the plague. However, if we consider the fact that

select count(*)

simply is not in any way functionally equivalent to

select *

then this fear loses most of its substance. I will not go into any details this time of when and why not declaring columns in the SELECT clause is poor practice.

For comparison let's execute these two queries:

  • Query #4 – retrieving all rows (and all columns);
    select *
     from dbo.Contact
    and
  • Query #5 – retrieving a subset of rows.
    select *
     from dbo.Contact
     where (dbo.Contact.FirstName = N'Karl')

The execution plans:

If we now compare the execution plans of these last two queries with the execution plans above, we can observe one significant difference. When viewing the execution plan in SSMS park the mouse over the arrows leading from the very first operator (Table Scan or Clustered/Nonclustered Index Scan/Seek). What can we observe?

Simply counting rows does not require any data to be read from the data pages (except when values need to be tested in order to apply query restrictions).


What about that "neat trick"?

A proposed method of speeding up row counts that has been suggested by some members of the community from time to time, is to replace the asterisk as the argument of the COUNT function with a constant. This – believe it or not – is supposed to lower the I/O for the query. Frankly, I've found that suggestion somewhat amusing if not illogical, but was curious enough to put it to a test.

The two queries above using the "trick" look like this:

  • Query #6 – using no restrictions (i.e. counting all rows);
    select count(1)
     from dbo.Contact
    and
  • Query #7 – using a restriction (i.e. counting rows corresponding to specific criteria).
    select count(1)
     from dbo.Contact
     where (dbo.Contact.FirstName = N'Karl')
Query Performance
(on a heap)
Result
(row count)
CPU Reads
#6 68 5203 19972
#7 76 5203 22

An equal number of reads, obviously, and a slight difference between CPU times compared to previous results for the heap.

Query Performance
(clustered primary key)
Result
(row count)
CPU Reads
#6 64 5208 19972
#7 84 5208 22

Again, an equal number of reads, but this time it seems more processing is required when using the clustered primary key. Is the optimizer translating the constant?

Query Performance
(cl. PK + noncl. FirstName)
Result
(row count)
CPU Reads
#6 2 59 19972
#7 0 2 22

Once more, an equal number of reads, but now it seems less processing is required when the nonclustered index is used.

The results are pretty inconclusive if you ask me, and the execution plans are the same as before:

One thing is certain, though, select count(1) does not perform any better than select count(*). And I've tried using other constants as well – no change. Is that another myth busted? Can anyone confirm this or prove otherwise?

As for me, I'm staying with Asterix. ;) And next time we'll look at another important fact regarding aggregate functions.


ML


*1 FYI: Destroying the primary key in this particular case results in an increased number of reads (5218).