What do you think?
ML
At the December meeting of SLODUG, the Slovenian Developers User Group, I've presented two new functionalities introduced with SQL Server 2008:
WHERE
clause to the definition of indexes (and statistics) on base tables restrictions can be applied to data sets being indexed, thus introducing several new possibilities, such as:
More on these two subjects (and why they actually represent the same one) is available in the presentation. Clue: Rational Flexibility.
(Or rather the files *will be* available as soon as they've been uploaded to the SLODUG site.)
Stay tuned...
ML
This may be old news, but I haven't said anything about this so far, so this may not be old news to everyone. ;)
SQL Server 2005 SP3 has been released on December 15th 2008. The files (elevating the full version number to 9.00.4035) are available for download:
Before applying SP3 you should be aware of the fact that not all Cumulative Updates (CU) for SQL Server 2005 are included in SP3. More information is available in MSDN Knowledge Base article #959195: Cumulative update package 1 for SQL Server 2005 Service Pack 3.
The feature pack for SQL Server 2005 has also been updated:
The local install version of SQL Server 2005 Books Online has also been updated on December 16th 2008. The files are available for download:
ML
An update to SQL Server 2008 Books Online has just been published yesterday:
ML
The advanced edition of SQL Server 2008 Express is now available for download. As was the case with version 2005 when this edition was first introduced, the Advanced Services include (in addition to the Database Engine, of course):
You can find the installation files and instructions right here:
You can find more information at the SQL Server 2008 Express main page:
ML
I'm sure you know by now that SQL Server 2008 has RTMed about two days ago and is already available for download to MSDN/TechNet subscribers.
There are additional free downloads that you might also be interested in:
As always, sample databases are available at CodePlex:
Before attempting to install SQL 2008, though, you really should read the release notes which might make it easier for you to avoid any unpleasant surprises. Knowing is half the battle.*
ML
* G.I. Joe.
At this year's Microsoft NT Conference in Portorož fellow MVP Andrej Tozon and I have presented a workshop on designing solutions with Visual Studio 2008 and SQL Server 2008 using the Windows Presentation Foundation and a few new SQL Server 2008 features. Andrej has blogged about this last month and has made all the workshop files available on his site.
You might also already know that SQL Server RC0 has been released on June 10th, which has been announced in the SQL Server documentation team blog – amongst several other places. Of course, a new version of Books Online is also available.
There have been a few changes in RC0 regarding the FILESTREAM
feature that I feel I should mention here – given the fact that as a consequence I'll also have to amend the workshop files. However, I think I'd better delay any changes until RTM has been released. Who knows what else I'll have to change... ;)
Long story short, the changes refer to the way FILESTREAM
is enabled, and the new procedure is described in the current version of Books Online:
In addition, there's currently one more problem regarding the use of SQL Server Configuration Manager when enabling FILESTREAM
. It's been reported in the SQL Server Storage Engine blog:
Well, that's about it. For now. But be sure to go through all the Books Online articles regarding FILESTREAM
before attempting to use it in RC0 (in case I missed something).
ML
This may be old news, but it's also the kind of "old news" that's actually becoming more and more relevant these days. ;)
More than a month ago the Microsoft SQL Server documentation team have announced a new Live Search Macro:
Those of you who've found the 'old' one useful, must have been anticipating the 'new' one.
Personally, sometimes I really don't know what I'd do without them – providing a friend in need with a 'live' link to a specific Books Online article has never been easier. :)
ML
Previously we have looked at the supported XQuery/XPath retrieval functions implemented in SQL Server 2005 with a few examples of use, and we have mentioned the fact that some functions are restricted to use in XPath predicates. Today we will see which XQuery/XPath functions are subject to this restriction.
Any attempt at using these restricted functions to retrieve data from the XML would result in the following exception:
Msg 2371, Level 16, State 1, Line 302 XQuery [query()]: '%function%' can only be used within a predicate or XPath selector
First of all, you'll need the XML sample (right-click + Save target) and the XML Schema that we've used in the previous post. Review each script and execute against your favorite testing database.
Of course, the XML Schema script must be executed first.
The purpose of predicates is to restrict the result set returned by the XQuery or the XPath expression. Think of the XPath predicate as being to XQuery as the WHERE clause is to the T-SQL query.
The XPath predicate is an optional part of the XPath expression. It must follow the XPath reference to an XML node and must be enclosed in square brackets. For instance, if we wanted to retrieve the values of the Price
element of all the books in our XML sample, we would use the following XPath expression:
/Books/Book/Price
However, to retrieve only the price for the third book in the sample we would also need the XPath predicate:
/Books/Book[3]/Price
...which is short for:
/Books/Book[position() = 3]/Price
Which, incidentally, brings us to our first predicate function.
NB: *any* supported XQuery/XPath function can be used inside an XPath predicate in SQL Server 2005 XML queries.
position()
To retrieve the title of the third book in our XML we'd use either of these queries:
select Books.Book.query( 'Title' ).value( '.' ,'varchar(80)' ) as TitleOfThirdBook from @Books.nodes('/Books/Book[position() = 3]') Books (Book) select Books.Book.query( 'Title' ).value( '.' ,'varchar(80)' ) as TitleOfThirdBook from @Books.nodes( '/Books/Book[3]' ) Books (Book)
Of course it would also be quite usefull to actually retrieve the position of a given node, wouldn't it? In SQL Server 2005 there's little we can do with the position()
function directly, but by using a table of numbers we can circumvent this limitation:
declare @NumbersOnTheFly table ( Number int primary key ) -- Use the spt_values table in the master database to create a table of numbers insert @NumbersOnTheFly ( Number ) select Number = row_number() over ( order by [name] ) from master.dbo.spt_values select NumbersOnTheFly.Number as Position ,Books.Book.query( 'Book[position() = sql:column("NumbersOnTheFly.Number")]/Title' ).value( '.' ,'varchar(80)' ) as Title from @Books.nodes('/Books') Books (Book) cross join @NumbersOnTheFly NumbersOnTheFly where (NumbersOnTheFly.Number <= Books.Book.query( 'count(Book)' ).value( '.' ,'int' ))
Using the sql:column
function (look it up in Books Online) we can correlate the results of the XQuery with the results of the T-SQL query. The initial correlation is achieved by issuing another XQuery in the WHERE
clause (i.e. using count(Book)
to get the total number of Book
elements), thus restricting the rows of the numbers table to the actual number of nodes targeted by the main XQuery expression passed to the nodes
XML method in the FROM
clause, followed by the "data-driven" query
XML method in the SELECT
statement.
If the position()
XQuery/XPath function was not restricted to predicate use, we could have used something much simpler:
select Books.Book.query( 'position()' ).value( '.' ,'int' ) as Position ,Books.Book.query( 'Title' ).value( '.' ,'varchar(80)' ) as Title from @Books.nodes( '/Books/Book' ) Books (Book)
...not in SQL Server 2005, though.
Take a look at the execution plan for the workaround, and imagine the execution plan if the workaround were not needed. ;)
last()
The last()
XQuery/XPath function is closely connected with the position()
XQuery/XPath function; it returns the index of the last node in the node set returned by a given XPath expression. To retrieve the title of the last book in the XML we can use either of these queries:
select Books.Book.query( 'Title').value( '.' ,'varchar(80)' ) as TitleOfLastBook from @Books.nodes( '/Books/Book[last()]' ) Books (Book) select Books.Book.query( 'Title' ).value( '.' ,'varchar(80)' ) as TitleOfLastBook from @Books.nodes( '/Books/Book[position() = last()]' ) Books (Book)
Why is there no first()
XQuery/XPath function? Think about it: the first index can always be known in advance (= 1). :) On the other hand, knowing the last index would require counting the nodes (e.g. using the count()
function) and additional processing that can be avoided thanks to the last()
function.
So, here they are – the two XQuery/XPath functions the use of which is restricted to XPath predicates. Both of them are essentially in conflict with one of the principal concepts of the set theory, namely that sets are not ordered. In XML, on the other hand, order is preserved (though not guaranteed), so functions like these make sense.
And we've also seen one more use for the table of numbers (for the time being).
Anyway, there are still a few XQuery/XPath functions in SQL Server 2005 that we haven't discussed yet – so stay tuned.
ML
Even after all this time (since the release of SQL Server 2005 which introduced the XML data type) I still notice users having trouble with XQuery/XPath functions when processing (or rather trying to process) XML data in SQL Server, that's why I've decided to dedicate a few more posts to the subject. The ulterior motive here is me learning and testing what (I think) I know. ;)
I've introduced the SQL Server 2005 implementation of XQuery/XPath functions in an earlier post.
In this post we'll be looking at "pure" retrieval functions – in contrast to predicate-only functions. The difference between the former and the latter comes from the limitations emerging from the way the XQuery/XPath standard(s) have been implemented in SQL Server 2005. As most users have already found out (the hard way) the use of some functions is limited to XPath predicates. This time we'll be focusing on those that do not suffer from this limitation. You can read more about the supported XQuery/XPath functions in Books Online. Also note that all retrieval functions can also be used inside XPath predicates.
Have you ever encountered the following exception when using XQuery/XPath functions in SQL Server 2005?
XQuery [query()]: The XQuery syntax '/function()' is not supported.
Take a good look at the examples below – you'll notice that no XPath expression uses the XPath/function()
syntax. Although this is valid syntax for XML queries according to the W3C XQuery/XPath Recommendations it is not currently allowed in SQL Server unless used inside XPath predicates (e.g. this is allowed: XPath[function() = 'value']
). But as mentioned before – we'll discuss predicates later.
Almost. We need an XML sample (right-click + Save target) (built from the pubs
sample database), and – since I assume most of us have left kindergarten a long time ago – an XML Schema that will provide us with what every data type needs: domain integrity.
Remember: untyped XML is synonymous with unnecessary problems.
Oh, and make sure the XML Schema Collection has been created in your favorite testing database *before* attempting to do the rest of this exercise.
number()
, ceiling()
, floor()
, round()
These four functions provide the basic functionality related to numerical data. If an XPath expression is passed as the sole argument to one of these functions then this XPath expression must evaluate to a singleton.
select Books.BooksCollection.query( 'Price' ).value( 'number(.)' ,'money' ) as Price ,Books.BooksCollection.query( 'Price' ).value( 'ceiling(.)' ,'money' ) as CeilingPrice ,Books.BooksCollection.query( 'Price' ).value( 'floor(.)' ,'money' ) as FloorPrice ,Books.BooksCollection.query( 'Price' ).value( 'round(.)' ,'money' ) as RoundPrice from @Books.nodes( '/Books/Book' ) Books (BooksCollection)
A word of caution: absent and/or nilled nodes return an empty sequence that cannot be implicitly converted to numerical data types – hence the use of the number()
function for the Price
column. Try removing the function and observe the consequences.
The number()
function converts the data returned by the XPath expression to a value of type xs:double
(equivalent to float
in SQL Server), and in contrast with the data()
function (discussed further down) properly returns null
to the caller when the XPath expression evaluates to an empty sequence.
Contrary to the W3C XPath Recommendation aggregate XPath functions implemented in SQL Server 2005 do not accept multiple arguments. Only a constant or an XPath expression can thus be passed as the sole argument to any of the aggregate functions, and the XPath expression must evaluate to a singleton or a node set.
sum()
, avg()
, count()
select Books.BooksCollection.query( 'sum(Book/Price)' ).value( '.' ,'money' ) as PriceTotal ,Books.BooksCollection.query( 'avg(Book/Price)' ).value( '.' ,'money' ) as AveragePrice ,Books.BooksCollection.query( 'count(Book/Price)' ).value( '.' ,'int' ) as BookCount from @Books.nodes( '/Books' ) Books (BooksCollection)
Consider the XPath expression used for the BookCount
column. Does it return the actual number of books in the XML sample?
max()
select Books.BooksCollection.query( 'max(Book/Royalty)' ).value( '.' ,'int' ) as LargestRoyalty from @Books.nodes('/Books') Books (BooksCollection)
min()
select Books.BooksCollection.query( 'min(Book/YtdSales)' ).value( '.' ,'int' ) as SmallestYtdSales from @Books.nodes( '/Books' ) Books (BooksCollection)
What can I say about these five functions? Their names pretty much speak for themselves, don't they?
string()
select Books.BooksCollection.query( 'string(Price)' ).value( '.' ,'nvarchar(20)' ) as LocalName from @Books.nodes( '/Books/Book' ) Books (BooksCollection)
Be careful when using the string()
function: when the XPath expression passed as the argument evaluates to a node set, the data from *all* elements of the node set is concatenated into a single string.
string-length()
select Books.BooksCollection.query( 'Title' ).value( '.' ,'nvarchar(80)' ) as Title ,Books.BooksCollection.query( 'string-length(Title)' ).value( '.' ,'int' ) as TitleLength from @Books.nodes( '/Books/Book' ) Books (BooksCollection)
substring()
select Books.BooksCollection.query( 'substring(Title, 4, 12)' ).value( '.' ,'nvarchar(12)' ) as TitleSubstring from @Books.nodes( '/Books/Book' ) Books (BooksCollection)
concat()
The concat()
function is used to concatenate values into a single string. Note that at least two arguments are mandatory, otherwise an exception is raised. Empty sequences (e.g. returned by XPath expressions when no node is matched) are treated as zero-length strings.
select Books.BooksCollection.query( 'concat(Title, " (", @TitleId, ")")' ).value( '.' ,'nvarchar(max)' ) as TitleAndTitleId from @Books.nodes( '/Books/Book' ) Books (BooksCollection)
contains()
I would expect the contains()
function to be restricted to predicate use, but apparently it can also be used to retrieve data. Of course, it either returns true
if the string passed as the second argument is found inside the string passed as the first argument, or false
if it isn't.
select Books.BooksCollection.query( 'Title' ).value( '.' ,'nvarchar(80)' ) as Title ,Books.BooksCollection.query( 'contains(Title, "Computer")' ).value( '.' ,'nvarchar(max)' ) as Contains_Computer_InTitle from @Books.nodes( '/Books/Book' ) Books (BooksCollection)
Warning: if the string passed as the second argument is longer than 4000 characters, contains()
returns an empty sequence. This is yet another limitation of the SQL Server 2005 XQuery/XPath implementation.
SQL Server 2008 introduces two additional XPath functions that are not supported by SQL Server 2005.
lower-case()
and upper-case()
These two functions can make it just a little easier to circumvent live with the fact that XML is case-sensitive.
select Books.Book.query ( 'lower-case(Title)' ).value ( '.' ,'nvarchar(80)' ) as LowerCaseTitle ,Books.Book.query ( 'upper-case(Title)' ).value ( '.' ,'nvarchar(80)' ) as UpperCaseTitle from @books.nodes (' /Books/Book ') Books (Book)
data()
We've mentioned the data()
function before. It returns the data from the node(s) specified in the XPath expression in the appropriate type (if the node is typed). For details see the Remarks section in this Books Online article.
select Books.BooksCollection.query( 'data(@TitleId)' ).value( '.' ,'varchar(6)' ) as TitleId ,Books.BooksCollection.query( 'data(@PubDate)' ).value( '.' ,'datetime' ) as PubDate from @Books.nodes( '/Books/Book' ) Books (BooksCollection)
I've also mentioned the data()
function when discussing the retrieval of XML attributes.
empty()
One might think this function would also be restricted to predicates (testing whether a node is empty or not), but how would we then make a node empty when modifying XML instances?
select Books.BooksCollection.query( 'empty(Price)' ).value( '.' ,'nvarchar(max)' ) as EmptyPrice from @Books.nodes( '/Books/Book' ) Books (BooksCollection)
Books Online also suggests using the empty()
function together with not()
to compensate for the lack of the exists
XPath function. I'll give an example when we discuss predicates.
true()
, false()
Once again, not only are we allowed to test the state of a bolean type node in XML instances, but we can also modify its value using these two functions.
Both are mentioned here only as a reference.
not()
Essential to logical operations, wouldn't you agree?
select Books.BooksCollection.query( 'not(Price)' ).value( '.' ,'nvarchar(max)' ) as NotPrice from @Books.nodes( '/Books/Book' ) Books (BooksCollection)
The query above returns true
for books with no Price
element.
local-name()
Now this is a function that causes many users migraines. But there's nothing actually wrong with it – provided the syntax limitations (mentioned earlier) are taken under consideration.
select Books.BooksCollection.query( 'local-name(.)' ).value( '.' ,'nvarchar(max)' ) as LocalName from @Books.nodes( '/Books/Book[1]/*' ) Books (BooksCollection)
Look closely at the XQuery expression passed to the nodes
function and the ones passed to the query
and value
functions. This is how the node name(s) can be retrieved in SQL Server 2005.
distinct-values()
This can be a very useful function, as demonstrated in another post.
select Books.BooksCollection.query( 'distinct-values(Book/PubId)' ).value( '.' ,'nvarchar(max)' ) as PubId from @Books.nodes( '/Books' ) Books (BooksCollection)
Well, there are five more XQuery/XPath retrieval functions implemented in SQL Server 2005, but they deserve special attention, and we'll discuss them later. For instance, the predicate-only functions.
So, here they are, (most of) the XQuery/XPath functions implemented in SQL Server 2005 with examples of use.
Ergo:
XPath/function()
syntax is not currently supported in SQL Server 2005;ML
p.s. Regarding the BookCount
column in the query demonstrating aggregate functions: count(Book/Price)
only counts Book
elements where the Price
element is present, and does not return the number of *all* Book
elements in the XML sample.
While browsing this blog's usage logs recently, I've noticed quite a few referrals from various search engines of users trying to troubleshoot problems they may have encountered when accessing XML attributes in a SQL Server XQuery.
Although the problem has been discussed in the past, and has even been documented in Books Online, some users out there still aren't familiar with a SQL Server 2005 limitation regarding the querying of XML attributes – and the error message itself isn't of much help either. That's why I've decided to point this problem out in my blog as well to help improve the visibility of this issue.
For example, when querying this XML:
declare @books xml set @books = N'<xml version="1.0"> <Books xsi:noNamespaceSchemaLocation="PubsTitles.xsd"> <Book TitleId="BU1032"PubDate="1991-06-12T00:00:00.000Z"> <Title>The Busy Executive''s Database Guide</Title> <Type>business</Type> <PubId>1389</PubId> <Price>19.9900</Price> <Royalty>10</Royalty> <YtdSales>4095</YtdSales> <Notes>An overview of available database systems with emphasis on common business applications. Illustrated.</Notes> </Book> <Book TitleId="BU1111"PubDate="1991-06-09T00:00:00.000Z"> <Title>Cooking with Computers: Surreptitious Balance Sheets</Title> <Type>business</Type> <PubId>1389</PubId> <Price>11.9500</Price> <Royalty>10</Royalty> <YtdSales>3876</YtdSales> <Notes>Helpful hints on how to use your electronic resources to the best advantage.</Notes> </Book> </Books>'
...one would instinctively formulate the query like this:
select Books.Book.query('Title').value('.', 'varchar(max)') as Title ,Books.Book.query('@TitleId').value('.', 'varchar(6)') as TitleId ,Books.Book.query('@PubDate').value('.', 'datetime') as TitleId from @Books.nodes('/Books/Book') Books (Book)
However, referencing XML attributes this way is not allowed in SQL Server 2005, and this is the error message we would have encountered if we were to follow our instincts:
Msg 2396, Level 16, State 1, Line 26 XQuery [query()]: Attribute may not appear outside of an element
Not quite as eloquent as one would expect, wouldn't you agree?
By using the data()
XQuery function like so:
select Books.Book.query('Title').value('.', 'varchar(80)') as Title ,Books.Book.query('data(@TitleId)').value('.', 'varchar(6)') as TitleId ,Books.Book.query('data(@PubDate)').value('.', 'datetime') as TitleId from @Books.nodes('/Books/Book') Books (Book)
...the query succeeds as expected:
Title TitleId TitleId ---------------------------------------------------- ------- ----------------------- The Busy Executive's Database Guide BU1032 1991-06-12 00:00:00.000 Cooking with Computers: Surreptitious Balance Sheets BU1111 1991-06-09 00:00:00.000
This is true both for typed as well as untyped XML instances, and the function is, of course, documented in Books Online.
ML
p.s. I've decided on posting this example while preparing a more detailed overview of XPath/XQuery functions in SQL Server, which should follow soon.
p.p.s. The overview is divided into two parts:
Counting – what's it all about? I'm sure you've heard about a trick that one should use when aggregating data using the COUNT
system function in Microsoft SQL Server; specifically, how the performance of the function can be improved using a neat trick. But can we go beyond neat...?
The tests have been done on a SQL Server 2005 (SP2) instance; you're all kindly invited to also run them on an instance/version of your own choice.
Let's start with some test data. I've provided a script that will do that for you, provided you have a testing database – which I'm sure you do (yes, tempdb is just fine).
The script creates a five column table and copies data into it from the AdventureWorks database, which I assume is available on the same server instance. If any of the above is not true in your particular case, you can still adapt the script as needed.
We end up with a brand new table containing 19,972 rows of data residing in 5,197 data pages.
We'll be counting rows (what a shock):
select count(*) from dbo.Contactand
select count(*) from dbo.Contact where (dbo.Contact.FirstName = N'Karl')
If we execute the two queries above on our heap (remember, this new table has no keys and no indexes), we get these results:
Query | Performance (on a heap) |
Result (row count) |
|
---|---|---|---|
CPU | Reads | ||
#1 | 76 | 5203 | 19972 |
#1 | 60 | 5203 | 19972 |
#1 | 50 | 5203 | 19972 |
#1 | 90 | 5203 | 19972 |
#1 | 70 | 5203 | 19972 |
#1 | 110 | 5203 | 19972 |
#2 | 62 | 5203 | 22 |
#2 | 50 | 5203 | 22 |
#2 | 50 | 5203 | 22 |
#2 | 81 | 5203 | 22 |
#2 | 60 | 5203 | 22 |
#2 | 70 | 5203 | 22 |
Also take a look at the execution plans (Query #1 execution plan, Query #2 execution plan). Nothing too impressive, but understandable: without any indexes, the optimizer needs to access all the pages in order to get the actual row count. With Query #1 the optimizer has no options but to do a full table scan, and the restriction in Query #2 can't really improve the performance as it requires data to be tested in order to restrict the result set.
Now, let's add the primary key. We'll use the ContactID
column and also make the primary key clustered (which is the default anyway, but I prefer explicit declarations):
alter table dbo.Contact add constraint pk_Contact primary key clustered ( ContactID )
Query | Performance (clustered primary key) |
Result (row count) |
|
---|---|---|---|
CPU | Reads | ||
#1 | 34 | 5208 | 19972 |
#1 | 40 | 5208 | 19972 |
#1 | 70 | 5208 | 19972 |
#1 | 10 | 5208 | 19972 |
#1 | 20 | 5208 | 19972 |
#1 | 30 | 5208 | 19972 |
#2 | 68 | 5208 | 22 |
#2 | 60 | 5208 | 22 |
#2 | 71 | 5208 | 22 |
#2 | 60 | 5208 | 22 |
#2 | 90 | 5208 | 22 |
#2 | 60 | 5208 | 22 |
Ha! The number of reads has actually increased. Note that the clustered primary key actually takes up as many pages as the data itself. On the other hand, for Query #1 the stress on the CPU has decreased, while for Query #2 it seems that the clustered primary key can't really do much – data must be read and tested, and the referenced column is not part of the primary key.
Do you think making it part of the clustered primary key might make a difference? Do you think it would make a favorable difference? I've decided on not including the latter in this test as it affects the data model, and modifying the data model (especially, changing a primary key) just to improve performance is IMO far from being good practice. If you're curious enough you can do it yourself.1
Anyway, as expected (and quite imminent as well), the execution plans are different now (Query #1 execution plan, Query #2 execution plan) compared to before.
Let's see what happens if we add an index. Let's index the column used in the restriction of Query #2 – the FirstName
column:
create nonclustered index x_Contact_FirstName on dbo.Contact ( FirstName )
Query | Performance (cl. PK + noncl. FirstName) |
Result (row count) |
|
---|---|---|---|
CPU | Reads | ||
#1 | 4 | 59 | 19972 |
#1 | 10 | 59 | 19972 |
#1 | 0 | 59 | 19972 |
#1 | 10 | 59 | 19972 |
#1 | 0 | 59 | 19972 |
#1 | 0 | 59 | 19972 |
#2 | 0 | 2 | 22 |
#2 | 0 | 2 | 22 |
#2 | 0 | 2 | 22 |
#2 | 0 | 2 | 22 |
#2 | 0 | 2 | 22 |
#2 | 0 | 2 | 22 |
Well, well, who'd have thought...? The optimizer once again stands up to its name. Let's examine the execution plans (Query #1 execution plan, Query #2 execution plan). No longer is the use of the clustered primary key the optimal choice – the nonclustered index requires far less reading. After all, it resides on only 57 pages, and the optimizer takes full advantage of this fact – on both occasions.
..others fear the asterisk like the plague. However, if we consider the fact that
select count(*)
simply is not in any way functionally equivalent to
select *
then this fear loses most of its substance. I will not go into any details this time of when and why not declaring columns in the SELECT
clause is poor practice.
For comparison let's execute these two queries:
select * from dbo.Contactand
select * from dbo.Contact where (dbo.Contact.FirstName = N'Karl')
The execution plans:
If we now compare the execution plans of these last two queries with the execution plans above, we can observe one significant difference. When viewing the execution plan in SSMS park the mouse over the arrows leading from the very first operator (Table Scan or Clustered/Nonclustered Index Scan/Seek). What can we observe?
Simply counting rows does not require any data to be read from the data pages (except when values need to be tested in order to apply query restrictions).
A proposed method of speeding up row counts that has been suggested by some members of the community from time to time, is to replace the asterisk as the argument of the COUNT
function with a constant. This – believe it or not – is supposed to lower the I/O for the query. Frankly, I've found that suggestion somewhat amusing if not illogical, but was curious enough to put it to a test.
The two queries above using the "trick" look like this:
select count(1) from dbo.Contactand
select count(1) from dbo.Contact where (dbo.Contact.FirstName = N'Karl')
Query | Performance (on a heap) |
Result (row count) |
|
---|---|---|---|
CPU | Reads | ||
#6 | 68 | 5203 | 19972 |
#6 | 81 | 5203 | 19972 |
#6 | 70 | 5203 | 19972 |
#6 | 70 | 5203 | 19972 |
#6 | 50 | 5203 | 19972 |
#6 | 70 | 5203 | 19972 |
#7 | 76 | 5203 | 22 |
#7 | 60 | 5203 | 22 |
#7 | 60 | 5203 | 22 |
#7 | 80 | 5203 | 22 |
#7 | 90 | 5203 | 22 |
#7 | 90 | 5203 | 22 |
An equal number of reads, obviously, and a slight difference between CPU times compared to previous results for the heap.
Query | Performance (clustered primary key) |
Result (row count) |
|
---|---|---|---|
CPU | Reads | ||
#6 | 64 | 5208 | 19972 |
#6 | 60 | 5208 | 19972 |
#6 | 70 | 5208 | 19972 |
#6 | 70 | 5208 | 19972 |
#6 | 80 | 5208 | 19972 |
#6 | 40 | 5208 | 19972 |
#7 | 84 | 5208 | 22 |
#7 | 90 | 5208 | 22 |
#7 | 70 | 5208 | 22 |
#7 | 80 | 5208 | 22 |
#7 | 90 | 5208 | 22 |
#7 | 90 | 5208 | 22 |
Again, an equal number of reads, but this time it seems more processing is required when using the clustered primary key. Is the optimizer translating the constant?
Query | Performance (cl. PK + noncl. FirstName) |
Result (row count) |
|
---|---|---|---|
CPU | Reads | ||
#6 | 2 | 59 | 19972 |
#6 | 10 | 59 | 19972 |
#6 | 0 | 59 | 19972 |
#6 | 0 | 59 | 19972 |
#6 | 0 | 59 | 19972 |
#6 | 0 | 59 | 19972 |
#7 | 0 | 2 | 22 |
#7 | 0 | 2 | 22 |
#7 | 0 | 2 | 22 |
#7 | 0 | 2 | 22 |
#7 | 0 | 2 | 22 |
#7 | 0 | 2 | 22 |
Once more, an equal number of reads, but now it seems less processing is required when the nonclustered index is used.
The results are pretty inconclusive if you ask me, and the execution plans are the same as before:
One thing is certain, though, select count(1)
does not perform any better than select count(*)
. And I've tried using other constants as well – no change. Is that another myth busted? Can anyone confirm this or prove otherwise?
As for me, I'm staying with Asterix. ;) And next time we'll look at another important fact regarding aggregate functions.
ML
*1 FYI: Destroying the primary key in this particular case results in an increased number of reads (5218).