Friday, September 01, 2006

Optimistic concurrency (with essential corrections)

Optimistic concurrency rests on somewhat wobbly foundations – the presumption that no user will ever be blocked by another user. Which is usually true for single-user solutions while multi-user solutions require a slight correction to this logic. Put another way: with optimistic concurrency we assume that a particular row will only be modified by one user at any given time, and being fully optimistic we do nothing to prevent different users from trying to modify the same row.

(I must admit that the subject of this post has been discussed pretty much everywhere up to the point of almost being disgusting, but I've had this sample laying around for a while and now I've decided on moving it somewhere else rather than permanently removing it from my disk.)

Of course not doing anything to prevent what seems quite probable is not optimism – it's stupidity. On the other hand doing too much (like introducing a sophisticated row-locking mechanism) is also hardly a good idea, especially since elementary cuncurrency management mechanisms already are part of the platform. Basically, what the client application really needs to know before attempting a data modification is:

  • Whether a row already exists (when inserting data); and
  • Whether the data has been modified between the time the local set of data has been retrieved and the time when an attempt is made to commit the changes (when updating and/or deleting data).

Inserting duplicate data is prevented through the implementation of the natural key – a combination of one or more attributes that uniquely identify each entity. If the client application also implements the natural key, then an insert will not even be attempted.

The update is where pure optimism can lead to problems. And the same goes for delete operations. I believe an illustration might help here:

Jim and Bob both work for a car manufacturer. Their job is to manage the bill of materials – keeping a current inventory list of all parts for a particular car model this manufacturer produces.

Today their task is simple: the car's breaking system has been changed slightly by the engineers and as a consequence the bolt with the ID number of B12/06 has been replaced, so now the inventory needs to be modified to reflect the change.

Jim and Bob work well together – they are a true team. However, they have a lot of work to do and sometimes don't have time to coordinate their activities perfectly, and pretty much the same is true for Tom, their boss. Which is not critical as long as they are able to identify any flaws in the organisation of their work and correct them accordingly. (The fact of the matter is that most human activities in general do not have to run perfectly smoothly all the time for the job to get done correctly and in time. Long story short – look up these words in your favorite dictionary: "politics", "strategy" and "tactics".)

Back to our example. Bob has been in the company longer than Jim and he's got a bit more experience, further more – his cousin Hal is an engineer. Bob decides on going straight to the source to find out exactly what changes have been made to the breaking system so that he can modify the inventory list appropriately.

Roughly at the same time Jim passes Tom in the hallway and Tom reminds him of the changes to the car's breaking system and Jim decides to go straight up to the chief engineer's office to get the details. He knows how important breaks are for the safety of the car and – more importantly – the passengers.

Anyway, both Jim and Bob get the information they need, and without missing a beat proceed to fill in the changes. Jim goes back to his office, while Bob simply takes out his PDA and logs on. Jim had the chance to look over the blue prints and he knows the new bolt is a 5x20, Bob talked to Hal and Hal showed him the new bolts so Bob knows the new bolt is a 5x25.

At this very moment both Jim and Bob see the details of the old breaking system – the old bolt is a 3x20, obviously completely wrong for the new breaks, and the last time this value has been changed corresponds to the time before the breaking system has been changed.

Now, in a true contemporary-Hollywood-sci-fi manner time slows down to a halt. This gives us time for a few questions:

  • Should the longer bolt be used or the shorter one?
  • Should Jim wait for Bob?
  • Should Bob go back to his office and talk to Jim?
  • Should Tom be more strict and possibly supervise his subordinates constantly?
  • Should Hal be forbidden to talk to the members of the inventory department?
  • Should the chief engineer never leave his office?

All valid questions. And all completely irrelevant.

The real question is:

Who should be the one to notice the conflicting values – Jim, Bob, Tom, Hal or the owner of a brand new car driving down a steep, slippery hill desperately trying to keep the car on the road?

With optimistic concurrency both Jim and Bob are free to modify the data. The platform itself will keep one of the conflicting transactions on hold (locked out) until the ongoing transaction is completed (either committed or rolled back) if by chance both Jim and Bob attempt to modify the row at the same time. Either way – only one value will eventually exist in the database, and both Jim and Bob will be confident in thinking the correct value was saved and the inventory list is in perfect order. That is, if we blindly follow the wobbly principle of optimistic concurrency mentioned above.

Considering that time stands still we even have time for a rhetorical question: could such obvious conflicts be prevented by an application? Maybe even a philosophical one: can computers think?

Let's focus on the pure mechanics of the situation:

  1. Jim wants to replace the value of 3x20 with the new value of 5x20.
  2. Bob wants to replace the value of 3x20 with the new value of 5x25.

This means that whoever of them commits the modification first will cause the other to be unable to modify the value in accordance with the logical rule above – changing 5x20 to 5x25 or 5x25 to 5x20 was never intended and should not occur. Can it be prevented?

Jim and Bob are intelligent beings: they've both identified the value of 3x20 to be the old one that needs to be replaced; even if all they've actually done is look at the data available in the inventory. They both have the *before* picture, and can imagine the correct outcome of the modification (based on the information they have). The trouble is that – since the server cannot consume both modifications at the same time – one of the guys inevitably ends up looking at the wrong *before* picture: the first successfully committed modification will have changed it. In addition – neither Jim nor Bob are even aware of the other's attempt to modify data. Should they be? Not with optimistic concurrency – attempts are irrelevant.

One possible solution springs to mind – the application should detect the change! But could it? It could detect the change after it has been committed and only after an appropriate mechanism has subsequently notified the application of this change (e.g. by utilizing the SqlDependency Class introduced in .Net 2.0), the attempt itself could not be detected. This, of course, may just as well be too late, because the conflict of two (or more) users does not actually occur at the application tier – it occurs deeper down in this architectural wonder we call a multi-tiered system. Can the discrepancy be detected at the data layer? Well, yes, it can. With a little help from an internal attribute that designates a point in time (I'm using the term "time" here as a reference to an imminent natural occurrence we can all observe but not influence).

A very popular method is to use a timestamp column.

Even after all these years there still seems to be quite a lot of confusion among users regarding the timestamp data type and its implementation in SQL Server.

First of all, the timestamp data type in SQL Server has nothing to do with storing date or time (contrary to the SQL Standard specification of timestamp):

timestamp <> datetime

Second, the behaviour of timestamp is not guaranteed in future releases and using the rowversion data type synonym in data definition language (DDL) statements is suggested. Personally, I even prefer the name – it suits the actual functionality.

How does rowversion work and how does it apply to our example?

The value of the rowversion column changes automatically every time one or more columns are updated. Of course the user does not have to be aware of this column, it's best never to propagate the rowversion values to the user interface at all – they mean nothing to the user. Keeping track of the user who last changed the row and of the time the row was last changed is usually added to provide more user-friendly information on data changes. In the data layer these two attributes could either be managed via a trigger or in the data modification procedures. They serve the purpose of making data modifications more transparent to the user.

In our example the implementation of the rowversion column could be used to prevent concurrency issues and to warn the user of the fact that the value has been changed in the time between the moment of data retrieval and the moment of the attempted modification.

Let's now in true sci-fi fashion unpause the universe and let Bob change the dimensions of bolt number B12/06. Just as he clicks Save on his PDA and successfully commits the transaction, back in the office Jim decides to do the same. But to his surprise the server rejects Jim's update and the application responds with a "concurrency violation" warning. He decides on refreshing the data and immediately notices that the value has been changed by Bob quite recently. He also notices a discrepancy between the new value and the value he'd expected to see, which – if nothing else – serves as a pretty solid foundation for what we humans do best – solve problems.

How was this achieved? Well, let's repeat the two transactions.

After creating our playground using this script open a new connection in your favourite SQL querying tool.

/* Connection 1

Bob's task:

Change B12/06 from 3x20 to 5x25

*/

-- Let's get the current data for Bob:

select dbo.Bolts.BoltNumber
 ,dbo.Bolts.BoltType
 ,dbo.Bolts.LastChangedBy
 ,dbo.Bolts.LastChangedOn
 ,dbo.Bolts.RV
 from dbo.Bolts
 where (dbo.Bolts.BoltNumber = 'B12/06')

-- Copy the RV value. We'll need it later.

In a new connection run the following:

/* Connection 2

Jim's task:

Change B12/06 from 3x20 to 5x20

*/

-- Let's get the current data for Jim:
-- Yes, Jim sees what Bob sees.

select dbo.Bolts.BoltNumber
 ,dbo.Bolts.BoltType
 ,dbo.Bolts.LastChangedBy
 ,dbo.Bolts.LastChangedOn
 ,dbo.Bolts.RV
 from dbo.Bolts
 where (dbo.Bolts.BoltNumber = 'B12/06')

-- You've remembered the RV value, right?
-- We don't need it yet. ;)

Return to connection 1 and let Bob update the row. Do not commit the transaction yet – we'll take the opportunity to check something later on.

/* Connection 1 */
declare @rv binary(8)

set @rv = --paste the RV value here

begin tran

update dbo.Bolts
 set dbo.Bolts.BoltType = '5x25'
  ,dbo.Bolts.LastChangedBy = 'Bob'
  ,dbo.Bolts.LastChangedOn = getdate()
 where (dbo.Bolts.BoltNumber = 'B12/06')
  and (dbo.Bolts.RV = @rv)

if (@@rowcount > 0)
 begin
  print 'No failure'
 end
else
 begin
  print 'Concurrency violation'
 end
go

As expected Bob successfully changes the column.

(1 row(s) affected)
No failure

Let's open a third connection and check a few things.

/* Connection 3*/

There's no point in running this query as it will run indefinitely, because Bob's transaction is still pending:

select *
 from dbo.Bolts

Of course in a real-life situation the transaction would have been committed by now.

Note that the query is not blocked in a SQL 2005 database with the READ_COMMITTED_SHAPSHOT setting turned ON. But that's another story.

The next query is popular with many users that are new to SQL, however, it's only ever useful for reporting purposes – when the report needs to be done quickly rather than reflect actual values:

select *
 from dbo.Bolts with(nolock)

The next query can be used to retrieve rows that are not currently locked by a transaction, which is useful for retrieval operations of individual (unlocked) rows while its use for data aggregation will most probably still yield useless results. But sometimes promptness supersedes accuracy:

select *
 from dbo.Bolts with(readpast)

All table hints are explained here.

Now this next query would certainly satisfy several newsgroup posters asking for a way to navigate the restless sea of locks. It offers a view into the logic behind locks, but that's pretty much all it's good for:

select NoLockBolts.*
 from dbo.Bolts NoLockBolts with(nolock)
 where (not exists (
   select *
    from dbo.Bolts ReadPastBolts with(readpast)
    where (NoLockBolts.BoltNumber = ReadPastBolts.BoltNumber)
   ))

The query returns a list of locked rows. Looks useful, but isn't really.

In SQL Server 2005 (if the database allows snapshot isolation) Jim could be allowed to access data in a snapshot – uninhibitted by Bob's transactions:

-- Set snapshot isolation level
set transaction isolation level snapshot

select *
 from dbo.Bolts

-- Set default isolation level
set transaction isolation level read committed

But that's not what this post is about. We'll have to discuss the implications of snapshot transaction isolation another time.

Let's allow Bob to end his transaction.

/* Connection 1 */
commit tran

If you like you can return to connection 3 and try the queries again.

Now let's give Jim at connection 2 a chance to do his job (or at least try).

/* Connection 2 */
declare @rv binary(8)

set @rv = --paste the RV value here (not the new one - the old one!)

begin tran

update dbo.Bolts
 set dbo.Bolts.BoltType = '5x20'
  ,dbo.Bolts.LastChangedBy = 'Jim'
  ,dbo.Bolts.LastChangedOn = getdate()
 where (dbo.Bolts.BoltNumber = 'B12/06')
  and (dbo.Bolts.RV = @rv)

if (@@rowcount > 0)
 begin
  print 'No failure'
 end
else
 begin
  print 'Concurrency violation'
  -- Now Jim knows that the data has been changed.
 end

What happens at the car manufacturer next would probably be quite interesting to see, but I'll leave that to your imagination. As far as the information management part is concerned by utilising the rowversion column we have not allowed optimism obscure data discrepancies and potentially lead to undesired consequences, but rather to guide data modifications by keeping a logical reference to the state a particular row of data was in prior to the modification attempt.

The rowversion value in this example helped us enforce the two rules mentioned above; based on the information available to both users before attempting modifications the logic was simple: the initial state of the row was fixed in time (logically, not physically) and even though there was a discrepancy between the expected results, both users had to have been treated equally in their potential to actually modify the value (which is the central principle of optimistic concurrency). The server had to follow a very elementary rule: "change A to B", and after A no longer existed the server had to prevent the violation of the rule. Neither the application nor the server "knew" the logical rule in detail (e.g. the initial column values were never compared at the server, only the rowversion value was used as a reference to a point in time in which the user's decision to change the value originated); the computer didn't think – the thinking was left to humans.

What was at work here?

Let's take another look at the principal feature of optimistic concurrency: any user can change any row, since we presume that only a single user at any given time will ever attempt to modify a row.

The above principle needs to be corrected if we consider the probability that the above presumption is violated. And we *should* consider it, especially in a multi-user environment. The correction is: any user can change any row as long as the timeline has not been broken. In other words: any user can modify a row of data if no other user has changed it between the time the first user has retrieved data and the time he attempted the modification.

Since this no longer constitutes *pure* optimism could we then call it *optimal* optimism from now on?

I've mentioned SqlDependency (a .Net 2.0 class of the System.Data.SqlClient namespace) as a way for the application to track changes to the data layer. This functionality may reduce the occurence of concurrency issues even further, but it cannot prevent two or more users attempting to update the same row of data at the same time – conflicts like these must be resolved at the data layer. Why? The situation itself is at its most critical stage at the very moment cuncurrent user actions collide, and this happens on the server, not in the application, so why would the application have to handle them? Nevertheless, if the application has the possibility to reduce the number of (potential) collisions then it should do so, by all means. Well, as long as the corrected principle of optimistic concurrency remains in effect.


ML

No comments: