In contrast to Unidirectional synchronisation (discussed in a previous post) also better described as source-to-target data propagation, the purpose of Full synchronisation is to consolidate data at both locations – following the meaning of the word 'synchronisation' more closely: resulting in current historical events (e.g. business transactions) at both locations.
From Greek: syn- (together [with]), chronos (time) – synchronos (at the same time, in the same period).
The rules sound fairly simple:
- Insert all rows from one location to the other if they don't already exist at the destination;
- Update the rows at one location with the values from the other location that were changed at a later time.
What about deleted rows? Deleting rows from one location that do not also exist at the other might sound right but would quite obviously be in conflict with the insert rule. Each synchronisation would simply result in rows being un-deleted. One quite simple alternative is to use a Deleted flag – e.g. a bit column for which the value 1 designates a 'deleted' row and the default value of 0 a 'regular' row. In addition the delete operation could be prevented altogether through the use of an INSTEAD OF DELETE trigger.
As far as the synchronisation is concerned the Deleted
flag allows the update operation to handle 'deleted' rows: rows flagged as deleted at one location are in turn flagged as deleted at the other.
These two tables represent the two locations:
create table dbo.RemoteSales
(
SaleId int identity (1, 1) not null
,DateChanged datetime not null
,ProductName varchar(128) not null
,CustomerName varchar(128) not null
,SaleDate datetime not null
,Quantity decimal(18,2) not null
,Deleted bit not null
)
go
alter table dbo.RemoteSales
add constraint pk_RemoteSales_SaleDate_ProductName_CustomerName
primary key nonclustered
(
SaleDate
,ProductName
,CustomerName
)
go
create unique clustered index ux_RemoteSales_SaleId
on dbo.RemoteSales
(
SaleId
)
go
alter table dbo.RemoteSales
add constraint df_RemoteSales_SaleDate
default (getdate())
for SaleDate
with values
go
alter table dbo.RemoteSales
add constraint df_RemoteSales_DateChanged
default (getdate())
for DateChanged
with values
go
alter table dbo.RemoteSales
add constraint df_RemoteSales_Deleted
default (0)
for Deleted
with values
go
create table dbo.LocalSales
(
SaleId int identity (1, 1) not null
,DateChanged datetime not null
,ProductName varchar(128) not null
,CustomerName varchar(128) not null
,SaleDate datetime not null
,Quantity decimal(18,2) not null
,Deleted bit not null
)
go
alter table dbo.LocalSales
add constraint pk_LocalSales_SaleDate_ProductName_CustomerName
primary key nonclustered
(
SaleDate
,ProductName
,CustomerName
)
go
create unique clustered index ux_LocalSales_SaleId
on dbo.LocalSales
(
SaleId
)
go
alter table dbo.LocalSales
add constraint df_LocalSales_SaleDate
default (getdate())
for SaleDate
with values
go
alter table dbo.LocalSales
add constraint df_LocalSales_DateChanged
default (getdate())
for DateChanged
with values
go
alter table dbo.LocalSales
add constraint df_LocalSales_Deleted
default (0)
for Deleted
with values
go
Notice the primary key, it's vital to the operation. It's a composite key, spanning three columns (SaleDate
, ProductName
and CustomerName
) that uniquely identify each sale transaction. Of course in a real-life situation the customer name and the product name would not be used here, but rather be represented by foreign keys referencing a table of customers and a table of products respectively.
Now the values:
-- Some initial transactions
insert dbo.RemoteSales
(
DateChanged
,ProductName
,CustomerName
,Quantity
)
select dateadd(w, -5, getdate()) as DateChanged
,'Granny Smith' as ProductName
,'Jim' as CustomerName
,27 as Quantity
union all
select dateadd(w, -4, getdate())
,'Granny Smith'
,'Bob'
,62
union all
select dateadd(w, -4, getdate())
,'Golden delicious'
,'Bob'
,12
go
insert dbo.LocalSales
(
DateChanged
,ProductName
,CustomerName
,Quantity
)
select dateadd(w, -7, getdate()) as DateChanged
,'Granny Smith' as ProductName
,'Sam' as CustomerName
,28 as Quantity
union all
select dateadd(w, -12, getdate())
,'Granny Smith'
,'Lana'
,6
union all
select dateadd(w, -14, getdate())
,'Golden delicious'
,'Rob'
,72
go
-- A couple of conflicting rows
insert dbo.LocalSales
(
DateChanged
,ProductName
,CustomerName
,SaleDate
,Quantity
)
select top 1
getdate() as DateChanged
,dbo.RemoteSales.ProductName as ProductName
,dbo.RemoteSales.CustomerName as CustomerName
,dbo.RemoteSales.SaleDate as SaleDate
,120 as Quantity
from dbo.RemoteSales
order by DateChanged
go
insert dbo.RemoteSales
(
DateChanged
,ProductName
,CustomerName
,SaleDate
,Quantity
,Deleted
)
select getdate() as DateChanged
,dbo.LocalSales.ProductName as ProductName
,dbo.LocalSales.CustomerName as CustomerName
,dbo.LocalSales.SaleDate as SaleDate
,dbo.LocalSales.Quantity as Quantity
,1
from dbo.LocalSales
where (dbo.LocalSales.CustomerName = 'Lana')
go
Before designing the synchronisation script let's take a closer look at the rules:
The insert rule is as simple as possible: all transactions must be stored at both locations.
The update would be difficult without the DateChanged
column. With it the job is also quite simple – 'the latest state prevails'. All latest changes to the rows must exist at both locations – including the Deleted
column values to accurately represent cancelled or annulled transactions.
By using a FULL OUTER JOIN query we can observe the differences between the remote and the local table:
select *
from dbo.RemoteSales
full join dbo.LocalSales
on (dbo.LocalSales.ProductName = dbo.RemoteSales.ProductName)
and (dbo.LocalSales.CustomerName = dbo.RemoteSales.CustomerName)
and (dbo.LocalSales.SaleDate = dbo.RemoteSales.SaleDate)
order by dbo.RemoteSales.DateChanged desc
Some rows exist at one location but not at the other while some rows have been changed at one location but the values not yet propagated to the other.
For the purposes of this rather simplified example we'll leave out any concurrency issues, and perhaps deal with them in another post.
As mentioned the insert part is simple:
- The propagation of values from the Local table to the Remote:
insert dbo.LocalSales
(
DateChanged
,ProductName
,CustomerName
,SaleDate
,Quantity
,Deleted
)
select dbo.RemoteSales.DateChanged as DateChanged
,dbo.RemoteSales.ProductName as ProductName
,dbo.RemoteSales.CustomerName as CustomerName
,dbo.RemoteSales.SaleDate as SaleDate
,dbo.RemoteSales.Quantity as Quantity
,dbo.RemoteSales.Deleted as Deleted
from dbo.RemoteSales
where (not exists(
select *
from dbo.LocalSales
where (dbo.LocalSales.ProductName = dbo.RemoteSales.ProductName)
and (dbo.LocalSales.CustomerName = dbo.RemoteSales.CustomerName)
and (dbo.LocalSales.SaleDate = dbo.RemoteSales.SaleDate)
))
- And the opposite direction:
insert dbo.RemoteSales
(
DateChanged
,ProductName
,CustomerName
,SaleDate
,Quantity
,Deleted
)
select dbo.LocalSales.DateChanged as DateChanged
,dbo.LocalSales.ProductName as ProductName
,dbo.LocalSales.CustomerName as CustomerName
,dbo.LocalSales.SaleDate as SaleDate
,dbo.LocalSales.Quantity as Quantity
,dbo.LocalSales.Deleted as Deleted
from dbo.LocalSales
where (not exists(
select *
from dbo.RemoteSales
where (dbo.RemoteSales.ProductName = dbo.LocalSales.ProductName)
and (dbo.RemoteSales.CustomerName = dbo.LocalSales.CustomerName)
and (dbo.RemoteSales.SaleDate = dbo.LocalSales.SaleDate)
))
By storing the time of data modifications in the DateChanged
column it's also really easy to know which rows need to be updated. Of course the default constraint will only work when inserting data as long as the value is not set by the insert statement, and the value would have to be set explicitly when updating data.
- The propagation of changes from the Local table to the Remote:
update dbo.LocalSales
set dbo.LocalSales.DateChanged = dbo.RemoteSales.DateChanged
,dbo.LocalSales.ProductName = dbo.RemoteSales.ProductName
,dbo.LocalSales.CustomerName = dbo.RemoteSales.CustomerName
,dbo.LocalSales.SaleDate = dbo.RemoteSales.SaleDate
,dbo.LocalSales.Quantity = dbo.RemoteSales.Quantity
,dbo.LocalSales.Deleted = dbo.RemoteSales.Deleted
from dbo.RemoteSales
inner join dbo.LocalSales
on (dbo.LocalSales.ProductName = dbo.RemoteSales.ProductName)
and (dbo.LocalSales.CustomerName = dbo.RemoteSales.CustomerName)
and (dbo.LocalSales.SaleDate = dbo.RemoteSales.SaleDate)
where (dbo.RemoteSales.DateChanged > dbo.LocalSales.DateChanged)
- And the opposite direction:
update dbo.RemoteSales
set dbo.RemoteSales.DateChanged = dbo.LocalSales.DateChanged
,dbo.RemoteSales.ProductName = dbo.LocalSales.ProductName
,dbo.RemoteSales.CustomerName = dbo.LocalSales.CustomerName
,dbo.RemoteSales.SaleDate = dbo.LocalSales.SaleDate
,dbo.RemoteSales.Quantity = dbo.LocalSales.Quantity
,dbo.RemoteSales.Deleted = dbo.LocalSales.Deleted
from dbo.LocalSales
inner join dbo.RemoteSales
on (dbo.RemoteSales.ProductName = dbo.LocalSales.ProductName)
and (dbo.RemoteSales.CustomerName = dbo.LocalSales.CustomerName)
and (dbo.RemoteSales.SaleDate = dbo.LocalSales.SaleDate)
where (dbo.LocalSales.DateChanged > dbo.RemoteSales.DateChanged)
If we now execute the FULL OUTER JOIN query mentioned above we can now see that the data has been synchronised appropriately between the two locations.
What are the typical use cases that could benefit from this example? As mentioned before most real-life scenarios can be quite easily handled by Replication – which, on the other hand, is not available in all editions of SQL Server. Wherever replication cannot be used to synchronise datasources other methods will have to be used, and once again a set-based solution is the simplest to design and will generally outperform any other method.
ML