Thursday 26 May 2011

Temporal table (101)

I'm always forgetting all of the nuances of creating a temporal table (my preferred way) in SQL, so I figured I'd put this post here.

We're going to take an existing table, and produce the same table in a temporal version. That is, we're going to produce a version that allows us to determine that state of the data at any time in the past.

There are some nuances here, which I'll hopefully address in later posts: we're only converting a single table. This is only tested on SQL Server 2008 or later. And the history isn't 100% - once a row has been created, then modified, then any enquiry for before the first modification will always return the initial row. I think this can be fixed, but I'll mull on it further

First, we present the table we want to produce a temporal version of:

create table dbo.T1 (
 ID int not null,
 Val1 varchar(10) not null,
 Val2 varchar(10) null,
 constraint PK_T1 PRIMARY KEY (ID)
)

Now, we introduce the equivalent Backing table for the temporal equivalent:

create table dbo._T2 (
 ID int not null,
 Val1 varchar(10) null,
 Val2 varchar(10) null,
 
 Deleted bit not null,
 ValidFrom datetime2 null,
 ValidTo datetime2 null,
 constraint UQ_T2_ValidFrom UNIQUE CLUSTERED (ID,ValidFrom),
 constraint UQ_T2_ValidTo UNIQUE (ID,ValidTo),
 constraint FK_T2_Prev FOREIGN KEY (ID,ValidFrom) references dbo._T2 (ID,ValidTo),
 constraint FK_T2_Next FOREIGN KEY (ID,ValidTo) references dbo._T2 (ID,ValidFrom),
 constraint CK_T2_NoTimeTravel CHECK (ValidFrom < ValidTo),
 constraint CK_T2_Val1_NotNull CHECK (
  Deleted = 1 or Val1 is not null),
 constraint CK_T2_Deletion CHECK (
  Deleted = 0 or Val1 is null)
)

There's a lot more going on here - we can no longer use "not null" properties on non-key columns. The unique constraints ensure that, for any ID value, only a single row can have a null ValidFrom or null ValidTo (or possibly both). The foreign key constraints ensure that there is an unbroken chain from each row for an ID value to its immediate predecessor and successor.

These constraints, together, ensure that the history for each ID value is unambiguous. You cannot insert values for a particular ID that have overlapping datetime periods.

We have to replace the "not null" properties with check constraints that apply the same constraints - providing the current value isn't considered to be a deleted row

But what we really want to do is to have a "T2" table that represents the original "T1" table. And we achieve that by implementing a view:

create view dbo.T2
with schemabinding
as
 select
  ID,
  ISNULL(Val1,'') as Val1,
  Val2
 from
  dbo._T2
 where
  ValidTo is null and
  Deleted = 0
go
create unique clustered index PK_T2 on dbo.T2 (ID)

I can't think of the last time I defined a view and didn't choose schemabinding. But it's also required in order to allow indexing to occur. Note that this clustered index precisely resembles the PK/Clustered index on T1. If there were non-clustered indexes on T1, we could replicate the same on T2, and SQL Server would manage all of the required updates in the background

Note, also, that we've used "ISNULL" to restore the non-nullness of the Val1 column, in the metadata. We know, ourselves, that no null value is possible (based on the combination of the deleted column, and the check constraints), but SQL Server is unable to derive this information

Next, we want to be able to use our time machine. What was T2 like at some point in the past?

create function dbo.T2_At (
 @At datetime2
)
returns table
as
 return (select
  ID,
  ISNULL(Val1,'') as Val1,
  Val2
 from
  dbo._T2
 where
  (ValidTo is null or ValidTo > @At) and
  (ValidFrom is null or ValidFrom < @At) and
  Deleted = 0
 )

Finally though, we want to enable external applications to be able to treat T2 as if it were a table (specifically, as if it were T1):

create trigger T_T2_I
on dbo.T2
instead of insert
as
 set nocount on
 ;with Dup as (
  select i.ID,i.Val1,i.Val2,t.Dupl,CASE WHEN t2.ID is null THEN 0 ELSE 1 END as Existing
  from
   inserted i
    left join
   dbo._T2 t2
    on
     i.ID = t2.ID and
     t2.Deleted = 1 and
     t2.ValidTo is null
    cross join
   (select 0 union all select 1) t(Dupl)
 )
 merge into dbo._T2 t2
 using Dup on t2.ID = Dup.ID and t2.ValidTo is null and Dup.Dupl = 0
 when matched then update set ValidTo = SYSDATETIME()
 when not matched and Dupl = 1 then
  insert (ID,Val1,Val2,Deleted,ValidFrom)
  values (Dup.ID,Dup.Val1,Dup.Val2,0,CASE WHEN Dup.Existing=1 THEN SYSDATETIME() END);

The use of merge here is why I said we need to use SQL Server 2008 or later (other non-SQL Server products may also be usable) - we need to be able to UPDATE an existing row (if such exists), and to INSERT a new row, in a single statement (assuming the non-existence of deferred constraints). We're inserting a new row which potentially references a previous deleted row, whilst simultaneously "end dating" the same previous row

Each of the triggers uses a similar pattern. We use a CTE to cross join with a table containing two rows (t), so that we can perform two different actions during the merge - an update of the previous row and an insert of the new row that replaces it

create trigger T_T2_U
on dbo.T2
instead of update
as
 set nocount on
 ;with Dup as (
  select i.ID,i.Val1,i.Val2,t.Dupl
  from
   inserted i
    cross join
   (select 0 union all select 1) t(Dupl)
 )
 merge into dbo._T2 t2
 using Dup on t2.ID = Dup.ID and t2.ValidTo is null and Dup.Dupl = 0
 when matched then update set ValidTo = SYSDATETIME()
 when not matched then
  insert (ID,Val1,Val2,Deleted,ValidFrom)
  values (Dup.ID,Dup.Val1,Dup.Val2,0,SYSDATETIME());
go
create trigger T_T2_D
on dbo.T2
instead of delete
as
 set nocount on
 ;With Dup as (
  select d.ID,t.Dupl
  from
   deleted d
    cross join
   (select 0 union all select 1) t(Dupl)
 )
 merge into dbo._T2 t2
 using Dup on t2.ID = Dup.ID and t2.ValidTo is null and Dup.Dupl = 0
 when matched then update set ValidTo = SYSDATETIME()
 when not matched then
  insert (ID,Deleted,ValidFrom)
  values (Dup.ID,1,SYSDATETIME());

Finally, let's run some queries, to see the similarities and differences between our two tables, T1 and T2

insert into T1 (ID,Val1,Val2)
select 1,'abc','def' union all
select 2,'ghi',null union all
select 3,'jkl','mno'
go
insert into T2 (ID,Val1,Val2)
select 1,'abc','def' union all
select 2,'ghi',null union all
select 3,'jkl','mno'
(3 row(s) affected)

(3 row(s) affected)

Hardly unexpected

insert into T1 (ID,Val1,Val2)
select 1,'pqr',null
go
insert into T2 (ID,Val1,Val2)
select 1,'pqr',null
Msg 2627, Level 14, State 1, Line 1
Violation of PRIMARY KEY constraint 'PK_T1'. Cannot insert duplicate key in object 'dbo.T1'.
The statement has been terminated.
Msg 2627, Level 14, State 1, Procedure T_T2_I, Line 6
Violation of UNIQUE KEY constraint 'UQ_T2_ValidFrom'. Cannot insert duplicate key in object 'dbo._T2'.
The statement has been terminated.

Similar, but not identical

insert into T1 (ID,Val1,Val2)
select 4,null,null
go
insert into T2 (ID,Val1,Val2)
select 4,null,null
Msg 515, Level 16, State 2, Line 1
Cannot insert the value NULL into column 'Val1', table 'Flange.dbo.T1'; column does not allow nulls. INSERT fails.
The statement has been terminated.
Msg 547, Level 16, State 0, Procedure T_T2_I, Line 6
The MERGE statement conflicted with the CHECK constraint "CK_T2_Val1_NotNull". The conflict occurred in database "Flange", table "dbo._T2".
The statement has been terminated.

Definitely, a different type of error, but if you care about is error or not, it's alright

update T1 set Val1 = null where ID=3
go
update T2 set Val1 = null where ID=3
Msg 515, Level 16, State 2, Line 1
Cannot insert the value NULL into column 'Val1', table 'Flange.dbo.T1'; column does not allow nulls. UPDATE fails.
The statement has been terminated.
Msg 547, Level 16, State 0, Procedure T_T2_U, Line 6
The MERGE statement conflicted with the CHECK constraint "CK_T2_Val1_NotNull". The conflict occurred in database "Flange", table "dbo._T2".
The statement has been terminated.

Again, different error messages, but the same sentiment is working

update T1 set Val2 = 'stu' where ID in (1,2)
go
update T2 set Val2 = 'stu' where ID in (1,2)
(2 row(s) affected)

(2 row(s) affected)
delete from T1 where Val1 = 'ghi'
go
delete from T2 where Val1 = 'ghi'
(1 row(s) affected)

(1 row(s) affected)
select * from T1
select * from T2
select * from _T2
IDVal1Val2
1abcstu
3jklmno
IDVal1Val2
1abcstu
3jklmno
IDVal1Val2DeletedValidFromValidTo
1abcdef0NULL2011-05-26 05:53:30.2990000
1abcstu02011-05-26 05:53:30.2990000NULL
2ghiNULL0NULL2011-05-26 05:53:30.2990000
2ghistu02011-05-26 05:53:30.29900002011-05-26 05:53:30.3430000
2NULLNULL12011-05-26 05:53:30.3430000NULL
3jklmno0NULLNULL

It's amazing how many little details are required to make this work.

No comments:

Post a Comment