I'm always forgetting all of the nuances of creating a temporal table (my preferred way) in SQL, so I figured I'd put this post here.
We're going to take an existing table, and produce the same table in a temporal version. That is, we're going to produce a version that allows us to determine that state of the data at any time in the past.
There are some nuances here, which I'll hopefully address in later posts: we're only converting a single table. This is only tested on SQL Server 2008 or later. And the history isn't 100% - once a row has been created, then modified, then any enquiry for before the first modification will always return the initial row. I think this can be fixed, but I'll mull on it further
First, we present the table we want to produce a temporal version of:
create table dbo.T1 ( ID int not null, Val1 varchar(10) not null, Val2 varchar(10) null, constraint PK_T1 PRIMARY KEY (ID) )
Now, we introduce the equivalent Backing table for the temporal equivalent:
create table dbo._T2 ( ID int not null, Val1 varchar(10) null, Val2 varchar(10) null, Deleted bit not null, ValidFrom datetime2 null, ValidTo datetime2 null, constraint UQ_T2_ValidFrom UNIQUE CLUSTERED (ID,ValidFrom), constraint UQ_T2_ValidTo UNIQUE (ID,ValidTo), constraint FK_T2_Prev FOREIGN KEY (ID,ValidFrom) references dbo._T2 (ID,ValidTo), constraint FK_T2_Next FOREIGN KEY (ID,ValidTo) references dbo._T2 (ID,ValidFrom), constraint CK_T2_NoTimeTravel CHECK (ValidFrom < ValidTo), constraint CK_T2_Val1_NotNull CHECK ( Deleted = 1 or Val1 is not null), constraint CK_T2_Deletion CHECK ( Deleted = 0 or Val1 is null) )
There's a lot more going on here - we can no longer use "not null" properties on non-key columns. The unique constraints ensure that, for any ID value, only a single row can have a null ValidFrom or null ValidTo (or possibly both). The foreign key constraints ensure that there is an unbroken chain from each row for an ID value to its immediate predecessor and successor.
These constraints, together, ensure that the history for each ID value is unambiguous. You cannot insert values for a particular ID that have overlapping datetime periods.
We have to replace the "not null" properties with check constraints that apply the same constraints - providing the current value isn't considered to be a deleted row
But what we really want to do is to have a "T2" table that represents the original "T1" table. And we achieve that by implementing a view:
create view dbo.T2 with schemabinding as select ID, ISNULL(Val1,'') as Val1, Val2 from dbo._T2 where ValidTo is null and Deleted = 0 go create unique clustered index PK_T2 on dbo.T2 (ID)
I can't think of the last time I defined a view and didn't choose schemabinding. But it's also required in order to allow indexing to occur. Note that this clustered index precisely resembles the PK/Clustered index on T1. If there were non-clustered indexes on T1, we could replicate the same on T2, and SQL Server would manage all of the required updates in the background
Note, also, that we've used "ISNULL" to restore the non-nullness of the Val1 column, in the metadata. We know, ourselves, that no null value is possible (based on the combination of the deleted column, and the check constraints), but SQL Server is unable to derive this information
Next, we want to be able to use our time machine. What was T2 like at some point in the past?
create function dbo.T2_At ( @At datetime2 ) returns table as return (select ID, ISNULL(Val1,'') as Val1, Val2 from dbo._T2 where (ValidTo is null or ValidTo > @At) and (ValidFrom is null or ValidFrom < @At) and Deleted = 0 )
Finally though, we want to enable external applications to be able to treat T2 as if it were a table (specifically, as if it were T1):
create trigger T_T2_I on dbo.T2 instead of insert as set nocount on ;with Dup as ( select i.ID,i.Val1,i.Val2,t.Dupl,CASE WHEN t2.ID is null THEN 0 ELSE 1 END as Existing from inserted i left join dbo._T2 t2 on i.ID = t2.ID and t2.Deleted = 1 and t2.ValidTo is null cross join (select 0 union all select 1) t(Dupl) ) merge into dbo._T2 t2 using Dup on t2.ID = Dup.ID and t2.ValidTo is null and Dup.Dupl = 0 when matched then update set ValidTo = SYSDATETIME() when not matched and Dupl = 1 then insert (ID,Val1,Val2,Deleted,ValidFrom) values (Dup.ID,Dup.Val1,Dup.Val2,0,CASE WHEN Dup.Existing=1 THEN SYSDATETIME() END);
The use of merge here is why I said we need to use SQL Server 2008 or later (other non-SQL Server products may also be usable) - we need to be able to UPDATE an existing row (if such exists), and to INSERT a new row, in a single statement (assuming the non-existence of deferred constraints). We're inserting a new row which potentially references a previous deleted row, whilst simultaneously "end dating" the same previous row
Each of the triggers uses a similar pattern. We use a CTE to cross join with a table containing two rows (t), so that we can perform two different actions during the merge - an update of the previous row and an insert of the new row that replaces it
create trigger T_T2_U on dbo.T2 instead of update as set nocount on ;with Dup as ( select i.ID,i.Val1,i.Val2,t.Dupl from inserted i cross join (select 0 union all select 1) t(Dupl) ) merge into dbo._T2 t2 using Dup on t2.ID = Dup.ID and t2.ValidTo is null and Dup.Dupl = 0 when matched then update set ValidTo = SYSDATETIME() when not matched then insert (ID,Val1,Val2,Deleted,ValidFrom) values (Dup.ID,Dup.Val1,Dup.Val2,0,SYSDATETIME()); go create trigger T_T2_D on dbo.T2 instead of delete as set nocount on ;With Dup as ( select d.ID,t.Dupl from deleted d cross join (select 0 union all select 1) t(Dupl) ) merge into dbo._T2 t2 using Dup on t2.ID = Dup.ID and t2.ValidTo is null and Dup.Dupl = 0 when matched then update set ValidTo = SYSDATETIME() when not matched then insert (ID,Deleted,ValidFrom) values (Dup.ID,1,SYSDATETIME());
Finally, let's run some queries, to see the similarities and differences between our two tables, T1 and T2
insert into T1 (ID,Val1,Val2) select 1,'abc','def' union all select 2,'ghi',null union all select 3,'jkl','mno' go insert into T2 (ID,Val1,Val2) select 1,'abc','def' union all select 2,'ghi',null union all select 3,'jkl','mno'
(3 row(s) affected) (3 row(s) affected)
Hardly unexpected
insert into T1 (ID,Val1,Val2) select 1,'pqr',null go insert into T2 (ID,Val1,Val2) select 1,'pqr',null
Msg 2627, Level 14, State 1, Line 1 Violation of PRIMARY KEY constraint 'PK_T1'. Cannot insert duplicate key in object 'dbo.T1'. The statement has been terminated. Msg 2627, Level 14, State 1, Procedure T_T2_I, Line 6 Violation of UNIQUE KEY constraint 'UQ_T2_ValidFrom'. Cannot insert duplicate key in object 'dbo._T2'. The statement has been terminated.
Similar, but not identical
insert into T1 (ID,Val1,Val2) select 4,null,null go insert into T2 (ID,Val1,Val2) select 4,null,null
Msg 515, Level 16, State 2, Line 1 Cannot insert the value NULL into column 'Val1', table 'Flange.dbo.T1'; column does not allow nulls. INSERT fails. The statement has been terminated. Msg 547, Level 16, State 0, Procedure T_T2_I, Line 6 The MERGE statement conflicted with the CHECK constraint "CK_T2_Val1_NotNull". The conflict occurred in database "Flange", table "dbo._T2". The statement has been terminated.
Definitely, a different type of error, but if you care about is error or not, it's alright
update T1 set Val1 = null where ID=3 go update T2 set Val1 = null where ID=3
Msg 515, Level 16, State 2, Line 1 Cannot insert the value NULL into column 'Val1', table 'Flange.dbo.T1'; column does not allow nulls. UPDATE fails. The statement has been terminated. Msg 547, Level 16, State 0, Procedure T_T2_U, Line 6 The MERGE statement conflicted with the CHECK constraint "CK_T2_Val1_NotNull". The conflict occurred in database "Flange", table "dbo._T2". The statement has been terminated.
Again, different error messages, but the same sentiment is working
update T1 set Val2 = 'stu' where ID in (1,2) go update T2 set Val2 = 'stu' where ID in (1,2)
(2 row(s) affected) (2 row(s) affected)
delete from T1 where Val1 = 'ghi' go delete from T2 where Val1 = 'ghi'
(1 row(s) affected) (1 row(s) affected)
select * from T1 select * from T2 select * from _T2
ID | Val1 | Val2 |
---|---|---|
1 | abc | stu |
3 | jkl | mno |
ID | Val1 | Val2 |
---|---|---|
1 | abc | stu |
3 | jkl | mno |
ID | Val1 | Val2 | Deleted | ValidFrom | ValidTo |
---|---|---|---|---|---|
1 | abc | def | 0 | NULL | 2011-05-26 05:53:30.2990000 |
1 | abc | stu | 0 | 2011-05-26 05:53:30.2990000 | NULL |
2 | ghi | NULL | 0 | NULL | 2011-05-26 05:53:30.2990000 |
2 | ghi | stu | 0 | 2011-05-26 05:53:30.2990000 | 2011-05-26 05:53:30.3430000 |
2 | NULL | NULL | 1 | 2011-05-26 05:53:30.3430000 | NULL |
3 | jkl | mno | 0 | NULL | NULL |
It's amazing how many little details are required to make this work.
No comments:
Post a Comment