Friday 27 May 2011

Temporal Tables - Foreign Keys - Part 1 of ...

Once you start building Temporal Tables in your database, you inevitably start thinking about foreign key constraints - how can they work?


There are so many variations available. For today's post, we'll build a non-temporal table (T2) that references a temporal table (T1/_T1). We want that, if there's a referencing row in T2, that deleting the corresponding row from T1 should not be allowed. Nor should it be possible to insert a row into T2 if there's no corresponding row in T1 - either the row doesn't exist, or it has never existed.


We'll build T1 in an identical fashion to how we built T2 in the previous post. I'll include the code for it at the end of this post also.


First, we have to augment our backing table slightly - we need something that can uniquely distinguish Active, undeleted rows. We're looking for something we can apply a unique index against (together with ID) so that it can be the target of the foreign key constraint. ValidTo, by itself, isn't suitable, since the row we'd want to target has a null value - and nulls in foreign keys mean the constraint isn't checked.


My first thought when building this is to add a computed column:


--This DOES NOT WORK:
alter table dbo._T1 add Active as CASE WHEN ValidTo is null and Deleted = 0 THEN 1 ELSE 0 END persisted

The reason that this isn't going to work is when we go to create the index. We'd like to only add rows with Active=1 to the index (there are going to be lots with Active=0), which sounds like a job for a filtered index. But filtered indexes can't be the target of a foreign key constraint. So we need something that's going to be unique across all rows:


alter table dbo._T1 add ActiveEndDate as CASE WHEN ValidTo is null and Deleted = 0 THEN CONVERT(datetime2,'99991231',112) ELSE ValidTo END persisted
go
create unique index UQ_T1_Active on dbo._T1 (ID,ActiveEndDate)

We have to specify the conversion code, because otherwise, SQL Server thinks that it's non-deterministic - it doesn't analyse the string, nor convert it to datetime once - it performs this conversion each time the computed column is computed.


Now, we can introduce our non-temporal table that references T1:


create table dbo.T2 (
ID int not null,
T1ID int not null,
Val3 varchar(10) not null,
T1Active as CONVERT(datetime2,'99991231',112) persisted,
constraint PK_T2 PRIMARY KEY (ID),
constraint FK_T2_T1 FOREIGN KEY (T1ID,T1Active) references dbo._T1 (ID,ActiveEndDate)
)

Relatively straightforward - other than having to add a computed column here (we could have used a plain column and check constraint, but either way, we need the same fixed value that's going to be present in T1 for active rows.


Finally, we can use T1 and T2... as if nothing odd was happening here


insert into dbo.T1(ID,Val1,Val2)
select 1,'abc','def' union all
select 2,'ghi',null
go
update dbo.T1 set Val1 = Val1 + '_ex'
go
insert into dbo.T2 (ID,T1ID,Val3)
select 1,2,'jkl'
go
update dbo.T1 set Val1 = SUBSTRING(Val1,1,3) + '_ex2'
go
delete from dbo.T1 where ID = 1

Everything works as expected - we can delete the row where ID=1, because it's not being referenced. But this provokes an error:


delete from dbo.T1 where ID = 2

Msg 547, Level 16, State 0, Procedure T_T1_D, Line 6
The MERGE statement conflicted with the REFERENCE constraint "FK_T2_T1". The conflict occurred in database "Flange", table "dbo.T2".
The statement has been terminated.

As does:


insert into dbo.T2 (ID,T1ID,Val3)
select 2,3,'mno'

Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the FOREIGN KEY constraint "FK_T2_T1". The conflict occurred in database "Flange", table "dbo._T1".
The statement has been terminated.

And...


insert into dbo.T2 (ID,T1ID,Val3)
select 3,1,'pqr'

Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the FOREIGN KEY constraint "FK_T2_T1". The conflict occurred in database "Flange", table "dbo._T1".
The statement has been terminated.

So, this is all working as expected.




The setup code for T1 (I've not included the historic function at this time, since we don't need it here)


create table dbo._T1 (
ID int not null,
Val1 varchar(10) null,
Val2 varchar(10) null,

Deleted bit not null,
ValidFrom datetime2 null,
ValidTo datetime2 null,
constraint UQ_T1_ValidFrom UNIQUE CLUSTERED (ID,ValidFrom),
constraint UQ_T1_ValidTo UNIQUE (ID,ValidTo),
constraint FK_T1_Prev FOREIGN KEY (ID,ValidFrom) references dbo._T1 (ID,ValidTo),
constraint FK_T1_Next FOREIGN KEY (ID,ValidTo) references dbo._T1 (ID,ValidFrom),
constraint CK_T1_NoTimeTravel CHECK (ValidFrom < ValidTo),
constraint CK_T1_Val1_NotNull CHECK (
Deleted = 1 or Val1 is not null),
constraint CK_T1_Deletion CHECK (
Deleted = 0 or Val1 is null)
)
go
create view dbo.T1
with schemabinding
as
select
ID,
ISNULL(Val1,'') as Val1,
Val2
from
dbo._T1
where
ValidTo is null and
Deleted = 0
go
create unique clustered index PK_T1 on dbo.T1 (ID)
go
create trigger T_T1_I
on dbo.T1
instead of insert
as
set nocount on
;with Dup as (
select i.ID,i.Val1,i.Val2,t.Dupl,CASE WHEN t1.ID is null THEN 0 ELSE 1 END as Existing
from
inserted i
left join
dbo._T1 t1
on
i.ID = t1.ID and
t1.Deleted = 1 and
t1.ValidTo is null
cross join
(select 0 union all select 1) t(Dupl)
)
merge into dbo._T1 t1
using Dup on t1.ID = Dup.ID and t1.ValidTo is null and Dup.Dupl = 0
when matched then update set ValidTo = SYSDATETIME()
when not matched and Dupl = 1 then
insert (ID,Val1,Val2,Deleted,ValidFrom)
values (Dup.ID,Dup.Val1,Dup.Val2,0,CASE WHEN Dup.Existing=1 THEN SYSDATETIME() END);
go
create trigger T_T1_U
on dbo.T1
instead of update
as
set nocount on
;with Dup as (
select i.ID,i.Val1,i.Val2,t.Dupl
from
inserted i
cross join
(select 0 union all select 1) t(Dupl)
)
merge into dbo._T1 t1
using Dup on t1.ID = Dup.ID and t1.ValidTo is null and Dup.Dupl = 0
when matched then update set ValidTo = SYSDATETIME()
when not matched then
insert (ID,Val1,Val2,Deleted,ValidFrom)
values (Dup.ID,Dup.Val1,Dup.Val2,0,SYSDATETIME());
go
create trigger T_T1_D
on dbo.T1
instead of delete
as
set nocount on
;With Dup as (
select d.ID,t.Dupl
from
deleted d
cross join
(select 0 union all select 1) t(Dupl)
)
merge into dbo._T1 t1
using Dup on t1.ID = Dup.ID and t1.ValidTo is null and Dup.Dupl = 0
when matched then update set ValidTo = SYSDATETIME()
when not matched then
insert (ID,Deleted,ValidFrom)
values (Dup.ID,1,SYSDATETIME());
go

Thursday 26 May 2011

Temporal table (101)

I'm always forgetting all of the nuances of creating a temporal table (my preferred way) in SQL, so I figured I'd put this post here.

We're going to take an existing table, and produce the same table in a temporal version. That is, we're going to produce a version that allows us to determine that state of the data at any time in the past.

There are some nuances here, which I'll hopefully address in later posts: we're only converting a single table. This is only tested on SQL Server 2008 or later. And the history isn't 100% - once a row has been created, then modified, then any enquiry for before the first modification will always return the initial row. I think this can be fixed, but I'll mull on it further

First, we present the table we want to produce a temporal version of:

create table dbo.T1 (
 ID int not null,
 Val1 varchar(10) not null,
 Val2 varchar(10) null,
 constraint PK_T1 PRIMARY KEY (ID)
)

Now, we introduce the equivalent Backing table for the temporal equivalent:

create table dbo._T2 (
 ID int not null,
 Val1 varchar(10) null,
 Val2 varchar(10) null,
 
 Deleted bit not null,
 ValidFrom datetime2 null,
 ValidTo datetime2 null,
 constraint UQ_T2_ValidFrom UNIQUE CLUSTERED (ID,ValidFrom),
 constraint UQ_T2_ValidTo UNIQUE (ID,ValidTo),
 constraint FK_T2_Prev FOREIGN KEY (ID,ValidFrom) references dbo._T2 (ID,ValidTo),
 constraint FK_T2_Next FOREIGN KEY (ID,ValidTo) references dbo._T2 (ID,ValidFrom),
 constraint CK_T2_NoTimeTravel CHECK (ValidFrom < ValidTo),
 constraint CK_T2_Val1_NotNull CHECK (
  Deleted = 1 or Val1 is not null),
 constraint CK_T2_Deletion CHECK (
  Deleted = 0 or Val1 is null)
)

There's a lot more going on here - we can no longer use "not null" properties on non-key columns. The unique constraints ensure that, for any ID value, only a single row can have a null ValidFrom or null ValidTo (or possibly both). The foreign key constraints ensure that there is an unbroken chain from each row for an ID value to its immediate predecessor and successor.

These constraints, together, ensure that the history for each ID value is unambiguous. You cannot insert values for a particular ID that have overlapping datetime periods.

We have to replace the "not null" properties with check constraints that apply the same constraints - providing the current value isn't considered to be a deleted row

But what we really want to do is to have a "T2" table that represents the original "T1" table. And we achieve that by implementing a view:

create view dbo.T2
with schemabinding
as
 select
  ID,
  ISNULL(Val1,'') as Val1,
  Val2
 from
  dbo._T2
 where
  ValidTo is null and
  Deleted = 0
go
create unique clustered index PK_T2 on dbo.T2 (ID)

I can't think of the last time I defined a view and didn't choose schemabinding. But it's also required in order to allow indexing to occur. Note that this clustered index precisely resembles the PK/Clustered index on T1. If there were non-clustered indexes on T1, we could replicate the same on T2, and SQL Server would manage all of the required updates in the background

Note, also, that we've used "ISNULL" to restore the non-nullness of the Val1 column, in the metadata. We know, ourselves, that no null value is possible (based on the combination of the deleted column, and the check constraints), but SQL Server is unable to derive this information

Next, we want to be able to use our time machine. What was T2 like at some point in the past?

create function dbo.T2_At (
 @At datetime2
)
returns table
as
 return (select
  ID,
  ISNULL(Val1,'') as Val1,
  Val2
 from
  dbo._T2
 where
  (ValidTo is null or ValidTo > @At) and
  (ValidFrom is null or ValidFrom < @At) and
  Deleted = 0
 )

Finally though, we want to enable external applications to be able to treat T2 as if it were a table (specifically, as if it were T1):

create trigger T_T2_I
on dbo.T2
instead of insert
as
 set nocount on
 ;with Dup as (
  select i.ID,i.Val1,i.Val2,t.Dupl,CASE WHEN t2.ID is null THEN 0 ELSE 1 END as Existing
  from
   inserted i
    left join
   dbo._T2 t2
    on
     i.ID = t2.ID and
     t2.Deleted = 1 and
     t2.ValidTo is null
    cross join
   (select 0 union all select 1) t(Dupl)
 )
 merge into dbo._T2 t2
 using Dup on t2.ID = Dup.ID and t2.ValidTo is null and Dup.Dupl = 0
 when matched then update set ValidTo = SYSDATETIME()
 when not matched and Dupl = 1 then
  insert (ID,Val1,Val2,Deleted,ValidFrom)
  values (Dup.ID,Dup.Val1,Dup.Val2,0,CASE WHEN Dup.Existing=1 THEN SYSDATETIME() END);

The use of merge here is why I said we need to use SQL Server 2008 or later (other non-SQL Server products may also be usable) - we need to be able to UPDATE an existing row (if such exists), and to INSERT a new row, in a single statement (assuming the non-existence of deferred constraints). We're inserting a new row which potentially references a previous deleted row, whilst simultaneously "end dating" the same previous row

Each of the triggers uses a similar pattern. We use a CTE to cross join with a table containing two rows (t), so that we can perform two different actions during the merge - an update of the previous row and an insert of the new row that replaces it

create trigger T_T2_U
on dbo.T2
instead of update
as
 set nocount on
 ;with Dup as (
  select i.ID,i.Val1,i.Val2,t.Dupl
  from
   inserted i
    cross join
   (select 0 union all select 1) t(Dupl)
 )
 merge into dbo._T2 t2
 using Dup on t2.ID = Dup.ID and t2.ValidTo is null and Dup.Dupl = 0
 when matched then update set ValidTo = SYSDATETIME()
 when not matched then
  insert (ID,Val1,Val2,Deleted,ValidFrom)
  values (Dup.ID,Dup.Val1,Dup.Val2,0,SYSDATETIME());
go
create trigger T_T2_D
on dbo.T2
instead of delete
as
 set nocount on
 ;With Dup as (
  select d.ID,t.Dupl
  from
   deleted d
    cross join
   (select 0 union all select 1) t(Dupl)
 )
 merge into dbo._T2 t2
 using Dup on t2.ID = Dup.ID and t2.ValidTo is null and Dup.Dupl = 0
 when matched then update set ValidTo = SYSDATETIME()
 when not matched then
  insert (ID,Deleted,ValidFrom)
  values (Dup.ID,1,SYSDATETIME());

Finally, let's run some queries, to see the similarities and differences between our two tables, T1 and T2

insert into T1 (ID,Val1,Val2)
select 1,'abc','def' union all
select 2,'ghi',null union all
select 3,'jkl','mno'
go
insert into T2 (ID,Val1,Val2)
select 1,'abc','def' union all
select 2,'ghi',null union all
select 3,'jkl','mno'
(3 row(s) affected)

(3 row(s) affected)

Hardly unexpected

insert into T1 (ID,Val1,Val2)
select 1,'pqr',null
go
insert into T2 (ID,Val1,Val2)
select 1,'pqr',null
Msg 2627, Level 14, State 1, Line 1
Violation of PRIMARY KEY constraint 'PK_T1'. Cannot insert duplicate key in object 'dbo.T1'.
The statement has been terminated.
Msg 2627, Level 14, State 1, Procedure T_T2_I, Line 6
Violation of UNIQUE KEY constraint 'UQ_T2_ValidFrom'. Cannot insert duplicate key in object 'dbo._T2'.
The statement has been terminated.

Similar, but not identical

insert into T1 (ID,Val1,Val2)
select 4,null,null
go
insert into T2 (ID,Val1,Val2)
select 4,null,null
Msg 515, Level 16, State 2, Line 1
Cannot insert the value NULL into column 'Val1', table 'Flange.dbo.T1'; column does not allow nulls. INSERT fails.
The statement has been terminated.
Msg 547, Level 16, State 0, Procedure T_T2_I, Line 6
The MERGE statement conflicted with the CHECK constraint "CK_T2_Val1_NotNull". The conflict occurred in database "Flange", table "dbo._T2".
The statement has been terminated.

Definitely, a different type of error, but if you care about is error or not, it's alright

update T1 set Val1 = null where ID=3
go
update T2 set Val1 = null where ID=3
Msg 515, Level 16, State 2, Line 1
Cannot insert the value NULL into column 'Val1', table 'Flange.dbo.T1'; column does not allow nulls. UPDATE fails.
The statement has been terminated.
Msg 547, Level 16, State 0, Procedure T_T2_U, Line 6
The MERGE statement conflicted with the CHECK constraint "CK_T2_Val1_NotNull". The conflict occurred in database "Flange", table "dbo._T2".
The statement has been terminated.

Again, different error messages, but the same sentiment is working

update T1 set Val2 = 'stu' where ID in (1,2)
go
update T2 set Val2 = 'stu' where ID in (1,2)
(2 row(s) affected)

(2 row(s) affected)
delete from T1 where Val1 = 'ghi'
go
delete from T2 where Val1 = 'ghi'
(1 row(s) affected)

(1 row(s) affected)
select * from T1
select * from T2
select * from _T2
IDVal1Val2
1abcstu
3jklmno
IDVal1Val2
1abcstu
3jklmno
IDVal1Val2DeletedValidFromValidTo
1abcdef0NULL2011-05-26 05:53:30.2990000
1abcstu02011-05-26 05:53:30.2990000NULL
2ghiNULL0NULL2011-05-26 05:53:30.2990000
2ghistu02011-05-26 05:53:30.29900002011-05-26 05:53:30.3430000
2NULLNULL12011-05-26 05:53:30.3430000NULL
3jklmno0NULLNULL

It's amazing how many little details are required to make this work.

Tuesday 24 May 2011

Not fit for purpose

In my previous post, I alluded to a need to get newspapers to follow a legally defined route. And this gets us to another problem in modern UK media - the PCC doesn't fit *any* useful purpose.

Quite apart from the fact that they rarely, if ever, actually criticize the papers under their control, we also have the issue that there are papers *not* under their control (such as all of those in the Desmond franchise these days), and that they believe that only those people directly referenced in an article are able to be offended by it (such as the infamous Jan Moir incident).

Privacy/Freedom of the press

I'm not sure where to start here. Obviously, there's something broken with the UK privacy laws.

I've started wondering whether there's a suitable system that meets the right to privacy *and* the right to free speech also. What I ended up thinking about recently was a "secrets registry". This may be a little tricky to explain, and it may still not fit the bill for those seeking privacy.

The idea would be that, if a particular "secret" was due to be published, that the person(s) involved would be able to postpone that publication - for a *short* period of time (currently thinking about 7 days). They would also be able to register:

a) A written statement (length to be confirmed - there would have to be limits based on the proposal below)
b) the same statement in other forms (audio/video/etc)

We'd also need the law to be changed. My proposal would be that, if any media outlet wished to report the story (after the blackout period is over), they would have to carry the statement(s) from each involved party, at least as prominently as any story they run. There's various obvious gotchas here. I'd think that if an outlet wanted to publish stories on multiple days, they shouldn't be required to print the statement every day. OTOH, they ought to have to publish the official statement every so often (say, every week).

Then, we get to the concept of prominence. I think this is (reasonably) easy to solve. If you wish to publish an article on page 1, then the statement must have been recently published on page 1. If the article appears at the top of the page, then the statement must have been recently published at the top of the same (or an earlier) page. Basically, we need to craft these rules to avoid obvious dodges (such as publishing one story at the bottom of page 19, with the statement, to allow them to run a follow up story the day after on page 1, without the statement)

We also need to consider how the non-written media deal with these stories. This is why I proposed that additional materials could be provided alongside the written statement. I'd propose that, if material has been provided that fits the medium (or can be so converted) that it be used. Only if no such material was available, would it be appropriate for actors to re-render the written statement, with some requirement that the actors have a neutral voice for reading the written statement.

Obviously, this is all a pipe-dream - people aren't going to change overnight. But any improvements available?