Monday, January 27, 2014

Unique Filtered Index

Consider a scenario where you need to create a table in which you need to store contact information, and for one of the columns, we need to store passport number for each contact, if one is available. Assuming that passport number is unique for each contact, but not all have passport, how can we ensure the uniqueness for the passport number enter while allowing null values for those contacts who don't have passport number?

Unique constraint/index would come to mind. However, the problem of using unique constraint/index in this scenario is while unique constraint/index would ensure that the data enter is unique, a column with unique constraint/index can only allow one null value in that column.

A quick demo for this using unique index (but the result should be similar if we use unique constraint as well):

IF (OBJECT_ID('dbo.ContactInfo') IS NOT NULL)
BEGIN
DROP TABLE dbo.ContactInfo
END

CREATE TABLE dbo.ContactInfo (
ContactID INT PRIMARY KEY IDENTITY (1,1),
ContactName VARCHAR(100) NOT NULL,
PassportNumber VARCHAR(25) NULL);

CREATE UNIQUE INDEX UI_PassportNumber
ON dbo.ContactInfo(PassportNumber)

Now let's try to test to make sure that we can't enter duplicate passport number:

INSERT INTO dbo.ContactInfo (ContactName, PassportNumber) VALUES ('John Doe', 'B1234567890');
INSERT INTO dbo.ContactInfo (ContactName, PassportNumber) VALUES ('Jane Doe', 'B1234567890');

We would get the following messages:

(1 row(s) affected)
Msg 2601, Level 14, State 1, Line 2
Cannot insert duplicate key row in object 'dbo.ContactInfo' with unique index 'UI_PassportNumber'. The duplicate key value is (B1234567890).
The statement has been terminated.

Which is to be expected, it would insert the information for John Doe just fine, but then it would failed to insert the information for Jane Doe since Jane Doe’s passport number is the same as John Doe’s.

image

Now let’s test to make sure that we can enter multiple contact info with blank passport number.

INSERT INTO dbo.ContactInfo (ContactName, PassportNumber) VALUES ('Mary Doe', NULL);
INSERT INTO dbo.ContactInfo (ContactName, PassportNumber) VALUES ('Jack Doe', NULL);

We would get the following messages:

(1 row(s) affected)
Msg 2601, Level 14, State 1, Line 2
Cannot insert duplicate key row in object 'dbo.ContactInfo' with unique index 'UI_PassportNumber'. The duplicate key value is (<NULL>).
The statement has been terminated.

It would insert the contact info for Marry Doe just fine, but it would fail when trying to insert the contact info for Jack Doe. This is because the unique constraint can allow only one null value.

image

Possible Solution

So how we can ensure that our passport column can accept null values while ensure uniqueness for the non-null value? With SQL Server 2008 and above, one possibility is by using the unique filtered index. A quick demo for this one:

IF (OBJECT_ID('dbo.ContactInfo') IS NOT NULL)
BEGIN
DROP TABLE dbo.ContactInfo
END

CREATE TABLE dbo.ContactInfo (
ContactID INT PRIMARY KEY IDENTITY (1,1),
ContactName VARCHAR(100) NOT NULL,
PassportNumber VARCHAR(25) NULL);

CREATE UNIQUE INDEX UI_PassportNumber
ON dbo.ContactInfo(PassportNumber)
WHERE PassportNumber IS NOT NULL

Let’s try to see what happen if we try to enter contact info with duplicate passport number:

INSERT INTO dbo.ContactInfo (ContactName, PassportNumber) VALUES ('John Doe', 'B1234567890');
INSERT INTO dbo.ContactInfo (ContactName, PassportNumber) VALUES ('Jane Doe', 'B1234567890');

We would get the following messages:

(1 row(s) affected)
Msg 2601, Level 14, State 1, Line 2
Cannot insert duplicate key row in object 'dbo.ContactInfo' with unique index 'UI_PassportNumber'. The duplicate key value is (B1234567890).
The statement has been terminated.

That works as expected. Only the record for John Doe that got inserted.

image

Now let’s try to see what happen if we try to enter multiple contact info with blank passport number:

INSERT INTO dbo.ContactInfo (ContactName, PassportNumber) VALUES ('Mary Doe', NULL);
INSERT INTO dbo.ContactInfo (ContactName, PassportNumber) VALUES ('Jack Doe', NULL);

We would get the following messages:

(1 row(s) affected)

(1 row(s) affected)


Both records are being inserted just fine, which what we are expecting.

image

Conclusion

We can use unique filtered index as a way to enforce uniqueness on filtered data. In the example above, it is to enforce uniqueness on a not null value on the passport number column. There is one caveat. Unlike unique constraint/index where it can be referenced by a foreign key constraint, unique filtered index can’t be referenced by a foreign key constraint.

So if we continue on from the above examples and try to create a new table named PassportVisit with the following definition:

IF (OBJECT_ID('dbo.PassportVisit') IS NOT NULL)
BEGIN
DROP TABLE dbo.PassportVisit
END

CREATE TABLE dbo.PassportVisit (
PassportNumber VARCHAR(25) FOREIGN KEY REFERENCES dbo.ContactInfo(PassportNumber),
DateofVisit DateTime,
CountryofVisit VARCHAR(50))

So we are trying to reference the passport number column on the passport visit table to the passport number on the contact info table. This will be successful if we have a unique constraint/index. But it will fail if we have unique filtered index. It will fail with the following messages:

Msg 1776, Level 16, State 0, Line 6
There are no primary or candidate keys in the referenced table 'dbo.ContactInfo' that match the referencing column list in the foreign key 'FK__PassportV__Passp__73B0DA39'.
Msg 1750, Level 16, State 0, Line 6
Could not create constraint. See previous errors.