Monday, April 16, 2018

About Indexing Character Fields

I had created a few tables in my SQL Server database like this:
CREATE TABLE ref_status(
  status_code VARCHAR(10) NOT NULL,
  description VARCHAR(80) NOT NULL,
  CONSTRAINT ref_status_pk PRIMARY KEY (status_code)
);
CREATE TABLE stuff (
  stuff_id    INT IDENTITY,
  stuff_date  DATETIME,
  stuff_desc  VARCHAR(80),
  status_code VARCHAR(10),
  CONSTRAINT stuff_pk PRIMARY KEY (stuff_id),
  CONSTRAINT stuff_status_fk FOREIGN KEY (status_code)
    REFERENCES ref_status (status_code)
);
I inserted a few status codes into ref_status, 'NEW', 'IN PROGRESS', and 'PRODUCTION'.  Then the developers got busy, and wrote a user interface to insert data into "stuff".  After testing, there were about 10 rows in "stuff".

Then the lead developer asked me, "Can you update the status codes to make them read 'New', 'In progress', and 'Production' instead?"  "Sure", I said.  Because of the foreign key, I felt I needed to do this in three steps.
  1. Insert new rows into "ref_status"  with the new status codes.
  2. Update "stuff" to change the existing status codes to the values of the new codes.
  3. Delete the old status codes from "ref_status".
But step 1 failed with a duplicate key error.  What's going on?  It turns out that in SQL Server, by default, indexes on character fields are case insensitive.  So 'NEW' and 'New' are duplicate keys (and so would be 'new' or even 'neW').  The correct way to do this was simply:
  1. Update the status_code column in one table.
  2. Update the status_code column in the other table.
It didn't matter which table was updated first, and it didn't cause a foreign key problem because the index on "ref_status" DIDN'T change.

Contrast with Oracle

Of course, the reason my first crack at this failed was that in Oracle - remember that I came from the Oracle world - the indexes are case sensitive.  'NEW' and 'New' are two different keys.  Can you make the indexes case insensitive, like in SQL Server?  Yes, BUT you have to do it with a function-based index.  That means that you can't make a case-insensitive primary key constraint on REF_STATUS.  You have to make an unique function based index like this:
CREATE UNIQUE INDEX ref_status_uk ON
  ref_status
   (NLSSORT (status_code, 'NLS_SORT=BINARY_CI') );
Yet to point a foreign key at REF_STATUS, it needs to have an unique or primary key constraint.  Well, I haven't tried this, but here is what I THINK will work.

Even though, by default, creating an unique constraint will automatically create an unique index by the same name, you CAN create the index first.  Then the unique constraint will use the pre-existing index instead of creating a new one.  So after creating the index as described above, do:
ALTER TABLE ref_status ADD
  CONSTRAINT ref_status_uk UNIQUE (status_code);
That way, Oracle will use the existing index to enforce uniqueness.  This much, I am pretty sure works.  Then, theoretically, you can create the foreign key, STUFF_STATUS_FK from STUFF to REF_STATUS.  If anyone wants to try this, please let me know in the comments if it worked.

No comments:

Post a Comment