Tuesday, August 8, 2017

Lookup Tables Revisited

More than ten years ago, I wrote a paper and did an ODTUG presentation about reference tables.  These are also known as lookup tables or code tables.  Some of what I wrote then I would still say today, but other parts were probably wrong then and definitely wrong now.  So, I thought I'd revisit the topic and try to lay out what I think about this kind of data today.
The most common use of a lookup table is for presenting users with a dropdown or list of values (LOV) page for setting a value in the application.  So in its simplest form, it is a table with just two columns, a code, which would be the primary key, and a description to be shown to the user in the list of values.  For instance, you might have a lookup for the States of the United States, where the U. S. Postal Services two character official State abbreviation is the code.  The row for my home State of Virginia would contain: CODE=VA and DESCRIPTION=Virginia.  In my ADDRESS table, I might have a column called STATE that would be set from this lookup.
Here are a few principles that I want reference/lookup/code tables to follow.  These are not hard and fast rules, and I’m sure there are some valid exceptions, but unless I can define a specific exception for a particular use case, I like to follow these.

Lookups as Foreign Keys

When you create a table with a column to be set from a lookup table, define a foreign key (FK) constraint.  That means that lookup tables need to be actual tables.  This is a major departure from my former opinion that there should be a single lookup table that has a key to subdivide it into separate lookups.  There are some nice advantages to the single lookup table, but I am currently working with an application that has one, and the disadvantages outweigh the advantages.  In my next post, I’ll talk about some ways to get the best of both worlds, but we start with one table for each list of values.
If you create the FK constraint, you guarantee that the value MUST come from the lookup table.  No other value will work.  This is important for validating your data.  It gives front-end developers information about which lookup table to use for their list of values.  And it also provides information to code generators, like the ones for generating JPA Entity objects in Eclipse and NetBeans, or the one for generating ADF Business Components in JDeveloper.  Some front-end developers might complain that this removes the flexibility for users to enter values that are not in the lookup tables.  My feeling is that you can give users that flexibility, if you include the ability to add a new value to the lookup table as part of your user interface.  That way the FK is still satisfied, and if one user needed a value that wasn’t in the table, probably other users need that value too.

Structure of Lookup Tables

Just because each lookup gets its own table, doesn’t mean that each lookup table gets a distinct structure.  Instead, I want every lookup table to have the same columns in the same order.  Beyond the basic CODE and DESCRIPTION columns, I like to add:
  • DISPLAY_SEQUENCE – the order in which this value is to appear in a dropdown or LOV.  There might be a unique key constraint on this column, but there doesn’t need to be.  In my State lookup table, I use the U.S. government’s numeric state codes, like 51 for Virginia, for this field.
  • ACTIVE – a Boolean value that says whether this value is currently available for use in a dropdown or LOV.  In my databases, this is a ‘Y’ for yes, or ‘N’ for no, but you may prefer 1 or 0, and some non-Oracle databases allow an actual boolean true or false.  In any case, the default for ACTIVE is true.  This is needed because of the FKs pointing to the lookup table.  As time goes on, you may no longer need a value in the table, but there may still be old data that references it.  So you can’t delete rows from the lookup table.  You need a way to remove it from your user interface, without actually deleting it.
  • Sometimes, I need a short DESCRIPTION for dropdowns, but a longer one for LOVs and reports. So I often add a SHORT_DESCRIPTION column.
While I’m a big fan of surrogate primary keys for most tables in an application, I prefer to make CODE a short natural key.  It is usually a character field.  For instance, take my example of ‘VA’ as the code for Virginia in my STATE lookup.  When CODE is meaningful, you can look at the data table that references the lookup, and know what it is referencing without necessarily having to look at the lookup table.  This saves a lot of time.  This is not possible with surrogate keys because they are by definition meaningless.  It also helps when you move the lookup table from database to database – I’ve seen bad things happen when new surrogate keys get generated by accident in the new database.
Lookup tables are usually fairly small.  Notice that in my standard definition, I have five columns at most.  My State lookup table with about 60 rows (It includes US Territories as well as States) is larger than most – they average at about 10 rows.  Access to a lookup table is almost always by CODE, the primary key.  In Oracle, this makes lookup tables ideal candidates to be Index-Organized tables, tables where the data is stored in the Primary Key index, instead of in separate storage.  Less I/O means faster.

Naming Standards

It doesn’t matter much what your naming standard is, but it does matter that you have one.  One standard that I've seen work well is to prefix all lookup tables with the same prefix, like "REF_" or "LKUP_".  That makes it obvious that this is, in fact, a lookup table, and will make them all sort together in lists of your tables.  Standard naming is especially useful if you want to write some code generators, which I'll talk about in my next post.  I also want standard names for columns, but occasionally, I'll write a VIEW of a lookup table just so that I can give some different names to columns.  Remember that the DISPLAY_SEQUENCE of my State lookup table is being set to the numeric code for the State, and the CODE is the Postal Abbreviation.  So I write:
CREATE VIEW state_codes (postal_abbrev, state_code, state_name) AS
  SELECT code, display_sequence, description
    FROM lkup_state
   WHERE active = 'Y';
This is a little easier for the developers, but still keeps the underlying standard naming.  Since it is a view of just one table and all not null columns are included, you can even do DML on the view.

Lookup Data as Source Code

Finally, lookup tables should normally be populated as part of creating the application database.  There would normally be no such thing as an empty lookup table.  Therefore, I usually consider the data for the lookups as much a part of the source code for the application as the DDL for creating the database.  That means that lookup data should be in a source code version control system such as Git or Subversion – whatever you use for the rest of your source code.  But this poses a special problem – lookup data is not static.  New rows are added occasionally, formerly ACTIVE rows are updated to be inactive, and sometimes the descriptions change.  It isn't just the developers who make these changes.  You will often give certain users the ability to maintain the lookup tables.  So how do you check those changes into your source code repository?
One way is to run a regular job to look for changes, export the data in a known standard form in a working copy of your repository, and then commit the changes.  Or you could be a little more sophisticated, and use a trigger to queue a job only when it is needed.  Some people don't actually store lookup tables in the database.  Instead they use external tables where that actual data is in files outside of the database.  If those files are in a working copy, you can make changes and commit those changes as you would for any other source code.  As long as you are aware that lookup data is part of the source code, you will come up with a way to handle it that works for you.

Conclusion

Lookup tables are an important part of your application database.  I hope that my re-thinking about them has helped you too.  Next time, I'll have a little code for generating an updatable view which can help your developers create a user interface for maintaining your lookup tables.

No comments:

Post a Comment