More than ten years ago, I wrote a paper and did an ODTUG
presentation about reference tables. These are also known as lookup tables or
code tables. Some of what I wrote then I would still say today, but other
parts were probably wrong then and definitely wrong now. So, I thought I'd
revisit the topic and try to lay out what I think about this kind of data
today.
The most common use of a lookup table is for presenting
users with a dropdown or list of values (LOV) page for setting a value in the
application. So in its simplest form, it is a table with just two columns, a
code, which would be the primary key, and a description to be shown to the user
in the list of values. For instance, you might have a lookup for the States of
the United States, where the U. S. Postal Services two character official State
abbreviation is the code. The row for my home State of Virginia would contain:
CODE=VA and DESCRIPTION=Virginia. In my ADDRESS table, I might have a column
called STATE that would be set from this lookup.
Here are a few principles that I want reference/lookup/code
tables to follow. These are not hard and fast rules, and I’m sure there are
some valid exceptions, but unless I can define a specific exception for a
particular use case, I like to follow these.
Lookups as Foreign Keys
When you create a table with a column to be set from a
lookup table, define a foreign key (FK) constraint. That means that lookup
tables need to be actual tables. This is a major departure from my former
opinion that there should be a single lookup table that has a key to subdivide
it into separate lookups. There are some nice advantages to the single lookup
table, but I am currently working with an application that has one, and the
disadvantages outweigh the advantages. In my next post, I’ll talk about some
ways to get the best of both worlds, but we start with one table for each list
of values.
If you create the FK constraint, you guarantee that the
value MUST come from the lookup table. No other value will work. This is
important for validating your data. It gives front-end developers information
about which lookup table to use for their list of values. And it also provides
information to code generators, like the ones for generating JPA Entity objects
in Eclipse and NetBeans, or the one for generating ADF Business Components in
JDeveloper. Some front-end developers might complain that this removes the
flexibility for users to enter values that are not in the lookup tables. My
feeling is that you can give users that flexibility, if you include the ability
to add a new value to the lookup table as part of your user interface. That
way the FK is still satisfied, and if one user needed a value that wasn’t in
the table, probably other users need that value too.
Structure of Lookup Tables
Just because each lookup gets its own table, doesn’t mean
that each lookup table gets a distinct structure. Instead, I want every lookup
table to have the same columns in the same order. Beyond the basic CODE and DESCRIPTION
columns, I like to add:
- DISPLAY_SEQUENCE – the order in which this value is to appear in a dropdown or LOV. There might be a unique key constraint on this column, but there doesn’t need to be. In my State lookup table, I use the U.S. government’s numeric state codes, like 51 for Virginia, for this field.
- ACTIVE – a Boolean value that says whether this value is currently available for use in a dropdown or LOV. In my databases, this is a ‘Y’ for yes, or ‘N’ for no, but you may prefer 1 or 0, and some non-Oracle databases allow an actual boolean true or false. In any case, the default for ACTIVE is true. This is needed because of the FKs pointing to the lookup table. As time goes on, you may no longer need a value in the table, but there may still be old data that references it. So you can’t delete rows from the lookup table. You need a way to remove it from your user interface, without actually deleting it.
- Sometimes, I need a short DESCRIPTION for dropdowns, but a longer one for LOVs and reports. So I often add a SHORT_DESCRIPTION column.
While I’m a big fan of surrogate primary keys for most
tables in an application, I prefer to make CODE a short natural key. It is
usually a character field. For instance, take my example of ‘VA’ as the code
for Virginia in my STATE lookup. When CODE is meaningful, you can look at the
data table that references the lookup, and know what it is referencing without
necessarily having to look at the lookup table. This saves a lot of time.
This is not possible with surrogate keys because they are by definition
meaningless. It also helps when you move the lookup table from database to
database – I’ve seen bad things happen when new surrogate keys get generated by
accident in the new database.
Lookup tables are usually fairly small. Notice that in my
standard definition, I have five columns at most. My State lookup table with
about 60 rows (It includes US Territories as well as States) is larger than
most – they average at about 10 rows. Access to a lookup table is almost
always by CODE, the primary key. In Oracle, this makes lookup tables ideal
candidates to be Index-Organized tables, tables where the data is stored in the
Primary Key index, instead of in separate storage. Less I/O means faster.
Naming Standards
It doesn’t matter much what your naming standard is, but it
does matter that you have one. One standard that I've seen work well is to
prefix all lookup tables with the same prefix, like "REF_" or
"LKUP_". That makes it obvious that this is, in fact, a lookup
table, and will make them all sort together in lists of your tables. Standard
naming is especially useful if you want to write some code generators, which
I'll talk about in my next post. I also want standard names for columns, but
occasionally, I'll write a VIEW of a lookup table just so that I can give some
different names to columns. Remember that the DISPLAY_SEQUENCE of my State
lookup table is being set to the numeric code for the State, and the CODE is
the Postal Abbreviation. So I write:
CREATE VIEW
state_codes (postal_abbrev, state_code, state_name) AS
SELECT code,
display_sequence, description
FROM
lkup_state
WHERE
active = 'Y';
This is a little easier for the developers, but still keeps
the underlying standard naming. Since it is a view of just one table and all
not null columns are included, you can even do DML on the view.
Lookup Data as Source Code
Finally, lookup tables should normally be populated as part
of creating the application database. There would normally be no such thing as
an empty lookup table. Therefore, I usually consider the data for the lookups
as much a part of the source code for the application as the DDL for creating
the database. That means that lookup data should be in a source code version
control system such as Git or Subversion – whatever you use for the rest of
your source code. But this poses a special problem – lookup data is not
static. New rows are added occasionally, formerly ACTIVE rows are updated to
be inactive, and sometimes the descriptions change. It isn't just the
developers who make these changes. You will often give certain users the
ability to maintain the lookup tables. So how do you check those changes into
your source code repository?
One way is to run a regular job to look for changes, export
the data in a known standard form in a working copy of your repository, and
then commit the changes. Or you could be a little more sophisticated, and use
a trigger to queue a job only when it is needed. Some people don't actually
store lookup tables in the database. Instead they use external tables where
that actual data is in files outside of the database. If those files are in a
working copy, you can make changes and commit those changes as you would for
any other source code. As long as you are aware that lookup data is part of
the source code, you will come up with a way to handle it that works for you.
Conclusion
Lookup tables are an important part of your application
database. I hope that my re-thinking about them has helped you too. Next
time, I'll have a little code for generating an updatable view which can help
your developers create a user interface for maintaining your lookup tables.
No comments:
Post a Comment