Thursday, August 24, 2017

Code Generation for Lookup Tables - Part 1


In my last post, I outlined some of my thoughts about lookup tables, and how they should be structured and organized. Now I want to write some code generators, to make it easier to create and manage lookup tables.  Since I want lookup tables to have the same columns and follow some naming conventions, why should I write the same basic code every time I create a new lookup table?  While my thoughts about lookup tables should probably apply to just about any relational database, stored procedure code is not as portable.  I work in an Oracle database, so all the code I'm going to show you is written in PL/SQL.  But you may be able to do something similar in the procedural language for your database.

An Index to the Lookup Tables

One thing I like to have is a table to store a list of all the lookup tables in the schema.  This isn't absolutely necessary, if you use a naming convention for your lookup table names, since you could write a view of a catalog view, like Oracle's USER_TABLES.  However, I'm going to create a real table named LOOKUP_TABLES:
CREATE TABLE lookup_tables (
  Table_name VARCHAR2(30),
  Decription VARCHAR2(255)
);

So my code generator is going to insert a row into this table anytime that I create a new lookup table, and delete that row anytime I drop a lookup table.  Once you start using your lookup tables, and start having foreign keys that reference them, you probably aren't going to drop any lookup tables, but you might during development, so I'll include a drop.

The Code Generator Package

In PL/SQL, I tend to write packages for all of my stored procedures, with few or no stand-alone functions or procedures.  While I'm going to show the procedures separately, you may assume that they are all inside a package named LOOKUP_UTIL.  There are two special characteristics of LOOKUP_UTIL.
In all of my databases, I have created a schema named COMMON to hold all database objects that are shared among many application schemas and users.  Most of these have EXECUTE or SELECT granted to PUBLIC, and many have public synonyms.  Since many applications may generate lookup tables, LOOKUP_UTIL would correctly be created in the COMMON schema.  By the way, some lookup tables might be shared between applications – my lookup table for U.S. States and Territories would be a prime example of this.  It might also be best created in COMMON.
 The second characteristic of LOOKUP_UTIL is that it is an invoker's rights package, using the "AUTHID CURRENT_USER" clause.  That means that it executes with the database privileges of the user (or schema) that called it.  It is going to create tables and other database objects – it should only do that in the current user's schema, and using privileges granted to that user, not COMMON, the owner of the package.  Many of the packages in the COMMON schema share this characteristic.

Creating and Dropping Lookup Tables

Here is the code for creating a new lookup table.  It creates the table with the standard columns as I explained in my previous post and adds a row to LOOKUP_TABLES.  The same description to be stored in LOOKUP_TABLES is also used as the table comment.  There are a few things that are here for demonstration purposes that might not be in real code.  For instance, I am using DBMS_OUTPUT to show the generated code in case the code gets an error upon execution.  In the real code, these would probably be in a call to a logging package.  I use LOGGER (https://github.com/OraOpenSource/Logger), and I highly recommend it.  If you have users who create lookup tables in schemas other than their own, with the CREATE ANY privilege, you may want to add an "owner" parameter.
PROCEDURE new_lookup_table (
  table_name        lookup_tables.table_name%TYPE,
  description       lookup_tables.description%TYPE DEFAULT NULL,
  tablespace_name   user_tablespaces.tablespace_name%TYPE DEFAULT NULL
)
AS
  v_table_name  lookup_tables.table_name%TYPE;
  create_cmd    VARCHAR2 (2000);
BEGIN
  IF table_name IS NULL THEN
    raise_application_error (
      -20000,
      'Table name of new lookup table may not be null.'
    );
  ELSIF UPPER (table_name) LIKE 'LKUP_%' THEN
    v_table_name  := UPPER (table_name);
  ELSE
    v_table_name  := 'LKUP_' || SUBSTR (UPPER (table_name), 1, 25);
  END IF;

  BEGIN
    INSERT INTO   lookup_tables (table_name, description)
         VALUES   (v_table_name, description);
  EXCEPTION
    WHEN DUP_VAL_ON_INDEX THEN
      raise_application_error (
        -20002,
        'Lookup table named ' || v_table_name || ' already exists.'
      );
  END;

  create_cmd      :=
       'CREATE TABLE '
    || v_table_name
    || '(
  code              VARCHAR2(10) NOT NULL,
  short_description VARCHAR2(30),
  description       VARCHAR2(255),
  display_sequence  NUMBER,
  active            CHAR(1) DEFAULT ''Y'' NOT NULL,
  CONSTRAINT '
    || SUBSTR (v_table_name, 1, 27)
    || '_PK PRIMARY KEY (code))'
    || '
ORGANIZATION INDEX'
    || CASE
         WHEN tablespace_name IS NOT NULL THEN
           '
TABLESPACE ' || tablespace_name
         ELSE
           NULL
       END;

  BEGIN
    EXECUTE IMMEDIATE create_cmd;
  EXCEPTION
    WHEN OTHERS THEN
      DBMS_OUTPUT.put_line (create_cmd);
      RAISE;
  END;

  EXECUTE IMMEDIATE
    'COMMENT ON TABLE ' || v_table_name || ' IS ''' || description || '''';
END new_lookup_table;

The code for dropping a lookup table is similar.  Note that my version includes the CASCADE CONSTRAINTS clause, so that if there are foreign keys referencing the lookup table being dropped, those constraints will be dropped too.  You might prefer not to use this clause and allow the drop to fail if there are references to the table.
PROCEDURE drop_lookup_table (table_name lookup_tables.table_name%TYPE)
IS
  v_table_name  lookup_tables.table_name%TYPE;
  drop_cmd      VARCHAR2 (2000);
BEGIN
  IF table_name IS NULL THEN
    raise_application_error (
      -20001,
      'Table name of lookup table to be dropped may not be null.'
    );
  ELSIF UPPER (table_name) LIKE 'LKUP_%' THEN
    v_table_name  := UPPER (table_name);
  ELSE
    v_table_name  := 'LKUP_' || SUBSTR (UPPER (table_name), 1, 25);
  END IF;

  DELETE FROM   lookup_tables
        WHERE   UPPER (table_name) = v_table_name;

  IF SQL%ROWCOUNT = 0 THEN
    raise_application_error (
      -20003,
         'Table named '
      || v_table_name
      || ' does not exist in the lookup table catalog.'
    );
  END IF;

  drop_cmd  := 'DROP TABLE ' || v_table_name || ' CASCADE CONSTRAINTS';

  BEGIN
    EXECUTE IMMEDIATE drop_cmd;
  EXCEPTION
    WHEN OTHERS THEN
      DBMS_OUTPUT.put_line (drop_cmd);
      RAISE;
  END;
END drop_lookup_table;

Next Time

That's enough generation code for now.  I'll continue in my next post with some reasons for simulating a single table of tables with a view.  I'll show you the code for generating that view, and for generating an API to make the view updatable.

Tuesday, August 8, 2017

Lookup Tables Revisited

More than ten years ago, I wrote a paper and did an ODTUG presentation about reference tables.  These are also known as lookup tables or code tables.  Some of what I wrote then I would still say today, but other parts were probably wrong then and definitely wrong now.  So, I thought I'd revisit the topic and try to lay out what I think about this kind of data today.
The most common use of a lookup table is for presenting users with a dropdown or list of values (LOV) page for setting a value in the application.  So in its simplest form, it is a table with just two columns, a code, which would be the primary key, and a description to be shown to the user in the list of values.  For instance, you might have a lookup for the States of the United States, where the U. S. Postal Services two character official State abbreviation is the code.  The row for my home State of Virginia would contain: CODE=VA and DESCRIPTION=Virginia.  In my ADDRESS table, I might have a column called STATE that would be set from this lookup.
Here are a few principles that I want reference/lookup/code tables to follow.  These are not hard and fast rules, and I’m sure there are some valid exceptions, but unless I can define a specific exception for a particular use case, I like to follow these.

Lookups as Foreign Keys

When you create a table with a column to be set from a lookup table, define a foreign key (FK) constraint.  That means that lookup tables need to be actual tables.  This is a major departure from my former opinion that there should be a single lookup table that has a key to subdivide it into separate lookups.  There are some nice advantages to the single lookup table, but I am currently working with an application that has one, and the disadvantages outweigh the advantages.  In my next post, I’ll talk about some ways to get the best of both worlds, but we start with one table for each list of values.
If you create the FK constraint, you guarantee that the value MUST come from the lookup table.  No other value will work.  This is important for validating your data.  It gives front-end developers information about which lookup table to use for their list of values.  And it also provides information to code generators, like the ones for generating JPA Entity objects in Eclipse and NetBeans, or the one for generating ADF Business Components in JDeveloper.  Some front-end developers might complain that this removes the flexibility for users to enter values that are not in the lookup tables.  My feeling is that you can give users that flexibility, if you include the ability to add a new value to the lookup table as part of your user interface.  That way the FK is still satisfied, and if one user needed a value that wasn’t in the table, probably other users need that value too.

Structure of Lookup Tables

Just because each lookup gets its own table, doesn’t mean that each lookup table gets a distinct structure.  Instead, I want every lookup table to have the same columns in the same order.  Beyond the basic CODE and DESCRIPTION columns, I like to add:
  • DISPLAY_SEQUENCE – the order in which this value is to appear in a dropdown or LOV.  There might be a unique key constraint on this column, but there doesn’t need to be.  In my State lookup table, I use the U.S. government’s numeric state codes, like 51 for Virginia, for this field.
  • ACTIVE – a Boolean value that says whether this value is currently available for use in a dropdown or LOV.  In my databases, this is a ‘Y’ for yes, or ‘N’ for no, but you may prefer 1 or 0, and some non-Oracle databases allow an actual boolean true or false.  In any case, the default for ACTIVE is true.  This is needed because of the FKs pointing to the lookup table.  As time goes on, you may no longer need a value in the table, but there may still be old data that references it.  So you can’t delete rows from the lookup table.  You need a way to remove it from your user interface, without actually deleting it.
  • Sometimes, I need a short DESCRIPTION for dropdowns, but a longer one for LOVs and reports. So I often add a SHORT_DESCRIPTION column.
While I’m a big fan of surrogate primary keys for most tables in an application, I prefer to make CODE a short natural key.  It is usually a character field.  For instance, take my example of ‘VA’ as the code for Virginia in my STATE lookup.  When CODE is meaningful, you can look at the data table that references the lookup, and know what it is referencing without necessarily having to look at the lookup table.  This saves a lot of time.  This is not possible with surrogate keys because they are by definition meaningless.  It also helps when you move the lookup table from database to database – I’ve seen bad things happen when new surrogate keys get generated by accident in the new database.
Lookup tables are usually fairly small.  Notice that in my standard definition, I have five columns at most.  My State lookup table with about 60 rows (It includes US Territories as well as States) is larger than most – they average at about 10 rows.  Access to a lookup table is almost always by CODE, the primary key.  In Oracle, this makes lookup tables ideal candidates to be Index-Organized tables, tables where the data is stored in the Primary Key index, instead of in separate storage.  Less I/O means faster.

Naming Standards

It doesn’t matter much what your naming standard is, but it does matter that you have one.  One standard that I've seen work well is to prefix all lookup tables with the same prefix, like "REF_" or "LKUP_".  That makes it obvious that this is, in fact, a lookup table, and will make them all sort together in lists of your tables.  Standard naming is especially useful if you want to write some code generators, which I'll talk about in my next post.  I also want standard names for columns, but occasionally, I'll write a VIEW of a lookup table just so that I can give some different names to columns.  Remember that the DISPLAY_SEQUENCE of my State lookup table is being set to the numeric code for the State, and the CODE is the Postal Abbreviation.  So I write:
CREATE VIEW state_codes (postal_abbrev, state_code, state_name) AS
  SELECT code, display_sequence, description
    FROM lkup_state
   WHERE active = 'Y';
This is a little easier for the developers, but still keeps the underlying standard naming.  Since it is a view of just one table and all not null columns are included, you can even do DML on the view.

Lookup Data as Source Code

Finally, lookup tables should normally be populated as part of creating the application database.  There would normally be no such thing as an empty lookup table.  Therefore, I usually consider the data for the lookups as much a part of the source code for the application as the DDL for creating the database.  That means that lookup data should be in a source code version control system such as Git or Subversion – whatever you use for the rest of your source code.  But this poses a special problem – lookup data is not static.  New rows are added occasionally, formerly ACTIVE rows are updated to be inactive, and sometimes the descriptions change.  It isn't just the developers who make these changes.  You will often give certain users the ability to maintain the lookup tables.  So how do you check those changes into your source code repository?
One way is to run a regular job to look for changes, export the data in a known standard form in a working copy of your repository, and then commit the changes.  Or you could be a little more sophisticated, and use a trigger to queue a job only when it is needed.  Some people don't actually store lookup tables in the database.  Instead they use external tables where that actual data is in files outside of the database.  If those files are in a working copy, you can make changes and commit those changes as you would for any other source code.  As long as you are aware that lookup data is part of the source code, you will come up with a way to handle it that works for you.

Conclusion

Lookup tables are an important part of your application database.  I hope that my re-thinking about them has helped you too.  Next time, I'll have a little code for generating an updatable view which can help your developers create a user interface for maintaining your lookup tables.