JCFlack's Database Development Journal: RDBMS

Showing posts with label RDBMS. Show all posts

Wednesday, October 3, 2018

Source Control for and deployment of your Database

I just read Jeff Smith's latest blog entry about exporting the DDL for your Oracle database with Oracle SQL Developer. You can do a similar export with TOAD - both the original version for Oracle and the version for SQL Server. Other tools have this ability too. Jeff hints at why you might want to use this feature - to put your database under source control.

I have had the experience of walking into a shop where my predecessor database designer/developer had not done this, and I was shocked. The application developers - the Java programmers, the JavaScript framework developers, the UI developers were all very careful to keep their source code in a source code repository, either with Subversion or Git, so I did not understand why the database guy/gal hadn't done the same with the code to create the database objects used by the application. DDL is SOURCE CODE, and it belongs in the source control system. So if you haven't done it already, follow Jeff's instructions (here) or the instructions for doing the same thing with your favorite database IDE and then put the resulting DDL code into your source code repository. If you don't know how to use your shop's source code control tools, ask your fellow developers for help (by the way, this is a nice way to ingratiate yourself with the developers), or ask your favorite search engine. What? Your shop doesn't do source control? Convince the powers that be that this is ESSENTIAL to a good application development project and there are good tools available at no cost. Do it NOW - I'll wait.

There is one more thing that goes in source control: scripts to load data into the look-up and reference tables. These might be SQL scripts with INSERT commands. I do that, and the scripts go in a folder named "Data". Or in Oracle, you might have data files (maybe CSV) with the SQL*Loader control files to load the data. Your database IDE like SQL Developer can help you with this if the data is already in your development database. If reference or look-up tables are ever changed by users, you will need to update this data and check the changes into source control. You need a plan for propagating the latest changes to all environments. The key is that you should consider the data in these tables as if it was DDL.

So now, you have captured all the DDL for the database objects your application uses and put it in source code control. What next? Most of the tools that let you export your DDL, also create a script to run the DDL. This would let you re-create your objects in a new database, such as when it is time to deploy to the test database. We always have at least four environments - development, test, staging (sometimes called pre-production), and production. I have been in shops where the production database is under the control of a production database administrator (DBA), and the development DBA (often me) does not have rights to run DDL in production. The process is that when a production release is authorized, the production DBA does the database deployment. So my deployment script is always thoroughly tested, and written so that even someone who knows NOTHING about the application can run it. If your tool doesn't write a deployment script, you'll need to write it by hand. I often use (in Windows) the DIR command to list the DDL files, spooling the output to a file, then edit the file. And yes, deployment scripts go to source control - mine go in a folder named "Scripts".

So what about database deployment? The initial deployment is easy. The deployment script just runs ALL the DDL and data load scripts. But subsequent deployments are a bit harder. Views and stored procedures can use the CREATE OR REPLACE syntax (or CREATE OR ALTER in SQL Server) Once a table has been created and contains data, you can't just DROP it and CREATE a new version. You need to run ALTER commands to make some changes, and for bigger changes you might need to do something like:
CREATE TABLE my_table_new ...
INSERT INTO my_table_new (...)
SELECT ...
FROM my_table
DROP my_table;
RENAME my_table_new TO my_table;
I will usually name my deployment script for each new database release with the name of the release, like "myapp_2.1_DEPLOY.sql". Since we use JIRA to manage tasks, I will have a script named after the JIRA task that does the changes associated with that task, such as "MYAPP-123.sql". If the changes involve DDL for an object that can use "CREATE OR REPLACE" syntax, MYAPP-123.sql will simply call the new version of the DDL script for that object. If not, MYAPP-123.sql will contain the ALTER (and other) commands needed. When I create the release deployment script, it calls each of the scripts for the JIRA tasks included in that release. All of these go in the Scripts folder. I also tend to update the CREATE scripts for tables, even if deployment will not be recreating the table - that way I can create a new empty database without running all the ALTER commands.

By the way, if you are using Oracle 11gR2 or later, you probably want to use Edition Based Redefinition in your deployment scripts. This is not used as often as it should be, and will let you do your database deployments in production with no downtime.

Hope this inspires you to put you DDL under source control, and give you some ideas how to organize your database deployments.

Thursday, March 29, 2018

More on my Oracle to SQL Server Transition

I'm continuing to adjust to this new world of Microsoft SQL Server. Remembering my great experiences in the Oracle user community, I've joined PASS, which is the international user group for SQL Server users. One thing PASS offers is "SQL Saturday" which are one day (yes, on Saturday) training events held by local groups all over the world. There are a few SQL Saturdays being held somewhere almost every weekend. On March 24, I drove the 100 miles from my home to Glen Allen, VA (northwest of Richmond) for the Richmond SQL Saturday. It was great, I got to meet a few people, and learned at least two things that will be useful in the work I'm doing right now.

About Table Storage and Indexes

Like Oracle, it is very important to have a Primary Key constraint on all of your tables in SQL Server. By default, SQL Server wants to make this a clustered index. "What is a clustered index?" my Oracle expert friends will ask. In a clustered index, the data from the row is stored with the index entry. That means that a table may only have one clustered index. Oracle experts will recognize this as essentially the same as an Index-Organized Table (IOT). IOTs aren't used that much in Oracle, but maybe they should be. If you often query a table by its primary key, or often do UPDATE or DELETE by primary key, you might get a good performance boost by making it an IOT.

At SQL Saturday, Richmond, I went to a great presentation about indexing, "Indexing for Performance" by Jeffrey Garbus. Mr. Garbus described very well the access path used for various WHERE conditions, with and without indexes, and differences between using a clustered and a non-clustered index. A non-clustered index is a B-tree index in SQL Server, just like a normal index in Oracle and is used in a similar way. In both databases, the optimizer decides whether to use an index or to do a full table scan according to similar criteria, and in fact, may decide not to use an index in situations such as the likelihood that more than a certain percentage of the rows will be returned by the query.

One very important difference between an IOT in Oracle and a table with a clustered index in SQL Server is that an IOT is always indexed with its primary key. While the primary key gets an unique clustered index by default in SQL Server, it is possible to specify that the primary key should get an unique non-clustered index. This leaves it open to create a clustered index on another key, and that index need not be unique. Since a clustered index makes the database store the rest of the data with the index, only one clustered index may be created on a table. Mr. Garbus explained why, for certain tables, it might be better to cluster on a parent key. For instance, in an Order-Entry application, line items for an order are rarely accessed any other way than by the parent order_id. Therefore, it is often a good idea to cluster the order_line_item table on this foreign key, rather than its primary key (order_line_item_id).

This reminded me of another Oracle feature that I believe may be under-used. While IOTs can only use a table's primary key, you can actually store master table data like the "order" table physically next to its detail rows in "order_line_item". Oracle has clustering too, but the syntax and implementation is different. Check out the CREATE CLUSTER command and its use in the Database Concepts Guide.

Tuesday, February 28, 2017

Same blog, new location

This is the continuation of my blog at JJFlash's Oracle Development Journal - the story of an old developer learning new tricks. I'll be posting here odds and ends about Relational Database design and development, especially Oracle RDBMS, Oracle Application Development Framework, Java, and whatever else comes to my mind as an IT expert.

I hope you enjoy it.