Saturday, December 12, 2009

Database Archival Utility

1. Introduction

In any enterprise class application physical delete of data is not a choice. There exist only logical delete. But logical delete brings its own challenges. Logical delete of data brings in ever growing tables which keep on slowing the database.
To overcome problem of growing data, in my recent application, I have architected and designed a utility which will move physically deleted data from application tables to archival tables. Archival tables essentially keep deleted records.
This utility is in two parts. Part 1, converts database schema in an XML file where it is marked that which tables to be archived and which are not. Part 2 of the utility does actual movement of data.

2. Database Schema

Database schema can be though of consists of three categories of tables:
• Master-Master Table
• Master Table
• Transaction tables

Master-Master tables are those tables which contain seed data for an application. This data is independent of any user and its operations. Example - Table containing the name of Countries etc.
Master Tables are those tables which are pivotal to an application and hold information like users etc.
Transaction tables are those tables which stores information about all the transaction performed by/on the entities stored in Master tables.

The relationship among tables can be thought of two types:
• Imported: primary keys of other tables referenced
• Exported: tables which refer table1’s primary key

This relationship among tables can grow up to any depth.
To understand please refer following picture:



Table table1 has imported primary key from table2, which in turn has imported keys from table5 and table6 and exported its primary key to table3 and table4. Table table1 also exported its primary key to table7 which in turn has imported primary keys from table8 and table9 and exported its primary key to table10 and table11.

Apart from these relationships, a well designed database schema must not have any circular relationships among tables.

3. Utility Parts

Data archival utility consists of two parts.
• Part 1- Database Relation XML generator Utility: This utility create an XML representation of Database Schema where some one marks each table into said categories (Master-Master, Master and Transaction tables)

• Part 2- Database Archival Utility: This utility is designed for moving, logically deleted records from the Main Database of application to the Archival database of application. DB Archival utility archives records for:
• Master tables
• Transaction tables in Xing

As Master-Master tables contain only static records, its unnecessary burden on utility to move these records to archival database. Instead it is left to the database administrator to copy records from Main Database’s Master-Master tables to Archival database’s Master-Master tables.

4. Database Relation XML generator Utility

Relation XML generator reads a property file to capture credentials to connect to Main database for generating table relations. This property file contains information about the main database. Structure of property files are described at end of Description section of this document.

Relation XML generator reads another property file to identify table for which relation XML is to be generated.

This Utility must be executed after every schema update.

5. Database Archival Utility

DB Archival utility is architected and designed for moving, deleted records from the Main Database to the Archival database. DB Archival utility archives records for:
• Master tables
• Transaction tables

As Master-Master tables contain only static records, its unnecessary burden on utility to move these records to archival database. Instead it is left to the database administrator to copy records from Main Database’s Master-Master tables to Archival database’s Master-Master tables.
DB Archival utility archive records table wise.

Database Archival utility reads a property file to capture credentials to connect to Main database. It also reads another property file to capture Archival database credentials to connect to Archival database.

Database archival utility runs continuously but archives data on specified time only. This time can be specified in a scheduling property file. This file provides information about when to move the deleted records to Archival database.

Database archival utility archives only those deleted records for which the specified timeframe (delay time) has expired since the record was deleted. Currently this timeframe is common for all tables. This timeframe (Delay time) is specified in a property file.

Database Archival utility reads property file to identify tables, which are to be archived. Then for each table, it reads its relationship XML generated by Database Relationship XML Generator, and moves the deleted records to Archival database tables from Main database tables.

Copy and Delete operations

• DB archival utility copies all imported records to archival database tables to avoid any database constraint failure.
• DB archival utility first copies all exported records to archival database tables then delete them from main database tables.

E.g. Lets assume table1 is specified for database archival
.
Exported and Imported tables for table1 are specified in a XML file, created by DB relation XML generator.

DB Archival utility finds all the rows in table1 to be archived i.e. rows in table where status is “Delete” and these deleted records are older than the delay time specified.

DB archival utility reads the XML file for table1 for finding all the tables which are referenced by table1 (Exported Table List) and all tables referenced by table1 (Imported table List).

DB Archival utility copies data of related imported relations to respective archive tables for a particular table (table1).

For exported relation’s DB Archival utility copies to respective archival table and delete the records from main table and all subsequent tables. If any table is imported in exported Table of table1 then its record is only copied to respective archival table.

DB archival utility copy imported table’s data accessed by records to be archived in the table. This procedure is recursive and copies all the data down the hierarchy in the imported table list.

Then DB archival utility copies the exported table’s data to archival table and deletes them from main tables. This procedure is recursive and copies all the data down the hierarchy in the exported table list.

6. Property File Structure

Five property files are used in Database Relation XML generator and DB Archival Utility. Structure of all property files is described below:

DBCredentials.properties: This property file is read by both utilities to connect to Main database. This file have following keys, all values must be specified and supports only single values.
a. DATABASE_DRIVER
b. DATABASE_IP
c. DATABASE_PORT
d. DATABASE_NAME
e. DATABASE_USER_ID
f. DATABASE_USER_PASSWORD
g. DATABASE_PROTOCOL

ArchiveDBCredentials.properties: This property file is read by DB Archival utility to connect to Archival database. This file have following keys, all values must be specified and supports only single values.
a. DATABASE_DRIVER
b. DATABASE_IP
c. DATABASE_PORT
d. DATABASE_NAME
e. DATABASE_USER_ID
f. DATABASE_USER_PASSWORD
g. DATABASE_PROTOCOL

Scheduler.properties: This property file holds information for Scheduler of Archival utility. This file have following keys:
a. SECONDS
b. MINUTES
c. HOURS
d. DAY-OF-MONTH
e. MONTH
f. DAY-OF-WEEK

TablesToArchive.properties: This property file is read by both utilities to identify Tables to archive. An additional value is read by Database Relation XML generator to identify Master-Master Tables. All other values are read by DB Archival utility. This property file have following keys:

a. TABLE_TO_ARCHIVE

b. MASTER_MASTER_TABLE_LIST

c. Delay time must be specified in property file against appropriate key. By default all keys are set to 0.
 archivalDelayTime_second=0
 archivalDelayTime_minute=0
 archivalDelayTime_hour=0
 archivalDelayTime_day=0

Delay time specifies the time duration after which deleted entity is moved to archival database’s table. Everything must be specified in INTEGER. The least time that can be specified is 0 second.

d. All table names in Main database as key and corresponding table name in archival database as value. All keys and values must be specified in small cases. Although only specifying tables which are in relationship with tables in TABLE_TO_ARCHIVE key will suffice, but to reduce probability of any error provide all tables.

e. Column names of last modified time and current status for all tables to archive must be specified in following manner:
 tableName_LAST_MODIFIED_TIME
 tableName_CURRENT_STATUS

tableName must be provided in small cases. Values for above key must be columns representing LAST_MODIFIED_TIME and CURRENT_STATUS column in respective tables.

7. Assumption in database archival utility

• All relationships in database are maintained at Database level not at code level.
• Archival Database structure will be same as Main database. All relationships will be maintained in archival database too.
• Archival tables must contain one and only one additional column for archival timestamp except for Master-Master table.
• Column names in archival tables must be same as column names in main database table except the archival timestamp column. All other RDBMS components (trigger, stored procedures etc) may not be needed in archival database as it contains only deleted records. If in any case (e.g. reporting etc) any RDBMS components is/are needed they must be modified according to archival Database and ported manually or some other means. Above operation is out of scope for DB Archival utility.
• Tables to archive must be specified in a property file.
• Tables to archive must have a status column and last modification time column. Last modification time column must be stored as timestamp, if it is stored as date, then time part (HH:MM:SS) of date is considered as 00:00:00. Actual column names for status and last modified time must be specified in property file against key
o tableName_LAST_MODIFIED_TIME
o tableName_CURRENT_STATUS
• Archival table names must be specified in property file against there table names as key.
• Deleted records must have there status “Delete” or “delete” or “DELETE” or any possible lower and upper case version of “DELETE”.

No comments:

Post a Comment