Database re-engineering: the 3 phases to do it.

Reingeniería de la base de datos: las 3 fases para hacerlo.
Reengenharia de banco de dados: as 3 fases para fazê-lo.

 

For more details about this analysis please have a look at the following RedBooks (available into the site):

  • Modernizing IBM i Applications from the Database up to the User Interface and Everything in Between
  • Tools and Solutions for Modernizing Your IBM i Applications

In the previous posts I analyzed the main steps of a deep modernization about IBM i (https://linkedibmi.com/2017/12/08/ibm-four-milestones-deep-modernization/) and I showed the main reasons to move from a program centric model to a data centric model, passing from old DDS technology to modern SQL and DDL (https://linkedibmi.com/2017/12/17/ibm-modernization-extended-database-dds-versus-ddl/).

By this analysis I want to move deeply in the database evolution through the re-engineering process and all the available tools able to assure this pattern.

IBM has developed a modern strategy, through a series of phases, to simplify the database modernization process and use Structured Query Language (SQL) and Data Definition Language (DDL). The new database will contain the items to assure data integrity and to be in line with the new business requirement. This approach is showed in the following figure:

The reasons to re-engineering are surely based on the business requirements, the need:

  • to move to new markets: this could create problems of language support and currency to the companies
  • to have availability of better and advanced query report: the grow needs ask for a better data analysis to trace, study and develop business plan and follow the market needs
  • to access in every time to the application when a global approach to e-commerce is a must to have
  • to increase the limits of the database, as a consequence of the business grow a lot of companies require higher limits of database capacity
  • to improve in term of duplication of data, integrity and business logic across the application: as I mentioned in the previous post it is important to move most part of the logic to the data level, avoiding redundancy of data

 

 

The suggested framework for re-engineering is composed by these following layers:

  • Physical data layer: it consists of the physical tables, constraints and indexes. This layer shouldn’t be accessed directly from program and it is designed to hold data and ensure integrity. It is normalized at third normal form and have identity columns as the primary key.
  • Data access layer: it contains stored procedures and service programs. It means to have a controlled point for writing, updating and deleting data, so the program must ask to this layer for manipulating data.
  • Virtual layer: it consists in views that bring pieces of data together across physical tables. View could be used to de-normalize the data into a way that could be easily used by programs. Views could also be used to aggregate information that can’t be stored, as the total amount, for example.
  • Application layer: it contains the programs that process all the information to the end user. These programs could be in old or modern languages, including PHP, .NET and Java and they elaborate data coming from the data access layer.

 

PHASE 0: discovery, study and tools

The main mission of the phase 0 is to discover how your database has been designed correctly with a rational database approach, studying if all relationships are declared through Referential Integrity (RI) constraints.

But in this phase you should also discover how much do you know, what tools do you have and what do you need to realize the modernization by re-engineering (more resources, education, tools, etc.).

In this phase each table must be reviewed to satisfy at least these core requirements:

– Each table is defined by using SQL Data Definition Language (DDL).

– The table contains a unique key constraint (also known as the public key).

– If the table is the parent in a parent-child relationship, it contains a primary key

constraint (also known as the surrogate key).

– The table contains time stamp columns that are used to identify the data and time the

row was inserted into the table and when the last change occurred to the row.

– The table is being logged by using journal support.

It is important to have the necessary skills to develop this pattern of re-engineering and IBM give an education support on DB2 that you should consult at the link http://www.ibm.com/systems/power/software/i/db2/education/index.html

Also are available the following database modernization tools:

 

There are also third-party solutions as:

 

Phase 1: Transformation

The Phase 1 reviews the current state of the physical data model (PDM) from Phase 0, to be sure that relationships are correctly defined. The PDM model could be used to complete the Logical Data Model (LDM), that will be useful for each new database enhancements: this step is heavy to realize and it is better to use graphical data modelling tool and split the entire project in multiple manageable projects.

The list of the task that must be completed are the following:

  • create an LDM (optional)
  • create DDL for the new SQL tables
  • create surrogate file, indexes and views

 

 

The creation of LDM should easily be done by an automated tool.

The single subtasks that are done during the creation of the LDM are listed here:

  • Create and update the domain model with the core fundamental items.

We define in this task a domain model (data dictionary) to collect all the attributes that are used for columns of tables.

  • Create and update the glossary model with naming standards and abbreviations.

This task will let you define a standard for naming and abbreviation, compliance and documentation.

  • Add the core fundamental items to the LDM entities.

With this task primary/foreign keys, row change time stamps, and other missing core fundamental are added to the LDM tables.

  • Transform the LDM to a physical data model.

When all the core fundamental attributes are added to LDM we need to transform the LDM to a Physical Data Model (PDM), to ensure that attributes are transformed correctly and generate the new table DDL.

This figure summarize these steps:

 

Phase 2: Isolation

The objective of this phase is to create a virtual data or abstract layer. The layer is created by decoupling the HLL program database code from the presentation and business logic. The process includes the tasks:

  • create SQL view of the database (iterative task of the phase 1 too)
  • create SQL I/O modules that contain SQL procedures to access the SQL views
  • create an handler program to intercept the HLL program I/O operations and run the appropriate SQL service program I/O function

 

Phase 3: Integrity

By this phase we add relationships and dependencies between the file of out model. This let us to move part of program logic to the database level. There are several ways to add and enforce business rules at the database level: this includes constraints, routines (functions, triggers and procedures), views and so on. Thinking about constraints there are the following types:

  • UINQUE (forbids duplicate values one or more columns)
  • PRIMARY KEY (Same as UNIQUE; however, NULL values are not allowed
  • FOREIGN KEY (Also known as a referential constraint. Defines the logical rules that are imposed on INSERT, UPDATE, and DELETE operations to ensure integrity between data relationships
  • CHECK (A logical rule that ensures data integrity at the column level

 

Leave a Reply

Your email address will not be published. Required fields are marked *