Database Normalization in depth

Database normalization is the rules to design of tables to minimize data redundancy i.e. data duplicate which ensure data consistency. It can also defined as dividing a single table in two or more and defining relationships between those tables.

Benefits of Database Normalization

  1. Avoid Repetitive entries
  2. Avoiding unnecessary disk space wastage
  3. Increase speed and flexible query.

 

 Table without Normalization

We have a Shipment table where whenever the customer books his/her shipment. The information are stored in database table as shown below:

1

So here in this table we store the application name from where the ShipmentRequest is coming, along with its hosted IP Address. These fields are duplicated in the table. We also have Commodity Type column also which remains consistent and duplicated in the table. We can see all these mentioned columns are repeated. Right now we are inserting all these details for each shipment in the table.

Drawbacks of this Design

  • Disk space wastage
  • Inconsistent Data
  • DML queries will become slow due

Disk space wastage

As we insert new records to the table. The disk space automatically increases rapidly for unnecessary data.

Inconsistent Data

If business team tells us that commodity Type Handicraft has been changed to Handicraft/Products and that should reflect in reports as well, we need to update the whole table wherever this Commodity Type is defined.

Slow DML queries

As discussed if we have to update the records we have to update all records which will lead to slow down of DML queries

Normalized structure of table

Below is the normalized tables design. We can normalize the main table to more extent which we will do after a while, But for understanding purpose lets check the same for now.

Main Shipment table Design
2

Commodity Type Table
3

Application table

4

Now we can clearly see all the duplicate records are not inserted into the table rather than we refer their respective id from their respective table. So now if we have to update the Commodity Type we need to update single row.

First Normal Form

The table to be in First Normal form the records should be single unit rather than duplicate. No multiple values or repetitions group of fields.

Sample ShipperDetails table where we have Shipper details and their respective GSTIN Number with state name as shown below:

5

According to 1NF we should not have repetitive group of fields. In our ShipperDetails table we have repetitive GSTIN and City Columns which are repetitive. We have some customer which has GSTIN number in only 1 state than respective other GSTIN columns are set to null hence resulting in wasted disk space. While in future our customer may increase his business and opt for other GST as well in that case we need to alter our tables and add the new GSTIN and city column. In order to implement 1NF we need to modify the table structure and make compliant to 1NF.

6

So we can see we have broken down the the single unit with Single GSTIN, State, CustomerFirstName and CustomerSecondName.

2NF

The table is state to be in Second normal form. If it fulfills the conditions for First Normal form. In Second normal form states that All the non-key columns are dependent on the table’s primary key.

In the above mentioned Table we have column name state as varchar type. If we have millions of records we would have repeative state name present in this table, We know there are 37 States as per GST regulations. So we will move our state column to a new table called statemaster with additional state details for our purpose as shown below:

7

Now we will change the State column from our [ShipperDetails]table to StateId as shown below with respect to the state present in [ShipperDetails] and create relationship between them using foreign key.

8

Create relationship between stateMaster and [ShipperDetails] table using Foreign key

9

Third Normal Form
A table is said to be in Third Normal form if meets all condition of First and Second normal form.

The table doesn’t contains column that are depends on other non-key field. i.e. columns that are not fully dependent upon Primary key for example:Total in our table which is dependent on Unit Price and Qty, Total column is not dependent on our PrimaryKey, We can compute Total by multiplying Unit Price and Qty.

So in order to make our [ShipperDetails] table in Third normal form we have to remove the Total column from the table as shown below:

10

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.