2. Normalisation
Normalisation is the process of identifying and
eliminating data anomalies and redundancies,
thereby improving data integrity and efficiency
in a relational database. This process is designed
to remove repeated data and improve database
design.
4. Normalisation
Basically we are making a database where –
1. Records are unique – So two people named John
Smith won’t break the database
2. We aren’t doubling up data – So if we update
something we only need to do it in one place
3. We can add data without creating empty fields
4. We can delete data without losing other data we
would like to keep
5. What happens in Normalisation?
During normalisation a relation is split into a
number of smaller tables suitable for
implementation in a relational database.
6. Microsoft Xbox 1S $399 Sony PlayStation 4 $509 $148
Lets look at an example.
EB Games is a store that sells a variety of consoles
for example –
Here are EB games sales records.
Cust Name Item Shipping
Address
Supplier Supplier
Phone
Price
John Smith XBox One 35 Palm St,
Melville
Microsoft (08) BUY-
XBOX
$399
Roger Banks Playstation 4 47 Campus
Rd, Murdoch
Sony (08) BUY-
SONY
$509
Evan Wilson XBox One, PS
Vita
38 Rock Av,
Coogee
Wholesale Toll Free $547
John Smith Playstation 4 47 Campus
Rd, Murdoch
Sony (08) BUY-
SONY
$509
7. 1stNormal Form
1. Each cell to be Single valued
2. Entries in a column are same type
3. Rows uniquely identified – Add Unique ID, or Add more columns to make
unique
Cust Name Item Shipping
Address
Supplier Supplier
Phone
Price
John Smith XBox One 35 Palm St,
Melville
Microsoft (08) BUY-
XBOX
$399
Roger Banks Playstation 4 47 Campus
Rd, Murdoch
Sony (08) BUY-
SONY
$509
Evan Wilson XBox One, PS
Vita
38 Rock Av,
Coogee
Wholesale Toll Free $547
John Smith Playstation 4 47 Campus
Rd, Murdoch
Sony (08) BUY-
SONY
$509
8. 1stNormal Form
1. Each cell to be Single valued
2. Entries in a column are same type
3. Rows uniquely identified – Add Unique ID, or Add more columns to make
unique
Cust Name Item Shipping
Address
Supplier Supplier
Phone
Price
John Smith XBox One 35 Palm St,
Melville
Microsoft (08) BUY-
XBOX
$399
Roger Banks Playstation 4 47 Campus
Rd, Murdoch
Sony (08) BUY-
SONY
$509
Evan Wilson XBox One, PS
Vita
38 Rock Av,
Coogee
Wholesale Toll Free $547
John Smith Playstation 4 47 Campus
Rd, Murdoch
Sony (08) BUY-
SONY
$509
2. Not Same Type1. Not a Single Value
3. Row uniquely Identified *doesn’t account for people with the same name*
9. So here is that table Normalised to the 1st Normal Form. The colour is just to
highlight the changes.
Primary
Key
Cust ID Cust Name Item Shipping
Address
Supplier Supplier
Phone
Price
1000 John Smith XBox One 35 Palm St,
Melville
Microsoft (08) BUY-
XBOX
$399
1001 Roger Banks Playstation
4
47 Campus
Rd,
Murdoch
Sony (08) BUY-
SONY
$509
1002 Evan Wilson XBox One 38 Rock Av,
Coogee
Microsoft (08) BUY-
XBOX
$399
1002 Evan Wilson PS Vita 38 Rock Av,
Coogee
Sony (08) BUY-
SONY
$148
1003 John Smith Playstation
4
47 Campus
Rd,
Murdoch
Sony (08) BUY-
SONY
$509
1. So to get to single values Evan Wilson’s entry was separated to two entries one for the XBox
and one for the Vita.
2. Wholesale and Toll Free were removed and replaced with proper values that reflect the
information expected.
3. A Primary Key (Cust ID) was added to make sure each row is uniquely identified. So we now
see clearly that the two John Smiths are two different customers.
10. 2nd Normal Form
1. All attributes (Non-Key Columns) dependant on the key.
So which attributes are not dependant on Cust ID in previous table?
Primary
Key
Cust ID Cust Name Item Shipping
Address
Supplier Supplier
Phone
Price
1000 John Smith XBox One 35 Palm St,
Melville
Microsoft (08) BUY-
XBOX
$399
1001 Roger Banks Playstation
4
47 Campus
Rd,
Murdoch
Sony (08) BUY-
SONY
$509
1002 Evan Wilson XBox One 38 Rock Av,
Coogee
Microsoft (08) BUY-
XBOX
$399
1002 Evan Wilson PS Vita 38 Rock Av,
Coogee
Sony (08) BUY-
SONY
$148
1003 John Smith Playstation
4
47 Campus
Rd,
Murdoch
Sony (08) BUY-
SONY
$509
11. 2nd Normal Form
1. All attributes (Non-Key Columns) dependant on the key.
So which attributes are not dependant on Cust ID in previous table?
Primary
Key
Cust ID Cust Name Item Shipping
Address
Supplier Supplier
Phone
Price
1000 John Smith XBox One 35 Palm St,
Melville
Microsoft (08) BUY-
XBOX
$399
1001 Roger Banks Playstation
4
47 Campus
Rd,
Murdoch
Sony (08) BUY-
SONY
$509
1002 Evan Wilson XBox One 38 Rock Av,
Coogee
Microsoft (08) BUY-
XBOX
$399
1002 Evan Wilson PS Vita 38 Rock Av,
Coogee
Sony (08) BUY-
SONY
$148
1003 John Smith Playstation
4
47 Campus
Rd,
Murdoch
Sony (08) BUY-
SONY
$509
The Supplier, Supplier Phone and Price and not dependant on the key. They are independent to the person buying the
console. Microsoft is the supplier of XBox regardless of who is buying it.
So we will take these items out into their own table.
12. I have used the name of the item as a primary key to make it a little clearer what is going on. You
probably wouldn’t do this in real life, there would be a stock number associated with these items.
Primary
Key
Primary
Key
Cust ID Cust Name Shipping
Address
Stock ID Supplier Supplier
Phone
Price
1000 John Smith 35 Palm St,
Melville
XBox One Microsoft (08) BUY-
XBOX
$399
1001 Roger
Banks
47 Campus Rd,
Murdoch
PlayStation 4 Sony (08) BUY-
SONY
$509
1002 Evan
Wilson
38 Rock Av,
Coogee
PS Vita Sony (08) BUY-
SONY
$148
1003 John Smith 47 Campus Rd,
Murdoch
So we have the non-dependant data out but now we can’t see who bought what so
we add table called a junction table to join them together.
13. I have used the name of the item as a primary key to make it a little clearer what is going on. You
probably wouldn’t do this in real life, there would be a stock number associated with these items.
Primary
Key
Primary
Key
Cust ID Cust Name Shipping
Address
Stock ID Supplier Supplier
Phone
Price
1000 John Smith 35 Palm St,
Melville
XBox One Microsoft (08) BUY-
XBOX
$399
1001 Roger
Banks
47 Campus Rd,
Murdoch
PlayStation 4 Sony (08) BUY-
SONY
$509
1002 Evan
Wilson
38 Rock Av,
Coogee
PS Vita Sony (08) BUY-
SONY
$148
1003 John Smith 47 Campus Rd,
Murdoch
So we have the non-dependant data out but now we can’t see who bought what so
we add table called a junction table to join them together.
Primary Key Primary Key
Cust ID Stock ID
1000 XBox One
1001 PlayStation 4
1002 XBox One
1002 PS Vita
1003 PlayStation 4
14. 3rd Normal Form
1. All fields (columns) can be determined only but the key in the table and no other column.
What filed can be determined by a column other than the key in our example?
Primary
Key
Primary
Key
Cust ID Cust Name Shipping
Address
Stock ID Supplier Supplier
Phone
Price
1000 John Smith 35 Palm St,
Melville
XBox One Microsoft (08) BUY-
XBOX
$399
1001 Roger
Banks
47 Campus Rd,
Murdoch
PlayStation 4 Sony (08) BUY-
SONY
$509
1002 Evan
Wilson
38 Rock Av,
Coogee
PS Vita Sony (08) BUY-
SONY
$148
1003 John Smith 47 Campus Rd,
Murdoch
Primary Key Primary Key
Cust ID Stock ID
1000 XBox One
1001 PlayStation 4
1002 XBox One
1002 PS Vita
1003 PlayStation 4
15. 3rd Normal Form
1. All fields (columns) can be determined only but the key in the table and no other column.
In our example the Supplier Phone is dependent on the supplier. If we left this and the supplier changed
their number we would need to change it in multiple places.
Primary
Key
Primary
Key
Cust ID Cust Name Shipping
Address
Stock ID Supplier Supplier
Phone
Price
1000 John Smith 35 Palm St,
Melville
XBox One Microsoft (08) BUY-
XBOX
$399
1001 Roger
Banks
47 Campus Rd,
Murdoch
PlayStation 4 Sony (08) BUY-
SONY
$509
1002 Evan
Wilson
38 Rock Av,
Coogee
PS Vita Sony (08) BUY-
SONY
$148
1003 John Smith 47 Campus Rd,
Murdoch
Primary Key Primary Key
Cust ID Stock ID
1000 XBox One
1001 PlayStation 4
1002 XBox One
1002 PS Vita
1003 PlayStation 4
So we pull out the Supplier
making it a Primary Key in it’s
own table and a Foreign key in
Stock Table. Which will means
you can only add suppliers from
the Supplier Table in the Stock
Table.
16. 3rd Normal Form
1. All fields (columns) can be determined only but the key in the table and no other column.
So we pull out the Supplier making it a Primary Key in it’s own table and a Foreign key in Stock
Table. Which will means you can only add suppliers from the Supplier Table in the Stock Table.
Primary
Key
Primary
Key
Cust ID Cust Name Shipping
Address
Stock ID Supplier Price
1000 John Smith 35 Palm St,
Melville
XBox One Microsoft $399
1001 Roger
Banks
47 Campus Rd,
Murdoch
PlayStation 4 Sony $509
1002 Evan
Wilson
38 Rock Av,
Coogee
PS Vita Sony $148
1003 John Smith 47 Campus Rd,
Murdoch
Primary Key Primary Key
Cust ID Stock ID
1000 XBox One
1001 PlayStation 4
1002 XBox One
1002 PS Vita
1003 PlayStation 4
Primary Key
Supplier Supplier
Phone
Microsoft (08) BUY-
XBOX
Sony (08) BUY-
SONY
And we are done! Yes
there is 4th Normal form
but we don’t need to
know about that.