2. DEFINATION
■ Database normalization or normalization : is the
process of organizing the attributes and relations of
a relational database to reduce data redundancy and
improve data redundancy.
■ Normalization is also the process of simplifying the
design of a database so that it achieves the optimal
structure composed of atomic elements.
■ It include classify like given below
■ 1st NF ,2ndNF,3rdNF, BCNF,4thNF,5thNF.
2
3. 1st NF
■ As per First Normal Form, no two Rows of data must contain repeating group of
information i.e each set of column must have a unique value, such that multiple
columns cannot be used to fetch the same row. Each table should be organized
into rows, and each row should have a primary key that distinguishes it as unique.
■ The Primary key is usually a single column, but sometimes more than one
column can be combined to create a single primary key. For example consider a
table which is not in First normal form
■ e.g
■ StudentTable :
Student Age Subject
Adam 15 Biology, Maths
Alex 14 Maths
Stuart 17 Maths
3
4. 1st NF
■ After 1st NF.
Student Age Subject
Adam 15 Biology
Adam 15 Maths
Alex 14 Maths
Stuart 17 Maths
1st NF
4
5. ■ As per the Second Normal Form there must not be any partial
dependency of any column on primary key. It means that for a table that
has concatenated primary key, each column in the table that is not part
of the primary key must depend upon the entire concatenated key for its
existence. If any column depends only on one part of the concatenated
key, then the table fails Second normal form.
■ In example of First Normal Form there are two rows for Adam, to include
multiple subjects that he has opted for.While this is searchable, and
follows First normal form, it is an inefficient use of space.Also in the
aboveTable in First Normal Form, while the candidate key is
{Student, Subject}, Age of Student only depends on Student column,
which is incorrect as per Second Normal Form.To achieve second normal
form, it would be helpful to split out the subjects into an independent
table, and match them up using the student names as foreign keys.
2nd NF
5
6. 2nd NF
■ After 2nd NF
■ New StudentTable following 2NF will be :
■ In SubjectTable the candidate key will be {Student, Subject} column.
Now, both the above tables qualifies for Second Normal Form and will
never suffer from Update Anomalies.Although there are a few complex
cases in which table in Second Normal Form suffers Update Anomalies,
and to handle those scenariosThird Normal Form is there.
Student Age
Adam 15
Alex 14
Stuart 17
Student Subject
Adam Biology
Adam Maths
Alex Maths
Stuart Maths
2nd NF
2nd NF
6
7. ■ Third Normal form applies that every non-prime attribute of table must be
dependent on primary key, or we can say that, there should not be the case
that a non-prime attribute is determined by another non-prime attribute.
So this transitive functional dependency should be removed from the table
and also the table must be in Second Normal form. For example, consider a
table with following fields.
■ Student_DetailTable :
■ In this table Student_id is Primary key, but street, city and state depends
upon Zip.The dependency between zip and other fields is called transitive
dependency. Hence to apply 3NF, we need to move the street, city and
state to new table, with Zip as primary key.
Student_id Student_name DOB Street city State Zip
3rd NF
7
8. 3rd NF
■ New Student_DetailTable :
■ AddressTable :
■ The advantage of removing transtive dependency is,
■ Amount of data duplication is reduced.
■ Data integrity achieved.
Student_id Student_name DOB Zip
Zip Street city state
3rd NF
8
9. Boyce and Codd Normal Form
(BCNF)
■ Boyce and Codd Normal Form is a higher version of theThird Normal form.
This form deals with certain type of anamoly that is not handled by 3NF. A 3NF
table which does not have multiple overlapping candidate keys is said to be in
BCNF. For a table to be in BCNF, following conditions must be satisfied:
■ R must be in 3rd Normal Form
■ and, for each functional dependency ( X ->Y ), X should be a super Key.
consider the following relationship:R(A,B,C,D)
and following dependwncies
A ->BCD
BC->AD
D ->B
above relationship is already in 3rd NF.keys are A and BC.
hence ,in functional dependency, A->BCD ,A is the super key .in second relation,
BC->AD,BC is also akey.but in,D->b,d is not a key.
Hence we can break our relation ship R into two relationship R1 and R2.
9
10. Boyce and Codd Normal Form
(BCNF)
R(A,B,C,D)
R1(A,D,C) R2(D,B)
10
11. 4th NF
■ Fourth normal form (4NF) is a level of database normalization where there are
no non-trivial multivalued dependencies other than a candidate key.
■ It builds on the first three normal forms (1NF, 2NF and 3NF) and the Boyce-
Codd Normal Form (BCNF). It states that, in addition to a database meeting
the requirements of BCNF, it must not contain more than one multivalued
dependency.
■ Multivalued dependency is best illustrated using an example. In a table
containing a list of three things - college courses, the lecturer in charge of
each course and the recommended book for each course - these three
elements (course, lecturer and book) are independent of one another.
Changing the course’s recommended book, for instance, has no effect on the
course itself.This is an example of multivalued dependency: An item depends
on more than one value. In this example, the course depends on both lecturer
and book.
■ Thus, 4NF states that a table should not have more than one of these
dependencies. 4NF is rarely used outside of academic circles.
11
12. 4th NF
■ E.g 4th NF table will be:
■ After 4th NF
Subject Lecturer books
math raj b1
math jay b2
physics kanu p1
chemestry vinod c1
Subject Lecturer
math raj
math jay
physics kanu
chemestry vinod
Subject books
math b1
math b2
physics p1
chemestry c1
12
13. 5th NF
■ A database is said to be in 5NF, if and only if,
■ It's in 4NF
■ If we can decompose table further to eliminate redundancy and anomaly,
and when we re-join the decomposed tables by means of candidate keys,
we should not be losing the original data or any new record set should not
arise. In simple words, joining two or more decomposed table should not
lose records nor create new records.
Subject Lecturer Semester
math raj 1
math jay 2
physics kanu 1
chemestry vinod 2
chemestry divy 1
13
14. 5th NF
In above table, Rose takes both Mathematics and Physics class for
Semester 1, but she does not take Physics class for Semester 2. In this
case, combination of all these 3 fields is required to identify a valid data.
Imagine we want to add a new class - Semester3 but do not know which
Subject and who will be taking that subject. We would be simply inserting
a new entry with Class as Semester3 and leaving Lecturer and subject as
NULL.As we discussed above, it's not a good to have such entries.
Moreover, all the three columns together act as a primary key, we cannot
leave other two columns blank!
Hence we have to decompose the table in such a way that it satisfies all
the rules till 4NF and when join them by using keys, it should yield correct
record. Here, we can represent each lecturer's Subject area and their
classes in a better way.We can divide above table into three - (SUBJECT,
LECTURER), (LECTURER,CLASS), (SUBJECT,CLASS)
14
15. 5th NF
■ After 5th NF.
Subject Lecturer
math raj
math jay
physics kanu
chemestry vinod
chemestry divy
Lecturer Semester
raj 1
jay 2
kanu 1
vinod 2
divy 1
Subject Semester
math 1
math 2
physics 1
chemestry 2
chemestry 1
15
16. 5th NF
■ Now, each of combinations is in three different tables. If we
need to identify who is teaching which subject to which
semester, we need join the keys of each table and get the
result.
■ For example, who teaches Physics to Semester 1, we would be
selecting Physics and Semester1 from table 3 above, join with
table1 using Subject to filter out the lecturer names.Then join
with table2 using Lecturer to get correct lecturer name.That is
we joined key columns of each table to get the correct data.
Hence there is no lose or new data - satisfying 5NF condition.
16