dcsimg
 

Database Normalization

Friday Feb 8th 2008 by David Faour

SWatch Reader Favorite! A poor database design can cripple an application, producing problems with data redundancy, accuracy, consistency and concurrency. Normalization reduces, if not eliminates, these problems.

Discuss this article in the ServerWatch discussion forum

Unsure About an Acronym or Term?
Search the ServerWatch Glossary
 
A poor database design can cripple an application, producing problems with redundancy, inaccuracy, consistency and concurrency of data. Normalization is a process that serves to reduce, if not eliminate, these problems with data. Since most businesses use 3rd normal form in the logical model, I'll take you through 1st, 2nd, and 3rd NF's.

First, normal forms requires there be no multi-valued attributes, and no repeating groups. A multi-valued attribute would contain more than one value for that field in each row.

Consider the following StudentCourses table:

StudentIDCourse
123453100,3600,3900
543211300,2300,1200

In this table, the Course field is a multi-valued attribute. There is not a single value for each field.

Now, consider this StudentCourses table:

StudentIDCourse1Course2Course3
12345310036003900
54321130023001200

The Course1, Course2, Course3 fields represent repeating groups.

The proper way to store this data follows. First Normal form is satisfied.

StudentIDCourse
123453100
123453600
123453900
543211300
543212300
543211200

In the first two designs, selecting students that are enrolled in a certain course is difficult. Say I want to do the following:

Tell me all of the students enrolled in course 3100.  In the first design, you'll have to pull all of the course data and parse it somehow. And in the second design, you'll have to check 3 different fields for course 3100. In the final design, a simple Select StudentID from StudentCourses where Course=3100.

Original date of publication, 02/21/2001

Second Normal Form requires that any non-key field be dependent upon the entire key. For example, consider the StudentCourses table below, where StudentID and CourseID form a compound primary key.

StudentIDCourseIDStudentNameCourseLocationGrade
123453100AprilMath BuildingA
123451300AprilScience BuildingB

The Student Name field does not depend at all on CourseID, but only on Student ID. CourseLocation has be dependency on StudentID, but only on CourseID.

This data should be split into three tables as follows:

Students Table

StudentID Name
12345April

Courses Table

CourseIDCourseLocation
3100Math Building
1300Science Building

Student Courses Table

StudentIDCourseIDGrade
123453100A
123451300B

In this example, grade was the only field dependent on the combination of StudentID and CourseID.

Let's suppose that in the first table design, the first row of data was entered with a StudentName of Aprok, a simple typo. Now, suppose the following SQL is run.

Delete from StudentCourses where StudentName="April"

The erroneous "Aprok" row will not be deleted. However, in the final design, using the following SQL:

Delete From StudentCourses where StudentID=12345
will delete every course that April was in by using the ID.

Original date of publication, 02/21/2001

Third Normal Form prohibits transitive dependencies. A transitive dependency exists when any attribute in a table is dependent on any other non-key attribute in that table.

Consider the following example CourseSections Table:

CourseIDSection

ProfessorIDProfessorName
310016789David
130016789David

The professor is uniquely identified by the CourseID and Section of the course. However, ProfessorName depends on ProfessorID and has no relation to CourseID or Section.

This data is properly stored as follows:

Professors Table

ProfessorIDProfessorName
6789David

CourseSections Table

CourseIDSectionProfessorID
310016789
130016789

By splitting the data into two tables, the transitive dependency is removed.

Taking the original design of the CourseSections table introduces the chance that ProfessorName may be Corrupted. Perhaps, on the second row the ProfessorName is entered as Davif, a simple typo. Since there is no such professor Davif, there would be a problem.

Original date of publication, 02/21/2001

Home
Mobile Site | Full Site