Assignment 3 — Data Modeling and Schema Design

This assignment involves several aspects of data modeling and schema design:

Creating an Entity/Relationship model from a textual description of a data model.
Turning your Entity/Relationship model into a relational database schema.
Examining the data in a table, and determining what functional dependencies are ruled out by it.
Using a BCNF decomposition approach to generate a relational database schema from functional dependencies.

This is purely a paper and pencil (or vi or emacs) assignment — no programming is involved.

Part 1: Create a data model for a university

You are going to model a university. Some of the obvious entities are:

Department: A Department has a name (e.g. "Computer Science"), an address, a phone number, a website.
Professor: A Professor has a unique, university-assigned id, name, email address, phone number, title (e.g. Associate Professor, Lecturer, ...). Each Professor belongs to exactly one Department. One of the Professors in the Department is the chair of that Department.
Course: A course is offered by exactly one Department. Information on a course includes the catalog number (e.g. COMP115), title, a description and prerequisites (zero or more other courses).
Student: Each Student has a name, and a unique, university-assigned id. Students take Courses. Some students may occasionally work as teaching assistants (TAs). A Student can be a TA for zero, one, or more courses per semester.

Your model also has to describe each Offering of a course, e.g. COMP115 in the Spring 2018 semester. An Offering of a Course is taught by one Professor, assisted by zero or more TAs. Associated with a course offering are details of evaluation: a set of tasks (exams, assignments), each with a weight. For example, the tasks might be:

Assignment 1: 5%
Assignment 2: 5%
Assignment 3: 5%
Assignment 4: 5%
Assignment 5: 5%
Assignment 6: 5%
Assignment 7: 5%
Assignment 8: 5%
Midterm: %15
Final: 30%
Class participation: 15%

These details may vary from one semester to the next, so you need to model this information to be associated with Course Offerings, not Courses. Your model also has to record each student's score (between 0 and 100) for each of these tasks.

With all of this information, the data model does not have to include the student's grade for the course, as it can be computed from the scores on each task. (Since this assignment is about data modeling and not querying, you won't need to write the query that computes the final grade.)

Part 1.1: Entity/Relationship diagram

Create an Entity/Relationship model for the university database. Your work should be in the form of an Entity/Relationship diagram, using the notation discussed in class (boxes for entities, labels or diamonds for relationships). For binary relationships, connecting lines should include arrows to indicate the one side of a relationship, as shown here.

A diagram prepared with a drawing tool would be ideal. Export your drawing as a JPG or PNG file, don't submit the Office document. If you must draw by hand, please be very concerned with legibility, and take a clear photo with no reflections or shadows.

Part 1.2: List of Entity Attributes

For each entity, list the attributes. Don't include the attributes in the diagram, list them separately, in tabular form.

Part 2: Database schema

Part 2.1: DDL

Provide a database schema corresponding to your Entity/Relationship model. This should be provided as DDL: a set of CREATE TABLE statements, (and possibly ALTER TABLE statements), and should be as complete as possible: column names, column types, constraints, primary keys, foreign keys.

Think carefully about the primary keys for each table. While it is always possible to use a surrogate key (e.g., a generated integer, or a UUID), you should also consider using natural keys when possible. For example, should the primary key of a Course be the course's catalog number or a surrogate (artificial) key?

Part 2.2: Discuss primary keys

Write a short discussion of how you designed the primary keys in your schema. What are the reasons for introducing surrogate keys? You don't have to discuss every table individually, as that would probably be repetitive. Just describe your general approach.

Part 3: Ruling out FDs based on data

Functional dependencies rule out certain states of a table. For example, if Y → Z applies to table T(Y, Z), then T can't have both rows (1, 10) and (1, 20). So if you look at data, you can rule out certain functional dependencies. If you see rows (1, 10) and (1, 20), then obviously Y → Z does not hold.

Part 3.1: Enumerate all possible functional dependencies for a 3-column table

Suppose we have a table with three columns, X, Y, and Z.

Enumerate all of the non-trivial functional dependencies that could possibly exist with this table, (e.g. X, Y → X is trivial because X is a subset of (X, Y)).

For example, a table with two columns has two possible non-trivial dependencies, X → Y and Y → X.

Part 3.2: Which functional dependencies are ruled out by this data?

Suppose the tables has these rows:

X Y Z

1 5 7

1 6 8

2 6 8

5 7 10

6 8 10

7 1 20

7 2 20

8 9 30

9 9 30

Of the functional dependencies you enumerated, which ones can be eliminated by examining this data? In each case, show the data that leads you to your conclusion.

Part 3.3: Same question, less data

Suppose, instead, the table has just these two rows:

X Y Z

1 5 7

1 6 8

What FDs can you rule out due to this data?

Part 3.4: Same question, much less data

Suppose, instead that the table is empty. What FDs can you rule out now? (Again, show the data.)

Part 4: BCNF decomposition

For each of the following problems, do the following:

Identify the candidate keys.
Do a BCNF decomposition, showing at each step:
- The functional dependency violating BCNF.
- The closure of that dependency.
- The decomposition to an improved schema.

Part 4.1: Problem 1

Table columns: V, W, X, Y, Z.

Functional dependencies:

V → W, X
X → Y, Z

Part 4.2: Problem 2

Table columns: W, X, Y, Z.

Functional dependencies:

W, X → Y
X → Z
Y → W

Checklist

Here is a list of the items to be submitted for this assignment:

Part 1:
- Part 1.1: Entity/Relationship diagram
- Part 1.2: List of Entity Attributes
Part 2:
- Part 2.1: DDL
- Part 2.2: Discuss primary keys
Part 3:
- Part 3.1: Enumerate all possible functional dependencies for a 3-column table
- Part 3.2: Which functional dependencies are ruled out by this data?
- Part 3.3: Same question, less data
- Part 3.4: Same question, much less data
Part 4:
- Part 4.1: Problem 1
- Part 4.2: Problem 2