1.1. Department, Major, Student 1.2. Department <- Major <- Student 1.3. Entity ids determine entity attributes: DI -> DN, DA MI -> MN, MDI SI -> SN, SMI, SGD Entity ids determine ids of associated ids in the one-to-many direction): SI -> MI MI -> DI MDI is a copy of DI, and MI is a copy of SMI. So: DI -> MDI MDI -> DI MI -> SMI SMI -> MI 1.4. University is not BCNF Candidate keys: (SI) DI -> DN, DA is a violation Closure: {DI, DN, DA, MDI} Decomposition: - D(DI, DN, DA, MDI) - MS(MI, MN, SI, SN, SMI, SGD, DI) D is BCNF MS is not BCNF Candidate keys: (SI) MI -> MN is a violation Closure: {MI, MN, SMI} Decomposition: - M(MI, MN, SMI) - S(SI, SN, SGD) M is BCNF S is BCNF COMMENTS: - MDI and SMI are useless, as they just copy DI and MI. They could be removed from the M and D tables, although this cannot be accomplished by the BCNF decomposition. - I was quite lenient in grading, accepting a wide variety of different decompositions leading to approximately the same result as obtained above, (e.g. omitting the SI/SMI and DI/MDI dependencies). Also, any order of decompositions was accepted, (since the decomposition algorithm specifies the selection of any FD violating BCNF). ---------------------------------------------------------------------- 2.1. select distinct name from salesperson join sale using (salesperson_id) where when = '1/5/2018' 2.2. select average(price) from sale join sale_date using (when) where weekday 2.3. select name from salesperson except select name from salesperson join sale using (salesperson_id) COMMENTS: - A number of students had the clever idea of using an outer join: select name from salesperson p left join sale s using (salesperson_id) where s.sale_id is null This answer received full credit. - To be really precise, the difference should be done on ids, because there could be two different salespeople with the same name, e.g. select name from ( select salesperson_id from salesperson except select salesperson_id from sale ) x This answer is actually better. I accepted both answers. 2.4. select d.when, sum(s.price) from sale_date d left join sale s using (when) group by d.when order by d.when ---------------------------------------------------------------------- 3. GENERAL COMMENTS: This is a question about B+-trees, not B-trees. Answers that relied on records (as opposed to keys) in interior nodes were penalized. 3.1. [10, 20] 3.2. [3,7] | | | +---+ | +---+ | | | [1,2] [3,6] [7,8] COMMENTS: - Many students provided trees that, while correct for the set of records, were not obtainable from the starting point by inserting 2 and then 6. 3.3. [37,55] | | | +------+ | +-----+ | | | [11,23] [37,42,53] [59,73] 3.4. [51] | | +-----------+ +---------+ | | [27,49] [82] | | | | | +-----+ | +----+ +----+ +--+ | | | | | [11,23,25] [37,42] [49,50] [64,73,80] [83,92,97] COMMENTS: - Interior node capacity is three keys. 49 is promoted to the [27,51,82] node, requiring a split and further promotion of 51. This leaves [82] all by itself. This violates the rule on space utilization (min 50%), but this is unavoidable in a B+-tree with wuch a low (and odd) capacity. 3.5. Two possible answers, depending on whether the middle node is merged with its left neighbor or its right neighbor, (both acceptable). 1) [55] | | +--+ +--+ | | [11,23,42] [64,73] 2) [42] | | +--+ +--+ | | [11,23] [42,64,73] COMMENTS: Leaving a leaf with 0-1 records was not an acceptable answer. ---------------------------------------------------------------------- 4. GENERAL COMMENTS: There seems to have been some confusion on this question. An index can be used even if the predicate specifies a PREFIX of the key columns. E.g., consider 4.1. The predicate is a = 1 and b = 2 and c > 3. The abc index can be used for this predicate. The ca index can ALSO be used, because the c column is the first column of the index. The ca index won't help with the a = 1 part of the query, but the index has still been used. So many students seem to have assumed that ALL index columns must be used (to consider the index for the query), that I put down the low scores on this question to a lack of clarity in the question itself. I will be scaling up the scores on this question. The answer below shows what indexes have been used, and which columns of the index. So for 4.1 again, abc indicates that all columns of the abc index can be used to evaluate the query. c(a) indicates that only the c column of the ca index is used. (I did not ask you to specify which columns are used. That information is provided here to explain my answers.) 4.1. abc c(a) 4.2. abc ca 4.3. a(bc) ca 4.4. a(bc) ca 4.5. ab(c) 4.6. a(bc) 4.7. a(bc) ca 4.8. c(a)