OPEN BOOKS -
OPEN NOTES - CLOSED ELECTRONIC DEVICES other than simple
calculators. Show all work on these pages.
1. Shorter answer
2. B+ Trees
Consider a B+-tree that is stored on a disk with page size 8192 bytes, with 8000 bytes usable. A key requires 40 bytes of storage, a page pointer requires 10 bytes of storage, and a RID requires 12 bytes. The index (unclustered) uses Alternative 2 for the data entries in leaf nodes. Assume that you have a file that is indexed with 1,000,000 records and that the nodes (both leaf and internal) are full. Find how many levels the tree has, and how many nodes at each level.
3. Relational Operators
You are given the following two relations:
Employees (eid:integer, did:integer, salary:integer, age:integer)
Departments (did:integer, dname:string, floor:integer, budget:integer)
Employee occupies 500 pages and Departments occupies 100 pages.
Assume that all attributes within the same relation have the same length. Employees are uniformly distributed over 1000 departments. An Employee page fits 40 tuples, while a Department page fits 50 tuples. Age values are uniformly distributed in the range [21:60]. Budgets are recorded in units of $1000 and are uniformly distributed in the range 100 to 500. Assume there are B=22 buffers available. Ignore the final output cost of the result for all cases.
(a) What is the I/O cost to perform a block nested loop join of the two tables?
(b) Assume that you are computing the projection of Employee on (did, age) with duplicate elimination using a sort-based approach. How much data needs to be sorted? _______ pages. Show the passes that are needed and the estimated i/o cost for each, and the total i/o cost. As usual, don't include any i/o cost for writing out the sorted data.
(c) You are given the query:
SELECT SUM(D.budget)
FROM Departments D
WHERE budget > 400
What is the approximate I/O cost to execute this query given a clustered Alternative 1 B+-tree index on budget? Show calculation of the appropriate reduction factor first.
(d) What is the I/O cost to execute this query using an unclustered Alternative 2 B+-tree index on budget? Is a table scan better?
4. Query Optimization. Use the same setup as in the last problem, and also given a clustered B+-tree index for Departments on budget and a unclustered hash index for Employee on did. You are given the query:
SELECT D.did, E.age
FROM Departments D, Employees E
WHERE D.did = E.did AND E.age > 50 AND D.budget > 400
Find the best (lowest i/o cost) nested-loops join plan.
a. Show the query tree for the best nested loops plan, the one that can use both indexes.
b. Specify the access method for each table (i.e., what index is in use)
c. Estimate the i/o cost of the query.
5. Concurrency Control
2) (30 points.) Consider the following schedule
R1(A) W1(A) R2(B) R1(B) R2(A) W1(B) C1 W2(A) C2
a. Identify all conflicting pairs of operations above by connecting them with arcs above the printed line.
b. Now draw the dependency graph below.
c. Explain how you know from b. that the schedule is or is not serializable.
d. The logic of the program for T1 is as follows. T1 is deducting 10 from account A, checking the balances of two accounts A and B, and subtracting 70 from B as long as the sum of balances A and B will remain > 0. The program for T2 is subtracting 70 from account A as long as the sum of the two balances will remain > 0.
Rewrite the schedule to show how it would run with strict two-phase locking in force, with A and B originally both 50. When you see a deadlock, choose T1 to abort. Don’t retry the aborted transaction (retries are optional and depend on the programming details.) Show the history with locking operations and determine the final values of A and B.
6. Recovery. Again consider the history from the last problem:
a. Copy your final schedule, from problem 5d: Drop the locking calls for this problem, and be sure to include only actions that actually happened in the final schedule.
Executed Schedule:
b. Assume A is on page P1 and B is on page P2. The system crashes after all these operations have been logged, including the abort handling operations. Show the log just before the crash by finishing the list below. Then show the normal recovery (no crash during recovery.) Assume that the ARIES protocol is set up for this DBMS. Show the dirty page table and transaction table at the end of the Analysis phase. Show the steps of the redo and undo phases. Show the resulting additions to the log file (after complete recovery). Use the back of the previous page for more space.
00 Begin checkpoint
10 End checkpoint
20 Update: T1 writes P1
--- CRASH ---
Recovery: