CS634 – Homework 5
Due Wed., Apr. 18 in class, on paper. Note that Mon., Apr. 16 is a holiday, 20 points
Transactions, Locking, MariaDB in the Cloud
Problem 1 Loading the
bench table into your own mysql
See directory /data/htdocs/cs634/setquery for the files that were previously used to load the bench table on Oracle that you queried in homework 2. Also there is loadbench_mysql.sh for the mysql bulk load. Create a hw5 directory under ~/cs634 on pe07.cs.umb.edu for your files for this assignment. Copy everything from setquery to your hw5 directory. Convert them to set up the bench table in your own mysql installation (in your cloud VM), in the database <username>db that you created in hw3. Copy them (use scp on your VM to do this) to a setquery directory (~/setquery) on your cloud system. Then compile the data generator program setq_dbg.c (gcc -o setq_dbg setq_dbg.c). If gcc is not there, see StackOverflowQuestion. Read the header comment to see how to run it, and create a datafile to load table bench with 1M (1,000,000) rows. Best to try it out for 10 rows before creating the big file. Use loadbench_mysql.sh for the bulk load.
Files: in order of use (we don't create a tablespace in mysql):
setq_dbg.c: data generator program, compile and run to
make input file bench.dat
setq_cre.sql: create table (modified from Oracle case)
loadbench_mysql.sh: load the table from /var/lib/mysql-files/bench.dat
setq_index1.sql: build secondary indexes (modified from Oracle case)
analyze.sql: run analyze on bench (modified from Oracle case)
check_bench.sql: get basic info on table from catalogs (modified from
Oracle case)
scan.sql: sample table scan, with one row of output
queryK500K.sql: your query from hw2 that uses K500K (on column with
index)
queryK250K.sql: your query from hw2 that uses K250K (on column without
index)
Leave the final versions of these files in your hw5 directory, i.e., copy them back from the cloud if necessary. In your homework, report on your problems and transcript of run the last four files listed above. Also include output listing all the indexes on table bench. Note that "script" is available in the cloud Linux environment.
Problem 2 Query Plans for MariaDB.
a. With this bench table setup in your cloud VM, try the queries we looked at in class 13, as follows. Show the MariaDB "explain" output for them and say whether the plan is the same or different from Oracle's (as reported in the class13 slides) and if so, how. Optionally, also compare to pe07's mysql, which now has the bench table publicly available in database setq_db. Note that pe07 has a much more powerful CPU than the one we get on our cloud VMs for free.
i. select max(s1) from bench where k500k=2;
ii. select max(s1) from bench where k100=2;
iii. select max(b1.s2) from bench b1, bench b2 where b1.k40k=b2.kseq and b2.k5=2;
b. Try the query I found that uses hash join after I set join_cache_level=8 in /etc/mysql/mariadb.conf.d, restarted the server with "service mysql restart", and used the command MariaDB> set join_cache_level=8. The join_cache_level value returns to the default when you exit mysql. MariaDB refuses to acknowledge the existence of the system variable join_cache_level until you put it in the conf file, and even then still uses the default level of 2 (no hash joins) until you use the set command. See doc on MariaDB system variables. Here's the query:
select max(b1.s2) from bench b1, bench b2 where b1.k250k=b2.k250k;
Try the query (report timing and plan) before and after editing the conf file and setting join_cache_level to 8 (its highest level, allowing all its join methods to be considered). See if the hash join improves the timing. A better test would involve rebooting Linux to clear the buffer cache before each execution (optional, and see next problem about rebooting). You can leave the config-file edit in place because it is only effective if you do the set command in a session.
Note that we can bring
down the whole Linux system by using
the stop button on the VM Instance Details page, accessible from the
SSH window by the gear icon in the upper right corner. Then we can
start it again from there. Stopping the Linux system brings down the
mysql server, while bringing it up starts up the mysql server. We can
leave the OS up and bring down just the mysql server itself by using
"sudo service mysql stop", and bring it back up with "sudo service
mysql start". Or do both with "sudo service mysql restart". You can
see that the OS knows about mysql, via scripts in /etc/init.d. See
"man service" for details if interested. Note that a DBA normally has
sudo access to the Linux server for a database.
a. Show that is is
necessary to reboot the Linux server, not just restart the mysql
server, to force mysql (as originally set up) to read disk, thus
taking much longer to answer a scan query. Specifically: reboot Linux
server, run scan.sql, restart mysql server,
run queryK250K.sql,
which also does a table scan.
Finally, run scan2.sql, another table-scan query of your choice.
Display query/output and explain your results. We need to use
different table-scan queries because it appears that mysql and MariaDB
sometimes cache query results, leading to almost instantaneous
query answers.
b. To make mysql more like Oracle, and avoid buffering data twice over,
turn off the use of OS buffering by putting innodb_flush_method
= O_DIRECT
in my.cnf or its included file (in the [mysqld]
area). Restart mysql to put this into action. Rerun the experiments of
part a. Now the buffer pool should be the only source of buffering, and
it is too small (128MB) to hold the bench table.
c. Reconfigure the buffer pool size to 500MB and rerun the three-scans experiment. Explain how you increased the buffer size and the results of the experiment. Put the buffer pool size back to its original value when you're done.
Problem 3X (don't
turn in anything for this) edited 4/12
Please check that you can set up a working (Apache) web server in your cloud VM by following these steps:
1. Go to your Compute
Engine VM Instances page and look at the Firewall section. If "Allow
HTTP Traffic" is not checked, click Edit at the top of the page and
then check this field. Then click Save at the bottom of the page.
2. Connect to your VM
with ssh. Browse to https://cloud.google.com/compute/docs/tutorials/basic-webserver-apache,
and copy the command to install apache (sudo apt-get update &&
sudo apt-get install apache2 -y) to your ssh window on your VM. The
next command on that page installs a simple html file at the root of
the web server (echo '<!doctype
html><html><body><h1>Hello
World!</h1></body></html>' | sudo tee
/var/www/html/index.html). You can display it by "cat
/var/www/html/index.html" on your VM and by browsing to your IP
address as explained in the tutorial.
3. (as in older
version of these instructions) Put a simple hello.html file in
/var/www/html/hello.html and access it by browsing to
http:<your External IP address>/hello.html. Your External IP
address is shown on your VM Instances page. It is a link there, but
the link address may use https: instead of http: so clicking on it may
not work immediately: you may need to edit out the s in your browser
address bar, and add /hello.html. You may also need to check "Allow
HTTP Traffic" on the VM Instance Details page. Let me know if you have
trouble with this.
Worked example of Schedules and 2PL locking, for next questions:
Recall the example from class 18: original sequence
R1(A) R2(A) R2(B) R1(B) W2(B) R2(B) W2(B) C1 C2
Running with strict 2PL, with T2 requesting a write lock for R(B) W(B):
The following sequence shown on the slide assumes the R1(B) executes after X2(B) but before the R2(B):
S1(A) R1(A) S2(A) R2(A) X2(B) <S1 (B)-blocked> R2(B) W2(B) C2 <S1 (B)-unblocked> R1(B) C1
But the simple interpretation of the above sequence is as follows, where T2 does the R2(B) just after getting the lock for it, so use this as a model:
S1(A) R1(A) S2(A) R2(A) X2(B) R2(B) <S1 (B)-blocked> W2(B) C2 <S1 (B)-unblocked> R1(B) C1
Problem 5
R1(A) R2(B) W1(B) W2(C)R3(C)W3(A) C1 C2 C3
R2(X) W3(Y) R1(Y) W2(X) W1(X) W1(Y) W4(Y),W4(X) C1 C2 C3 C4
Problem 7
Run the following sequence under Strict 2PL. Show the waits-for-graph as the sequence of actions progresses.
W1(A) R2(B) W2(B) W3(C) R1(B) W1(B) W2(C) W3(A) C1 C2 C3 C4