cs634 hw5

Due Wed., Apr. 18 in class, on paper. Note that Mon., Apr. 16 is a holiday, 20 points

See directory /data/htdocs/cs634/setquery for the files that were previously used to load the bench table on Oracle that you queried in homework 2. Also there is loadbench_mysql.sh for the mysql bulk load. Create a hw5 directory under ~/cs634 on pe07.cs.umb.edu for your files for this assignment. Copy everything from setquery to your hw5 directory. Convert them to set up the bench table in your own mysql installation (in your cloud VM), in the database <username>db that you created in hw3. Copy them (use scp on your VM to do this) to a setquery directory (~/setquery) on your cloud system. Then compile the data generator program setq_dbg.c (gcc -o setq_dbg setq_dbg.c). If gcc is not there, see StackOverflowQuestion. Read the header comment to see how to run it, and create a datafile to load table bench with 1M (1,000,000) rows. Best to try it out for 10 rows before creating the big file. Use loadbench_mysql.sh for the bulk load.

setq_dbg.c: data generator program, compile and run to
        make input file bench.dat

        setq_cre.sql: create table (modified from Oracle case)

        loadbench_mysql.sh: load the table from /var/lib/mysql-files/bench.dat

        setq_index1.sql: build secondary indexes (modified from Oracle case)

        analyze.sql: run analyze on bench (modified from Oracle case)

        check_bench.sql: get basic info on table from catalogs (modified from
        Oracle case)

        scan.sql: sample table scan, with one row of output

        queryK500K.sql: your query from hw2 that uses K500K (on column with
        index)

        queryK250K.sql: your query from hw2 that uses K250K (on column without
        index)

Leave the final versions of these files in your hw5 directory, i.e., copy them back from the cloud if necessary. In your homework, report on your problems and transcript of run the last four files listed above. Also include output listing all the indexes on table bench. Note that "script" is available in the cloud Linux environment.

a. With this bench table setup in your cloud VM, try the queries we looked at in class 13, as follows. Show the MariaDB "explain" output for them and say whether the plan is the same or different from Oracle's (as reported in the class13 slides) and if so, how. Optionally, also compare to pe07's mysql, which now has the bench table publicly available in database setq_db. Note that pe07 has a much more powerful CPU than the one we get on our cloud VMs for free.

iii. select max(b1.s2) from bench b1, bench b2 where b1.k40k=b2.kseq and b2.k5=2;

b. Try the query I found that uses hash join after I set join_cache_level=8 in /etc/mysql/mariadb.conf.d, restarted the server with "service mysql restart", and used the command MariaDB> set join_cache_level=8. The join_cache_level value returns to the default when you exit mysql. MariaDB refuses to acknowledge the existence of the system variable join_cache_level until you put it in the conf file, and even then still uses the default level of 2 (no hash joins) until you use the set command. See doc on MariaDB system variables. Here's the query:

Try the query (report timing and plan) before and after editing the conf file and setting join_cache_level to 8 (its highest level, allowing all its join methods to be considered). See if the hash join improves the timing. A better test would involve rebooting Linux to clear the buffer cache before each execution (optional, and see next problem about rebooting). You can leave the config-file edit in place because it is only effective if you do the set command in a session.

Problem 3 Data Buffering in Mysql: as DBA, you control this important configuration

Note that we can bring down the whole Linux system by using the stop button on the VM Instance Details page, accessible from the SSH window by the gear icon in the upper right corner. Then we can start it again from there. Stopping the Linux system brings down the mysql server, while bringing it up starts up the mysql server. We can leave the OS up and bring down just the mysql server itself by using "sudo service mysql stop", and bring it back up with "sudo service mysql start". Or do both with "sudo service mysql restart". You can see that the OS knows about mysql, via scripts in /etc/init.d. See "man service" for details if interested. Note that a DBA normally has sudo access to the Linux server for a database.

a. Show that is is necessary to reboot the Linux server, not just restart the mysql server, to force mysql (as originally set up) to read disk, thus taking much longer to answer a scan query. Specifically: reboot Linux server, run scan.sql, restart mysql server, run queryK250K.sql, which also does a table scan. Finally, run scan2.sql, another table-scan query of your choice. Display query/output and explain your results. We need to use different table-scan queries because it appears that mysql and MariaDB sometimes cache query results, leading to almost instantaneous query answers.

b. To make mysql more like Oracle, and avoid buffering data twice over, turn off the use of OS buffering by putting innodb_flush_method = O_DIRECT in my.cnf or its included file (in the [mysqld] area). Restart mysql to put this into action. Rerun the experiments of part a. Now the buffer pool should be the only source of buffering, and it is too small (128MB) to hold the bench table.

c. Reconfigure the buffer pool size to 500MB and rerun the three-scans experiment. Explain how you increased the buffer size and the results of the experiment. Put the buffer pool size back to its original value when you're done.

Problem 3X (don't turn in anything for this) edited 4/12

Please check that you can set up a working (Apache) web server in your cloud VM by following these steps:

1. Go to your Compute Engine VM Instances page and look at the Firewall section. If "Allow HTTP Traffic" is not checked, click Edit at the top of the page and then check this field. Then click Save at the bottom of the page.

2. Connect to your VM with ssh. Browse to https://cloud.google.com/compute/docs/tutorials/basic-webserver-apache, and copy the command to install apache (sudo apt-get update && sudo apt-get install apache2 -y) to your ssh window on your VM. The next command on that page installs a simple html file at the root of the web server (echo '<!doctype html><html><body><h1>Hello World!</h1></body></html>' | sudo tee /var/www/html/index.html). You can display it by "cat /var/www/html/index.html" on your VM and by browsing to your IP address as explained in the tutorial.

3. (as in older version of these instructions) Put a simple hello.html file in /var/www/html/hello.html and access it by browsing to http:<your External IP address>/hello.html. Your External IP address is shown on your VM Instances page. It is a link there, but the link address may use https: instead of http: so clicking on it may not work immediately: you may need to edit out the s in your browser address bar, and add /hello.html. You may also need to check "Allow HTTP Traffic" on the VM Instance Details page. Let me know if you have trouble with this.

The following sequence shown on the slide assumes the R1(B) executes after X2(B) but before the R2(B):

S₁(A) R₁(A) S₂(A) R₂(A) X₂(B) <S₁ (B)-blocked> R₂(B) W₂(B) C₂ <S₁ (B)-unblocked> R₁(B) C₁

But the simple interpretation of the above sequence is as follows, where T2 does the R2(B) just after getting the lock for it, so use this as a model:

S₁(A) R₁(A) S₂(A) R₂(A) X₂(B) R₂(B) <S₁ (B)-blocked> W₂(B) C₂ <S₁ (B)-unblocked> R₁(B) C₁

Run the following sequence under Strict 2PL. Show the waits-for-graph as the sequence of actions progresses.