From Maura:
I planned to finish up chapter 6 with some stunning examples of graphical representations of data. We never got there. Students had many questions about the homework, including how to calculate measures of central tendency from a histogram. We spent most of the class time reviewing that with the Erewohn housing data. I did sneak in some graphical representations of data work as we first looked at the graph of combined housing and condo sales and took time to make observations about it. The class made some good quantitative observations and that was good practice. Then we did the hard (but ultimately worthwhile) work of calculating the median and mean from the grouped data. There were some complaints along the lines of, “when are we going to do math again!” The Excel work is tedious for some students, especially those who are not as comfortable with it. At the end I showed them how great spreadsheets are for the “what if” questions – we added a few house sales and saw that the median and mean were immediately updated. That’s powerful, and worth the tedium.
From Ethan:
Last class before spring break. I was tentatively planning to ask them to look at some data visualization examples, in groups, and present conclusions. Several students wanted help on the US income distribution problem from the homework, so we did that first. Of course “first” turned out to be most of the class, but it was a very useful class. Here are the high points:
- While working out the mean household income I came up with $644K. Red flag! That’s larger than the largest entry being averaged, so we knew instantaneously that something was wrong. It took several minutes to track down just what was wrong – essentially, using [=Cn*Dn] where [=Bn*Dn] was correct.
- Dealing with the missing 3% of households. First I did a simple example (computing grade so far in a semester before the final, so that weights of existing items summed to less than 100%). Then we added a line to the spreadsheet for the remaining households, and discovered the dramatic difference in the mean as a function of the mean income we assigned to that top 3%. The power of “what-if” in Excel was visible. So was the extent of the income disparities in the US, so we were visualizing data after all.
- Looking at the source of the picture. In this case, wikipedia, with the census bureau listed there as the source of the data (so probably a wikipedia site that can be trusted). It was clear that the chart there was created in Excel. I suggested that we should edit that wikipedia page in order to replace the chart with one displaying the same information, but without the visually distracting fake 3D. And that we should add a note about the missing 3% to the caption. I was very tempted to do it on the fly, right in class. I may assign it as an optional exercise. Of course only the first person who tries can do it, since then it will be done …
We did have fifteen minutes left to spend on data visualization. We looked very quickly at the three links I’d prepared:
- Charles Blow’s NY Times graphic (local copy)
- The Billion Dollar-o-Gram
- NY Times interactive federal budget graphic
– all three of which contributed to the ongoing discussion of where the money in this country comes from and where it goes. I promised more along those lines when we reconvene after break and get to income tax computations (before April 15).
blog home page