Tony Hirst of the Open University pretty much left us all with our jaws on the floor as he romped through a variety of methods to pull, claw, scrape, coax, tease and otherwise harvest structured and unstructured data from such varied sources as PDFs, APIs, HTML tables and other screen-scrapable pages, CSVs and other utilities – in a presentation at the Banff Learning Analytics conference – #LAK11. The data is imported/mashed up into Google Spreadsheets. It is cleansed to be consistent and comparable, and then queried, analyzed and visualized. As Tony put it: you then have a “conversation with the data.”
That is a nicely humanizing concept, a cherry on the cake of a home-made and ingenious approach to making a tasty soup from a handful of leftovers and wilted vegetables in the crisper. Assuming that the world will continue to be a messy place for a long, long time, we don’t have to wait for perfect IT systems and teams, along with a semantic web, to do some interesting and creative work in data analysis.