Thursday, March 31, 2011

Visualization: Pixels, Degrees and lots of data!

My visualization project has lead me down a rabbit hole I never knew existed.  When I think of the world I think about the miles between location X and location Y, which can be easily translated into latitudes and longitudes.  Though I never knew the actual conversion I knew it was mathematically trivial and I had never really thought about how this works in something like Google Maps.

Mapping Coordinates and Pixels
Of course when dealing with mapping software the world can no longer be represented as a ellipsoid.  The standard way to project a globe onto a flat surface is the Mercator projection.  Using this projection, we can display the globe as a flat map.  There is some weirdness about the way the Mercator projection works, which you can read about here.  Once the world has been projected into this form we can easily display the map, but Google maps this coordinate system onto yet another coordinate system, their tile system.  This tile system is likely not a surprise to anyone who has ever used google maps, but the way that it works did surprise me a little bit.  Google has a predefined tile size (which is 256 by standard).  As you zoom the entire world is broken up into more and more tiles, but the viewport has the same number in it.  For example, at zoom level 2 this is a 2x2 grid, but at the 19th zoom level it's 2097151 x 2097151.  That's a lot of tiles!

Why does it matter?
Now the question is, where does this fit into my visualization project?  What I need is the ability to map a lat/long (or group of lat/longs) into a particular tile at every zoom level.  The initial scope of this project is to gather about a week's worth of data and allow the user to view this data at all zoom levels.  The data is currently being gathered at about 6 tweets/second.  6 * 60 * 60 gives us 21,600 tweets per hour which is 3,628,800 (call it 4 million) for the week.  The obvious (and bad) solution to this problem would be to simply create a database of the 4 million tweets at each zoom level and then when I wanted information for a particular tile (for a particular time) I would calculate the number of tweets in that tile/time, make a color out of that number and color the tile.  Obviously this solution is very bad and we can do much better.  The initial plan is to do a bunch of data preprocessing that will allow the data to grouped by both location and time, but that is a topic for another day.


No comments: