Diving in to spatial interpolation

Now that I’m playing with some spatial data, it would be nice to make the sorts of geographic heatmaps that we all so admire, like this one:

I find anything spatially oriented to be a particularly difficult thing to get into. There are tons of blog posts about how to do a regression or build a decision tree or do image classification, but I’ve only come across a few really nice ones for geographical data (Zev Ross, I’m looking at you). Even the nice blog posts seem to have a hitch in them. I have yet to try one out in R that worked perfectly without any further research. For example, in the post I linked to above, the US Census links to download data were no longer providing the same extensive data as before, and there were also some hidden dependencies in the loaded libraries that showed up as some kind of ‘lacking permission’ error message, which was quite frustrating.

This time around I thought I’d start nice and easy by finding some code to go over and commenting it to explain it to myself. In this case I found same sample code on a page hosted by the University of Tartu. It was itself recycled from an earlier RDocs posting. That can sometimes indicate a real dearth of examples, when even examples you find are recycling older examples. Which is of course what I’m doing now as a learning exercise.

Here’s the code in full if you want to see the code in full. For me, the purpose of going through and commenting the code was to force myself to understand each step. When you start something new it’s easy to let your eyes glide over as you tell yourself you understand, but then when you want to code something up yourself you can spend quite a lot of time frustrated because you missed some simple detail.

So I started by looking at the libraries the script used:

# ggmap is spatial visualization with ggplot2

# ggplot2 can plot just about anything, including maps

# gstat is designed to do statistics operations specifically designed for geographic data
# for example it provides methods to determine whether trends are directional (estimate the spatio-temporal anisotropy)

# according to its own catalog, this provides "classes and methods for spatial data"
# as well as utility functions for plotting data as maps
# also methods for retrieving coordinates
# we will use the following methods: coordinates() & gridded()

# fairly generic sounding description again
# "set of tools for manipulating and reading geographic data, in particular ESRI shapefiles"
# we use it for readShapePoly()

The crucial ones here are gstat and sp. Based on mere impressions from reading this script, it seems as though gstat does the spatially-oriented computations and sp provides the sort of data objects you’d want to operate on with gstat.

Then the script does a lot of plain vanilla work. It geocodes some locations using ggmap::geocode and then joins these coordinates to the original place names. Also joined to the coordinates (loaded in via data frames from a few .csv files) are information for the weather stations for temperatures on two dates.

The handwavey bit for me, even after reading the code, is this:
coordinates(estonia_air_temperature_2_test) = ~x + y
At this point the data frame being passed to coordinates does have x and y columns that correspond to longitude and latitude respectively. All I could find out about the coordinates() method is that it makes a `spatial` object, whatever that is. From looking at the attributes, you can see that the spatial object retains all the original info in the data frame and also adds some spatial info, including a bounding box attribute.

You can also make a grid once you’ve made a spatial object, like so. The trick is that it’s already  grid numerically…you’re just formally declaring it as such

grd <- expand.grid(x = seq(from = x.range[1], to = x.range[2], by = 0.1), y = coordinates(grd) <- ~x + y
gridded(grd) <- TRUE

And then finally the ‘magic’ (which we should all really know in detail from reading the source code…but I won’t do that as a newbie) happens in the gstat::idw function as we see here:

idw <- idw(formula = may12 ~ 1, locations = estonia_air_temperature_2_test, newdata = grd)

We use our existing info in a data frame for locations and our newly formed grid as the newdata. The formula projects one of the temperature columns onto constant 1, which means we just want to estimate that column’s value.

Finally, you’ll want to plot the data. It turns out that idw returns some kind of output that can work with gmap::geom_tile to create a heat map, which is what you’ll have if you don’t put your original points or your geographic data into your image:


Of course this is far more informative if we put our actual data points on the map and even an ESRI shape file showing political contours for the region. Voila.


I promise the political boundaries are there even if they’re not very clear. Why not run the script yourself and read the comments I’ve added to the script in full? Thanks very much to the original crafters of this code. I got to learn a relative lot in a relative short period of time thanks to this helpful example.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s