By Adam Grossman on August 29, 2013.
Take NOAA’s Global Forecast System (GFS). It had a resolution of 0.5 degrees, which is 1/10th the resolution (and 1/100th the number of points) of our final image. Here is a comparison of GFS versus Quicksilver for a portion of northern Africa:
So how do we create the final high-resolution product from these low resolution sources? The trick is to realize that, below a certain scale, local changes in temperature are caused not by the weather itself, but by geographical effects. Things such as elevation, foliage cover, ground albedo, and terrain can and do skew the temperature away from its baseline. These are called microclimate effects.
Our existing sources provide a decent map of the weather conditions, but are too low resolution to resolve the microclimate effects caused by geography. To solve that, we need to generate a model of how the geography affects temperature on a local scale, and apply it as a perturbation to the low resolution data.
First, we gather temperature measurements from thousands of ground stations around the globe, using NOAA’s freely available Integrated Surface Database. These provide precise and accurate data on the temperature at specific points on the Earth, at specific times.
Next, we gather as much data as we can about the local geography at these points. We determine elevation using the USGS’s Global Multi-resolution Terrain Elevation Data, which has worldwide coverage at a resolution of 30 arc-seconds.
We also pull in land-surface measurements from NASA’s two Moderate Resolution Imaging Spectroradiometers aboard the Terra and Aqua satellites. This provides us with crucial high resolution data on land-surface temperatures at nearly every point on Earth. The satellite revisit times are too low to provide a real-time snapshot — but it does provide us with the average high and low ground temperatures for a given month, which can show us which areas are generally cooler or hotter than their surroundings, and what the day/night temperature swings are. But because the data represents land-surface temperature rather than air temperature (which we’re after), we can’t use it directly.
MODIS also provides us with detailed maps of vegetation cover, and ground emissivity and albedo.
Next, we want to determine how all these variables — elevation, daily ground temperature swing, vegetation, emissivity and albedo, plus latitude and day of the year — are correlated with observed average air temperature from the ground station measurements. We can’t use them to predict the actual temperature observations themselves, since those are determined largely by the weather, but we can correlate them with differences in average daily high and low temperatures from one station to another.
For example, if there are two nearby stations and one of them is on the top of a mountain while the other is in a valley, we may find that — on average — the station on the mountain is 4 degrees cooler during the day and 10 degrees cooler at night. By correlating this average temperature difference with our geographical parameters at the two points, we can create a map of how the microclimate in the region varies with changing conditions (in this case, primarily elevation).
However, there is a problem: once we have all this data, how we do compute the correlation? Normally, you’d just perform a linear regression on the data. But a linear regression assumes that our variables are independent from each other and exhibit a linear relationship with the observed temperature. This almost certainly isn’t the case — the variables can be strongly correlated in complex non-linear ways. For example, how much vegetation there is depends, in part, on elevation and latitude. Ground emissivity, in turn, is influenced by vegetation. The daily temperature swing is affect by all of these.
Instead, we need to perform a nonlinear regression. But that poses a difficulty: to pick a good nonlinear regression model, it’s useful to understand how the variables are related. But without doing complex physical modeling of the sun/earth/biosphere interactions, it isn’t so clear in our case.
So the approach we took was to pick a tool we already use in Dark Sky: the neural net. You may recall that we used neural nets to classify radar blobs in our radar image cleaning system. But at their core, neural nets — despite their romantic sounding name — are simply nonlinear function estimators. And their main advantage for us is that they can be used when the nature of the nonlinearity isn’t precisely known. Properly tuned and trained with a sufficient corpus of test data, they can often perform as well or better than other nonlinear regression techniques.
So that’s what we did here. The end result was two neural nets, one that computes average daily high temperature for a given set of inputs, and one that computes average daily low temperature. From there, we can figure out the average temperature at any point during the day:
The key thing to realize, though, is that the output of these neural nets can’t be used directly. The temperature at given point and time depends much more on the specific weather conditions at that point than on the geographical effects of elevation, etc. What it provides us, though, is the relative temperature difference — i.e., the temperature deviation due to microclimate effects. We can use this relative difference to adjust the low resolution temperature models we already have, effectively creating a much higher resolution map.
Take GFS again. Each grid point of GFS data corresponds, roughly, to the average temperature over a 35 mile area. Knowing the specific geographical conditions at points within that area, we can adjust the temperature for each point. The end result is a much higher resolution image:
At this point we’re almost done. The last step is to bias the output based on actual ground station measurements. Models such as GFS are notoriously unreliable when it comes to predicting near-surface temperature (they’re mostly used for their precipitation forecasts). Fortunately, the bias tends to be relatively smooth and uniform, so we can use real-time ground station measurements to tweak the model.
Once that’s done, we do it again for each source data we have and then perform a weighted average of all of them to produce a final high-res grayscale image, which the fine folks at MapBox have converted into a pretty tileset:
Of course, there are several problems with the way we generate these images that can lead to inaccuracies:
Despite these issues, we encourage people to play with the data. It can be downloaded as big honkin’ 16-bit GeoTIFFs, here, which update once an hour. If you do anything fun with it, please let us know!