Climate Change

About this dataset: http://www.cru.uea.ac.uk/documents/421974/1295957/Info+sheet+%231.pdf/c612fc7e-babb-463c-b5e3-124ac76680c5

Monthly temperature readings in degree's celsius from baseline established by 50 year average. So, +1.2 means "1.2 degrees celsius above the baseline." Columns from several datasources, but we'll be focusing on three reading synthesized by the Climate Research Unit calculated back to 1850. They provide three data points: "NH" for "Northern Hemisphere", "SH" for "Southern Hemisphere", and a "GL" for "Global."

In [2]:
# usual set up
%pylab inline
rcParams['figure.figsize'] = 14,7
import pandas as pd
Populating the interactive namespace from numpy and matplotlib

In [3]:
# load the data from the file
data = pd.read_csv('/Users/oranlooney/Dropbox/code/python/scipy/climate_change_data.csv')
In [4]:
# focus in on the data we're interested in today
data = data[['Year', 'Month', 'Date', 'CRU3 NH', 'CRU3 SH', 'CRU3 GL']]
data = data.rename(columns={'CRU3 NH':'North', 'CRU3 SH':'South', 'CRU3 GL':'Global'})
In [5]:
# sanity check: yep, they're numbers
data.head()
Out[5]:
Year Month Date North South Global
0 1850 1 1850.042 -2.98 -1.26 -2.12
1 1850 2 1850.125 0.30 -1.56 -0.63
2 1850 3 1850.208 -1.99 0.44 -0.77
3 1850 4 1850.292 -1.43 -0.86 -1.14
4 1850 5 1850.375 -1.37 -0.46 -0.92
In [6]:
# statistical moments and ranges of each column
data.describe()
Out[6]:
Year Month Date North South Global
count 1956.000000 1956.000000 1956.000000 1941.000000 1941.000000 1941.000000
mean 1930.993865 6.500000 1931.493865 -0.558032 -0.413122 -0.485095
std 47.054694 3.452935 47.055573 0.582007 0.398859 0.415996
min 1850.000000 1.000000 1850.042000 -3.220000 -1.960000 -2.160000
25% 1890.000000 3.750000 1890.771000 -0.860000 -0.670000 -0.740000
50% 1931.000000 6.500000 1931.500000 -0.540000 -0.410000 -0.500000
75% 1972.000000 9.250000 1972.229000 -0.240000 -0.150000 -0.250000
max 2011.000000 12.000000 2011.958333 1.720000 0.930000 1.000000

why mean and std are (sometimes) meaningless: Abscombe's Quartet

Anscombe's Quaret

Anscombe's Quaret

In [7]:
# start by simply plotting all available data to get an intuitive picture
data.plot(x='Date', y=['North', 'South', 'Global'])
Out[7]:
<matplotlib.axes.AxesSubplot at 0x10808b390>
In [8]:
# let's zoom in on the last data to get a close up picture of how the data varies.
last = max(data['Date'])
last_decade = data[ data['Date'] >= last-10][['Date', 'North', 'South', 'Global']]
last_decade.plot(x='Date')
linear = linspace(last-10, last, 100)

plot(linear, 0.3 + 0.3*cos(linear*2*pi), 'y--')
Out[8]:
[<matplotlib.lines.Line2D at 0x1080970d0>]
In [9]:
# I suspect seasonal variation, so I'm going to use a Fast Fourier Transform to check for periodicity in the data.
import statsmodels
from scipy import fftpack as fft
In [10]:
# Check the 'North' temperature data for seasonal variation
temps = data['North'].fillna(method='pad')

# rfft => real fast-fourier transform
weights = fft.rfft(temps)
frequencies = fft.rfftfreq(temps.size)

plot(frequencies, weights)
ylim([-10, 250])

annotate('annual', xy=(1.0/12.0, 80), xytext=(1/12.0, 100), arrowprops=dict(arrowstyle='->',))
Out[10]:
<matplotlib.text.Annotation at 0x108684f50>
In [11]:
# Check the global temperature data for seasonal variation
temps = data['Global'].fillna(method='pad')

# rfft => real fast-fourier transform
weights = fft.rfft(temps)
frequencies = fft.rfftfreq(temps.size)

plot(frequencies, weights)
ylim([-10, 250])
Out[11]:
(-10, 250)
In [12]:
# fit a linear regression model to the 'Global' temperature data
import statsmodels.api as sm

X = sm.add_constant(data['Date'])
Y = data['Global'].fillna(method='pad')

linear_model = sm.OLS(Y, X)

results = linear_model.fit()
results.summary()
Out[12]:
OLS Regression Results
Dep. Variable: Global R-squared: 0.363
Model: OLS Adj. R-squared: 0.362
Method: Least Squares F-statistic: 1111.
Date: Sun, 19 Jan 2014 Prob (F-statistic): 2.64e-193
Time: 15:30:43 Log-Likelihood: -644.13
No. Observations: 1956 AIC: 1292.
Df Residuals: 1954 BIC: 1303.
Df Model: 1
coef std err t P>|t| [95.0% Conf. Int.]
const -10.8928 0.312 -34.858 0.000 -11.506 -10.280
Date 0.0054 0.000 33.337 0.000 0.005 0.006
Omnibus: 44.414 Durbin-Watson: 0.924
Prob(Omnibus): 0.000 Jarque-Bera (JB): 70.180
Skew: 0.210 Prob(JB): 5.76e-16
Kurtosis: 3.827 Cond. No. 7.93e+04
In [13]:
# let's graph the fit line to see if it makes sense
data.plot(x='Date', y=['Global'])
plot(data['Date'], data['Date'] * 0.0054 - 10.89, 'r-', linewidth=2, label='fit line')
Out[13]:
[<matplotlib.lines.Line2D at 0x1080cc7d0>]