Climate Change

About this dataset: http://www.cru.uea.ac.uk/documents/421974/1295957/Info+sheet+%231.pdf/c612fc7e-babb-463c-b5e3-124ac76680c5

Monthly temperature readings in degree's celsius from baseline established by 50 year average. So, +1.2 means "1.2 degrees celsius above the baseline." Columns from several datasources, but we'll be focusing on three reading synthesized by the Climate Research Unit calculated back to 1850. They provide three data points: "NH" for "Northern Hemisphere", "SH" for "Southern Hemisphere", and a "GL" for "Global."

In [2]:

# usual set up
%pylab inline
rcParams['figure.figsize'] = 14,7
import pandas as pd

Populating the interactive namespace from numpy and matplotlib

In [3]:

# load the data from the file
data = pd.read_csv('/Users/oranlooney/Dropbox/code/python/scipy/climate_change_data.csv')

In [4]:

# focus in on the data we're interested in today
data = data[['Year', 'Month', 'Date', 'CRU3 NH', 'CRU3 SH', 'CRU3 GL']]
data = data.rename(columns={'CRU3 NH':'North', 'CRU3 SH':'South', 'CRU3 GL':'Global'})

In [5]:

# sanity check: yep, they're numbers
data.head()

Out[5]:

	Year	Month	Date	North	South	Global
0	1850	1	1850.042	-2.98	-1.26	-2.12
1	1850	2	1850.125	0.30	-1.56	-0.63
2	1850	3	1850.208	-1.99	0.44	-0.77
3	1850	4	1850.292	-1.43	-0.86	-1.14
4	1850	5	1850.375	-1.37	-0.46	-0.92

In [6]:

# statistical moments and ranges of each column
data.describe()

Out[6]:

	Year	Month	Date	North	South	Global
count	1956.000000	1956.000000	1956.000000	1941.000000	1941.000000	1941.000000
mean	1930.993865	6.500000	1931.493865	-0.558032	-0.413122	-0.485095
std	47.054694	3.452935	47.055573	0.582007	0.398859	0.415996
min	1850.000000	1.000000	1850.042000	-3.220000	-1.960000	-2.160000
25%	1890.000000	3.750000	1890.771000	-0.860000	-0.670000	-0.740000
50%	1931.000000	6.500000	1931.500000	-0.540000	-0.410000	-0.500000
75%	1972.000000	9.250000	1972.229000	-0.240000	-0.150000	-0.250000
max	2011.000000	12.000000	2011.958333	1.720000	0.930000	1.000000

why mean and std are (sometimes) meaningless: Abscombe's Quartet

Anscombe's Quaret

In [7]:

# start by simply plotting all available data to get an intuitive picture
data.plot(x='Date', y=['North', 'South', 'Global'])

Out[7]:

<matplotlib.axes.AxesSubplot at 0x10808b390>

In [8]:

# let's zoom in on the last data to get a close up picture of how the data varies.
last = max(data['Date'])
last_decade = data[ data['Date'] >= last-10][['Date', 'North', 'South', 'Global']]
last_decade.plot(x='Date')
linear = linspace(last-10, last, 100)

plot(linear, 0.3 + 0.3*cos(linear*2*pi), 'y--')

Out[8]:

[<matplotlib.lines.Line2D at 0x1080970d0>]

In [9]:

# I suspect seasonal variation, so I'm going to use a Fast Fourier Transform to check for periodicity in the data.
import statsmodels
from scipy import fftpack as fft

In [10]:

# Check the 'North' temperature data for seasonal variation
temps = data['North'].fillna(method='pad')

# rfft => real fast-fourier transform
weights = fft.rfft(temps)
frequencies = fft.rfftfreq(temps.size)

plot(frequencies, weights)
ylim([-10, 250])

annotate('annual', xy=(1.0/12.0, 80), xytext=(1/12.0, 100), arrowprops=dict(arrowstyle='->',))

Out[10]:

<matplotlib.text.Annotation at 0x108684f50>

In [11]:

# Check the global temperature data for seasonal variation
temps = data['Global'].fillna(method='pad')

# rfft => real fast-fourier transform
weights = fft.rfft(temps)
frequencies = fft.rfftfreq(temps.size)

plot(frequencies, weights)
ylim([-10, 250])

Out[11]:

(-10, 250)

In [12]:

# fit a linear regression model to the 'Global' temperature data
import statsmodels.api as sm

X = sm.add_constant(data['Date'])
Y = data['Global'].fillna(method='pad')

linear_model = sm.OLS(Y, X)

results = linear_model.fit()
results.summary()

Out[12]:

OLS Regression Results
Dep. Variable:	Global	R-squared:	0.363
Model:	OLS	Adj. R-squared:	0.362
Method:	Least Squares	F-statistic:	1111.
Date:	Sun, 19 Jan 2014	Prob (F-statistic):	2.64e-193
Time:	15:30:43	Log-Likelihood:	-644.13
No. Observations:	1956	AIC:	1292.
Df Residuals:	1954	BIC:	1303.
Df Model:	1

	coef	std err	t	P>\|t\|	[95.0% Conf. Int.]
const	-10.8928	0.312	-34.858	0.000	-11.506 -10.280
Date	0.0054	0.000	33.337	0.000	0.005 0.006

Omnibus:	44.414	Durbin-Watson:	0.924
Prob(Omnibus):	0.000	Jarque-Bera (JB):	70.180
Skew:	0.210	Prob(JB):	5.76e-16
Kurtosis:	3.827	Cond. No.	7.93e+04

In [13]:

# let's graph the fit line to see if it makes sense
data.plot(x='Date', y=['Global'])
plot(data['Date'], data['Date'] * 0.0054 - 10.89, 'r-', linewidth=2, label='fit line')

Out[13]:

[<matplotlib.lines.Line2D at 0x1080cc7d0>]