Friday, August 15, 2008

Graph of NH SSTs and Named Storms Questioned

I have written about the association between the number of named storms in the Atlantic basin and Northern Hemisphere sea surface temperature anomalies several times now (last time here). I am quite confident there's a causal association there (even considering the possibility of coincidental trends).

The problem is that my posts on the subject have begged disbelief. You see, the scientific literature is not clear on the matter, and not even top climate scientists seem to agree on whether the association exists. That's why I'm making this spreadsheet available.

In particular, there is a graph that is very difficult to deny. Sometimes you can express doubt about mathematical analyses on technical grounds, but clear and easily reproducible graphs are difficult to argue with. The graph in question is that of 17-year central moving averages of northern hemisphere sea surface temperature anomalies, and the number of named storms in the Atlantic basin, from the 1850s to the present time.

In the new spreadsheet I'm making available, I calculated both 15-year and 21-year moving averages of both data sets. You will find comments in column headers with the URLs of where the raw data comes from. Having to do this seems over the top, but there really are people who apparently don't believe the original graph is real; plus they seem to be misunderstanding the graph completely, as you can see in the comments section of this post at AccuWeather.com.

The 15-year and 21-year CMA graphs are posted below, in that order.





Comment Policy

I will state my comment policy here, for future reference. I do not enable comment moderation. The only comments I delete are those that are clearly in violation of Blogger's Content Policy. Scrutiny is more than welcome. If you believe I made a mistake, tell me. If you believe I'm making things up, you absolutely should tell me, but you better be right.

Tuesday, August 12, 2008

NOAA Study Seems To Confirm Observation From 07/14 Post

No so long ago I wrote a follow-up to an earlier analysis on the association between the number of named storms in the Atlantic basin and northern hemisphere sea surface temperatures. At the end of the post I listed a number of conclusions, one of which was the following.

The graph provides support for the contention that old storm records are unreliable. I would not recommend using storm counts prior to 1890.


I had posted a graph of 17-year central moving averages of NH sea surface temperature and named storm series, reproduced below. You will note I had placed a vertical line around the year 1890 in order to indicate there was some sort of point of change there.



I didn't use any mathematical analysis to determine that 1890 was in any way special. It was simply obvious, visually, that something was not right in the named storms series prior to 1890. Of course, the central moving average smoothing helped in terms of being able to see that.

Enter Vecchi & Knutson (2008), a NOAA study of North Atlantic historical cyclone activity. The authors determined, based on known ship tracks, that early ships missed many storms, especially in the 19th century.

Now, this study is being touted as evidence that global warming and the number of storms in the Atlantic are not associated. Clearly, that is nonsense, if you just look at the figure above. If you'd like to see some Math, I have done a detrended cross-correlation analysis as well. All that is necessary to demonstrate an association is to do a linear detrending on series that go from 1900 to the present time. The detrending should take care of any problems related to unreliability of old storm counts. I can further report that even after detrending the series based on 6th-order polynomial fits, a statistically significant association is still there, provided storms are presumed to lag temperatures by at least one year.

About The Disingenuous "Global Warming Challenge" by JunkScience.com

I read somewhere that JunkScience.com had issued a "global warming challenge" some time back that is promoted as follows.

$500,000 will be awarded to the first person to prove, in a scientific manner, that humans are causing harmful global warming.


That's also what people will say whenever they tout the "challenge." If you are certain anthropogenic global warming is real, you should be able to prove it. Who wouldn't want to make $500,000?

But as you can imagine, there's a catch. You need to falsify two hypotheses.


UGWC Hypothesis 1

Manmade emissions of greenhouse gases do not discernibly, significantly and predictably cause increases in global surface and tropospheric temperatures along with associated stratospheric cooling.

UGWC Hypothesis 2

The benefits equal or exceed the costs of any increases in global temperature caused by manmade greenhouse gas emissions between the present time and the year 2100, when all global social, economic and environmental effects are considered.


Now, hypothesis #1 should be falsifiable now. The only issue I have with it is that they have made it unnecessarily difficult (to cover their asses no doubt) by including stratospheric cooling as a requirement. Don't get me wrong. I'm sure stratospheric cooling is an important matter to climate scientists, but why does it matter to the challenge? Isn't surface temperature warming due to anthropogenic causes interesting enough?

Technically, the issue is that there's not a lot of data on stratospheric temperatures, as far as I know. Considering lags and so forth, it's probably difficult to demonstrate an association in a decisive way. I haven't run the numbers, but this is my preliminary guess.

Hypothesis #2 is not falsifiable right now. We'd have to wait until about 2100 to either validate it or falsify it. Peak oil is probably looming or behind us, so we can't say what might happen by 2100. There are policy decisions to consider. There might be technological advances that change the general outlook. If we make certain assumptions, then sure, it's theoretically possible to give confidence ranges on certain predictions, such as sea level rises or changes in storm intensity.

Clearly, the "challenge" is designed such that it's impossible or nearly impossible to win. Despite its name, JunkScience.com is not a site about junk science. If you visit it you will see it's nothing but a propaganda outlet for global warming denialism books and videos. A site that is truly about junk science would probably discuss things like the paranormal, Homeopathy, the vaccine-autism hypothesis, etc. JunkScience.com does not.

In fact, what is the evidence that JunkScience.com has $500,000 to give out? Have they been collecting pledges? If they have collected funds, and there's no winner to their challenge, which I can almost certainly assure you there won't be, will they keep the money?

Call me cynical, but I doubt JunkScience.com is either capable or willing to give out $500,000 to anybody, regardless of the entries they receive.

Counter-Challenge

Here's a counter-challenge for JunkScience.com. Reduce the stakes if you need to. Then change the requirements of the challenge to include a single hypothesis to falsify, as follows.

Manmade emissions of greenhouse gases do not discernibly, significantly and predictably cause increases in global temperatures.


What's there to fear, JunkScience.com?

Friday, August 8, 2008

Just in case there are any doubts about anthropogenic influence in atmospheric CO2

You would think this is the least controversial aspect of the global warming debate, but you'd be surprised. I realized this after reading some of the comments in a post by Anthony Watts about a recent correction in the way Mauna Loa data is calculated (see also reactions by Tamino and Lucia).

Tamino subsequently wrote an interesting post on differences in CO2 trends as observed in three different sites: Mauna Loa (Hawaii), Barrow (Alaska) and South Pole station. Most notably, there's a pronounced difference in the annual cycle between these stations, which according to Tamino, is explained by there being more land mass in the Northern Hemisphere. I would imagine higher CO2 emissions in the Northern Hemisphere might also play a role, but I'm speculating.

In this post I want to show that available data is quite clear about anthropogenic influence in atmospheric CO2. Additionally, I want to discuss how we can tell that excess CO2 stays in the atmosphere for a long time.

I will use about 170 years of data for this. There's a reconstruction of CO2 concentrations from 1832 to 1978 made available by CDIAC, and derived by Etheridge et al. (1998) from the Law Dome DE08, DE08-2, and DSS ice cores. You will note that there's an excellent match between these data and Mauna Loa data for the period 1958 to 1978. Mauna Loa data has an offset of 0.996 ppmv relative to Etheridge et al. (1998), so I applied this simple adjustment to it in order to end up with a dataset that goes from 1832 to 2004.

CDIAC also provides data on global CO2 emissions. What we need, however, is an estimate of excess anthropogenic CO2 that would be expected to remain in the atmosphere at any given point in time. We could simply calculate cumulative emissions since 1751 for any given year, but this is not necessarily accurate. Some excess CO2 is probably reclaimed by the planet every year. What I will do is make an assumption about the atmospheric half-life of CO2 in order to obtain a dataset of presumed excess CO2. I will use a half-life of 24.4 years (i.e. 0.972 of excess CO2 remains after 1 year). I should note that I have tried this same analysis with half-lifes of 50, 70 and 'infinite' years, and the general results are the same.

Figure 1 shows the time series of the two data sets.

co2 concentration and emissions

The trends are clear enough. CO2 emissions appear to accumulate in the atmosphere and are then observed in ice cores (and at various other sites like Mauna Loa). Every time we compare time series, though, there's a possibility that we're looking at coincidental trends. A technique that can be used to control for potentially coincidental trends is called detrended cross-correlation analysis (Podobnik & Stanley, 2007). In our case, the detrended cross-correlation is obvious enough graphically, and we'll leave it at that. See Figure 2. Basically, we take the time series and remove their trends, which are given by third-order polynomial fits. You can do the same thing with linear fits or second-order first. The third-order fit is a better fit and produces more fluctuations around the trend, which makes the correlation more obvious and less likely to be explained by coincidence.

detrended residuals co2 concentration emissions

With that out of the way, how do we know that excess CO2 stays in the atmosphere for a long time? First, let's check what the scientific literature says on the subject, specifically, Moore & Braswell (1994):

If one assumes a terrestrial biosphere with a fertilization flux, then our best estimate is that the single half-life for excess CO2 lies within the range of 19 to 49 years, with a reasonable average being 31 years. If we assume only regrowth, then the average value for the single half-life for excess CO2 increases to 72 years, and if we remove the terrestrial component completely, then it increases further to 92 years.


In general, it is widely accepted that the atmospheric half-life of CO2 is measured in decades, not years.

One type of analysis that I have attempted is to select the half-life hypothesis that maximizes the Pearson's correlation coefficient of the series from Figure 1. If I do this, I find that the best half-life is about 24.4 years. Nevertheless, I had attempted the same exercise with the Mauna Loa series (1958-2004) previously, and the best half-life then seems to be about 70 years. It varies depending on the time frame, and there's not necessarily a trend in the half life. This just comes to show that there's uncertainty in the calculation, and that the half-life model is a simplification of the real world.

Another approach we can take is to try to estimate the weight of excess CO2 currently in the atmosphere, and see how this compares to data on emissions. The current excess of atmospheric CO2 is agreed to be roughly 100 ppmv. If by 'atmosphere' we mean 20 Km above ground (this is fairly arbitrary) then the volume of the atmosphere is about 1.03x1010 Km3. This would mean that the total volume of excess CO2 is 1.03x106 Km3, or 1.03x1015 m3. The density of CO2 is 1.98 kg/m3, so the total weight of excess CO2 should be about 2.03x1015 Kg, or 2,030,000 millions of metric tons.

Something is not right, though. If we add all annual CO2 emissions from 1751 to 2004, we come up with 334,000 millions of metric tons total. This can't be. I'd suggest that CDIAC data does not count all sources of anthropogenic emissions of CO2. It obviously can't be considering feedbacks either. Furthermore, our assumptions in the calculations above might not be accurate (specifically that a 100 ppmv excess is maintained up to an altitude of 20Km). In any case, it's hard to see how these numbers would support the notion that the half-life of CO2 is low.

Sunday, August 3, 2008

Why the 1998-2008 Temperature Trend Doesn't Mean a Whole Lot

Suppose I wanted to determine whether the current temperature trend is consistent with some projected trend. In order to do this, let's say I calculate the temperature slope of the last 200 days, and its confidence interval in the standard manner. Then I check to see if the projected trend is in the confidence interval. But maybe I want a tighter confidence interval. I could use more data points in this case, say, temperatures in the last 1,000 minutes. If we assume temperature series approximate AR(1) with white noise, this should be fine.

That makes no sense at all, does it?

Intuitively, it seems that confidence intervals on temperature slopes (when we want to compare them with a long term trend) should depend more on the working time range than on the number of data points, or on how well those data points fit a linear regression. We should have more confidence on a 20-year trend than a 10-year trend, almost regardless of whether we use monthly data as opposed to annual data. Certainly, the standard slope confidence interval calculation is not going to do it. We need to come up with a different method to compare short-term trends with long-term ones.

I will suggest one such method in this post. First, we need to come up with a long projected trend we can test the method on. We could use a 100-year IPCC trend line, if there is such a thing. For simplicity, I will use a third-order polynomial trend line as my "projected trend." Readers can repeat the exercise with any arbitrary trend line if they so wish. I should note that the third-order polynomial trend line projects a temperature change rate of 2.2C / century from 1998 to 2008.

The following is a graph of GISS global annual mean temperatures, along with the "projected trend." For the year 2008 I'm using 0.44C as the mean temperature. You can use other temperature data sets and monthly data too. I don't think that will make a big difference.

GISS temperature

We have 118 years of 11-year slopes we can analyze. There are different ways to do this. To make it easy to follow, I will detrend the temperature series according to our projected trend. This way we can compare apples with apples as far as slopes go. The detrended series is shown in the following graph.

detrended GISS temperature

The long term slope of detrended temperatures is, of course, zero. All 11-year slopes in the detrended series will distribute around zero. We know that the 1998-2008 slope is -1.53C / century. The question we want an answer for is whether the 1998-2008 slope is unusual compared to 11-year slopes observed historically, which would indicate there's likely a point of change away from the projected trend.

We can start by visualizing the distribution of 11-year slopes throughout the detrended series. The following is a graph of the number of years in slope ranges of width 0.2C / century. For example, the number of years that have slopes between 0.1 and 0.3 is 10.

GISS detrended temperature 11-year slope distribution

This is roughly a normal distribution of years according to their slopes. In it, approximately 95% of years have slopes in the -2.7 to 2.7 range. That is, 4 years have slopes of -2.7 or lower, and 3 years have slopes of 2.7 or higher. I put forth that the real confidence interval for 11-year temperature slopes relative to long-term 3rd-order polynomial trend lines is approximately ± 2.7 C / century.

The 11-year slope for 1998 is only -1.53C / century, well within the estimated confidence interval. Therefore, it's a little premature to say that the 1998-2008 trend falsifies 2C / century. Of course, if 2009 is a cold year, that might change this evaluation.

Saturday, August 2, 2008

Wherein I Revise Previous Sensitivity Estimate Down to 3.13C

I found an annual reconstruction of CO2 atmospheric concentrations that goes from 1832 to 1978. It is made available by CDIAC and it comes from Etheridge et al. (1998). There's a more than adequate match between this data and the data collected at Mauna Loa, Hawaii for the range 1958 to 1978.

Naturally, I thought this CO2 data would be more accurate than that estimated from emissions, which I had used in my calculation of climate sensitivity to CO2 doubling. (BTW, that calculation was based on 150 years of data). So I reran the analysis, and the following is the new formula for the rate of temperature change (R) given a CO2 concentration in ppmv (C) and a temperature anomaly in degrees Celsius (T).

R = 0.0857 ( 10.398 log C - 26 - T )

The equilibrium temperature (T') is calculated as follows.

T' = 10.398 log C - 26

This means that climate sensitivity to CO2 doubling (based on this model which only considers this one forcing) is most likely 3.13 degrees Celsius.

I also rebuilt the hindcast graph, which follows.

global warming hindcast co2

I think this is a subjectively better hindcast than the original. Note that it even predicts a nearly flat temperature trend in the 1950s. This is simply what the more accurate CO2 data does. While sensitivity is lower (I had originally estimated it at 3.46C), the range of CO2 concentrations is wider. Estimations based on emissions produce a concentration of about 295 ppmv in 1850. Etheridge et al. (1998) determines the concentration is 283.5 ppvm at that point.

The model predicts that the rate of temperature change should be about 2.1C / century in 2007.

I also wanted to attempt a 1000-year hindcast. I had previously discussed the 1781-year temperature reconstruction that is the product of Mann & Jones (2003). It just so happens that there's also a 1000-year CO2 reconstruction from Etheridge et al. (1998). Well, this more ambitious hindcast didn't turn out to be as accurate. At first I thought this is just what happens when you fail to consider other important climate forcings. But then I went back and examined other 1000-year temperature reconstructions. I'm sure readers have seen that graph many times. It turns out that there's considerable uncertainty in these types of reconstructions.

Either way, I will post my first attempt at a 1000-year hindcast below. The red line is the reconstruction from Mann & Jones (2003). I also added a green line, which is a reconstruction based on glacier records that comes from Oerlemans (2005).

global warming 1000-year hindcast

It could be better. I'm now curious as to what would happen if other major climate forcings were considered.

Thursday, July 24, 2008

The "Hockey Stick" is Fine

There seems to be considerable controversy over the well-known "hockey stick" temperature reconstruction of the last two millennia – Mann & Jones (2003). I have even found what look like accusations of fraud, all embedded in discussions of very complicated statistics and algorithmic procedures that the average person couldn't possibly hope to evaluate.

I'm not interested in getting involved in the politics of the whole thing. I just want to point out that the raw data of the temperature reconstruction is made available by the NOAA Paleoclimatology Program. I contend that most people reading this can double-check if the raw data tells us we are living in unusually warm times – which is basically what the "hockey stick" construct conveys.

Of course, there are those who will say that we are living in unusually warm times relative to most of the last thousand years simply because the little ice age has ended. But we can control for this fairly easily.

There is a general temperature trend historically. We can remove this trend from the data, and then check if we're still living in unusually warm times after the removal. Specifically, we want to remove the warming trend that is a natural part of the end of the little ice age.

I would suggest that a 4th-order polynomial trend line will capture the general temperature trend of the last 1781 years more than sufficiently. (Excel will produce polynomial trend lines for you, up to 6th-order ones). The trend is characterized by a medieval warm period, followed by a period of cooling, and a subsequent period of warming. We can detrend the temperature time series based on the polynomial fit and see if the modern era remains special.

This sort of detrending methodology has apparently been used in climatology before. Holme et al. (2008) point out that "sophisticated statistical methods have been applied to [climate] series, but perhaps sometimes these methods might even be too sophisticated." They further claim that "the [detrending] method provides a rigorous way of defining climate 'events', and allows comparison of long-term trends and events in time series of climatic records from different archives."

The detrending method in Holme et al. is actually more sophisticated than what we can do in a straightforward manner, but the authors are interested in long-term quasiperiodic trends.

Let's first see what the temperature time series looks like, along with the proposed 4th-order polynomial fit. We will only be looking at the global temperature reconstruction in this post.

mann & jones hockey stick temperature reconstruction

In order to detrend the time series, we simply subtract temperatures modeled by the polynomial equation from observed (reconstructed) temperatures. The Y axis offset is not important to this analysis. (Note that in the equation shown in the figure above, x = year - 199). The result of the detrending procedure is illustrated in the following figure.

detrended mann & jones hockey stick temperature reconstruction

So now we have a nice detrended temperature time series, which – if I may be redundant – has an entirely flat trend. What do we do with it?

Let's sort data rows by detrended temperature in descending order. If we look at the top 5% (89) years ranked in this manner, we see that they have a detrended temperature greater than 0.123. In other words, if we were to pick a year at random from the data set, there is only a 5% chance that its detrended temperature is greater than 0.123. (If you must know, the residuals of the polynomial regression are normally distributed).

In statistics, a 5% probability is the standard for rejection of hypotheses. If we hypothesize that a given year is not an unusually warm year, its detrended temperature should be 0.123 or lower. Yet, this is not the case for many of the years in the modern era, as shown in the following figure.

mann & jones hockey stick temperature reconstruction warm years

All but 3 of the years from 1968 to 1980 are statistically warm years, even after detrending the whole 1781-year time series. This cannot be explained as a consequence of the culmination of the little ice age. Clearly, we are in the midst of a "climate event."

Is it an unprecedented event? If you only consider the 1968-1980 range as special, then no. There was an 11-year "climate event" between the years 668 and 678 when detrended temperatures were higher than 0.123. That is the closest precedent that can be found in the 1781-year temperature series. If we consider that temperatures have increased after 1980, then I'd have to agree with Mann & Jones that modern era global warming "dwarfs" anything from the last 2 millennia.

Sunday, July 20, 2008

Global Warming Forecast - Based on 3.46C Model

So far we have estimated climate sensitivity to CO2 doubling, and tested the results of the analysis with a hindcast. I will close the series with a forecast.

It will be a simple forecast in the sense that we will only consider CO2 trends. While I would caution this is an important limitation of the forecast, I would also note the hindcast had the same exact limitation. Of course, it's quite possible that in analyses of historic data, CO2 acts as a proxy of other anthropogenic forcings. The behavior of this confounding in the past may differ from its future behavior.

That said, the part of the forecast that I really can't be very confident about has to do with projecting future CO2 atmospheric concentrations. This basically amounts to attempting to predict human behavior and world-wide policy decisions. What I will do is to simply define 2 scenarios based on the Mauna Loa data, as follows.
  • Scenario A: A second-order polynomial forecast of CO2 concentrations.

  • Scenario B: A third-order polynomial forecast of CO2 concentrations.

Each scenario is illustrated in the following graph.

co2 polynomial forecasts

If it is true that peak oil is either looming or behind us, I would say Scenario B is considerably more likely.

To get "high" and "low" estimates I was initially planning to use the 95% confidence interval of the rate of temperature change formula. This range produces forecasts that are very similar. So instead what I did is produce new formulas for sensitivities of 3.0C (low) and 4.0C (high). For additional details on how the forecast is done, see the hindcast post.

The resulting forecasts of each scenario are illustrated in the following graphs.

global warming forecast

global warming forecast

Again, I consider scenario B to be more probable. We'll see how they do. Under either scenario it would seem that a global temperature anomaly of 1 degree Celsius by the early 2020s is a done deal. The model also tells us that it takes about 10 years for temperatures to level off after CO2 concentrations do. Under scenario B, we are apparently at a peak in the rate of temperature change – roughly 2C/century. This rate will begin to drop. It will be 1.5C/century by 2035.

Friday, July 18, 2008

How Well Does a Sensitivity of 3.46C Hindcast?

[Note: Revised 08/02/2008]

In the last post we estimated the most likely climate sensitivity to CO2 doubling by means of an analysis of temperature change rates. The result (3.46C) is in the high end of the range of sensitivities considered plausible by the scientific community. A hindcast should not only tell us if the estimate is in fact too high, but it should also test some of the other results from the analysis. And to make it interesting, we will do a hindcast of the last 150 years. Sound crazy? See Figure 1.

global warming hindcast co2

This turned out much better than I expected. In fact, I suspect the chart might beg disbelief among some readers, so I'm making the spreadsheet available here (XLS). Formulas can be verified to match those of the analysis.

The only inputs to the hindcast are (1) CO2 atmospheric concentrations from 1853 to 2004 (estimated in ppmv as described at the end of this post), and (2) observed temperatures from 1853 to 1856. The observed temperatures used (Column D) are actually central moving averages of period 7.

My expectation for the hindcast was that error would accumulate, and in the end we would have a deviation from the observed temperature trend, but hopefully not a big one. That's because the way temperature for year Y is predicted in the hindcast is by adding the temperature in Y-2 plus the predicted temperature change rate in Y-1 times 2. Intuitively, it doesn't seem like this technique would tend to maintain accuracy over a time series this long.

There is a good reason why the model hindcasts this well, nevertheless. First, it helps that formulas were derived in part from the data we're hindcasting. But more importantly, what we're looking at is a self-correcting system. Local variability cannot make the system resolve its imbalance any faster or slower. If temperature becomes higher than it should be, for whatever reason, the temperature change rate will drop. Similarly, temperatures lower than they should be will be corrected by a positive change in the rate. Sooner or later, the observed trend will rejoin the predicted trend.

This speculative observation is testable in the hindcast. We can break the chain of predicted temperatures, insert artificial values, and see if the model resolves. This can be done in the spreadsheet by modifying one of the predicted temperature columns (e.g. column K, any row greater than 9). What I did is introduce an artificial warming between 1910 and 1913 so it ended up at 0.1C. The results can be seen in Figure 2.

global warming hindcast

I think that's interesting, and I'm sure there's some insight about what's been occurring since 1998 somewhere in there.

For those who are interested in the details, the following is a recap of the results from the analysis that are used to produce the hindcast.

  1. T' = 11.494 log C - 28.768

  2. R = (T' - T) * 0.0915

  3. An unexplained lag of 3 years for imbalance to take effect on the rate of temperature change.


Where

  • C = The atmospheric concentration of CO2 given in ppmv.

  • T' = The equilibrium temperature, given in degrees Celsius anomalies as defined in CRUTEM3v data set.

  • T = The observed temperature. In the hindcast, this is actually the predicted temperature, except for 4 years we use as inputs.

  • R = The rate of temperature change, given in degrees Celsius per year.


The high and low hindcast predictions are based on the confidence interval given in the formula for R.

As an example, the following is how the predicted temperature for 1857 is calculated.

T(1857) = T(1855) + 2 * R(1856)

R(1856) = 0.0915 * (T'(1853)-T(1853))

That's all the hindcast is.

Next up: We'll attempt a forecast.

Tuesday, July 15, 2008

Here's How You Can Estimate CO2 Climate Sensitivity From Historic Data

Most likely value = 3.46C

[Note: Revised 08/02/2008]

When I first became interested in the science of Global Warming (which was not too long ago) I had some substantial misconceptions. For example, I thought the current temperature anomaly (about 0.6C globally) was due to the current levels of greenhouse gases in the atmosphere, primarily CO2 (about 380 ppmv). Reality is more complicated. The issue is not that there's some lag between greenhouse gas concentrations and temperature either – it's a bit more complicated that this.

I've been learning about a concept called CO2 climate sensitivity, which is defined as the equilibrium temperature increase expected if the atmospheric concentration of CO2 were to double. The word equilibrium needs to be emphasized. At current CO2 concentrations, I would estimate the equilibrium temperature anomaly should be 0.89C, but the actual temperature anomaly is only about 0.6C. There's a significant imbalance, and the imbalance is corrected by temperature change. Simplifying, the mechanism that causes temperature change is called CO2 forcing.

There is much debate and uncertainty about the most likely climate sensitivity value. For a good overview, see James' Empty Blog.

What I want to do in this post is go over a relatively simple analysis where we estimate climate sensitivity by using publicly available historic data. We will also come up with formulas that tell us the most likely equilibrium temperature for a given CO2 concentration, and the most likely temperature change rate for a given actual temperature and CO2 concentration. The plausibility of these results will be illustrated with a graph.

First, let's go over some of the underlying theory. Given the way climate sensitivity is defined, it's clear that the expected equilibrium temperature change is the same for any doubling of CO2 concentrations, be it from 100 to 200 ppmv, or 1000 to 2000 ppmv. This tells me there's a logarithmic relationship between temperature and CO2 concentrations (assuming all else is equal) as follows:

T' = a log C + b


T' is the equilibrium temperature and C is the atmospheric concentration of CO2; a and b are constants. Climate sensitivity is thus

S = (a log 2C + b) - (a log C + b) = a log 2


When the observed temperature (T) differs from the equilibrium temperature (T'), there's imbalance. We will define imbalance (I) as follows.

I = T' - T


Further, I put forth that temperature change rate is given by

R = d I


where d is a constant. We're guessing a bit here, but the above is consistent with Newton's Law of Cooling.

Finally, let me define a construct (J) that I will use in the analysis. It is simply the imbalance minus the constant b, as follows.

J = I - b = a log C - T


If we know S, then we know a. When we have S, a and C for any given year, we can calculate J for any given year. Since we should be able to determine the temperature change rate (R) for any given year, we can model J vs. R (a linear relationship). The relationship between J and R should be equivalent to the relationship between I and R, except for a shift given by the constant b.

Here's the plan. We need to test different hypotheses on the value of S. The way we determine a hypothesis is good is by checking if the resulting relationship between I and R is suitable. And we measure this by means of the "goodness of fit" of the linear association between J and R. (This methodology is called "selection of hypotheses by goodness of fit" and it seems adequate in this case, judging by Figure 3, which I will mention shortly).

Before I get into the nuances of the analysis (which are important) I wanted to show the reader how I chose the best value of S. Figure 1 models S vs. the goodness of fit of the linear association between J and R.



This tells us that the value of S that makes most sense is 3.46.

After we have determined the most likely value of S, we can calculate the constant b. The linear association between J and R is as follows.

R = 0.09152J - 2.63281


The slope should be the same in the association between I and R, except here the intercept must be zero.

R = 0.09152I


Therefore, b may be calculated as follows.

0.091521I = 0.091521(I - b) - 2.63281
b = 2.63281 / 0.091521 = -28.768


Figure 2 is the scatter graph that illustrates the association between imbalance (I) and temperature change rate (R) when we assume S=3.46. This confirms the slope of the linear fit and the "goodness of fit" we had previously found.

co2 climate sensitivity

A very important graph is one that shows the R and I time series side by side, under the same assumption (S=3.46). See Figure 3.

co2 climate sensitivity

Figure 3 validates much of the underlying theory. It's one of those graphs that, once again, show anthropogenic global warming to be an unequivocal reality.

Figure 3 can also be used to visually check different values of S. When S is less than 3.46, you will see the imbalance (I) time series rotate in a clockwise direction. When it is greater than 3.46, it will rotate in a counter-clockwise direction. This provides subjective confidence about the adequacy of the hypothesis selection methodology.

Note that the imbalance (I) time series in Figure 3 is shifted three years to the right. An initial inspection of the graph clearly showed there was a lag of 3 years between imbalance and temperature change rate. I would've expected the effect to be immediate, but that's why it's important to put your data in graphs. I couldn't begin to theorize why it takes time for imbalance to take effect, but this finding needs to be taken into account in the analysis; otherwise the results won't make sense.

Another important aspect of the analysis is that time series noise needs to be reduced, otherwise you probably won't notice details like the 3 year lag. I calculated central moving averages of period 7 from the CRUTEM3v global data set. For example, the "smooth" temperature for 1953 is calculated as the average between 1950 and 1956. Additionally, the temperature change rate (R) is calculated based on the "smooth" temperatures, looking 4 years ahead and 4 years in the past. If you also consider the 3 year imbalance lag, this leaves us with a workable time range spanning 1859 to 2000.

How do I get CO2 concentration data spanning that time frame? I discussed how I estimate that here. Basically, I try to find the best possible constant half-life of extra CO2 by matching emission data with the Hawaii data. The best half-life is 70 years or so.

I should note that this technique produces pre-industrial CO2 concentrations that are higher than I believe is generally accepted. My estimate gives about 294 ppmv for the 1700s. From ice cores, I understand the concentration has been determined to be 284 ppmv circa 1830. However, I can report that I tried a different estimation method that produces a value closer to 284 ppmv in the early 1800s, and this data produces much poorer fits in the analysis. For this reason, I went with my original estimation based on a constant half-life.

Let's look at the results of the analysis.

S = 3.46

T' = 11.494 log C - 28.768

R = 0.0915I [ 95% CI 0.074I to 0.109I ]


Temperatures are given as anomalies in degrees Celsius, as defined in the CRUTEM3v data set. The rate of change (R) is given in degrees per year.

What's the confidence interval on S? We'll leave that as an unsolved exercise. It's not only that there's uncertainty on the various data sets used, but it's unclear how we would calculate the uncertainty on the best "goodness of fit." It's not a matter of calculating confidence intervals on R2 values, which is easy. We basically have to determine the likelihood that the best "goodness of fit" is other than the one we found. This seems non-trivial, but maybe a reader can suggest a method. From what I've seen in a visual inspection of Figure 3, I would say S is unlikely to fall outside the range 2.8 to 4.0. Of course, things might happen in the future which invalidate these results, as they are applicable to historic data.

Next up: We'll see how well these results hind-cast.

Monday, July 14, 2008

Hurricanes and Global Warming - Revisited

I previously wrote an analysis on the association between sea surface temperature and named storms. The post met some scrutiny which was actually pretty decent, primarily from a commenter named Kenneth Fritsch over at Climate Audit. I understand Climate Audit is one of the major AGW denial blogs.

I had conjectured that when detrending time series, closer fits will tend to better control for coincidence. This intuition makes perfect sense, in my view. Consider that detrending with a linear fit is better than not detrending at all. After that, it's not hard to imagine there are coincidental time series where linear detrending does not make sense at all. I've also found time series where a second-order detrending is quite poor, and I've had to use a third-order detrending. The cumulative CO2 emissions time series is case in point.

The problem with detrending too closely is that there is some loss of information. To give you an example, if we only had 7 data points and detrended them using a 6th-order fit, the fit would be perfect, and we'd be left with zero information. This is presumably not so much of an issue when you have many data points, but there has to be some loss of information either way.

Kenneth had tried my analysis with a 6th-order detrending and found that statistical significance was lost. This was interesting, but I subsequently pointed out that if you attempted the association by assuming there's a lag of 1 year between temperature and storms, statistical significance remained. I had previously found a lag of 1 year produced a better association than a lag of 0 years, and the 6th-order detrending confirms it. The 6th-order detrending is pretty remarkable too. There are no hints of cycles in a visual inspection of the detrended time series.

The exercise left me quite sure that there was still an association, but I got the sense that there's something missing as far as convincing some readers. I think many people are unconvinced by slopes, confidence intervals and theoretical Math. You need a good graph to be convincing. Unfortunately, both the temperature data and the storms data contain a lot of noise. You can sort of see a pattern if you look closely, but it's not something that is slam dunk convincing.

So I had an idea. We just need to smooth out the noise. And what's a simple way to smooth out noise? We just get central moving averages. In fact, this idea is so simple that I'd be very surprised no one has thought of it before. Here's what I did. For the year 1859 I calculated the "smooth" temperature as the average of raw temperatures from 1851 to 1867. For the year 1860, it was the 1852-1868 average, and so forth. Same for named storms. The resulting graph follows.

hurricanes storms global warming temperature

At times I think a better name for this blog might have been "Deny This." :)

Some remarks:
  • The effect given by a straight comparison of the time series appears to be 8 storms for every 1 degree (C). This is somewhat higher than the effect I had previously reported from an analysis of the residuals, which was 6 storms for every 1 degree.

  • The graph provides support for the contention that old storm records are unreliable. I would not recommend using storm counts prior to 1890.

  • My prediction that at an anomaly of 2 degrees (C) the average season will be similar to the 2005 season is unchanged.

  • The lag from the graph appears to be 2 years, and not 1 year, as suggested by various analyses of residuals.

Saturday, July 12, 2008

Post on Global Warming Appears to Upset Denialists

A couple weeks ago I wrote a post in my primary blog that, if I may say so myself, convincingly and conclusively shows anthropogenic global warming is a reality. I believe the analysis is such that you don't need to have a degree in Math to follow it.

Not surprisingly, some global warming "skeptics" showed up in the comments and argued some points that are, frankly, not relevant to the analysis. But they were mostly civil. More recently, however, a commenter shows up, saying things like...

Good grief!

There is too much wrong with this analysis to do a thorough critique...

There is nothing at all impressive about your statistics...


Personally, I find these types of comments fairly rude, but that wouldn't matter so much if the commenter had actually advanced some challenges of note. Have you ever encountered guys like this? While this is the first time I've come across global warming denialists, I do have considerable experience with their anti-science counterparts in the autism community. We call them "anti-vaxers" and "the mercury militia." I doubt global warming denialists are nearly as nasty, though. But I digress.

Additionally, it's a little funny that the guy hadn't apparently read the post at all, judging by the following comment.

Also, there's not a thinking person on the planet who disagrees that from 1850 to present both carbon dioxide and temperature have increased. That alone will cause a positively-sloped line.


In the first paragraph of my post I had made it perfectly clear that my intention was to test a methodology that controls for potentially coincidental trends. In the first paragraph! I don't think I would've bothered to do a global warming analysis otherwise. You have to keep in mind that I have no dog in this fight (except perhaps for the fact that I live in this warmed up planet). My interest in the topic is scientific and not political.

This is a good opportunity to repost more clear versions of the figures from the analysis, nevertheless. Figure 1 shows the two time series without any adjustments. Figure 2 shows the residuals of the time series relative to the modeled trend lines. I've come to realize that a more intuitive way to think of Figure 2 is as a detrending of the time series from Figure 1. Note that in Figure 2 the residuals of temperature are calculated from a temperature time series that is 10 years ahead of observed values. I've also widened the CO2 Y scale a bit for clarity.

co2 temperature

detrended co2 temperature cross-correlation

I encourage the reader to click on the figures to get familiar with their nuances. Print them if you prefer. I hereby also grant permission to use these images in any way the reader sees fit.

Note that Figure 2 includes linear fits of both detrended time series. The fits are completely flat. This means that the temperature residuals are not associated with the year, and neither are the cumulative CO2 residuals. Any independent property of the year should not associate with either. If the residuals cross-associate, at 99.99999999% confidence, then it's very difficult to argue that we're not looking at an actual effect.

Let me get back to some of the points the commenter raised.

If you wish to prove Anthropogenic Global Warming, you'll need to use temperatures from the whole globe. You cannot simply ignore the entire Southern Hemisphere. And you really should test other temperature data sets using your methodology...


Here the commenter seems to be suggesting that finding an effect of CO2 on Northern Hemisphere (NH) temperatures is not convincing enough. Unless we can show the whole planet is affected, it doesn't really matter if CO2 is warming the NH. Plus we have to show this using all data sets. Amazing.

When I first did the analysis, I didn't know much about all the data sets available. I just wanted to find one that contains as many data points as possible. When it came time to pick a data set, I chose a NH one simply because most CO2 is generated in the NH, and so by choosing this data set theoretically less noise would be introduced in the analysis.

The general temperature trend behavior is similar when you compare the globe with the NH and SH, even though the size of the effect of greenhouse gases varies. This is true of all data sets. If the commenter hopes the analysis won't hold if we look at different temperature data sets, frankly, he's engaging in self-deception.

When you're trying to validate a theory, you have to use measurements of what's ACTUALLY IN THE THEORY. For AGW, this means you have to model the CO2 concentrations in the atmosphere.


Here the commenter is suggesting that cumulative human CO2 emissions are not a good proxy of the CO2 concentrations in the atmosphere. This is not true, as I will elaborate on, but in any case, how does this explain the association found?

As far as I know, data on CO2 atmospheric concentration is only available for the range 1958 to 2004. I don't believe this is enough for this type of analysis considering how noisy the data in question is. Would you find Figure 2 convincing if you could only see a third of the graph? But more importantly, early on I realized that if I wanted to make an argument about anthropogenic global warming, it was key to look at the human contribution of CO2.

I have modeled cumulative CO2 emissions vs. atmospheric concentrations at Mauna Loa, Hawaii. The fit is excellent. For those who are versed in statistics, if I put both data sets in a scatter and do a linear fit, the R2 of the fit is 0.9981.

I can get slightly better fits by assuming there's a constant half-life of CO2. To do this I use a simple model where our total atmospheric contribution at any point in time is calculated as follows.

total(year) = (total(year - 1) + emissions(year)) * constant


The constant is what tells us how much of the extra CO2 we've put into the atmosphere is lost after 1 year. Of course, we're assuming that naturally produced CO2 is in equilibrium with the environment; which was roughly the case before the industrial revolution.

I've tested different values of constant and compared the resulting R2 fit measures of the linear association between total emissions and atmospheric concentrations. The results can be seen in the following graph.

goodness of fit co2 half-life

What this tells us is that the best values of constant are somewhere between 0.99 and 0.9908. These translate to an atmospheric half-life between 69 and 75 years.

None of this detracts from the fact that cumulative emissions are an excellent proxy of our contribution to atmospheric concentrations. But in case readers have any doubts, the following is a graph of anthropogenic CO2 contribution where we assume a half-life of 69 years. Please compare and contrast with Figure 1.

co2 cumulative emissions trend 69-year half-life

Evidently, this is all just a distraction from the facts in evidence: An association was found, and data imprecisions cannot explain it away.

Monday, July 7, 2008

Shouldn't It Be Considerably Warmer?

In a prior residual correlation analysis of cumulative CO2 emissions and northern hemisphere temperatures the effect found appeared to be much larger than expected for short-term fluctuations. It was a clear effect, too, in the sense that it was evident graphically. I speculated that cumulative CO2 emissions are probably not a good reflection of actual atmospheric concentrations because some CO2 probably does get removed from the atmosphere after some time.

That finding peeked my interest though. In the original analysis, I basically assumed the half-life of CO2 was 'infinite'. We were only interested in fluctuations from the general trend, so the assumption was sufficient to prove a point then.

I subsequently went ahead and calculated human CO2 contribution assuming a constant atmospheric half-life of 50 years. (A constant half-life doesn't match up with the numbers very well, but we'll set this aside for the time being). Going from a half-life of 'infinite' to a half-life of 50 years, I expected to see a decreased effect.

Instead, the effect was about the same, using the best fluctuation lag I had previously found: 8 years. The slope was 3.181x10-5 ± 9.927x10-6. By matching up with atmospheric concentration data sampled at Mauna Loa, Hawaii, this translates to 0.081 (± 0.025) degrees (C) for every 1 ppmv increase in CO2 concentration. (I've done the analysis in other ways which I'm not going to go into, and I'm confident this is about right).

Keeping in mind that this was a northern hemisphere temperature analysis, the effect is still huge. Assuming the relationship is linear, it would mean that a fluctuation of 100 ppmv should result in a temperature fluctuation of about 8 degrees (C). At this point is when I started to think of where the error might be. Of course, there are subtleties involved in how such a result should be interpreted, and I'll get to that, but I kept coming back to a graph I had previously seen.



In this graph we see that, historically, a fluctuation of 100 ppmv CO2 corresponds to a fluctuation of 8 to 10 degrees (C). I realize there are feedbacks involved, but this is interesting nevertheless.

Could it be that at current CO2 levels the expected temperature anomaly should be 5 or 10 degrees, as opposed to 1 degree? Let's consider the finding that a fluctuation of 1 ppmv should result in a temperature increase of about 0.05 degrees globally. In the analysis, 8 years were enough for this temperature increase to be realized for such a small fluctuation. Let's round that to 10. Temperature cannot increase with arbitrary speed I suppose. If it takes 10 years for a 0.05 degree increase, could it be that it takes 1,000 years for an expected 5 degree increase to materialize?

No, I don't think so. The rate of temperature increase cannot be constant or bounded by such a low value. If it were, we would not be able to detect short-term CO2 increase effects. Temperature would already be slowly working its way up towards a target and small green house gas fluctuations would not have an effect in the rate of increase. So instead of 1,000 years, we could be talking about hundreds or less.

What's going on with the data is not very intuitive, so I came up with an analogy that I believe is helpful. Imagine the planet is a car and its temperature is the speed of the car. Pumping CO2 into the atmosphere would be analogous to pressing the gas pedal. When you press the gas pedal, there will be an immediate effect: the speed of the car (temperature) will begin to increase, but it will take some time until it reaches a stable speed. The more you press the gas pedal, the faster the speed increase, but the target stable speed is farther ahead.

This suggests we've been looking at the results of the fluctuation analysis all wrong. It tells us not about the effects of CO2 concentrations on temperature, but about its effects on temperature increase. This is an important distinction. In the end, what we're seeing in the analysis is that for every 1 ppmv fluctuation, there's a fluctuation of about 0.008 degrees per year in the rate of increase of temperature (maybe 0.005 globally). But once again, this relationship cannot possibly be linear. It all gets fairly complicated from this point forward.

I presume climate models take this into account, either implicitly or explicitly. But I've never heard it explained this way. It is mistaken to suppose that current CO2 levels are what drive current temperature levels; they actually drive the rate of increase of temperature up to a target temperature that is probably very far off yet. I'm no climate scientist, but this seems quite obvious in retrospect.

If my intuition is correct, some additional questions come to mind.

  • If CO2 were to level off at current levels, would temperature continue to increase? For how long? Up to what point?
  • Does this all mean CO2 levels should be brought down to at most 300 ppmv for species in this planet to be able to survive long term?
  • Should we expect an acceleration of the rate of increase of temperature? Is there a limit to how fast it can increase?

Saturday, July 5, 2008

"There is a much better correlation between sun activity and temperature"

Shortly after I wrote my first post on global warming, a commenter noted that "there is a much better correlation between sun activity and temperature." I've read other blog discussions on the topic, and this seems to come up from time to time.

So I decided to put the data in scatters to see if there's any merit to this claim. I'm not going to standardize the data in any way. These will be straight plots of existing data.

First, let's look at a scatter (Figure 1) of atmospheric CO2 concentration vs. global temperature anomalies 8 years later from 1959 to 1999 (corresponding to 1967 to 2007 for temperature).



Why 8 years later? This is the best lag I found in my initial analysis of CO2 emissions vs. temperature anomalies. Even without this lag, you will find a similar association. The 8 year lag is probably an underestimate when we're talking about long-term increases in CO2. That was a lag applicable to a fluctuating trend. (And yes, this is bad news).

Finally, let's look at a scatter (Figure 2) of SunSpot number vs. global temperature anomaly, between 1881 and 2007.



Is that what they call a "much better correlation"?

Saturday, June 28, 2008

Hurricanes and Temperature are Indeed Associated

There is apparently considerable climate science that can be cited to show there's a clear association between global warming and either the number of hurricanes in any given season or their intensity. See, for example, Hurricanes and Global Warming - Is There a Connection?, written by a number of climate scientists who run RealClimate.org. There is both basic science and computer modeling that can be used to predict what should occur under certain warming scenarios.

I'm generally inclined to trust scientific consensus and published science, particularly if it's peer-reviewed, unless I can advance a seriously strong argument explaining why I do not. Nevertheless, there's nothing like analyzing data first hand. Because I understand this, and because I understand some people out there don't trust some published science at all under the pretext of "conflicts of interest," I've acquired the habit of writing posts where I walk the reader through very accessible analyses of publicly available data. I combine this with a very lenient comment policy. My pledge is to only remove comments that clearly violate Blogger's content policy.

I already did this type of analysis in my post titled Anthropogenic Global Warming is Absolutely Occurring. This time I will look into the claim that global warming might have had an effect in the number of named storms in the Atlantic Basin, given that some people appear to doubt this claim. In doing so, I will try to go over additional details of the methodology which I might have left out in my previous post.

I will use data on the number of named storms from 1851 to 2006 provided by NOAA. I will use ocean surface temperature data for the northern hemisphere provided by the Climatic Research Unit of the University of East Anglia. For accuracy, since we're interested in the hurricane season, I will use June-November averages for each year.

Let's start by putting these two data sets in a chart, side by side. This will be Figure 1, which also shows trend lines for both temperature and storm trends. The trend lines are third-order polynomial fits (easily produced with Excel).

hurricanes global warming

The reader will note that both trends are pointing upward, at least for the last 60 years. This is not what we are interested in, however. We want to control for the fact that there could be a coincidence of upward trends. That's where the third-order polynomial fits come in.

The polynomial fits provide us a time-based model of each trend. For any given year they tell us what the "expected" temperature and number of storms should be. Of course, a given year might have more or less storms than expected. It will also have a higher or lower temperature than expected. In the end, what we want to find out is whether years with higher temperature than expected tend to have more storms than expected, and vice versa.

By subtracting trend line equation values from observed values, residuals of temperature and storms can be produced for each year. These residuals represent how different from "expected" an observed value is in a given year. Residuals are generally time-independent. In our case, if you produce a scatter chart of year vs. temperature residual or storm residual, you will see the scatter trend is entirely flat. This is a basic confirmation that should be done after getting the set of residuals.

Figure 2 is a scatter chart of temperature residuals vs. storm residuals. The trend of this scatter should be flat, unless there's association between temperature and number of storms.

hurricanes global warming

What we see in Figure 2 is that if we try to fit a linear trend to the scatter, we do get a positive slope of 3.43. Now, we need to verify that we can state, with statistical confidence, that the slope is actually positive. In this case it is. The 95% confidence interval of the slope is 0.25 to 6.61. This is not a slam dunk finding like the one for the correlation between cumulative CO2 emissions and temperature, but it is statistically significant, which means an association between temperature and number of storms is demonstrated.

Given the methodology used, this result cannot be explained as a coincidental trend.

There are some peculiarities about the data which are interesting. For example, it is clear that the 2005 Atlantic season was an unusual one, even after controlling for the time trend of named storms. It could be placed in a group of seasons that only occur every 50 years or so. Evidently, the fact that the seasons that came after 2005 did not measure up is inconsequential to the finding that temperature associates with the number of named storms.

We can, however, pose the following question: What sort of temperature increase would be required for the average season to be like the 2005 season? Given the slope of the scatter in Figure 2, it would seem that a temperature anomaly of 4.05 degrees (C) would be required for this. The current temperature anomaly is about 0.6 degrees (C), so such an eventuality appears to be far off. Or is it?

I ran a second residual correlation analysis of temperature vs. number of named storms one year later. This actually produces a considerably steeper slope (6.36) and the confidence interval is entirely positive even at 99.993% confidence. I can't really explain why this would be the case. But here's the thing. If we were to take this new slope at face value, a temperature anomaly of 2.18 degrees (C) would be enough to make the average season similar to the 2005 season.

Anthropogenic Global Warming is Absolutely Occurring

[Originally posted at Natural Variation.]

I need to ask for the reader's indulgence, as this post is not about autism, except insofar as determining the merit of correlations has become a perseveration of mine. You see, it is trivial to come up with naive correlations of autism trends vs. practically anything about the modern world. The administrative prevalence of autism has been increasing almost always since records have been kept. Concurrent upward trends of nearly anything, from vaccines to environmental pollution, from trans fats to electromagnetic radiation, and so on, are easy to come by.

In my latest post at LB/RB I suggested that instead of correlating trends in a naive manner, we could attempt to correlate the residuals of time regression models of each trend. A residual is a delta or difference between an observed value and a modeled value. (Here's a concise explanation).

When modeling real world phenomena, regression models will never (or almost never) be perfect fits. For all sorts of reasons, even if simply random fluctuation, there will be deviations from a modeled trend. If there's a causative relationship between two trends, the residuals of (or deviations from) corresponding close-fitting regression models should correlate with one another as well. By this I don't mean that the residuals should always be in the same direction; but they should be in the same direction more often than not, in average.

The nice thing about this technique is that it is completely accessible to anyone with Excel installed. It can also be illustrated graphically, as the reader will see.

So it occurred to me to test this idea in a different field of science where there's controversy over correlation vs. causation. I thought global warming would be a great candidate. After all, the spoof about a decrease in the number of pirates correlating with many other arbitrary trends appears to originate in the global warming debate (see this).

To summarize what I found, there is a strong and statistically significant correlation between cumulative human CO2 emissions and northern hemisphere temperature anomalies. Because of the methodology used, I'm quite confident this cannot be explained by coincidence, data collection errors, solar output as a confound, or causation in the opposite direction.

Now, I fully recognize that I'm only superficially familiar with the debate over anthropogenic global warming. I am also not versed in climatology. Therefore, I cannot be entirely sure that this type of analysis hasn't been done before. Google and Google Scholar searches didn't seem to turn up anything, and given the importance of the topic, I thought it was not only prudent but necessary to put this evidence out there. As always, scrutiny and discussion are welcome.

Northern hemisphere temperature data from 1850 to 2004 was obtained from the Climatic Research Unit of the University of East Anglia, UK.

Global CO2 emission data was obtained from CDIAC. I did not use CO2 atmospheric concentration data because temperature increases can theoretically cause this concentration to increase. Human emissions are what we're interested in. More specifically, I calculated cumulative CO2 emissions for every year since 1850. Greenhouse temperature anomalies are presumably caused by the total amount of CO2 in the atmosphere, not by the emissions in any given year. Since CO2 stays in the atmosphere for 50 to 200 years (source) modeling the cumulative human contribution of CO2 should be adequate enough.

Figure 1 (click to enlarge) is a graph of the general time trends of these two sets of data. It also shows the modeled trend lines we will use to calculate residuals. In this analysis we're using third-order polynomial models. They seem to give a considerably closer fit than second-order polynomial models.

co2 temperature

I calculated the residuals and built a scatter graph matching cumulative CO2 (X axis) and temperature (Y axis) residuals for each year from 1850 to 2004. As expected, the slope of a linear regression of the scatter was positive (1.9x10-5) and statistically significant (95% confidence interval 1.13x10-5 to 2.66x10-5).

[Note: Instructions on how to calculate the slope confidence interval of a linear regression with Excel can be found here.]

I suspected, however, that there should be lag between cumulative CO2 fluctuations and temperature fluctuations. It presumably takes some time for heat to be trapped. I proceeded to create a moving average trend line of the temperature residuals. It did in fact have a similar shape to the cumulative CO2 residuals graph, but it appeared to lag it by about 10 years. The reader should be able to roughly see this lag in Figure 1.

So I re-ran the whole analysis by only considering the years 1850 to 1997 and correlating CO2 residuals with residuals of temperature 10 years later. The correlation between these two sets of data is remarkable. Let's start with a bar graph of both sets of residuals, Figure 2.

co2 temperature residuals

Figure 2 is a good graph to get a subjective sense of the correlation. Let's see if the math confirms this. Figure 3 is the scatter graph of the residuals.

co2 temperature residual cross-correlation

The slope of a linear regression of the scatter is 2.6x10-5, and it is statistically significant (95% confidence interval 1.88x10-5 to 3.33x10-5). Even the 99.99999999% confidence interval is entirely positive. Unless anthropogenic global warming is a reality, there is no apparent reason why the residuals of cumulative human CO2 emissions should correlate so well with the residuals of temperature 10 years later throughout the last 150 years.

The slope of the scatter is actually more steep than expected, if you consider the naive correlation between cumulative CO2 emissions and temperature. There are probably several reasons for this. The one I believe to be the most likely is that over time CO2 does get removed from the atmosphere. Adding this consideration to the analysis should produce a more accurate slope. The other potential reasons don't bode so well for our species.

[Update 2/22/2010: I have written a follow-up titled Statistical Proof of Anthropogenic Global Warming v2.0.]

Hello World

I will use this blog to write about topics unrelated to autism. This was prompted by a post on Global Warming that I wrote on my primary blog, Natural Variation. I have at least a few more such posts planned.