A rant about "apples and oranges" in meteorological papers
Posted: 10 April 2006. Updated: 27 April 2006: some minor revisions and fixed typos
As usual, this is my opinion - comments can be sent to me at firstname.lastname@example.org.
This essay is motivated by an accumulating sense of frustration with what I see as a basic sloppiness in the scientific literature of meteorology. Perhaps this occurs in other fields as well. The mistakes I'm going to discuss are effective "show stoppers" when considering the validity of the analysis done in a scientific paper. They are fundamental flaws that reveal a weak foundation in any scientific argument. When I encounter these errors in a paper or proposal I'm reviewing, it's basically grounds for immediate rejection. Such mistakes are perhaps excusable for an undergraduate meteorology major. They are absolutely unacceptable for anyone who wishes to call him/herself a scientist.
Imagine that an author is proposing to do a comparison of his/her work with that of another (already published) paper. Does it not make logical sense that if a calculation is being made by the author, it should be identical to that done by the author(s) of the work to which the results will be compared? Suppose one is calculating, say, some diagnostic variable based on observations. The method used in the paper being referenced should be given with enough detail that it's obvious precisely how to duplicate those calculations. And the new calculations should then follow that very same procedure, without any departure, if a comparison is to be made.
There are many sources for possible errors of the sort to which I'm referring. The observational data might different in some way. For example, there are many different sources for rawinsonde data, and they might differ in various ways - perhaps one data set employed methods that find values of wind at mandatory and significant pressure levels by interpolating the winds to those levels in one way, whereas another data set uses a different interpolation scheme. The interpolated values would therefore likely be similar but not be precisely the same between the two data sets. If an averaging method is employed, the averaging schemes might differ between data sets. A slightly different scheme could be used to compute something like mixing ratio from the raw temperature, relative humidity, and pressure values. Any algorithmic difference, however minor, can become critical. These are details that simply must be reconciled if a proper comparison is to be made. The authors of scientific papers are obligated to provide all of these apparently minor details if other scientists are to be able to replicate their results and do proper comparisons. And authors purporting to do comparisons are similarly obligated to make calculations using the identical procedures and data.
Unfortunately, I find it common for these details not to be provided by manuscript authors, and not to be a matter of concern for those doing new, related research. Hence, it is common for authors making calculations using ostensibly similar data sets to produce incompatible results, and neither side is able to come to a reconciliation of such conflicts. Although the differences are often minor, they sometimes are such that diametrically opposed interpretations result. Effectively, this situation is a manifestation of the old cliché about comparing apples and oranges. You just can't make useful comparisons in such situations. The devil is in the details ...
It's continually annoying to find that many purporting to do science seem to have no regard for the limitations of data sets. In this day of an apparent obsession with climate change, it's of fundamental importance to use observational data in an intelligent and logical way. Many data sets have acute problems regarding their long-term stationarity. Three compelling examples with which I'm familiar are the rawinsonde data (Schwartz and Doswell 1990), the tornado reports database (Brooks et al. 2003) and the nontornadic severe reports database (Doswell et al. 2005). I'm also aware that the database on tropical cyclones has some important stationarity issues.
a. Secular changes
When the methodology for gathering data changes, this usually introduces a secular change in the data. As documented in the references, the data sets with which I'm most familiar are rife with such inhomogeneities. The tornado and nontornadic severe thunderstorm reports exemplify this sort of problem, for a host of reasons. If you consider how reports of these events are being tallied today, and then consider how they were done in, say, the 1950s, trying to compare events in the 1950s with those in the 21st century is simply absurd (see Speheger et al. 2002). To say nothing about the vast differences in reporting between beginning of the 21st century compared to the beginning of the 20th century, or the 19th century. Thes secular trends are overwhelming in the case of nontornadic severe thunderstorm reports.
Even something as large as tropical cyclones involves many challenges - for example, tropical cyclones at sea in the era before geostationary satellites were simply not detected until they made landfall or a ship encountered them. In today's world, there are frequent data-gathering flights by research aircraft using dropsondes and other technology to monitor such things as tropical cyclone windspeeds on a frequent basis. Even as recently as 10 years ago, this sort of intensive monitoring simply was not done. The future might hold even more technological advances in observing capability. Hence, not only the number, but the intensity of tropical cyclones was observed very differently in the past than it is now, or will be in the future.
To attempt to use the existing records of weather events to determine if climate change is affecting such things as the frequency and intensity of severe storms is just not possible. This has been discussed by Chris Landsea in the context of tropical cyclones - he's basically saying that the observations simply do not support any conclusions regarding the possible influence of climate change on tropical cyclone frequency and/or intensity. I tend to agree with his concerns. However, I want to be very clear about my position - which does not attempt to address tropical cyclones at all (not my area of expertise) but rather is focused on severe thunderstorms and tornadoes - the observational data cannot be used to support any statement about the possible effects of global warming on the frequency or intensity of such events. There may or may not be some relationship between climate change and severe thunderstorms and tornadoes, but the data just do not permit any scientifically and statistically valid conclusions.
There are frequent attempts by misguided or ignorant authors to relate long-period climatological cycles (such as the El Niño/Southern Oscillation or the North Atlantic Oscillation) to observed severe weather/tornadoes. In my opinion, all such attempts are doomed from the start. The putative causality chains between mesoscale weather and processes governing climatic variability are long, highly nonlinear, and tenuous at best. Using the observed frequencies and intensities is too perilous to even contemplate such things, however much it seems plausible to postulate connections between the general circulation and the actual weather (severe thunderstorms and tornadoes). If someone wants to consider observations that are far more homogeneous over long periods of record - such as temperature or rainfall observations - that might well be justifiable. But it just isn't reasonable to use the existing record of severe thunderstorms and tornadoes to detect connections with climate variability.
If it were possible to develop an observing system for severe thunderstorm and tornado events that is perfect (highly unlikely in the near future), and it could be implemented today, then many decades would have to pass ( perhaps more than 100 years) before we could even begin to have some confidence in the results of an attempted test of the statistical association between climate variability and severe thunderstorms/tornadoes. You need a large sample size that would include a number of climatic oscillation cycles to have any statistically valid conclusions to draw. The existing record is so laced with secular trends that it's worthless for this purpose and it's not going to be possible at any time in the near future to develop a record of observed events that could even remotely be considered useful for this process.
b. Sample size
One way to produce a more homogeneous record is to limit attention to the extreme events. As we have shown (see the References, below), the relatively high-end events for severe thunderstorms and tornadoes have been more resistant than the marginal events to secular trends. But "more resistant" doesn't mean "immune" and there are indications that in recent years, a secular growth in the frequency of the most intense events has begun (see Doswell et al. 2005). Therefore, it appears that restricting events to the extreme events is not going to work. But even more problematic is that such a strategy has the consequence of reducing the sample size. This means that statistical testing is even more likely to be incapable of supporting any hypothesis. There are many clear indications from the existing observational records that we have a very poor sample of the extreme events. If we consider, for example, violent tornadoes, we find that the statistics of violent tornadoes are dominated by a very small number of days. The "super outbreak" of 1974 stands out like a sore thumb in the statistics of violent tornadoes. No day of comparable magnitude is contained within the record of tornadoes. Is that because it truly is the most important tornado outbreak in history, or does it mean that some events in the past were highly underestimated because of the way we obtain the data? The Tri-State tornado of 18 March 1925 is the longest track tornado in history. Is that because it truly is an extreme event or is it because it really was a series of tornadoes and the apparent continuity of the path is simply an artifact due to its having occurred in 1925 rather than 2005? See Doswell and Burgess (1988) for some discussion of this issue. We have not seen the like of such an event since 1925. This seems to indicate that we have no real idea what the frequency of long-track tornadoes really is - our sample of such is simply too small to have any confidence in any frequency estimates.
Therefore, anything that reduces the sample size can make it virtually impossible to say anything valid. Tornadoes and severe thunderstorms are relatively rare events at any given place. Even in regions of relatively high frequency, a long time can pass between significant events. The higher the threshold of an "event" is chosen, the longer the average interval between such events. Increasing the threshold simply reduces the sample size. It's inescapable that the existing data do not permit any conclusive interpretations of the observations.
See here for a discussion related to sample size in the tornado data base.
Comparing observations from the past with current (and future) observations can be a case of "apples versus oranges." Please, don't waste my time with this pointless speculation. I just don't believe the observations can be used to infer the impacts of climate change on the frequency and/or intensity of severe thunderstorms and tornadoes - barring some methodological breakthrough I don't currently foresee. There might be valid conclusions one could draw about other observed variables that are empirically connected to severe convective storms, such as the location and strength of jet streams, or low-level moisture patterns - but it's a large leap from these "proxy" variables to actual severe convective storm events.
It's scientifically and logically essential to draw comparisons from comparable information. Anything less is fundamentally flawed and is a waste of everyone's time and resources. I don't believe that inappropriate comparisons should be as common as I find them to be. Yes, it's extra effort to be careful, but in scientific papers, extra effort should be the norm, not an exception.
(all of these are available here)
Brooks, H.E., C. A. Doswell III, and M. P. Kay, 2003: Climatological estimates of local daily tornado probability for the United States. Wea. Forecasting, 18, 626-640.
Doswell, C.A. III, and D.W. Burgess, 1988: On some issues of Unites States tornado climatology. Mon. Wea. Rev., 116, 495-501.
Doswell, C.A. III, H.E. Brooks, and M. Kay, 2005: Climatological distributions of daily local nontornadic severe thunderstorm probability in the United States. Wea Forecasting, 20, 577–595.
Schwartz, B.E. and C.A. Doswell III, 1991: North American rawinsonde observations: Problems, concerns, and a call to action. Bull. Amer. Meteor. Soc., 72, 1885-1896.
Speheger, D.A., C.A. Doswell III, and G.J. Stumpf, 2002: The tornadoes of 3 May 1999: Event verification in central Oklahoma and related issues. Wea. Forecasting. 17, 362-381.