Posted: 01 November 2008 Updated: 25 November 2008: some minor rewording and added a missing reference
This essay is intended as a response to the articles by de Elía, R., and R. Laprise (2003, 2005), who apparently were unaware of the widespread use of subjective probability in weather forecasting - it has not been reviewed and so (as usual) represents only my opinion. My experiences with the BAMS editor suggest to me that submitting this for formal publication will not be a productive exercise - hence, its appearance here.
Comments can be sent to me at cdoswell # earthlink.net (either use the email hyperlink or cut and paste into your emailer after replacing ' # ' with '@'. If you're not willing to see your comments posted here (or to give me a reasonable explanation for why not), then don't waste my time and yours.
Anyone who has attempted to forecast the weather surely has found that weather forecasting involves uncertainty. There are many sources for this uncertainty, with two distinctly different origins: (1) uncertainty about the initial state of the atmosphere and (2) uncertainty about the processes that govern the evolution from the initial state to the time for which the forecast is valid - that is, model uncertainty. Lorenz (1963) was the first to demonstrate that uncertainty in the initial conditions, even when extremely small, can result in a complete loss of predictability at some point in the forecast whenever the equations governing the evolution of the atmospheric state are nonlinear, even if those equations are known perfectly. We're unable to observe the state of the atmosphere without error, and the resolution of our observations means that two seemingly identical states might include unsampled variability that means the match is not truly perfect. Since the science of the atmosphere (and its mathematical models) also is likely to remain less than perfectly understood indefinitely, it's been established firmly that forecasts involve uncertainty, and will continue to do so for the indefinite future despite advances both in observational capability and in our collective understanding of the atmosphere.
Another consequence of Lorenz's work is that the divergence between two similar initial states varies as a function of that initial state. In some situations, forecast confidence decays rapidly, while in other situations, forecast confidence remains high for a relatively long time. Again, anyone forecasting the weather has likely experienced this situational dependence whereby some weather patterns have high confidence and others have low confidence associated with them. It's no coincidence that the importance of nonlinear dynamics was pioneered by a meteorologist (the late Edward Lorenz)
Predictability of the weather is clearly a function of the scale of atmospheric processes. In general, as the scale increases, the time scale of predictability increases – forecasting the location of a synoptic-scale front 12 h ahead is much easier that forecasting details about a thunderstorm along that front even 1 h ahead. When considering examples of specific types of phenomena, the variability of the atmosphere means that, for example, one front is virtually never identical to another, similar front. Therefore, the predictability of a given case can vary from one example of such a case to another.
Does the atmosphere ever truly repeat itself? Our current understanding of the atmosphere, as described in the deterministic equations that encapsulate our knowledge, indicates that if the state of the entire atmosphere ever came back to some state it had occupied in the past, it would be perfectly periodic and would repeat itself endlessly. This abstract notion was discussed by Lorenz (1967, Ch. 1) – it seems highly unlikely that the atmosphere would ever find itself in a state that exactly repeated a previous state in all detail down to tiny gusts of wind, despite our inability to rule out such a possibility by logic alone. The governing equations include periodic solutions but our experience strongly suggests that the weather never repeats exactly. Lorenz (1967, p. 6) observes that "… non-linearity does not assure [emphasis added] non-periodicity" but despite producing weather systems that at least appear to be similar to those of the past, the evolution of those systems eventually diverges from those previous examples. Given that processes in the atmosphere are coupled to the underlying surface (which changes on geological time scales), including the oceans (which have comparably nonlinear governing equations and observational uncertainty), as well as to inputs from outside the atmosphere (which change on astronomical time scales), if the exceedingly unlikely premise of a return to a former atmospheric state should ever occur, the atmospheric evolution could only repeat if these non-atmospheric contributions also were identical.
Analog forecasting based on finding previous atmospheric conditions that are similar to a current state has some prospects for success, provided the atmospheric state associated with the analogs is not within a part of the phase space where slightly different initial states diverge rapidly. Of course, analog forecasting cannot result in perfect forecasts, in part because of our inability to make perfect observations, but primarily because the atmospheric state is virtually certain never to repeat itself precisely. I’ll assume hereafter, for the purposes of this essay, that the atmosphere is not periodic – the state of the atmosphere never repeats.
The point of this rather abstract introduction is to indicate that purely deterministic forecasting can never be absolutely certain, barring some unforeseeable breakthrough. To the extent that the level of uncertainty can be estimated, a forecast that does not include uncertainty information is at best incomplete. This notion has been expressed before in a variety of contexts (e.g., Sanders 1963; Fleming 1971; Ntelekos et al. 2006). Nevertheless, in my experience, many people, including some forecasters, are unwilling to accept that uncertainty is inevitable and an important component of any forecast. There are several common arguments that I hear repeatedly to justify avoiding having uncertainty statements included in weather forecasts. This essay is an attempt to provide a response to these common objections. My goal is to motivate the acceptance of what the science of meteorology (notably, as embodied in the work of Lorenz) has shown to be inevitable – forecasts are uncertain and we do our forecast users a disservice by not providing uncertainty information as a routine part of all weather forecasts.
One major source of concern expressed by many opponents of probabilistic forecasting is that human forecasters can only guess at their uncertainty - this was the substance of the material presented by de Elía and Laprise in their two articles. According to the proponents of this argument, probabilities derived this way are not objective and so are not "true" probabilities. Addressing this objection requires pursuing several threads so bear with me as I try to respond to this widespread concern.
To begin, I suspect that many forecasters do not realize that it is possible in principle to calculate forecast uncertainty. This involves the so-called Liouville equation (Ehrendorfer 1994), which is an expression of what has been called "stochastic dynamics" by Epstein (1969). Statistics and dynamics can be combined in such a way that the probability density functions (pdfs - to be distinguised from Adobe "PDF" file format) for state variables are forecast in a way comparable the way numerical weather predicition (NWP) forecasts values of the state variables themselves (Thompson 1985). The complexity and daunting computational requirements for accomplishing this goal mean that this concept remains rather far from a practical system for predicting forecast uncertainty. Nevertheless, it provides a theoretical basis for a wholly objective process by which uncertainty information could be generated.
Since stochastic dynamics remains for the moment technically impractical for use in weather forecasting, on what basis can forecast probabilities be derived in an objective fashion? In classic probability theory, the determination of probabilities follows from either a symmetry property of the system being considered or is derived from what de Elía and Laprise (2005 – hereafter DL05) call a "frequentist" interpretation. DL05 state, "There are several schools of thought regarding the interpretation of probabilities, and they intend to capture the meaning of probability in all its complexities." They go on to assert that, "All of them, however, suffer flaws whose severity varies depending on the perspective of the individual users … "
It isn't evident that atmospheric processes involve some symmetry property comparable to the sides of a coin or the faces of a die. Hence, it seems unlikely that symmetry can be exploited to determine probabilities.
A purely frequentist approach to probability traditionally involves determining the outcome of a large number of trials involving identical objects (e.g., the classic statistical textbook notion of drawing colored balls in an urn). Probability in this context is derived as the limit of frequencies determined from sampling trials as the number of trials goes to infinity. If my assumption that the atmosphere never repeats itself is valid, all atmospheric states are unique and different. Therefore, it’s logically impossible to conduct even a small sample of trials regarding a particular atmospheric state. Every atmospheric state is a sample of one, and will never happen again. It appears that we cannot use a frequentist approach to develop forecast probabilities, either. At most, we could categorize situations into groupings that involve some degree of similarity and attempt to derive frequencies for events associated with patterns that fit within a single grouping. This latter method has been used in some studies but is subject to question on the basis of the grouping process.
However, there are ways to develop forecast probabilities that are closely related to a frequentist interpretation. These involve statistical methods such as screening regression (e.g., as in the so-called model output statistics [MOS] methods, or ensemble methods). A detailed treatment of this is well beyond the scope of this essay, but interested readers can consult Glahn and Lowry (1972) or Klein and Glahn (1974) for more information about them. Briefly, MOS is based on the relationship between a combination of state variables derived from model output for many cases and the observed events. Probabilities are extrapolated from event ("predictand") frequencies in relation to a limited suite of state variables ("predictors") selected from model output for forecasts during some fixed period. A key element is that the set of predictor variables is relatively small and very similar "states" can be found readily by considering only those variables. A statistically-derived relationship between predictor variables and the predictand can be used to estimate probabilities with a frequentist interpretation. However, the limited set of predictor variables means that it is a necessarily limited description of the atmospheric state. It can be thought of as “squinting” at the atmosphere so that the image is very blurry and then describing that blurry image’s characteristics only in terms of what can be "seen" – to the exclusion of aspects not "visible" to the system.
Ensemble probabilities are determined by considering explicitly the uncertainty in the initial conditions and/or the governing model equations. Each ensemble member can be viewed as a possible realization of the state of the atmosphere at the valid time of the forecast. The ensemble then accounts for the uncertainties in initial conditions and/or model dynamics by revealing the variability in the forecasts within the set of realizations (i.e., the ensemble). Ensemble prediction is a sort of "poor man’s stochastic dynamics" (Thompson 1985) that can be and is being implemented as a practical forecast system with current technology. The pdfs of the forecast variables could be determined by something akin to Monte Carlo methods, rather than being forecast directly.
Finally, forecast uncertainty information can be provided by what are called subjective probabilities (Murphy and Winkler 1971). Apparently, de Elía and Laprise were unaware of the long successful history of the use of subjective probabilities in weather forecasting. In DL05, it is said that
A distinctive characteristic of this interpretation of probability is that different persons may have conflicting beliefs about the likelihood of occurrence of an event, and all are equally valid in principle (although we will tend to believe whoever has the best record with respect to previous forecasts). It is also highly possible, as is the case with other subjective appreciations, that even the experts would be somewhat affected by nonrational behavior … and cognitive illusions … . The main advantage of this interpretation is the fact that it allows predictions of situations that have no previous record.
This represents a succinct statement of the main objection to probabilistic forecasting that is the topic of this section. Give 100 forecasters the same information and you might obtain 100 different probability estimates. According to this sort of interpretation, subjective probability is little more than guessing; it's seen as nothing more substantial than the opinion of the individual forecaster, and not a valid probability value. Murphy and Winkler (1971) have offered a cogent set of arguments indicating this is an inappropriate view of subjective probability, at least as used by operational weather forecasters.
Moreover, such a conclusion seems to be based on a very negative view of human weather forecasters. If 100 forecasters would arrive at 100 different probability estimates for some event in a given forecast situation, how different would those estimates be? What would that distribution of those probability forecasts look like? Would they run the gamut from zero to 100%? Would they be evenly distributed across some range or would they have one or more peaks within that range? What would the relationship be between those subjective probability estimates and the observed outcomes?
It turns out that I'm not forced to speculate about the quality of subjective forecast probability estimates. Since the inception of probability of precipitation (PoP) forecasting in 1965, National Weather Service (NWS) forecasters have been producing PoP forecasts routinely, so a basis for answering some of these questions regarding subjective probability estimation already exists. On average, it appears that NWS PoP forecasts are much more than mere guesses – they exhibit many characteristics of a good forecast (as defined in Murphy 1993), including the specific property called reliability – the observed frequency of precipitation corresponds rather well to the forecast probabilities (Murphy 1993). It even has been shown (e.g., Murphy and Winkler 1982) that subjective forecast probability estimates of rare events, such as tornadoes and severe thunderstorms, are reliable.
Provided subjective probability estimates are used in accordance with the laws of probability, they are valid probabilities. Or perhaps, more correctly, they are valid probability estimates. How accurate those estimates are is an issue that can be resolved by verification. The key to successful subjective probability estimates is that the forecasters need some experience with doing so and must be given feedback about how well they are doing. This is referred to as "calibration" of the forecasts (see Doswell 2004). When forecasters are properly calibrated, individual differences remain, but the probability estimates produced by calibrated forecasters will not be so diverse as suggested by those denying them as being of value in estimating uncertainty. Some variation of the uncertainty from one forecaster to the next is inevitable, but subjective probabilities are inherently part of a human forecaster’s cognitive processes, whether they recognize it or not. Experienced, calibrated forecasters are good enough at estimating their uncertainty that this common objection is simply not a valid basis for rejecting probabilistic forecasting. Information useful to the users about forecast uncertainty is contained in subjective probabilistic forecasts because they have been calibrated via verification.
Another common objection to probabilistic forecasting is that users don’t want probabilities. They want categorical forecasts – "Will it freeze in my citrus grove next Saturday or not?" Probabilities are perceived by some as the formal expression of forecasters being unable to make a decision. It's understandable that forecast users would want to have forecasters make dichotomous (yes/no) forecasts for weather events. In effect, such a forecast makes the weather-related decision for the user . For the hypothetical citrus forecast, if the forecast is for subfreezing temperatures, some freeze protection mechanism should be employed, whereas if the forecast is for above freezing temperatures, no protection is needed. Unfortunately, making decisions for users in this way fails to account for all the non-meteorological elements that go into such a decision - forecasters would have no way of knowing all those non-meteorological factors for any but a tiny minority of forecast users.
Recent evidence (Morss et al. 2008) indicates that users of "deterministic" forecasts already understand what seems obvious - that there's some uncertainty about such forecasts, even when that uncertainty isn't stated explicitly. The problem is with not providing quantitative information about the level of the forecaster's uncertainty in a given situation – this leaves the user in the position of having to guess what that uncertainty might be, perhaps on the basis of his/her own experience. Deterministic forecasts don’t allow the forecaster to clarify the uncertainties for the user – in some cases, it’s lower uncertainty than the average, but in other cases, the uncertainty is higher. And the expected situation might be bimodal – either it’s likely to be warm and sunny, or cool and rainy – rather than a unimodal distribution of values around a most likely forecast value. Some users clearly would benefit from this "withheld" information in making their own weather-related decisions.
Of course, the forecaster may experience some anxiety about making this decision on behalf of the user. Why? Because of the inevitable uncertainty. A dichotomous forecast implies a high level of confidence about the outcome that is, in most instances, unwarranted. And a forecast user is forced by experience to recognize that the forecasts are not always right. If the user accepts a forecast for subfreezing temperatures in the citrus grove, there are costs associated with taking protective action. But if a forecast for above-freezing temperatures turns out to be wrong, the crop could be a total (or partial) loss. A user’s decision about what to do with the weather information received involves more than just the uncertain weather forecasts. As discussed by Murphy (1985b), a decision depends on the cost/loss ratio for the user, which is not known to the forecaster in almost all cases. Cost/loss ratios can vary widely among users and the forecaster cannot possibly know everyone’s cost/loss ratio. The only logical choice in this situation is for the forecaster to produce the best possible forecast and leave the decision about how to use that forecast information up to the user.
When the forecast is wholly dichotomous, however, then any uncertainty about the weather forecast is left for the user to estimate! A user who knows those forecasts are not perfect must make decisions based on any particular forecast by some means. A user for whom weather information is critical likely would need to develop some objective verification of the forecasts to guide the estimation of the inevitable uncertainty. Otherwise, the user would simply have to guess how to use the dichotomous forecast. But forecasters already have that information and have used it to calibrate their own forecast uncertainty estimates. Why not pass that along to the users as a routine part of the forecast?
Ideally, one property of forecasts is that they should be unbiased. Forecasters, however, may be influenced to make decisions in a biased way owing to an asymmetric penalty function. That is, for example, the choice to issue or not issue a tornado warning is strongly biased by the fact that no one is ever killed by a nonevent. A false alarm decreases warning verification scores, but a failure to issue a warning in the event a tornado actually occurs can be responsible for human casualties. In this case, asymmetry in the perceived penalty typically results in overforecasting the event. It is also easy to provide examples where such an asymmetry favors underforecasting.
But providing uncertainty information as part of every forecast (for events that have a subjective probability above some chosen threshold) would be very helpful to most forecast users (see the 'Sidebar' at the end of this essay). In particular, its greatest value would be to that segment of the forecaster users who are comfortable with incorporating that uncertainty information into their decision-making. It always is possible to convert probabilistic forecasts into dichotomous forecasts by choosing threshold probabilities for some event, but it's not possible, in general, to go the other way – a dichotomous forecast product represents an irretrievable loss of information.
Furthermore, providing probabilistic information removes a source of anxiety for forecasters – conveying uncertainty would no longer force the forecaster to make a binary decision. As an additional benefit, it would lessen the tendency to produce biased forecasts in the presence of asymmetric penalty functions. Forecasters would simply need to make accurate estimates of the probability of some event and leave the user's decision of what to do with the information to the user, who after all is the only one who knows the nonmeteorological input necessary for his or her personal decision in the face of uncertainty about the weather.
Another common objection to probabilistic forecasts is that users simply don't understand probability and can't make effective use of it. On the whole, those advocating this view seem to visualize the community of users as some sort of monolithic block, wherein all users are equally ignorant about probability. I disagree strongly with this viewpoint. As noted earlier, many users for whom weather information is of critical importance surely can and will make good use of uncertainty information in their decision-making. Of course, this does not rule out the likelihood that some users would be bewildered or confused by the addition of uncertainty information, choosing either to ignore it or even to seek its removal from the forecast.
It's important to understand that I'm not advocating we replace wholly dichotomous forecasts with probability statements. Rather, I'm advocating the addition of probabilistic forecast information to the forecast. As already noted, thresholds allow the systematic and consistent conversion of probabilistic forecasts to dichotomous statements, so if some users prefer not to use the probabilistic expression of the forecast, a categorical statement could easily be made available. But by not providing any forecast uncertainty information, we ignore the needs of those users most able to take advantage of it. We reduce the forecast to the lowest common denominator of the users and thereby limit the value of our products to the rest of the spectrum of forecast users.
Furthermore, there's no clear need for users to understand abstract probability theory to make effective use of probabilities. As a component of the process leading up to the choice to provide PoPs in NWS forecasts in 1965, studies (Hughes 1965) were done to find out how effective words could be in conveying uncertainty. It was found that phrases such as "chance of" or "likely" meant different things to different people, whereas a quantitative number representing probability is completely without ambiguity. A 30% probability is unambiguously less than a 40% probability. The only concern is whether or not forecasters can distinguish the level of uncertainty to some level of precision. Apparently, based on verification of PoPs in NWS forecasts, they can in fact do so within the context of the existing format for PoP forecasts, even when relatively inexperienced at the task (Murphy . Murphy (personal communication) indicates that much of the confusion about probability is not tied to the use of a probability estimate, per se. Rather the confusion is almost entirely about the event being forecast. Does a PoP refer to the probability of measureable rain within the 8 inch circle representing the mouth of the local NWS office’s rainguage? Or does it represent the fraction of the area that will experience measureable rainfall? Or does it represent the chance that rain will fall at the user’s home? Is it an area or a point probability? In my experience, this confusion might even characterize some fraction of the forecasters who issue the forecast! Morss et al. (2008) have validated that many forecast users don’t understand what is being forecast with Probability of Precipitation (PoP) despite the decades that have passed since its introduction in the mid-1960s.
Such confusion continues to the present, and likely will continue into the future. Nevertheless, this confusion evidently does not prevent NWS forecasters from issuing reliable PoP forecasts, as the verification shows, provided they are properly calibrated. And this confusion need not prevent users from making effective use of the information that PoPs convey. You don’t have to be able to construct a proof of Baye’s theorem to incorporate forecast uncertainty into your decision-making. Abstract probability theory is a challenging subject, but understanding it thoroughly is simply not necessary to use probability to understand uncertainty. Sanders (1963) said:
Aside from the merits or disadvantages of offering quantitative probabilities to the general public or other users of meteorological information, it is urged that probability be acknowledged as the proper internal language of forecasters.
Assuming that probability is the proper language of uncertainty, at least among forecasters, I would go one step farther than Sanders and assert that probability is not so inaccessible to users that we should forbid its use in forecasts to the public. The experience of the NWS with PoP should be a strong message that probability need not be a taboo word, despite the errors made in its introduction (notably, the absence of public education in the meaning and use of probabilistic information about precipitation). Rather, I believe it should become the lingua franca of uncertainty – familiar to forecaster and forecast user alike. Attempts to circumvent the use of probability numbers by verbiage are not likely to be effective, as words are always subject to misinterpretation.
There already is a broad understanding within atmospheric science that forecasts should incorporate uncertainty information. The American Meteorological Society has a formal statement about it [here] which I support wholeheartedly. Changing the forecast format to add uncertainty information requires considerable effort. Among other things, that effort would have to include studies about how best to express uncertainty for the events within the forecast, and that, in turn, requires interaction with a broad spectrum of forecast users. This should not just be a bureaucratic decision to alter the forecast format to include probability – as scientists we should appreciate the value of making decisions based on gathering the appropriate information. The change I am advocating also would require a substantial public education program to inform forecast users about the coming change. As part of that public education, the definitions of the events being forecast would need to be made widely available.
Additionally, forecasters themselves need preparation for such a change. With the single exception of PoPs, forecasters are not experienced with formulating subjective probabilities. If the change to add probabilistic forecast information were to result in poor probability estimates from the forecasters, public reaction to the change likely would be unduly negative. In order to accomplish this change effectively, then, I propose that forecasters would need several components.
Forecasts without uncertainty information impose a burden, both on forecasters and on forecast users. Forecasters are forced to make difficult decisions having large societal consequences without being able to express their level of confidence in their assessment of the meteorological situation. Users are forced to attempt to use those forecasts without any direct knowledge of how certain they are, despite knowing full well that forecasts are not always correct. An expression of uncertainty is simply an admission of the reality of weather forecasts that both forecasters and users already experience routinely. It is my belief that an honest statement of forecaster confidence in the forecast would increase the value of the forecasts to users and eventually result in an improved relationship between forecasters and the general public. Further, such a statement should be in probabilistic terms.
In my own personal experience, the Norman area forecast for one Saturday in the fall several years ago involved a quasistationary front. This meteorological situation produced an approximately equal probability for two very different scenarios. It would likely either be fair and sunny with temperatures around 70°F or be cloudy and rainy with temperatures below 40°F. It happened that there was an OU football game scheduled that day and there would be 80,000 people at the game. The forecast format did not allow for a probabilistic statement about the situation and the forecaster on duty had to make a dichotomous choice for that afternoon. Unfortunately, that forecast decision turned out to be the incorrect one – the forecast was for warm and sunny, whereas the observed weather was cold and rainy. Many attending the game were unprepared for the actual weather.
However, being a meteorologist, I wore shorts and a t-shirt to the game, but was aware of the uncertainty and carried a backpack with cold, rainy weather gear to put on in the event the front wiggled south of us, which it did. Had the forecast allowed the forecaster to express the uncertainty properly, many of the unhappy game fans would been informed enough to have made the decision to come prepared for either outcome. Since the forecast format didn't permit that option, valuable information effectively was blocked from the users.
I do not see the outcome as being the result of a poor forecast – any forecaster would have struggled with the decision and it's likely that if we had 100 forecasters make the decision independently, the result probably would produce a bimodal distribution of their forecasts. In fact, the variability among forecasters contained in the ensemble of their forecasts would likely be a reasonably accurate reflection of that uncertainty. But the format of the forecast as issued at the time did not allow the forecaster to make that uncertainty known.
What users might want from forecasts is 100% confidence in the forecasts. But the reality is that forecasters cannot provide that level of confidence, in general. And most intelligent users know that. On a summer day with temperatures well over 100°F, the probability of subfreezing temperatures in, say, the next 12 hours is certainly small, and indistinguishable from zero on the vast majority of such days. Near-absolute uncertaintyof that order is possible in some situations, but certainly not as a general rule. Therefore, except in very unusual circumstances, it would not be necessary to produce a forecast probability for subfreezing temperatures in the middle of an extended heat wave. Rather, a useful product would be a probability density function, which also could have multiple modes.