The DIME website help
Jochen Broecker, CATS
Disclaimer
This web page is under construction. Graphs and data published on this page may still be unreliable, errorneous or mislabeled. Under no circumstances will the CATS group or any member be liable in any way for any content of this web site, including, but not limited to, any errors or omissions in any content, or for any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via the DIME web site.
Background Information on Ensemble Forecasts
The atmosphere is a chaotic system. This means that its future behaviour is sensitive to small changes in its present state. This property is popularly known as the "butterfly effect" since it implies that the flapping of a butterfly's wings could change the behaviour of the atmosphere. In practice, the inability to precisely determine the current state of the atmosphere, due to the limited number of meteorological observations and the finite resolution of satellite sensors, sets a limit on how far into the future the weather can be predicted - even if forecasting models were perfect (which they are not). So-called "twin model" experiments, in which numerical models of the atmosphere are started using different initial conditions to reflect uncertainty in the initial state, indicate that chaos typically prohibits the deterministic prediction of atmospheric behaviour at times much beyond a few days.
This is an average, the predictability of the atmosphere changes over time. Ensemble forecasting is an attempt to quantify the predictability of the atmosphere. Seasonal forecasting attempts to forecast interannual variations in climate at lead times of 3 months and beyond. This does not contradict the results of twin model experiments because seasonal forecasts are forecasting climate rather than weather. Climate forecasts are probabilistic. An example of a seasonal forecast might be that Southern California can expect higher than average rainfall next winter - the forecast cannot specify on which days the rain will fall. Seasonal forecasts demonstrate skill at lead times of 6 months and beyond in some parts of the world. They are particularly skillful in the areas surrounding the tropical Pacific due to the El Niño phenomenon.
The atmosphere is a chaotic system. This means that small differences in its present state can lead to large differences in its future state. This behaviour is popularly known as "the butterfly effect" since it implies that the flapping of a butterfly's wings can change the course of a hurricane. In practice, limitations in how accurately we can measure the current state of the atmosphere are more important than disturbances caused by butterflies.
Weather forecasts are made by feeding an estimate of the current state of the atmosphere into a computer simulation and then simulating how the atmosphere will behave over the next couple of weeks. Sometimes a small error in the estimate of the current state will make a big difference to how the forecast turns out. Other times, because the atmosphere is in a more stable state, small errors are less important.
To determine whether the atmosphere is in a predictable, or an unpredictable, state ensemble forecasts are made. An ensemble forecast consists of a number of simulations made by making small changes to the estimate of the current state used to initialize the simulation. These small changes are designed to reflect the uncertainty in the estimate.
The ensemble forecasts on this website are based on the ensemble forecasts produced by the U.S. National Center for Environmental Prediction (NCEP). They have been enhanced, however, by including other sources of errors which are not accounted for in the NCEP ensembles. The result is an ensemble of 100 future outcomes, each of which can be regarded as equally likely. On some days these 100 outcomes will all be very similar, on other days they will be quite different.
Ensemble forecasts contain more information than a single forecast. But how can this information be displayed so that end-users can quickly and easily know what they need to know? The graphical user interface (GUI) on this site provides several options for displaying the forecasts.
Fan Charts show cumulative probability intervals. There is a 5% chance that the actual weather will fall in each of the shaded bands.
Spaghetti Plots show each of the ensemble members over the next few days. From this type of plot it can be quite easy to see roughly what fraction of the ensemble members fall above or below a certain threshold on a certain day.
Probability Plots are bar charts showing the fraction of ensemble members in a range of intervals for the selected date in the future.
Summary provides a text summary of the ensemble for the selected date in the future. The summary gives information such as the ensemble mean, the standard deviation, the median and the 10th and 90th percentiles.
One of our aims is to improve the GUI to make straightforward to use, and make sure it provides information in a form that end-users can understand. Feedback concerning the GUI, suggesting possible refinements or new features, will be very welcome.
The advantage of ensemble forecasts is that they give an estimate of the probability of a particular type of weather occuring. For example, the probability of it raining, or the probability of it freezing. What do these probabilities mean? Does saying that there is a 50% chance of rain mean that you don't know anything? Well, no, it depends on how often it rains on average. In Britain it does actually rain on about 50% of days, so saying that there is a 50% chance of rain is really little better than a guess. However, if it only rained on 5% of days, on average, (as it does in arid places), saying that there is a 50% chance of rain tomorrow is an informative forecast. In fact, even saying that there is a 20% chance of rain is telling you something that you wouldn't have guessed.
How can you make a decision based on an ensemble forecast? With a traditional single "deterministic" forecast making decisions is easy. If the forecast for tomorrow says it will be fine and sunny we go for a picnic, if the forecast says it will rain we don't go for a picnic. But what if it doesn't rain in 80% of the ensemble members and it rains in the other 20% - do we go for a picnic or not?
The best way to think about this type of decision is in terms of utility. Utility is a term economists like to use, it's just a measure for quantifying how much we like different outcomes. For example, let's say that the utility of staying at home and watching the telly is 0 (the absolute numbers don't really matter, only the relative amounts). Also, let's say the utility of going on a picnic when the weather is sunny is +10, but going on a picnic and being rained on has a utility of -30. Note all these values will depend on the individual, if you like walking in the rain you might give being rained on a higher utility.
We can summarize all of these utilities in a utility matrix:
| STAY AT HOME |
0 |
0 |
| GO ON PICNIC |
+10 |
-30 |
If we know that the probability of rain is P, we can work out our expected utility for each action (staying at home or going on a picnic). The expected utility if we stay at home is
Expected Utility (home)
The expected utility in this case is zero since whether it rains or not the utility of staying at home is zero.
The expected utility of going on a picnic is
Expected Utility (picnic)
The expected utility is calculated by taking a weighted average of the utility in the case of no rain and the case of rain, the weights are the corresponding probabilities.
So should we go on the picnic? Well the expected utility of going on the picnic exceeds the expected utility of staying at home when
This will occur when
So we should go on the picnic only if the probability of rain is less than 25%. We demand such a low probability because, in this case, the discomfort of being rained on greatly outways the pleasure of going on a dry picnic. If we make our decision this way it is still possible that we will go on the picnic and be rained on, or that we will stay at home and miss out on a sunny day. But, averaged over a year say, we will maximize our happiness if we make decisions that maximize our expected utility for each individual choice.
Placing quantitative utilities on experiences such as being rained on can be very tricky, and is often not worth the effort for everyday low-risk decisions. In many business decisions, however, the utility can be equated with money and this is a natural quantification of the desirability of outcomes (more profit is better than less profit, and small losses are better than large losses).
In business the principle of maximization of expected profits can be used when making decisions under uncertainty.
Suppose a pavement cafe has room outside for an extra 20 people when the weather is fine. This extra capacity, however, can only be exploited if an extra waiter is available to wait on the extra customers. The owner of the cafe must decide two days in advance whether to ask one of her waiters to work an extra afternoon shift. The owner could look at a traditional, single forecast and if the forecast says it won't rain ask the waiter to work. This is not the best thing to though. If the owner has access to an ensemble or probability forecast they can do better. If the probability of rain, according to the ensemble forecast, is P, should the owner ask for an extra waiter? To make this decision the owner must be able to estimate the potential losses and gains that are at stake. Suppose it costs £40 to have a waiter work the extra shift. If it doesn't rain the extra capacity will result in an extra income of £200, less the £40 required to pay the waiter. If it does rain the owner must still pay the waiter the £40 but gets no extra income to offset this loss. These gains and losses can be summarized in a "decision matrix":-
| |
NO RAIN |
RAIN |
| NO EXTRA WAITER |
£0 |
£0 |
| EXTRA WAITER |
+£200-£40=+£160 |
-£40 |
The expected profit if the owner doesn't ask for the extra waiter is £0 - this is certain. If the owner asks for an extra waiter the expected profit is calculated by averaging over the two possibles outcomes, rain and no rain, weighting with the corresponding probabilities of
and
respectively.
Expected Profit
If the expected profit obtained by asking the waiter to work exceeds the expected profit of not asking the waiter to work (which is £0) then the owner should ask the waiter to work. That is, if
This condition is satisfied if P<0.8. Thus, if the probability of rain is less than 80% the owner should ask a waiter to work an extra shift. It is important to note that often, when the owner has asked the waiter to work, it will in fact rain and the owner will lose £40. But, these occasions will be more than made up for by the times when it doesn't rain and the owner gets the extra income from the 20 seats outside. To appreciate this, look at Figure 1.1. The blue line shows the cumulative extra income of a cafe owner who only uses the traditional style, single forecast. If the forecast is for no rain the owner asks for an extra waiter. The red line shows the cumulative extra income who makes their decisions with an ensemble forecast, as described above.
Figure 1.1:
A comparison of the cumulative income of a fictitious cafe owner using a traditional forecast (blue line) and one using an ensemble forecast (red line). All forecasts and verifications were obtained from the U.S. National Centers for Environmental Prediction.
 |
With a traditional single forecast measuring how good it was is straightforward; how far was it from what actually happened? With an ensemble things are a little trickier. Suppose you had an ensemble of rainfall forecasts and it rains in 90% of the ensemble members, implying there is a 90% chance of rain. Then suppose it doesn't rain - was the forecast wrong? It is not really possible to evaluate any single ensemble forecast. Instead, you have to consider their performance over a period of time. Two properties are desirable in an ensemble forecast; reliability and sharpness.
Reliability is a measure of how accurate the probabilities are. Ideally, if we consider all the days for which the forecast said there was a 75% chance of rain then it should have rained on 75% of those days, if we consider all the days for which the forecast said there was a 10% chance of rain, it should have rained on 10% of those days, and so on. If the actual weather lies within the range of the ensemble when it's plotted on a fan chart or spaghetti plot then the forecast is reliable. However, there's more to good forecasting than reliability, we must also consider sharpness.
Sharpness is a measure of how precise the forecasts are. If it rains on 50% of days (about right for parts of the U.K.) I can issue a forecast saying there is a 50% chance of rain every single day, and according to the definition given above these forecasts will be very reliable. While reliable however, these forecasts will not be very informative, this is because they lack sharpness. The more forecasts that are issued with high (close to 100%) or low (close to 0%) probabilities then the more sharpness we say the forecast has. Another way of thinking about sharpness is that it describes how narrow the ensemble forecast is when it's plotted as a fan chart or spaghetti plot.
The overall quality of an ensemble forecast depends on both its reliability and its sharpness. For a given level of reliability more sharpness means better forecasts, and vice versa. Another way of saying this is that when we see the ensemble plotted on a fan chart or spaghetti plot we want to be confident that the actual weather will lie within the spread of the ensemble (reliability), but also, we'd like that spread to be as narrow as possible (sharpness).
Figure 2.1:
A screenshot of the DIME Operational Ensemble Forecast start page.
 |
Click on one of the regions to view probabilistic forecasts of weather variables for that region using a Java Applet. If your browser does not support Java Applets download Java for free from Sun Microsystems or select the GUI screenshots to see some examples of the graphical user interface in action.
The aim of this site is to introduce users to the concept of probabilistic and ensemble forecasting.
For an explanation of ensembles, and how they can be used, see Chapter 1.
The DIME Forecast Interpretation Methods Website
Figure 3.1:
A screenshot of the DIME Forecast Interpretation Methods start page.
 |
General Information About This Website
The aim of this site is to introduce users to the DIME research
concerning the comparison of probabilistic and ensemble
forecasts. Considering the pure ensemble forecasts as a raw
product, DIME strives to disseminate long time performance
evaluation according to the various demands of different end
users.
A problem users are faced with is the lack of information on
how the multitude of available products compare. Not only has he to find the most appropriate forecast source, but alo decide among the various post-processing and dressing techniques existing. The potential benefits of these factors heavily depend on particular end
users applications. DIME adresses this by comparing and
evaluating various different dressing techniques applied to various ensemble forecast sources. The results
of these studies are disseminated via this web site.
Interactive pages allows the user to choose source, location,
variable and postprocessing technique to be evaluated.
Forecasts of different variables of interest for various
locations are calculated in the form of probability density
functions (for example, by ``kernel dressing''). These
forecasts are not operational in real time but are compared in
terms of their long term performance, using various measures of
performance. The result is a plot of the skill over lead time.
The Locations and Variables Selection Form
On this page you can select various results from DIME's forecast
evaluation research. On this and the following page you can specify
two of the many forecast products that have been generated and
evaluated by DIME. EVentually a plot of their relative performance is
shown. To obtain an absolute performance you will have to compare the
forecast under concern to climatology as one the reference forecast.
To choose among the forecast products on this web site, you will have
to specify your location and variable of choice. For a closer
explanation of these parameters, see the subsequent
Sections 3.2.1 and 3.2.2. On a
subsequent form, you are asked to specify the two forecasts you
actually want to compare. This form is explained in more detail in
Section 3.3.
Locations
The parameter locations specifies, as the name
suggests, the location on the globe the forecast is made for.
The locations for which DIME has made forecast evaluations are
given in Table 3.1.
Table 3.1:
The locations for which DIME has made forecast evaluations
| Frankfurt International Airport
|
| IATA-Code |
: | FRA |
| Position |
: | 008:34:00E050:02:00N |
| Runway Elevation |
: | 364 ft |
| GMT Offset |
: | -1 |
| Verifications |
: | Temperature 10m over ground at 12 UTC |
| Source |
: | faked by taking NCEP high
resolution forecast initialized at 0 UTC with lead time 12h
plus a little observation noise. |
|
| London Heathrow Airport |
| IATA-Code |
: | LHR |
| Position |
: | 000:27:00W051:28:00N |
| Runway Elevation |
: | 80 ft |
| GMT Offset |
: | 0 |
| Verifications |
: | Temperature 10m over ground at 12 UTC |
| Source |
: | WMO Station Nr 03772 |
|
| Tokyo Narita International Airport |
| IATA-Code |
: | NRT |
| Position |
: | ? |
| Runway Elevation |
: | ? |
| GMT Offset |
: | ? |
| Verifications |
: | Temperature 10m over ground at 12 UTC |
| Source |
: | faked by taking NCEP high
resolution forecast initialized at 0 UTC with lead time 12h
plus a little observation noise. |
|
| Chicago O'Hare Airport |
| IATA-Code |
: | ORD |
| Position |
: | 087:54:00W041:59:00N |
| Runway Elevation |
: | 667 ft |
| GMT Offset |
: | +6 |
| Verifications |
: | Temperature 10m over ground at 12 UTC |
| Source |
: | faked by taking NCEP high
resolution forecast initialized at 0 UTC with lead time 12h
plus a little observation noise. |
|
| Sydney Airport |
| IATA-Code |
: | SYD |
| Position |
: | 151:10:00E033:56:00S |
| Runway Elevation |
: | 21 ft |
| GMT Offset |
: | -10 |
| Verifications |
: | Temperature 10m over ground at 12 UTC |
| Source |
: | faked by taking NCEP high
resolution forecast initialized at 0 UTC with lead time 12h
plus a little observation noise. |
|
|
Variables
The parameter variable specifies, as the name suggests, the variable the forecast is made for. As you can tell from Section 3.2.1, the following variables are currently employed:
Temperature 2m over ground for 12h UTC
for all the locations on choice. Recall that for some locations (as mentioned in Section 3.2.1) the actual variable was not available. Instead, the observation was ``faked''. Here is a closer description.
- WMO Station
- Actual weather stations maintained by the World Meteorological Organisation. They have a unique identification number. See the list of locations (Section 3.2.1) for the WMO stations employed.
- Faked NCEP
- Emulated observations for 12h UTC by using the NCEP high resolution forecast (see Section 3.3.1) initialized at 0 UTC with lead time 12 UTC with some ( 10%) observation noise added.
The Forecast Interpretation Selection Form
Figure 3.2:
A screenshot of the DIME Forecast Interpretation Methods selection page.
 |
This page
(see Figure 3.2)
allows you to
select the two forecast interpretation methods you actually want to
compare. The left resp. right columns specify the first resp. second
forecast interpretation method. To fully specify the forecast, choose
the source for the raw forecasts
(see Section 3.3.1)
in the pulldown menu ``Forecast
Source'' and the actual interpretation methods
(see Section 3.3.2)
in the pulldown menu ``Forecast Interpretation''.
Forecast Data Sources
The premier forecast data sources for DIME are the National Center for Environmental Prediction (NCEP) in the USA and the European Centre for Medium Range Weather Forecast (ECMWF) in Reading, UK. Table 3.2 gives a closer description.
Table 3.2:
The forecast data sources for DIME
| NCEP High Resolution
|
| Initialisation Time |
: | 0h UTC |
| Ensemble Members |
: | 1 Unperturbed |
| Grid |
: | 2.5
2.5
Lat-Lon Grid |
|
| NCEP Ensemble |
| Initialisation Time |
: | 0h UTC |
| Ensemble Members |
: | 10 Perturbed + 1 Unperturbed |
| Grid |
: | 2.5
2.5
Lat-Lon Grid |
|
| ECMWF High Resolution |
| Initialisation Time |
: | 12h UTC |
| Ensemble Members |
: | 1 Unperturbed |
| Grid |
: | 2.5
2.5
Lat-Lon Grid |
|
| ECMWF Ensemble |
| Initialisation Time |
: | 12h UTC |
| Ensemble Members |
: | 50 Perturbed + 1 Unperturbed |
| Grid |
: | 2.5
2.5
Lat-Lon Grid |
|
|
Note: Different Initialisation Times
Note that ECMWF has an initialisation time 12h later than NCEP. As explained in Sections 3.2.2 and 3.2.1, all verification data has verification time 12h UTC. Therefore, forecast from ECMWF and NCEP verifying at the same time differ in their ``age'' by 12 hours. We adopted the convention that for a forecast with lead time
days, we use the most recent forecast available being at least
days old. E.g. a forecast verifying at 5.Jan. 2005, 12h UTC with lead time 4 days blending NCEP and ECMWF ensembles would use the NCEP ensemble issued on 1.Jan. 2005, 0h UTC with lead time 4d, 12h and the ECMWF ensemble issued on 1.Jan. 2005, 12h UTC with lead time 4d. Thus all forecasts have at least 4 days lead time, but as little exess lead time as possible. We think this agrees with what one would do in reality according to common sense.
Forecast Interpretation Methods
Figure 3.3:
The form to select the forecast sources and the actual forecast interpretation (``dressing'') method. The forecast interpretation method menu is pulled down.
 |
The pulldown menu ``Forecast Interpretation'' (see Figure 3.2 and 3.3) allows to select the method that shall be used to interprete the raw forecast. A brief explanation of the available methods is given in Table
. For a thorough explanation of the forecast interpretation methods, the training algorithms and other features the reader is referred to the emtool-manual [#!emtoolmanual!#].
List of Forecast Interpretation Methods
The following Table
is a list of the forecast interpretation methods currently available. Some of them treat the ensemble output as scenarios (MOS-approach), some of them are more general (EMOS-approach). The following two sections explain briefly the techniques of dressing and boosting that are essential for the presented interpretations.
- Climatology
- Ignores the ensemble alltogether and fits a kernel estimator (see
) to the observed data
- Gauss
- This interpretation assigns a gaussian to each ensemble. The mean
and the standart deviation
of the gaussian are set to
where
resp
are the mean resp the standart deviation of the input ensemble (where the standart deviation of a singleton ensemble is equal to 1, for example if the input ``ensemble'' is in fact the high resolution forecast). The parameters
and
are trained by minimizing ignorance over a training set. A first version of this ensemble interpretation method did not have the parameter
, but its performance was very poor. If more than one ensemble is involved (e.g. if the NCEP and the ECMWF ensemble are combined), the interpretations for each ensemble are blend together in a weighted sum. The weights are calculated in a second training stage.
- Jewson
- Essentially like Gauss, but the mean
and the standart deviation
of the gaussian are
where
resp
are the mean resp the standart deviation of the ensemble (the standart deviation of a singleton ensemble is equal to 1). The parameters
are trained by minimizing ignorance over a training set. This method was suggested by Steve Jewson in [#!jewson03!#]. If more than one ensemble is involved (e.g. if the NCEP and the ECMWF ensemble are combined), the interpretations for each ensemble are blend together in a weighted sum. The weights are calculated in a second training stage.
- Kernel Dressing (Version 1)
- Each ensemble is kernel dressed (see 3.3.2)
where the mean
and the standart deviation
of the kernel that dresses ensemble member
are
where
resp
are the
th ensemble member resp the standart deviation of the ensemble (the standart deviation of a singleton ensemble is equal to 1, for example if the input ``ensemble'' is in fact the high resolution forecast). The parameters
and
are trained by minimizing ignorance over a training set. If more than one ensemble is involved (e.g. if the NCEP and the ECMWF ensemble are combined), the interpretations for each ensemble are blend together in a weighted sum. The weights are calculated in a second training stage.
- Kernel Dressing (Version 2)
- Essentially like Version 1, but the mean
and the standart deviation
of the kernel that dresses ensemble member
are
where
resp
are the
th ensemble member resp the standart deviation of the ensemble (the standart deviation of a singleton ensemble is equal to 1, for example if the input ``ensemble'' is in fact the high resolution forecast). The parameters
are trained by minimizing ignorance over a training set. If more than one ensemble is involved (e.g. if the NCEP and the ECMWF ensemble are combined), the interpretations for each ensemble are blend together in a weighted sum. The weights are calculated in a second training stage.
- Error Dressing
- In this method, a set of ensemble member error distributions are calculated from an archive of ensembles and verifications. For an ensemble of 11 members, we can calculate the 11 distributions, namely the distribution of best member errors, a distribution of second best member errors etc down to a distribution of 11th best member errors. To each of this distributions, a climatology is fitted (see above). Each of these climatologies (or even combinations) could potentially serve as a kernel for dressing individual ensemble members. We adopted a ``greedy'' strategy where the kernel is chosen that yields the smallest ignorance. If more than one ensemble is involved (e.g. if the NCEP and the ECMWF ensemble are combined), the interpretations for each ensemble are blend together in a weighted sum. The weights are calculated in a second training stage.
- Sufficient Statistics Emos
- Essentially like Gauss, but the mean
and the standart deviation
of the gaussian are determined by a nonlinear fitting procedure using radial basis functions (rbf's).
Here
is a symmetric input feature determined from the ensemble. See the subsection 3.3.2 for a closer explanation.
If more than one ensemble is involved (e.g. if the NCEP and the ECMWF ensemble are combined), the interpretations for each ensemble are blend together in a weighted sum. The weights are calculated in a second training stage.
- Sufficient Statistics Emos (Boosted Weights 3.3.2)
- Essentially like normal Sufficient Statistics Emos, but instead of weighting over all constituend ensembles, a series of models is blend together using boosting, see
.
- Error Dressing (Boosted Weights 3.3.2)
- Essentially like normal Error Dressing, but instead of weighting over all constituent ensembles, a series of models is blend together using boosting, see 3.3.2.
- Parzen
- Essentially like Kernel Dressing Vesion 1, but all parameters (including the weights over constituent ensembles) are trained sinultaneously on ignorance, essentially resulting in trainig Parzen estimators to be a single large optimisation problem.
Kernel Dressing
Boosting Weights
Symmetric Input Features
Presentation of Results
Figure 3.4:
A screenshot of the DIME Forecast Interpretation Methods start page.
 |
This page shows the results of the forecast evaluation you have
choosen to see. The table shows again the specs of the two forecast
interpretations you wanted to compare. The viewgraph shows the
relative skill of the models over lead-time.
Note: Smaller is better, i.e. if the bar is in the negative
range, interpretation 1 is significantly better than
interpretation 2.
The Specs Panel
Figure 3.5:
A screenshot of the Specs Panel.
 |
The specs panel again records the specs you have chosen and the plot corresponds
to. It is an automatically generated tabulary legend of the plot. The entries
can be interpreted using the following small translation table:
| Location |
IATA-Code (see Table 3.1) |
| Variable |
t2m = Temperature 2m over Ground (only option available at the moment) |
| product category |
evaluation (only option available at the moment) |
| Interpretation 1 |
First Forecast Interpretation Method (see Table ) |
| Source 1 |
First Forecast Source (see Table 3.2) |
| Interpretation 2 |
Second Forecast Interpretation Method (see Table ) |
| Source 2 |
Second Forecast Source (see Table 3.2) |
Skill vs Lead Time Plots
Figure 3.6:
A screenshot of the skill vs the lead time. The bootstrap bars are the 5% and 95% quantiles. By definition of the skill, smaller is better. Consequently, the first forecast is significantly better than the second if the bootstrap bars are fully in the negative range.
 |
Evaluating probabilistic forecasts requires to take the probabilistic character of the forecasts into account. Error measures for point forecasts (like the root mean square error) could be applied to the forecast mean. This procedure though would discard a lot of potentially useful information from the forecast distribution. Therefore, other measures have to be employed to value forecast distributions. Various measures, commonly termed skill scores, have been proposed. Although these skill scores should share some common properties ensuring consistency of results (see [#!broecker04!#]), there is no skill score that by any means could be referred to as the most general skill score. Different skill scores value different aspects of the forecast, similar to the absolute mean error and the root mean square error valuing outliers differently.
Skill scores are defined by the skill function
, where
is the forecast distribution and
is the verification. Usually forecasts
and verifications
are available from a series of days
. The skill of a forecast is defined as
Since
is in fact a function, it can enter
in functional form (e.g. being integrated over).
DIME allows the forecast performance to be evaluated in terms of the folowing skill scores (currently only Ignorance is available)
| Ignorance |
$ S = $ |
 |
| Brier Score |
$ S = $ |  |
The plots (e.g. 3.6) show skill vs lead time. By definition of the skill, smaller is better. Consequently, the first forecast is significantly better than the second if the bootstrap bars are fully in the negative range.
The Bootstrap Bars
The eventual goal is to decide whether forecast 1 is better or worse than forecast 2. By definition of the skill, smaller is better. Consequently, the difference of the skill of forecast 1 and forecast 2 should be significantly smaller than zero. The skill score hokwever is a random quantity. To get a handle on the possible variations of this quantity, we bootstrap-resample the results. That is, if we look at two forecasts
and
at a number of days
, we not only calculate
but also
where
is generated by drawing with replacement
numbers from the set
.
Figure 3.6 shows errorbars okf the
skill
over the leadtime. The median of all
's along
with the 5% and 95% percentiles are shown.
How to Get Help
On this and the following pages you will find a couple of help buttons. Clicking these buttons will open this help document and take you to the respective section. E.g. clicking the help button next to the specs panel
you will be taken to the section explaining the meaning of the specs panel, which is part of this manual (Section 3.4.1).
How to Get More Information
Click on the More Information button
,
and you will be taken to a section of this help document where the purpose of the page you are currently looking at is explained in more detail.
beginthebibliography
endthebibliography
The DIME website help
This document was generated using the
LaTeX2HTML translator Version 2002-2-1 (1.70)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -dir ../html -split 0 -html_version 4.0 main.tex
The translation was initiated by jochen on 2005-08-10
jochen
2005-08-10