The DIME website help

Jochen Broecker, CATS


Contents

Disclaimer
This web page is under construction. Graphs and data published on this page may still be unreliable, errorneous or mislabeled. Under no circumstances will the CATS group or any member be liable in any way for any content of this web site, including, but not limited to, any errors or omissions in any content, or for any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via the DIME web site.


Background Information on Ensemble Forecasts

How far into the future can the weather be predicted?

The atmosphere is a chaotic system. This means that its future behaviour is sensitive to small changes in its present state. This property is popularly known as the "butterfly effect" since it implies that the flapping of a butterfly's wings could change the behaviour of the atmosphere. In practice, the inability to precisely determine the current state of the atmosphere, due to the limited number of meteorological observations and the finite resolution of satellite sensors, sets a limit on how far into the future the weather can be predicted - even if forecasting models were perfect (which they are not). So-called "twin model" experiments, in which numerical models of the atmosphere are started using different initial conditions to reflect uncertainty in the initial state, indicate that chaos typically prohibits the deterministic prediction of atmospheric behaviour at times much beyond a few days.

This is an average, the predictability of the atmosphere changes over time. Ensemble forecasting is an attempt to quantify the predictability of the atmosphere. Seasonal forecasting attempts to forecast interannual variations in climate at lead times of 3 months and beyond. This does not contradict the results of twin model experiments because seasonal forecasts are forecasting climate rather than weather. Climate forecasts are probabilistic. An example of a seasonal forecast might be that Southern California can expect higher than average rainfall next winter - the forecast cannot specify on which days the rain will fall. Seasonal forecasts demonstrate skill at lead times of 6 months and beyond in some parts of the world. They are particularly skillful in the areas surrounding the tropical Pacific due to the El Niño phenomenon.

What is an ensemble forecast?

The atmosphere is a chaotic system. This means that small differences in its present state can lead to large differences in its future state. This behaviour is popularly known as "the butterfly effect" since it implies that the flapping of a butterfly's wings can change the course of a hurricane. In practice, limitations in how accurately we can measure the current state of the atmosphere are more important than disturbances caused by butterflies.

Weather forecasts are made by feeding an estimate of the current state of the atmosphere into a computer simulation and then simulating how the atmosphere will behave over the next couple of weeks. Sometimes a small error in the estimate of the current state will make a big difference to how the forecast turns out. Other times, because the atmosphere is in a more stable state, small errors are less important.

To determine whether the atmosphere is in a predictable, or an unpredictable, state ensemble forecasts are made. An ensemble forecast consists of a number of simulations made by making small changes to the estimate of the current state used to initialize the simulation. These small changes are designed to reflect the uncertainty in the estimate.

The ensemble forecasts on this website are based on the ensemble forecasts produced by the U.S. National Center for Environmental Prediction (NCEP). They have been enhanced, however, by including other sources of errors which are not accounted for in the NCEP ensembles. The result is an ensemble of 100 future outcomes, each of which can be regarded as equally likely. On some days these 100 outcomes will all be very similar, on other days they will be quite different.

Ensemble forecasts contain more information than a single forecast. But how can this information be displayed so that end-users can quickly and easily know what they need to know? The graphical user interface (GUI) on this site provides several options for displaying the forecasts.

Fan Charts show cumulative probability intervals. There is a 5% chance that the actual weather will fall in each of the shaded bands.

Spaghetti Plots show each of the ensemble members over the next few days. From this type of plot it can be quite easy to see roughly what fraction of the ensemble members fall above or below a certain threshold on a certain day.

Probability Plots are bar charts showing the fraction of ensemble members in a range of intervals for the selected date in the future.

Summary provides a text summary of the ensemble for the selected date in the future. The summary gives information such as the ensemble mean, the standard deviation, the median and the 10th and 90th percentiles.

One of our aims is to improve the GUI to make straightforward to use, and make sure it provides information in a form that end-users can understand. Feedback concerning the GUI, suggesting possible refinements or new features, will be very welcome.

How can ensembles be used?

The advantage of ensemble forecasts is that they give an estimate of the probability of a particular type of weather occuring. For example, the probability of it raining, or the probability of it freezing. What do these probabilities mean? Does saying that there is a 50% chance of rain mean that you don't know anything? Well, no, it depends on how often it rains on average. In Britain it does actually rain on about 50% of days, so saying that there is a 50% chance of rain is really little better than a guess. However, if it only rained on 5% of days, on average, (as it does in arid places), saying that there is a 50% chance of rain tomorrow is an informative forecast. In fact, even saying that there is a 20% chance of rain is telling you something that you wouldn't have guessed.

How can you make a decision based on an ensemble forecast? With a traditional single "deterministic" forecast making decisions is easy. If the forecast for tomorrow says it will be fine and sunny we go for a picnic, if the forecast says it will rain we don't go for a picnic. But what if it doesn't rain in 80% of the ensemble members and it rains in the other 20% - do we go for a picnic or not?

The best way to think about this type of decision is in terms of utility. Utility is a term economists like to use, it's just a measure for quantifying how much we like different outcomes. For example, let's say that the utility of staying at home and watching the telly is 0 (the absolute numbers don't really matter, only the relative amounts). Also, let's say the utility of going on a picnic when the weather is sunny is +10, but going on a picnic and being rained on has a utility of -30. Note all these values will depend on the individual, if you like walking in the rain you might give being rained on a higher utility.

We can summarize all of these utilities in a utility matrix:

STAY AT HOME 0 0
GO ON PICNIC +10 -30
If we know that the probability of rain is P, we can work out our expected utility for each action (staying at home or going on a picnic). The expected utility if we stay at home is

   Expected Utility (home)$\displaystyle = 0 x (1-P) + 0 x P = 0
$

The expected utility in this case is zero since whether it rains or not the utility of staying at home is zero.

The expected utility of going on a picnic is

   Expected Utility (picnic)$\displaystyle = +10 x (1-P) - 30 x P = 10 - 40P
$

The expected utility is calculated by taking a weighted average of the utility in the case of no rain and the case of rain, the weights are the corresponding probabilities.

So should we go on the picnic? Well the expected utility of going on the picnic exceeds the expected utility of staying at home when

$\displaystyle 10-40P > 0
$

This will occur when

$\displaystyle P < 1/4
$

So we should go on the picnic only if the probability of rain is less than 25%. We demand such a low probability because, in this case, the discomfort of being rained on greatly outways the pleasure of going on a dry picnic. If we make our decision this way it is still possible that we will go on the picnic and be rained on, or that we will stay at home and miss out on a sunny day. But, averaged over a year say, we will maximize our happiness if we make decisions that maximize our expected utility for each individual choice.

Placing quantitative utilities on experiences such as being rained on can be very tricky, and is often not worth the effort for everyday low-risk decisions. In many business decisions, however, the utility can be equated with money and this is a natural quantification of the desirability of outcomes (more profit is better than less profit, and small losses are better than large losses).

How can ensembles be used in business?

In business the principle of maximization of expected profits can be used when making decisions under uncertainty.

Suppose a pavement cafe has room outside for an extra 20 people when the weather is fine. This extra capacity, however, can only be exploited if an extra waiter is available to wait on the extra customers. The owner of the cafe must decide two days in advance whether to ask one of her waiters to work an extra afternoon shift. The owner could look at a traditional, single forecast and if the forecast says it won't rain ask the waiter to work. This is not the best thing to though. If the owner has access to an ensemble or probability forecast they can do better. If the probability of rain, according to the ensemble forecast, is P, should the owner ask for an extra waiter? To make this decision the owner must be able to estimate the potential losses and gains that are at stake. Suppose it costs £40 to have a waiter work the extra shift. If it doesn't rain the extra capacity will result in an extra income of £200, less the £40 required to pay the waiter. If it does rain the owner must still pay the waiter the £40 but gets no extra income to offset this loss. These gains and losses can be summarized in a "decision matrix":-

  NO RAIN RAIN
NO EXTRA WAITER £0 £0
EXTRA WAITER +£200-£40=+£160 -£40
The expected profit if the owner doesn't ask for the extra waiter is £0 - this is certain. If the owner asks for an extra waiter the expected profit is calculated by averaging over the two possibles outcomes, rain and no rain, weighting with the corresponding probabilities of $ P $ and $ (1-P)$ respectively.

   Expected Profit$\displaystyle = P x -40 + (1-P) x + 160 = 160 - 200P
$

If the expected profit obtained by asking the waiter to work exceeds the expected profit of not asking the waiter to work (which is £0) then the owner should ask the waiter to work. That is, if

$\displaystyle 160 - 200P > 0
$

This condition is satisfied if P<0.8. Thus, if the probability of rain is less than 80% the owner should ask a waiter to work an extra shift. It is important to note that often, when the owner has asked the waiter to work, it will in fact rain and the owner will lose £40. But, these occasions will be more than made up for by the times when it doesn't rain and the owner gets the extra income from the 20 seats outside. To appreciate this, look at Figure 1.1. The blue line shows the cumulative extra income of a cafe owner who only uses the traditional style, single forecast. If the forecast is for no rain the owner asks for an extra waiter. The red line shows the cumulative extra income who makes their decisions with an ensemble forecast, as described above.
Figure 1.1: A comparison of the cumulative income of a fictitious cafe owner using a traditional forecast (blue line) and one using an ensemble forecast (red line). All forecasts and verifications were obtained from the U.S. National Centers for Environmental Prediction.
\begin{figure}\begin{center}
\epsfig{file = plots/cafe_incomes.eps, width = 0.95 \textwidth}\end{center}\end{figure}

How are ensembles evaluated?

With a traditional single forecast measuring how good it was is straightforward; how far was it from what actually happened? With an ensemble things are a little trickier. Suppose you had an ensemble of rainfall forecasts and it rains in 90% of the ensemble members, implying there is a 90% chance of rain. Then suppose it doesn't rain - was the forecast wrong? It is not really possible to evaluate any single ensemble forecast. Instead, you have to consider their performance over a period of time. Two properties are desirable in an ensemble forecast; reliability and sharpness.

Reliability is a measure of how accurate the probabilities are. Ideally, if we consider all the days for which the forecast said there was a 75% chance of rain then it should have rained on 75% of those days, if we consider all the days for which the forecast said there was a 10% chance of rain, it should have rained on 10% of those days, and so on. If the actual weather lies within the range of the ensemble when it's plotted on a fan chart or spaghetti plot then the forecast is reliable. However, there's more to good forecasting than reliability, we must also consider sharpness.

Sharpness is a measure of how precise the forecasts are. If it rains on 50% of days (about right for parts of the U.K.) I can issue a forecast saying there is a 50% chance of rain every single day, and according to the definition given above these forecasts will be very reliable. While reliable however, these forecasts will not be very informative, this is because they lack sharpness. The more forecasts that are issued with high (close to 100%) or low (close to 0%) probabilities then the more sharpness we say the forecast has. Another way of thinking about sharpness is that it describes how narrow the ensemble forecast is when it's plotted as a fan chart or spaghetti plot.

The overall quality of an ensemble forecast depends on both its reliability and its sharpness. For a given level of reliability more sharpness means better forecasts, and vice versa. Another way of saying this is that when we see the ensemble plotted on a fan chart or spaghetti plot we want to be confident that the actual weather will lie within the spread of the ensemble (reliability), but also, we'd like that spread to be as narrow as possible (sharpness).

The DIME Operational Ensemble Forecast Website

Figure 2.1: A screenshot of the DIME Operational Ensemble Forecast start page.
\begin{figure}\begin{center}
\epsfig{file = plots/screenshot_operational.eps, width = 0.95\textwidth}\end{center}\end{figure}

General Information About This Website

Click on one of the regions to view probabilistic forecasts of weather variables for that region using a Java Applet. If your browser does not support Java Applets download Java for free from Sun Microsystems or select the GUI screenshots to see some examples of the graphical user interface in action.

The aim of this site is to introduce users to the concept of probabilistic and ensemble forecasting.

For an explanation of ensembles, and how they can be used, see Chapter 1.

The UK Forecasts

The Forecast Viewer

Variables

Viewing Options

Forecast Data Sources

Best Member Dressing


The DIME Forecast Interpretation Methods Website

Figure 3.1: A screenshot of the DIME Forecast Interpretation Methods start page.
\begin{figure}\begin{center}
\epsfig{file = plots/screenshot_evaluation.eps, width = 0.95\textwidth}\end{center}\end{figure}


General Information About This Website

The aim of this site is to introduce users to the DIME research concerning the comparison of probabilistic and ensemble forecasts. Considering the pure ensemble forecasts as a raw product, DIME strives to disseminate long time performance evaluation according to the various demands of different end users.

A problem users are faced with is the lack of information on how the multitude of available products compare. Not only has he to find the most appropriate forecast source, but alo decide among the various post-processing and dressing techniques existing. The potential benefits of these factors heavily depend on particular end users applications. DIME adresses this by comparing and evaluating various different dressing techniques applied to various ensemble forecast sources. The results of these studies are disseminated via this web site. Interactive pages allows the user to choose source, location, variable and postprocessing technique to be evaluated. Forecasts of different variables of interest for various locations are calculated in the form of probability density functions (for example, by ``kernel dressing''). These forecasts are not operational in real time but are compared in terms of their long term performance, using various measures of performance. The result is a plot of the skill over lead time.


The Locations and Variables Selection Form

On this page you can select various results from DIME's forecast evaluation research. On this and the following page you can specify two of the many forecast products that have been generated and evaluated by DIME. EVentually a plot of their relative performance is shown. To obtain an absolute performance you will have to compare the forecast under concern to climatology as one the reference forecast.

To choose among the forecast products on this web site, you will have to specify your location and variable of choice. For a closer explanation of these parameters, see the subsequent Sections 3.2.1 and 3.2.2. On a subsequent form, you are asked to specify the two forecasts you actually want to compare. This form is explained in more detail in Section 3.3.


Locations

The parameter locations specifies, as the name suggests, the location on the globe the forecast is made for. The locations for which DIME has made forecast evaluations are given in Table 3.1.

Table 3.1: The locations for which DIME has made forecast evaluations
Frankfurt International Airport
IATA-Code : FRA
Position : 008:34:00E050:02:00N
Runway Elevation : 364 ft
GMT Offset : -1
Verifications : Temperature 10m over ground at 12 UTC
Source : faked by taking NCEP high resolution forecast initialized at 0 UTC with lead time 12h plus a little observation noise.
London Heathrow Airport
IATA-Code : LHR
Position : 000:27:00W051:28:00N
Runway Elevation : 80 ft
GMT Offset : 0
Verifications : Temperature 10m over ground at 12 UTC
Source : WMO Station Nr 03772
Tokyo Narita International Airport
IATA-Code : NRT
Position : ?
Runway Elevation : ?
GMT Offset : ?
Verifications : Temperature 10m over ground at 12 UTC
Source : faked by taking NCEP high resolution forecast initialized at 0 UTC with lead time 12h plus a little observation noise.
Chicago O'Hare Airport
IATA-Code : ORD
Position : 087:54:00W041:59:00N
Runway Elevation : 667 ft
GMT Offset : +6
Verifications : Temperature 10m over ground at 12 UTC
Source : faked by taking NCEP high resolution forecast initialized at 0 UTC with lead time 12h plus a little observation noise.
Sydney Airport
IATA-Code : SYD
Position : 151:10:00E033:56:00S
Runway Elevation : 21 ft
GMT Offset : -10
Verifications : Temperature 10m over ground at 12 UTC
Source : faked by taking NCEP high resolution forecast initialized at 0 UTC with lead time 12h plus a little observation noise.



Variables

The parameter variable specifies, as the name suggests, the variable the forecast is made for. As you can tell from Section 3.2.1, the following variables are currently employed:
Temperature 2m over ground for 12h UTC
for all the locations on choice. Recall that for some locations (as mentioned in Section 3.2.1) the actual variable was not available. Instead, the observation was ``faked''. Here is a closer description.
WMO Station
Actual weather stations maintained by the World Meteorological Organisation. They have a unique identification number. See the list of locations (Section 3.2.1) for the WMO stations employed.
Faked NCEP
Emulated observations for 12h UTC by using the NCEP high resolution forecast (see Section 3.3.1) initialized at 0 UTC with lead time 12 UTC with some ( 10%) observation noise added.


The Forecast Interpretation Selection Form

Figure 3.2: A screenshot of the DIME Forecast Interpretation Methods selection page.
\begin{figure}\begin{center}
\epsfig{file = plots/screenshot_fcintselect.eps, width = 0.95\textwidth}\end{center}\end{figure}
This page (see Figure 3.2) allows you to select the two forecast interpretation methods you actually want to compare. The left resp. right columns specify the first resp. second forecast interpretation method. To fully specify the forecast, choose the source for the raw forecasts (see Section 3.3.1) in the pulldown menu ``Forecast Source'' and the actual interpretation methods (see Section 3.3.2) in the pulldown menu ``Forecast Interpretation''.


Forecast Data Sources

The premier forecast data sources for DIME are the National Center for Environmental Prediction (NCEP) in the USA and the European Centre for Medium Range Weather Forecast (ECMWF) in Reading, UK. Table 3.2 gives a closer description.

Table 3.2: The forecast data sources for DIME
NCEP High Resolution
Initialisation Time : 0h UTC
Ensemble Members : 1 Unperturbed
Grid : 2.5 $ ^\mathtt{o}$ $ \times$ 2.5 $ ^\mathtt{o}$ Lat-Lon Grid
NCEP Ensemble
Initialisation Time : 0h UTC
Ensemble Members : 10 Perturbed + 1 Unperturbed
Grid : 2.5 $ ^\mathtt{o}$ $ \times$ 2.5 $ ^\mathtt{o}$ Lat-Lon Grid
ECMWF High Resolution
Initialisation Time : 12h UTC
Ensemble Members : 1 Unperturbed
Grid : 2.5 $ ^\mathtt{o}$ $ \times$ 2.5 $ ^\mathtt{o}$ Lat-Lon Grid
ECMWF Ensemble
Initialisation Time : 12h UTC
Ensemble Members : 50 Perturbed + 1 Unperturbed
Grid : 2.5 $ ^\mathtt{o}$ $ \times$ 2.5 $ ^\mathtt{o}$ Lat-Lon Grid



Note: Different Initialisation Times

Note that ECMWF has an initialisation time 12h later than NCEP. As explained in Sections 3.2.2 and 3.2.1, all verification data has verification time 12h UTC. Therefore, forecast from ECMWF and NCEP verifying at the same time differ in their ``age'' by 12 hours. We adopted the convention that for a forecast with lead time $ n$ days, we use the most recent forecast available being at least $ n$ days old. E.g. a forecast verifying at 5.Jan. 2005, 12h UTC with lead time 4 days blending NCEP and ECMWF ensembles would use the NCEP ensemble issued on 1.Jan. 2005, 0h UTC with lead time 4d, 12h and the ECMWF ensemble issued on 1.Jan. 2005, 12h UTC with lead time 4d. Thus all forecasts have at least 4 days lead time, but as little exess lead time as possible. We think this agrees with what one would do in reality according to common sense.


Forecast Interpretation Methods

Figure 3.3: The form to select the forecast sources and the actual forecast interpretation (``dressing'') method. The forecast interpretation method menu is pulled down.
\begin{figure}\begin{center}
\epsfig{file = plots/screenshot_classes.eps , width = 0.9\textwidth}
\end{center}
\end{figure}
The pulldown menu ``Forecast Interpretation'' (see Figure 3.2 and 3.3) allows to select the method that shall be used to interprete the raw forecast. A brief explanation of the available methods is given in Table [*]. For a thorough explanation of the forecast interpretation methods, the training algorithms and other features the reader is referred to the emtool-manual [#!emtoolmanual!#].


List of Forecast Interpretation Methods

The following Table [*] is a list of the forecast interpretation methods currently available. Some of them treat the ensemble output as scenarios (MOS-approach), some of them are more general (EMOS-approach). The following two sections explain briefly the techniques of dressing and boosting that are essential for the presented interpretations.
Climatology
Ignores the ensemble alltogether and fits a kernel estimator (see [*]) to the observed data
Gauss
This interpretation assigns a gaussian to each ensemble. The mean $ \mu$ and the standart deviation $ \sigma$ of the gaussian are set to
$\displaystyle \mu$ $\displaystyle =$ $\displaystyle a + m$  
$\displaystyle \sigma$ $\displaystyle =$ $\displaystyle b \cdot s$  

where $ m$ resp $ s$ are the mean resp the standart deviation of the input ensemble (where the standart deviation of a singleton ensemble is equal to 1, for example if the input ``ensemble'' is in fact the high resolution forecast). The parameters $ a$ and $ b$ are trained by minimizing ignorance over a training set. A first version of this ensemble interpretation method did not have the parameter $ a$, but its performance was very poor. If more than one ensemble is involved (e.g. if the NCEP and the ECMWF ensemble are combined), the interpretations for each ensemble are blend together in a weighted sum. The weights are calculated in a second training stage.
Jewson
Essentially like Gauss, but the mean $ \mu$ and the standart deviation $ \sigma$ of the gaussian are
$\displaystyle \mu$ $\displaystyle =$ $\displaystyle a_1 + a_2 m$  
$\displaystyle \sigma$ $\displaystyle =$ $\displaystyle b_1 + b_2 s$  

where $ m$ resp $ s$ are the mean resp the standart deviation of the ensemble (the standart deviation of a singleton ensemble is equal to 1). The parameters $ a_1, a_2, b_1, b_2 $ are trained by minimizing ignorance over a training set. This method was suggested by Steve Jewson in [#!jewson03!#]. If more than one ensemble is involved (e.g. if the NCEP and the ECMWF ensemble are combined), the interpretations for each ensemble are blend together in a weighted sum. The weights are calculated in a second training stage.
Kernel Dressing (Version 1)
Each ensemble is kernel dressed (see 3.3.2) where the mean $ \mu$ and the standart deviation $ \sigma$ of the kernel that dresses ensemble member $ x_i$ are
$\displaystyle \mu$ $\displaystyle =$ $\displaystyle a + x_i$  
$\displaystyle \sigma$ $\displaystyle =$ $\displaystyle b \cdot s,$  

where $ x_i$ resp $ s$ are the $ i$th ensemble member resp the standart deviation of the ensemble (the standart deviation of a singleton ensemble is equal to 1, for example if the input ``ensemble'' is in fact the high resolution forecast). The parameters $ a$ and $ b$ are trained by minimizing ignorance over a training set. If more than one ensemble is involved (e.g. if the NCEP and the ECMWF ensemble are combined), the interpretations for each ensemble are blend together in a weighted sum. The weights are calculated in a second training stage.
Kernel Dressing (Version 2)
Essentially like Version 1, but the mean $ \mu$ and the standart deviation $ \sigma$ of the kernel that dresses ensemble member $ x_i$ are
$\displaystyle \mu$ $\displaystyle =$ $\displaystyle a_1 + a_2 x_i$  
$\displaystyle \sigma$ $\displaystyle =$ $\displaystyle b_1 + b_2 s$  

where $ x_i$ resp $ s$ are the $ i$th ensemble member resp the standart deviation of the ensemble (the standart deviation of a singleton ensemble is equal to 1, for example if the input ``ensemble'' is in fact the high resolution forecast). The parameters $ a_1, a_2, b_1, b_2 $ are trained by minimizing ignorance over a training set. If more than one ensemble is involved (e.g. if the NCEP and the ECMWF ensemble are combined), the interpretations for each ensemble are blend together in a weighted sum. The weights are calculated in a second training stage.
Error Dressing
In this method, a set of ensemble member error distributions are calculated from an archive of ensembles and verifications. For an ensemble of 11 members, we can calculate the 11 distributions, namely the distribution of best member errors, a distribution of second best member errors etc down to a distribution of 11th best member errors. To each of this distributions, a climatology is fitted (see above). Each of these climatologies (or even combinations) could potentially serve as a kernel for dressing individual ensemble members. We adopted a ``greedy'' strategy where the kernel is chosen that yields the smallest ignorance. If more than one ensemble is involved (e.g. if the NCEP and the ECMWF ensemble are combined), the interpretations for each ensemble are blend together in a weighted sum. The weights are calculated in a second training stage.
Sufficient Statistics Emos
Essentially like Gauss, but the mean $ \mu$ and the standart deviation $ \sigma$ of the gaussian are determined by a nonlinear fitting procedure using radial basis functions (rbf's).
$\displaystyle \mu$ $\displaystyle =$ rbf$\displaystyle _1(\vec{x})$  
$\displaystyle \sigma$ $\displaystyle =$ rbf$\displaystyle _2(\vec{x})$  

Here $ \vec{x}$ is a symmetric input feature determined from the ensemble. See the subsection 3.3.2 for a closer explanation. If more than one ensemble is involved (e.g. if the NCEP and the ECMWF ensemble are combined), the interpretations for each ensemble are blend together in a weighted sum. The weights are calculated in a second training stage.
Sufficient Statistics Emos (Boosted Weights 3.3.2)
Essentially like normal Sufficient Statistics Emos, but instead of weighting over all constituend ensembles, a series of models is blend together using boosting, see [*].
Error Dressing (Boosted Weights 3.3.2)
Essentially like normal Error Dressing, but instead of weighting over all constituent ensembles, a series of models is blend together using boosting, see 3.3.2.
Parzen
Essentially like Kernel Dressing Vesion 1, but all parameters (including the weights over constituent ensembles) are trained sinultaneously on ignorance, essentially resulting in trainig Parzen estimators to be a single large optimisation problem.


Kernel Dressing


Boosting Weights


Symmetric Input Features


Presentation of Results

Figure 3.4: A screenshot of the DIME Forecast Interpretation Methods start page.
\begin{figure}\begin{center}
\epsfig{file = plots/screenshot_evalresults.eps, width = 0.95\textwidth}\end{center}\end{figure}
This page shows the results of the forecast evaluation you have choosen to see. The table shows again the specs of the two forecast interpretations you wanted to compare. The viewgraph shows the relative skill of the models over lead-time. Note: Smaller is better, i.e. if the bar is in the negative range, interpretation 1 is significantly better than interpretation 2.


The Specs Panel

Figure 3.5: A screenshot of the Specs Panel.
\begin{figure}\begin{center}
\epsfig{file = plots/screenshot_specspanel.eps, width = 0.5\textwidth}\end{center}\end{figure}
The specs panel again records the specs you have chosen and the plot corresponds to. It is an automatically generated tabulary legend of the plot. The entries can be interpreted using the following small translation table:
Location IATA-Code (see Table 3.1)
Variable t2m = Temperature 2m over Ground (only option available at the moment)
product category evaluation (only option available at the moment)
Interpretation 1 First Forecast Interpretation Method (see Table [*])
Source 1 First Forecast Source (see Table 3.2)
Interpretation 2 Second Forecast Interpretation Method (see Table [*])
Source 2 Second Forecast Source (see Table 3.2)


Skill vs Lead Time Plots

Figure 3.6: A screenshot of the skill vs the lead time. The bootstrap bars are the 5% and 95% quantiles. By definition of the skill, smaller is better. Consequently, the first forecast is significantly better than the second if the bootstrap bars are fully in the negative range.
\begin{figure}\begin{center}
\epsfig{file = plots/screenshot_skillvsleadtime.eps, width = 0.95\textwidth}\end{center}\end{figure}
Evaluating probabilistic forecasts requires to take the probabilistic character of the forecasts into account. Error measures for point forecasts (like the root mean square error) could be applied to the forecast mean. This procedure though would discard a lot of potentially useful information from the forecast distribution. Therefore, other measures have to be employed to value forecast distributions. Various measures, commonly termed skill scores, have been proposed. Although these skill scores should share some common properties ensuring consistency of results (see [#!broecker04!#]), there is no skill score that by any means could be referred to as the most general skill score. Different skill scores value different aspects of the forecast, similar to the absolute mean error and the root mean square error valuing outliers differently.

Skill scores are defined by the skill function $ S(p, v) \in \mathbb{R}$, where $ p$ is the forecast distribution and $ v$ is the verification. Usually forecasts $ p_i$ and verifications $ v_i$ are available from a series of days $ i = 1 \ldots N$. The skill of a forecast is defined as

$\displaystyle S := \frac{1}{N} \sum_i S(p_i, v_i).
$

Since $ p_i$ is in fact a function, it can enter $ S$ in functional form (e.g. being integrated over).

DIME allows the forecast performance to be evaluated in terms of the folowing skill scores (currently only Ignorance is available)

Ignorance $ S = $ $ -\log(p(v))$
Brier Score $ S = $$ \vert p - v\vert^2$
The plots (e.g. 3.6) show skill vs lead time. By definition of the skill, smaller is better. Consequently, the first forecast is significantly better than the second if the bootstrap bars are fully in the negative range.


The Bootstrap Bars

The eventual goal is to decide whether forecast 1 is better or worse than forecast 2. By definition of the skill, smaller is better. Consequently, the difference of the skill of forecast 1 and forecast 2 should be significantly smaller than zero. The skill score hokwever is a random quantity. To get a handle on the possible variations of this quantity, we bootstrap-resample the results. That is, if we look at two forecasts $ p_i$ and $ q_i$ at a number of days $ i = 1 \ldots N$, we not only calculate

$\displaystyle S_0 = \frac{1}{N} \sum_i S(p_i, v_i).
$

but also

$\displaystyle S_{\cal B} = \frac{1}{N} \sum_{i \in \cal B} S(p_i, v_i).
$

where $ {\cal B}$ is generated by drawing with replacement $ N$ numbers from the set $ 1 \ldots N$.

Figure 3.6 shows errorbars okf the skill $ S$ over the leadtime. The median of all $ S_{\cal B}$'s along with the 5% and 95% percentiles are shown.


How to Get Help

On this and the following pages you will find a couple of help buttons. Clicking these buttons will open this help document and take you to the respective section. E.g. clicking the help button next to the specs panel
\epsfig{file = plots/screenshot_specspanel.eps, width = 0.3\textwidth} \epsfig{file=plots/screenshot_helpbutton.eps, height = 2.5ex}
you will be taken to the section explaining the meaning of the specs panel, which is part of this manual (Section 3.4.1).


How to Get More Information

Click on the More Information button \epsfig{file=plots/screenshot_moreinfobutton.eps, height = 2.5ex}, and you will be taken to a section of this help document where the purpose of the page you are currently looking at is explained in more detail. beginthebibliography endthebibliography

About this document ...

The DIME website help

This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -dir ../html -split 0 -html_version 4.0 main.tex

The translation was initiated by jochen on 2005-08-10

jochen 2005-08-10