• # The formula.

We consider a bin of a histogram with N entries of weigthed events with weigths w_i, i=1,N.
The quantity of interest is the sum of the weights, sum_w,

sum_w = sum {w_i} (i=1,N).

The error on sum_w is then given as

err(sum_w) = sqrt( sum {w_i^2} ).

• # Derivation of the formula.

Derivation of above formula is based on error propagation and intrinsic poissonian statistics only.
The variance var(sum_w) of sum_w (i.e. the "error on the weighted number of events" in that bin) is given by error propagation

(err(sum_w))^2 == var(sum_w) = sum {var(w_i)} (i=1,N)

, i.e. adding the squares of the errors on the weighted events.
The variance var(w_i) of weight w_i is determined only by the statistical fluctuation of the number of events considered,

var(w_i) = var(w_i * 1 event) = w_i^2 * var(1 event) = w_i^2,

with poissonian fluctuation of the number of events ("1 event"), and taking w_i to be a constant for event i.

If this sounds difficult at first glance, just make the exercise and construct error propagation where you have 100 events split into two groups, with 90 events w_i==1.0 and 10 events with w_i==0.1 .
We have
sum_w = 90*1 + 10*0.1 = 91 events,
the statistical fluctuation is coming from sqrt(90) and sqrt(10), giving
var(w_i) = 1^2 * 90 + 0.1^2 * 10 = 90.1 ,
i.e. the error on sum_w is sqrt(90.1) = 9.49 .
Your relative error is 9.49/91 = 0.105.

# Number of Equivalent Events.

The number of equivalent events is defined as
N_equ = ( sum_{w_i} )^2 / sum {w_i^2} .

This number relates the sample of N weighted events to N_equ events with w==1 that would have the same relative statistical fluctuation.

For the example above: The number of equivalent events there is
N_equ = (sum_w)^2 / var(w_i) = 91.9 events.
This means your statistics fluctuation is about as good (or bad) as for 92 events with event-weight==1.

For the MC-files for the atm-nu's in the Nature paper:
For 7000 events we get N_equ=2200,
while the number of events in the data sample is 188.
So, the "equivalent" statistics of MC is only about 12 times the data !
In certain regions of variable space, or for different distribution functions of the weights (which is the relevant quantity here !! ) you can be much better or worse !
Note also, that the other MC-sample used (Eva's events) contain 2100 events (weight=1.) and have the same statistical significance.

• # Booking, Plotting weighted errors in PAW.

Only a single command in Hbook or PAW (also from shell!!) is needed to get weighted error handling 'correctly' (sorry to all who knew this...) -
Invoke statistics BEFORE filling the histo:

call hbook1 (id,............)
call hidopt (id,'stati')
.....
call hfill (id,......)

If you also want error bars on the plot, add "call hbarx(id)" after hidopt.

The number of equivalent events you get with "x=hstati(id,...)" or from PAW shell by
\$HINFO(id,'events').

• # An erroneous way of statistics reasoning.

See here for Gary's wrong way to derive the correct formula.
Recommended exercise for all who believed he was right, since the mistake in his ansatz is dangerous for similar cases.

You see that the formula given in ansatz must be wrong immediately from the weigth==1. limit coming out absolutely wrong:
Error=0 instead of sqrt(N) !

The problems:
(1) His "Ansatz" neglects any statistics fluctuation of the data sample.
(2) He makes a numerically wrong assumption about the second term being small.

Neglecting the second term introduces up to 100% error in the formula !
For our cos(zenit)-distribution it is about 30%.
These two mistakes by chance compensate to the correct final
formula (poissonian and error propagation quadratic).

# Original discussion of Errors and binning of the cos(zenit)-Plot for Nature.

Find here the first discussion from August, 2nd,2000.

Find here also the cos(zenit) plots, for bin=0.025, for bin=0.050, and for bin=0.10, recently updated for the binomial error bars for the data.

For consideration of Poissonian statistics only, here are the cos(zenit) plots, for bin=0.025, for bin=0.050, and for bin=0.10. (The latest suggestion was using Poissonian error bars for the Nature paper.)

# Consequences:

- Fig.3 in Nature draft - already changed.

- Fig.2 in Nature draft - the error of the ratios and the fit results will (slightly?) change; for sure the error of the fit results.

- Any statistics check (Chi2, Runtest, Kolmogorov, ...) of the MC-data compatibility, as e.g. for the cos(zenit)-distribution will quantitatively change.

# Errors in unweighted histograms:

How to correctly assign errors to unweighted events ?
The true statistics the distribution of N_k entries in a histogram bin with N entries in the histogram in total is following a binomial statistics.

The error of N_k comes out to
err(N_k) = sqrt (N_k * (1 - N_k/N) ).

If neglecting this, you get an upscaling of the errors up to 13% in the cos(zenith) plots.
Also, this effect can get big e.g. for exponential distribution with a large fraction of entries in a small number of bins.

The usage of binomial statistics means that you consider the number of trials fixed to the number of entries in the given histogram. This is the case e.g. if you compare the data to another data set, i.e. making an (implicitely normalized) density or shape distribution.
Since we are doing this, the application of the binomial statistics seems to be adequat in our case .