The following should merely be regarded as an addendum to Chapter 8 of Bevington and Robinson's Data Reduction and Error Analysis. This is therefore not a self-contained manual on "how to do error analysis in the Graduate and Senior Lab and See you later". Referral to Bevington and Robinson for concepts, definitions, theory and the practice of Error-analysis will be necessary. Furthermore what is given here is a mere minimum of how to fit an arbitrary function to a given set data. There are several other methods, which are equally as valid. One can make one's error-analysis really as elaborate as one wants in any way which is scientifically, mathematically and logically sound and accepted. Some of these are also discussed in Bevington and Robinson. Be very careful though of prepackaged error-analysis programs, -routines or magic boxes, if you do not know how they work and what exactly they return as results. It is therefore not advisable to use the Trendline command in the Insert menu of Excel. As said, what follows below is a minimum of requirements and just one particular way of doing it. We will proceed by example.
Consider for instance the Geiger counter data from an irradiated silver piece used as example in chapter 8 (see Table 8.1). The theoretical shape of the data is given by the function
y(t) = a1 + a2 exp(-t/ a4
) + a3 exp(-t/ a5 )
For simplicity's sake we will assume the distribution to be Poisson,
hence the error in each point is just the square root (Only so
when p<<1 ; Chapter 2 Bevington and Robinson; be
careful when doing real data analysis, whether such assumptions
are valid; test!). Also, we will ignore the error in the time,
since we will assume it to be far smaller than the error in the
counts. If this is not the case, see Chapter 6, page 100 in Bevington
and Robinson. This will not however detract from or change the
method to fit functions described below.
A first necessity is obviously to enter the data into Excel. In
most cases you will have to do this manually, though for some
experiments it has been automated; in that case the TA will explain
it to you. It is useful to leave some rows free at the top; you
will see why. HINT: From experience always immediately label your
data columns rows appropriately and with the right units; this
will make life a lot easier.
You should now have three columns of data: one for the x-coordinate
(time), one for the y coordinate (counts) and one for the error
in the y coordinate (Dcts).
In the rows left free at the top just type as text the function
you will be fitting, this is convenient as a reminder. Label a
set of cells with the appropriate parameter names leaving a set
of cells adjacent to the names free. This is where the actual
values of the parameters will appear. Next (I usually skip a column
to keep clearly separated the fir from the actual data) we create
a column for the fitted function as follows. Select the first
cell (below the labeling cell) in the column for the fitted function
and type there an equal sign followed by the expression of the
function, using the proper cell address (A1,B23,AC34, etc.) for
value of the function variable and for value of the to be fitted
parameters, in which case each element of the cell address should
be preceded by a "dollarsign" ($A$1, $B$23, $AC$34,
etc). For mathematical expressions such as sine, cosine, exponential
etc. use the proper Excel command (it can be found under Function...
in the Insert Menu or in the help menu). After entering
the expression, pressing return will return the calculated value.
If the cell address now is not preceded by a "dollarsign"
Excel regards the address-referral as relative to the original
cell (three left, one up), if not as absolute (The cell:
A1). Thus copying this first cell with the function-expression
down will leave the parameter-value unchanged while picking out
exactly the proper x-value. This procedure is standard for any
Excel routine and will be used frequently.
The next column we set up is the column of normalized residuals; this is just the deviation divided by the error for each data point viz. (y -fit(y))/error(y). How to set such a column up is explained in the previous paragraph. The definition of c2 is the sum of the squares of the normalized residuals, (Bevington and Robinson page 67). To find this value, use the SUMSQ command in Excel. Pick an unused cell in the free area left on top and type an equal sign followed by SUMSQ(argument) where argument is a set of cell addresses. Cell addresses seperated by a comma are distinct, while those seperated by a colon are continuous viz. A1,A10 are the cells A1 and A10, while A1:A10 are the cells A1, A2, A3 through A10. Do not forget to label your cell. The normalized c2 is then the c2 divided by the number of degrees of freedom i.e. number of data points minus number of fitting parameters (Bevington and Robinson page 68). The closer the normalized c2 will be to 1 the better the fit. Why? See Bevington and Robinson which will also tell you why a normalized c2 less than 1 is somewhat of a disaster (amongst others, page 107). We are now completely set up to start performing the fit with Excel.
To find the best fitting funtion to the data we want to minimize
c2. We do this with the
resident Solver routine in Excel, which can be found in
the Tools menu. WARNING: OUTSIDE OF THE LAB NOT ALL INSTALLATIONS
OF EXCEL HAVE THIS ROUTINE. A Solver Parameters submenu
will appear asking for the proper parameters. Our Target Cell
will be the c2 cell, which
we want to minimize (min). This we would like to accomplish by
changing the values of the parameters a1....a5
hence we enter their respective addresses (in our case $A$6:$E$6,
note the colon and the dollarsigns. the latter are required in
the Solver submenu). The Solver Options are set
to perform a hybrid Grid-Gradient search for a minimum in the
c2 hypersurface in parameter
space (page 150ff Bevington and Ronbinson). There is no
need to change the default settings, they are mainly concerned
with optimizing the calculation with respect to CPU and memory
use. For those interested look under Excel help for the Solver
Options Dialog Box. It gives a short description of each option.
Before starting the Solver, however, read carefully pages
147 and 148 on Starting values and local minima. The initial
values for the parameters a1....a5
are rather important otherwise the Solver search routine
could get stuck in a local minimum of the c2
surface. We are however looking for the absolute one. Several
tries with the Solver routine while manually adjusting
the parameters if necessary ought to give you the absolute minimum
of the c2 surface.
Besides using the c2 value as a gauge for correctness of the fit it is also very useful to look at the distribution of the normalized residuals. Ideally it should be completely random. If it has any distinguishing patterns, it might indicate that the function your are fitting does not conform to shape of the data. Another kind of function or a combination of two might then be called for as the one to be fitted.
Once we have obtained a best fit to our data we must find some
way to extract the errors of the parameters themselves. We do
this by performing a c2
variation (page 146/7 Bevington and Robinson) for each parameter
that we want to know the uncertainty of. Say one wants to know
the uncertainty in a4. Then one does the following.
One performs several Solver routines, each time for the
values of a1 , a2 , a3
and a5 ($A$6;$B$6;$C$6;$E$6) while keeping
a4 constant at some manually chosen value, differing
for each time one performs the routine. We thus obtain a set of
values of c2 each corresponding
to a value of a4. A plot hereof should be a
parabola with the minimum at the true c2.
The error in a4 is then given by the difference
between the true value of a4 and that where
c2 is exactly 1 more than
the minimum. The exact value can be found by fitting a parabola
to a c2 vs. a4
plot (Figure 8.3 page 146 Bevington and Robinson).
Lastly, as data tables are intrinsically rather incomprehensible,
we would like to represent our data graphically. This can be done
with the Chart function in the Insert menu or the
short-cut Chartwizard button. A Chartwizard submenu
will ask you for the range i.e. the set of cells you want to plot.
Easiest is to give one set of x and y values, in this case Range:
=$A$11:$A$69,$B$11:$B$59. Note the use of the colons and commas;
the equality and dollarsigns are necessary. The rest of the steps
of the submenu speak for themselves. Recommended is the X-Y scattering
graph, though, it is the most flexible. The graphical tricks that
can be performed are innumerable. Clicking on what one wants changed
usually gives the submenu to make it possible. Only two useful
commands will be mentioned here.
One is to insert error-bars into the graph. This is done by selecting
(clicking on) the data series to which one wants to introduce
error-bars. The formula-bar i.e. the bar just above the line of
column labels A,B,C,D,...will tell you whether you have done so
successfully. Then Error-bars in the Insert menu
will activate the Error-bar submenu. Naturally we will choose
Custom Error-bars. The input here is the same format as
in Range i.e. an equality sign at the beginning and dollarsigns
in front of the cell addresses are necessary. Clicking on a error-bar
one can change its shape and format. It is especially advisable
to change the Weight (thickness) to the minimum, if not
the error-bars will be rather fat and clog the picture.
The other useful command is to have the fit and actual data represented
in one graph. This is done by the New Data command in the
Insert menu. One is than simply asked for the range (set
of cells) which give the y-values for the new data series. The
x-values are of course already known. The graph will then show
both the actual data as well as the fitted curve. Double clicking
on any Data series allows you to change its format (Lines, Dots,
sparsing, thickness color, Error-bars(also) etc.).
Ask for a printout of an Excel worksheet for a complete analysis
according to the above prescription of the Data from table 8.1
in Bevington and Robinson. Once again this is only one way to
do one's error analysis, which exhibits the minimal requirements
and once again the use of the Trendline command
in the Insert menu is not advisable unless you
know exactly what it does.
Written by Koenraad Schalm. Last updated 1/15/98 by Laszlo Mihaly.