PHY 445 / PHY 515, Advanced Physics Laboratory

Least Square fits to an arbitrary function using Excel

(Hybrid grid-gradient searches of c2 surfaces for nonlinear fits)

Introduction

The following should merely be regarded as an addendum to Chapter 8 of Bevington and Robinson's Data Reduction and Error Analysis. This is therefore not a self-contained manual on "how to do error analysis in the Graduate and Senior Lab and See you later". Referral to Bevington and Robinson for concepts, definitions, theory and the practice of Error-analysis will be necessary. Furthermore what is given here is a mere minimum of how to fit an arbitrary function to a given set data. There are several other methods, which are equally as valid. One can make one's error-analysis really as elaborate as one wants in any way which is scientifically, mathematically and logically sound and accepted. Some of these are also discussed in Bevington and Robinson. Be very careful though of prepackaged error-analysis programs, -routines or magic boxes, if you do not know how they work and what exactly they return as results. It is therefore not advisable to use the Trendline command in the Insert menu of Excel. As said, what follows below is a minimum of requirements and just one particular way of doing it. We will proceed by example.

Set-up

Consider for instance the Geiger counter data from an irradiated silver piece used as example in chapter 8 (see Table 8.1). The theoretical shape of the data is given by the function

y(t) = a1 + a2 exp(-t/ a4 ) + a3 exp(-t/ a5 )

For simplicity's sake we will assume the distribution to be Poisson, hence the error in each point is just the square root (Only so when p<<1 ; Chapter 2 Bevington and Robinson; be careful when doing real data analysis, whether such assumptions are valid; test!). Also, we will ignore the error in the time, since we will assume it to be far smaller than the error in the counts. If this is not the case, see Chapter 6, page 100 in Bevington and Robinson. This will not however detract from or change the method to fit functions described below.

A first necessity is obviously to enter the data into Excel. In most cases you will have to do this manually, though for some experiments it has been automated; in that case the TA will explain it to you. It is useful to leave some rows free at the top; you will see why. HINT: From experience always immediately label your data columns rows appropriately and with the right units; this will make life a lot easier.

You should now have three columns of data: one for the x-coordinate (time), one for the y coordinate (counts) and one for the error in the y coordinate (Dcts). In the rows left free at the top just type as text the function you will be fitting, this is convenient as a reminder. Label a set of cells with the appropriate parameter names leaving a set of cells adjacent to the names free. This is where the actual values of the parameters will appear. Next (I usually skip a column to keep clearly separated the fir from the actual data) we create a column for the fitted function as follows. Select the first cell (below the labeling cell) in the column for the fitted function and type there an equal sign followed by the expression of the function, using the proper cell address (A1,B23,AC34, etc.) for value of the function variable and for value of the to be fitted parameters, in which case each element of the cell address should be preceded by a "dollarsign" ($A$1, $B$23, $AC$34, etc). For mathematical expressions such as sine, cosine, exponential etc. use the proper Excel command (it can be found under Function... in the Insert Menu or in the help menu). After entering the expression, pressing return will return the calculated value. If the cell address now is not preceded by a "dollarsign" Excel regards the address-referral as relative to the original cell (three left, one up), if not as absolute (The cell: A1). Thus copying this first cell with the function-expression down will leave the parameter-value unchanged while picking out exactly the proper x-value. This procedure is standard for any Excel routine and will be used frequently.

The next column we set up is the column of normalized residuals; this is just the deviation divided by the error for each data point viz. (y -fit(y))/error(y). How to set such a column up is explained in the previous paragraph. The definition of c2 is the sum of the squares of the normalized residuals, (Bevington and Robinson page 67). To find this value, use the SUMSQ command in Excel. Pick an unused cell in the free area left on top and type an equal sign followed by SUMSQ(argument) where argument is a set of cell addresses. Cell addresses seperated by a comma are distinct, while those seperated by a colon are continuous viz. A1,A10 are the cells A1 and A10, while A1:A10 are the cells A1, A2, A3 through A10. Do not forget to label your cell. The normalized c2 is then the c2 divided by the number of degrees of freedom i.e. number of data points minus number of fitting parameters (Bevington and Robinson page 68). The closer the normalized c2 will be to 1 the better the fit. Why? See Bevington and Robinson which will also tell you why a normalized c2 less than 1 is somewhat of a disaster (amongst others, page 107). We are now completely set up to start performing the fit with Excel.

Fitting

To find the best fitting funtion to the data we want to minimize c2. We do this with the resident Solver routine in Excel, which can be found in the Tools menu. WARNING: OUTSIDE OF THE LAB NOT ALL INSTALLATIONS OF EXCEL HAVE THIS ROUTINE. A Solver Parameters submenu will appear asking for the proper parameters. Our Target Cell will be the c2 cell, which we want to minimize (min). This we would like to accomplish by changing the values of the parameters a1....a5 hence we enter their respective addresses (in our case $A$6:$E$6, note the colon and the dollarsigns. the latter are required in the Solver submenu). The Solver Options are set to perform a hybrid Grid-Gradient search for a minimum in the c2 hypersurface in parameter space (page 150ff Bevington and Ronbinson). There is no need to change the default settings, they are mainly concerned with optimizing the calculation with respect to CPU and memory use. For those interested look under Excel help for the Solver Options Dialog Box. It gives a short description of each option.

Before starting the Solver, however, read carefully pages 147 and 148 on Starting values and local minima. The initial values for the parameters a1....a5 are rather important otherwise the Solver search routine could get stuck in a local minimum of the c2 surface. We are however looking for the absolute one. Several tries with the Solver routine while manually adjusting the parameters if necessary ought to give you the absolute minimum of the c2 surface.

Besides using the c2 value as a gauge for correctness of the fit it is also very useful to look at the distribution of the normalized residuals. Ideally it should be completely random. If it has any distinguishing patterns, it might indicate that the function your are fitting does not conform to shape of the data. Another kind of function or a combination of two might then be called for as the one to be fitted.

Errors in the parameters

Once we have obtained a best fit to our data we must find some way to extract the errors of the parameters themselves. We do this by performing a c2 variation (page 146/7 Bevington and Robinson) for each parameter that we want to know the uncertainty of. Say one wants to know the uncertainty in a4. Then one does the following. One performs several Solver routines, each time for the values of a1 , a2 , a3 and a5 ($A$6;$B$6;$C$6;$E$6) while keeping a4 constant at some manually chosen value, differing for each time one performs the routine. We thus obtain a set of values of c2 each corresponding to a value of a4. A plot hereof should be a parabola with the minimum at the true c2. The error in a4 is then given by the difference between the true value of a4 and that where c2 is exactly 1 more than the minimum. The exact value can be found by fitting a parabola to a c2 vs. a4 plot (Figure 8.3 page 146 Bevington and Robinson).

Graphical presentation

Lastly, as data tables are intrinsically rather incomprehensible, we would like to represent our data graphically. This can be done with the Chart function in the Insert menu or the short-cut Chartwizard button. A Chartwizard submenu will ask you for the range i.e. the set of cells you want to plot. Easiest is to give one set of x and y values, in this case Range: =$A$11:$A$69,$B$11:$B$59. Note the use of the colons and commas; the equality and dollarsigns are necessary. The rest of the steps of the submenu speak for themselves. Recommended is the X-Y scattering graph, though, it is the most flexible. The graphical tricks that can be performed are innumerable. Clicking on what one wants changed usually gives the submenu to make it possible. Only two useful commands will be mentioned here.

One is to insert error-bars into the graph. This is done by selecting (clicking on) the data series to which one wants to introduce error-bars. The formula-bar i.e. the bar just above the line of column labels A,B,C,D,...will tell you whether you have done so successfully. Then Error-bars in the Insert menu will activate the Error-bar submenu. Naturally we will choose Custom Error-bars. The input here is the same format as in Range i.e. an equality sign at the beginning and dollarsigns in front of the cell addresses are necessary. Clicking on a error-bar one can change its shape and format. It is especially advisable to change the Weight (thickness) to the minimum, if not the error-bars will be rather fat and clog the picture.

The other useful command is to have the fit and actual data represented in one graph. This is done by the New Data command in the Insert menu. One is than simply asked for the range (set of cells) which give the y-values for the new data series. The x-values are of course already known. The graph will then show both the actual data as well as the fitted curve. Double clicking on any Data series allows you to change its format (Lines, Dots, sparsing, thickness color, Error-bars(also) etc.).

Closing remarks

Ask for a printout of an Excel worksheet for a complete analysis according to the above prescription of the Data from table 8.1 in Bevington and Robinson. Once again this is only one way to do one's error analysis, which exhibits the minimal requirements and once again the use of the Trendline command in the Insert menu is not advisable unless you know exactly what it does.


Written by Koenraad Schalm. Last updated 1/15/98 by Laszlo Mihaly.