PHY 445/6 / PHY 515/6

Error analysis

Introduction

In any measurement the numerical value that you obtain from your instrument is always somewhat different from the true value of the physical quantity. Your goal is the come to term with that fact of life, and characterize the numerical value accordingly: estimate the error of your measurement. A few definitions: Experimentalists often use the words "error" and "uncertainty" interchangeably. Get used to it.

Statistical errors can, in general, be estimated in only one way: by repeating the measurement several times. This means in particular, that you cannot estimate statistic error by simply looking at the precision of your instrument, or by counting significant digits, etc. If you claim, for example, that a ruler has tick marks to make measurement to within 1.0 mm, this is an incorrect estimate of the statistical error. To determine the statistical error, you must repeatedly make the same identical measurement with the same ruler many times to determine the error on your measurement statistically. A detailed illustration of this concept follows later.

There are certain physical situations, where you may be able to mathematically characterize the statistical errors, without repeating the measurements. For example, having counted N nuclear decay processes, you will often assume that the counts follow a probability distribution characterized by fluctuations of the order of N1/2. However, if you follow this route, you must justify your assumptions.

Often you can use measured statistical errors on one of your data points to infer the statistical errors on the other points. (In other words you do not necessarily need to repeat ALL measurements many times, but for each experiment be sure to repeat at least one measurement several times to get a measure of the statistical error.)

Systematic errors are not so easy to measure. Try to estimate these on a physical basis, or include them in your model. You should strive to reduce systematic errors to as close to zero as possible, or at least they should be small compared to your statistical errors. If the unaccounted-for systematic errors in an experiment are larger than the statistical errors, then you will not be able to verify any physical model. Such an experiment is, by definition, a poor experiment.

You must account for the statistical error on your measured points by representing these uncertainties as error bars on your plots. In nearly every lab, you are varying some quantity, X, and the measuring some other quantity, Y. Measure or estimate the statistical error on Y. Then plot Y vs. X with little bars on Y that represent the statistical errors. A plot without error bars is just plain wrong.

Usually we can assume that the quantity X is measured with great accuracy, and we can forget about the error there. But in more delicate situations there may be error bars in the X direction as well.

Once you have plotted the points, do a fit to some model function that describes the physics you expect to verify. If your model is simple (it has one or two fitting parameter) you may turn the problem into fitting a straight line to the data. This can be done by eye (graph paper and ruler) if you are careful.

Once you have done a best fit, however, you need to address the following question: is the best fit a good fit? In other words, does the model fit the data within the uncertainties prescribed by the error bars? Your goal is not to measure a number. Your goal is not even to measure the "right" number. Really, your goal is to determine if the physical model is supported by the data. To do this you must numerically answer the question: "Does the data fit the model to within the statistical uncertainties of the measurements?"

The first step is to plot the data (with error bars) and the fitted curve on the same graph. In order you answer "is this a good fit?" numerically, you should calculate the c2 (chi-squared) of the data relative to the model. (See later how to do it.) No matter how you have done your fit, always calculate the c2. By dividing the number of degrees of freedom (usually N-2 for a linear fit to N points) you end up with a number called the reduced c2. Understand the following:

The reduced c2 is the experimentalist's universal criterea for comparing data with models. If you learn to use the c2, you will have gained a powerful tool for evaluating the meaning of nearly any new experimental result in any field.

A little preparation

The purpose of the next few pages is to introduce you to the basic ideas of error analysis. In the process you will learn about using Microsoft Excel. (Just because I am using Excel for illustrating the basic concepts, you are NOT required to use it. Feel free to pick your own software, but always test it before use!)

Please start Excel on your computer in another window. Click on "Help", "About". If you see the copyright notice with "97 SR-1", then we are using the same version. Most likely you will have a more recent version, and you may expect some differences between the screenshots I present and your real Excel screens, but the differences may not be too important.

Click on "Tools" and check if "Solver..." and "Data Analysis..." is there. If not, try to install them by "Tools", "Add-Ins..." . This is crucial - without the Solver package you can read the text in the next pages, but can not really learn to set up your own data analysis routines.

Use standard Windows commands to create a directory for your work. Download this data file and this one (by using the WEB browser) into your directory. The data files contain sets of (simulated) VOLTAGE measurements in units of Volt.

You are ready to proceed to the first page of discussions.


This page was created by Laszlo Mihaly. Last updated 1/8/98. You are visitor # since 8/26/2004.