Search MAA Reviews:
Statistical Adjustment of Data
W. Edwards Deming
Publisher: Dover Publications (1964)
Details: 261 pages, Paperback
Topics: Data Analysis, Statistics
MAA Review[Reviewed by Robert W. Hayden, on 04/21/2012]
Yes, this is the same Deming who later became famous as a quality control guru in the United States and Japan. Earlier in his career, he made important contributions to a number of other areas of statistics, and this is one of those. This Dover edition is a reprint of the 1943 edition, and a statistician reading it today is in a position a bit like a mathematician reading Gauss (who Deming frequently cites and sometimes quotes — in Latin). Even the title needs explaining to a modern audience.
“Adjusting” data sounds a bit suspect. Here are examples of what that meant in 1943. First, imagine you have measured the angles of a triangle as accurately as you can. The results add to 179.4 degrees. Clearly, that cannot be. If you want this triangle to obey the laws of Euclidian geometry, you need to adjust the measurements to obey that geometry. For example, you could add 0.2 degrees to each measurement. Such adjustments are applied all the time in surveying and astronomy.
To take a more statistical example based on real life, imagine an on-campus fraternity that is noted for the comments it yells at female students walking by. A survey is sent to students asking whether they think the fraternity should be shut down. Although the student body as a whole is 55% female, 75% of the responses come from female students. You might wish to adjust the resulting contingency table so that the marginals for gender agree with the known distribution. Such adjustments are important to the U. S. Census Bureau for whom Deming was an advisor at the time this work was published.
As a final example, imagine fitting a curve to experimental data. Although the data do not perfectly match an inverse square law, theory in the area where the data were gathered may indicate this is the true relationship. So, we use least squares to fit such a model and regard the predicted y for each observed x to be an adjusted version of the observed value, subject to the constraint that the adjusted values must lie on such a curve. Of course, least squares regression is a central part of statistics today.
The general principle here is that the data are adjusted to fit a constraint believed to be a known fact. This is in contrast to the modern usage as in “mortality rates were adjusted for age and gender.” This modern sort of adjustment uses data on one variable to help interpret data on another rather than using known quantities as constraints.
A modern reader will be impressed by how much the cheap availability of electronic computers has changed what statisticians need to know. Deming devotes lots of space to computations and approximations. He includes advice on how to lay out the work in a (large) table. As the discussion progresses, these tables take on a life of their own, and statistical situations are described in terms of what these tables look like. This seems less odd if we recall that matrices came into being in a similar fashion, but unfortunately Deming’s tables are mainly of historical interest today. And that could well be said of the entire book, which is not to say that it is of no interest. Certainly Deming offers much practical advice along the way that goes unheeded much too often today. At places the exposition seemed hard to follow, perhaps because Deming assumed the reader was aware of the issues of the day, perhaps because of his writing style, or perhaps because of the density of the reviewer.
Deming uses the method of least squares exclusively, and he gives many reasons for preferring it that we rarely hear today. These may be of interest to teachers of statistics. Deming does not mention the flaw that became a concern in later years — the fact that results can be thrown way off by one or two wild values. On the other hand, Deming does counsel wisely that data should never be thrown away just because we don’t like it, e.g., because it is more than some magic number of standard deviations from the mean. If the system under study actually produces such values, it is only honest to include them. For regression, Deming treats the cases where we wish to minimize the errors in y, or in x, or in both. Most introductory statistics textbooks treat only the first case and do not even mention the others.
Recommended for those interested in the relatively short history of statistics. Probably not worth reading from cover to cover for most others, though many will find it referenced in the literature as the original source of many ideas, and the place to go for details.
After a few years in industry, Robert W. Hayden (firstname.lastname@example.org) taught mathematics at colleges and universities for 32 years and statistics for 20 years. In 2005 he retired from full-time classroom work. He now teaches statistics online at statistics.com and does summer workshops for high school teachers of Advanced Placement Statistics. He contributed the chapter on evaluating introductory statistics textbooks to the MAA's Teaching Statistics.