## Remarks on Our Statistical Methodology

The results in *The Alumni Factor* are almost all based on surveys taken by graduates of some of the finest colleges and universities across the country. Ideally, every alumnus from every single school in the US would be included in our survey, but this is simply not possible - at least not yet. Fortunately, we can instead appeal to standard statistical methods to take a *sample* of alumni from the nation's best schools, and then make valid statistical conclusions based on that sample. Over 52,000 alumni (a significant sample size) answered our questionnaires, and they responded to a wide variety of interesting questions on topics ranging from the strength of their friendships to current earning power to certain political leanings. A total sample of 52,000 is a very good start, and as we accumulate more and more data, we'll be able to draw conclusions involving more schools and more topical areas of interest.

**Confidence Bounds**

What do 52,000 observations buy in terms of the ability to assess alumni opinions of their schools? A great deal, it turns out. We designed our survey to obtain about 200 respondents from each school, spread out over various majors and graduation years. In any survey in which you wish to make statistically valid conclusions, the general rule is, the more observations, the better. The sample sizes we are dealing with in *The Alumni Factor* allow us to answer all of the interesting questions we pose with a reasonably high level of statistical confidence.

Let's look at a typical example: Suppose that 57.0% of the 200 respondents from College ABC strongly agreed with the statement "My College Developed Me Intellectually." Besides noting that a bit more than half of this particular sample put forth strong agreement to the statement, does the 57.0% figure tell us anything else? Yes. The beautiful thing about using statistics is that such a sample size allows us to generalize these findings to the *entire population* of students from College ABC (subject to some caveats we discuss below). In fact, we can state with 95% confidence that the true proportion *p* of the College ABC alumni population that is in strong agreement is in the range:

0.57 - 1.96 [0.57 (1 - 0.57) / 200]1/2 < p < 0.57 + 1.96 [0.57 (1 - 0.57) / 200]1/2

That is, we are 95% confident that *p* lies in the range 57.0% +/- 6.9% (50.1% to 63.9%). Thus, 200 observations give us a reasonable idea of where people at any school stand with respect to the statement in play.

The +/- 6.9% interval range in the above example is often called the *sampling error*, and represents the uncertainty in the results due to inherent randomness in the particular sample we are using. As more respondents contribute to the survey, the sampling error will tend to decrease, and we'll be able to make more-precise statements about the parameter of interest - e.g., the proportion *p*of alumni who agree with a certain statement.

**Data and Sampling Issues **

There are other ways for uncertainty to enter the system. For instance, if we ask the wrong question or address it to the wrong group of people, then we introduce what is known as systematic bias into the analysis. We can then end up being very confident about the answer to an incorrect question. Examples of systematic bias include:

- Limiting a poll on political beliefs to theater majors - which might bias answers to the liberal side
- Wording a question in a biased way, such as "In light of the known ameliorative effects of medical marijuana for patients experiencing debilitating nausea and pain, are you in favor of its carefully supervised and limited use?"

*The Alumni Factor* tried to avoid systematic bias by asking our questions precisely and fairly, without introducing wording that would tend to lead answers in a certain direction. Our interest lies solely in assessing what alumni think of their schools. Moreover, we asked the *same set of questions* to all respondents. So if there is some unrecognized systematic error, it seems reasonable to expect that the error would affect all schools in the same direction. For instance, if respondent self-selection from a particular school tended to bias results toward a more-liberal response, then we might expect the same phenomenon to appear for most other schools; and then we could at least do a fair "apples-to-apples" comparison of any *differences* between schools.

We were also on the lookout for so-called *outlier effects*. Outliers are individual observations that differ from "average" observations by a substantial margin (say by, more than three "standard deviations"). Outliers often result in skewed interpretations of data, and so they must at least be pointed out, if not dropped altogether from any analysis. To this end, suppose that three recent graduates from the Economics Department of XYZ University now make salaries of $55,000, $45,000 and $2,600,000, respectively (it turns out the third guy currently plays in the NBA). That's an average of $900,000, which is skewed due to the outlier.

Finally, what if some data are just plain bad? Some respondents may fill out the survey incompletely, not take the task seriously, or make obvious errors. With these potential problems in mind, we vigilantly scour our database looking for incomplete, inconsistent, or goofy data; and we deal with the offending data appropriately.

**Regressions **

As you go through *The Alumni Factor*, you will find numerous examples of what is known as *regression analysis*; for example, see Chapter 3 (Alumni Rate Their College Experience) where we plot weighted averages of SAT scores vs. Intellectual Development. We've inserted a regression line on the scatterplot, which gives a rough indication of the relationship between the two variables (though it does not reveal cause and effect, i.e., whether SAT scores tend to help or hinder Intellectual Development). In addition, the scatterplot shows that the relationship is not exact, since many of the points fall quite a bit off the line.

The regression's *coefficient of determination, R2*, is a measure used to quantify the strength of the relationship between the variables. If the variables are highly correlated with each other, then *R2* will likely be close to one. Otherwise, if the variables are independent, then *R2* will be close to zero. So if we use the temperature on Mars to predict IBM's stock price, then the resulting *R2* will be close to zero. But, if we use a person's SAT scores to predict his ACT scores, *R*2 will likely be closer to 1. For the analysis in Chapter 3, we find that *R2* = 0.3148, indicating a modest correlation between SAT scores and Intellectual Development.

**Adding Things Up **

In order to come up with the Overall Ranks for each of the 227 prestigious schools in our study, we add up the scores each school receives on each of 15 metrics - Intellectual Development, Social Development, etc. (all of the components are listed in the Methodology section). How is this done? The easiest thing would simply be to add up the 15 rankings that each school receives from the corresponding metrics. For example, if School ABC were #23 in Intellectual Development, #112 in Social Development, etc., we could add up these 15 numbers to obtain a grand total; and then rank the grand totals of the 227 schools. The school with the lowest grand total would have Overall Rank #1; the school with the second-lowest total would have Overall Rank #2, etc. This method is all well and good, but it doesn't take into account the magnitude of score differences among schools within one of the 15 metrics. We instead use what are commonly referred to as *standardized Z-scores* for each school for each metric. To obtain our Overall Ranks, we add up the 15 Z-scores for each school, and then rank the 227 schools according to these sums.

All told, the conclusions in this book are drawn from statistical analyses that are as rigorous as the data allows - and with more than 52,000 graduates, the data is very reliable. There are still some cases where we do not have enough data to draw sound conclusions, and we have done our best to indicate those instances. We are continuing to collect data and will add schools to our rankings when the data we gather is sufficient to draw sound conclusions.

Professor David M. Goldsman, PhD *Statistician Stewart School of Industrial and Systems Engineering Georgia Institute of Technology*