National EMSC Data Analysis Resource Center

Statistical Help

Home >

There are several key items you should start thinking about before you consult with a statistician or other researcher...

A wise man once said, “Never begin data collection without calculating the necessary sample size!” Ok, so perhaps a wise man didn’t say that, but a wise user of statistics will always have a plan to succeed before data collection begins.

A big part of planning to succeed is figuring out how many observations you will need in order to meet the objectives of your project. Taking observations costs time and money, so we want to make sure we get just the right amount to make inferences about our outcomes of interest.

Sample size calculation is a more complex topic than can be covered in-depth here, but there are several key items you should start thinking about before you consult with a statistician or other researcher familiar with sample size calculations.

First, if you have worked with data that is similar to the data you are going to be gathering, or you have researched similar work done by others, write down what you think the biggest and smallest possible outcomes could be.

For example, if you are working with height data, you know it is not possible for someone to be 4 inches tall. You also know it is not possible for people to be 20 feet tall. If you are familiar with human heights, you will have a pretty good idea what the tallest and shortest values possible could be. This is to estimate the variability of your data.

Second, did you know that when you take a sample there is a chance of concluding there is a difference between your subgroups of interest, when in fact, your population does not have a true difference between the subgroups?

Since you are observing a sample, and not the entire population, sometimes you will get the wrong answer simply due to chance. However, you can choose the probability of this occurrence. Do you want there to be a 10% chance of making this kind of error, or 1% chance? This is called the significance level. Keep in mind, however, the smaller you make the chance of making this kind of error, the larger your sample size will have to be.

Third, did you know you can conclude there is not a difference in your subgroups of interest when in actuality there is a difference between the subgroups? Again, since you are observing a sample, and not the entire population, sometimes you will get the wrong answer simply due to chance. And again, you can choose the probability of this occurrence. Do you want there to be a 20% chance of making this kind of error, or 5% chance? This is called the power. Once more, the smaller this probability is, the larger your sample size will have to be.

Finally, you need to think about the size of a difference between your subgroups of interest that is meaningful. Let’s use a nutritional supplement example.

What if the actual difference between the supplement (intervention) and eating the same amount of calories (control) is .02 lbs. Is that really meaningful? What if the actual difference is 20 lbs.?

You need to think about the size of a difference between your groups which is meaningful to be able to detect it in your project. Keep in mind the smaller the difference you’d like to be able to detect, the larger your sample size is going to need to be.

Along these same lines, sometimes data is collected about a single population to make estimates about a mean value for the population. In this case, you will need to think about how close the estimate from your sample should be to the actual population value. This is called the margin of error.

Since you are taking a sample, there is a degree of variability in the estimate you will get. You have probably seen political polls that estimate the percent in favor of a particular candidate. The reports will usually give the estimate with a margin of error.

For example, a newspaper recently reported one candidate for mayor was favored with 63% ± 3% of the vote. The ± 3% means the polling agency is confident the actual percent is between 60-66%, based on their sample data.

I bet you guessed it, but keep in mind, the smaller your margin of error, the larger your sample size.

Tweet

rev. 04-Aug-2022