When the population standard deviation is unknown and the sample size is greater than 30 What table value should be used in computing a confidence interval for mean?

A confidence interval is a way of using a sample to estimate an unknown population value. For estimating the mean, there are two types of confidence intervals that can be used: z-intervals and t-intervals. In the following lesson, we will look at how to use the formula for each of these types of intervals. To see the examples below in a video, scroll down!

Table of Contents

Z-Intervals

This procedure is often used in textbooks as an introduction to the idea of confidence intervals, but is not really used in actual estimation in the real world. Even so, it is common enough that we will talk about it here!

What makes it strange? Well, in order to use a z-interval, we assume that \(\sigma\) (the population standard deviation) is known. As you can imagine, if we don’t know the population mean (that’s what we are trying to estimate), then how would we know the population standard deviation?

When to use a z-interval

Setting the discussion above aside, the general rule for when to use a z-interval calculation is:

Use a z-interval when:
the sample size is greater than or equal to 30 and population standard deviation known OR Original population normal with the population standard deviation known.

Formula for the z-interval

If these conditions hold, we will use this formula for calculating the confidence interval:

\(\overline{x} \pm z_{c}\left(\dfrac{\sigma}{\sqrt{n}}\right)\)

where \(z_{c}\) is a critical value from the normal distribution (see below) and \(n\) is the sample size.

Common values of \(z_{c}\) are:

Confidence Level Critical Value
90% 1.645
95% 1.96
99% 2.575

Example using a z-interval

Suppose that in a sample of 50 college students in Illinois, the mean credit card debt was $346. Suppose that we also have reason to believe (from previous studies) that the population standard deviation of credit card debts for this group is $108. Use this information to calculate a 95% confidence interval for the mean credit card debt of all college students in Illinois.

Solution

Since we wish to estimate the mean, we immediately know we will be using either a t-interval or a z-interval. Looking a bit closer, we see that we have a large sample size (\(n = 50\)) and we know the population standard deviation. Therefore, we will use a z-interval with \(z_{c} = 1.96\). From reading the problem, we also have:

  • Mean is $346: \(\overline{x} = 346\)
  • Population standard deviation is 108: \(s = 108\)

Applying the formula:

\(\begin{align}\overline{x} &\pm z_{c}\left(\dfrac{\sigma}{\sqrt{n}}\right)\\ 346 &\pm 1.96\left(\dfrac{108}{50}\right)\end{align}\)

The \(\pm\) indicates that we need to perform two different operations: a subtraction and an addition.

Left hand endpoint:

\(346 – 1.96\left(\dfrac{108}{50}\right) = 316.1\)

Right hand endpoint:

\(346 + 1.96\left(\dfrac{108}{50}\right) = 375.9\)

This gives our 95% confidence interval for \(\mu\), the population mean, as \(\boxed{(316.1, 375.9)}\).

Interpretation

We are 95% confident that the mean amount of credit card debt for all college students in Illinois is between $316.10 and $375.90.

Of course this is a very particular statement, so please make sure you study how to interpret confidence intervals in general and so you can understand exactly what this means.

Other ways to write this interval

Another way to present this interval would be to calculate the margin of error:

\(1.96\left(\dfrac{108}{50}\right)=29.9\)

and write the interval as:

\(\boxed{$346} \pm \$29.9\)}

Both versions are correct, and the version you use depends on your audience and perhaps your teacher or professors preference. You can read more about different ways to write intervals here: Three ways to write a confidence interval.

T-intervals

The much more realistic scenario is using a t-interval to estimate an unknown population mean. This interval relies on our sample standard deviation in calculating the margin of error. All this means for us is that the formula will be very similar, but the critical value will no longer come from the normal distribution. Instead, it will come from the student’s t distribution.

When to use a t-interval

The rules for when to use a t-interval are as follows.

Use a t-interval when:
Population standard deviation UNKNOWN and original population normal OR sample size greater than or equal to 30 and Population standard deviation UNKNOWN.

Formula for the t-interval

The formula for a t-interval is:

\(\overline{x} \pm t_{c}\left(\dfrac{s}{\sqrt{n}}\right)\)

where \(t_{c}\) is a critical value from the t-distribution, \(s\) is the sample standard deviation and \(n\) is the sample size.

Finding \(t_c\)

The value of \(t_{c}\) depends on the sample size through the use of “degrees of freedom” where \(df = n – 1\). We will use this to look up the value of \(t_{c}\) in a table (a nice free version of that table can be found here, or typically in the back of your textbook if you are currently taking a class).

Example using a t-interval

Suppose that a sample of 38 employees at a large company were surveyed and asked how many hours a week they thought the company wasted on unnecessary meetings. The mean number of hours these employees stated was 12.4 with a standard deviation of 5.1. Calculate a 99% confidence interval to estimate the mean amount of time all employees at this company believe is wasted on unnecessary meetings each week.

Solution

As before, since we are estimating a mean with a confidence interval, we know it will either be a t-interval or a z-interval. In this case, we have a large sample (\(n = 38\)), but we only have the sample standard deviation. If you aren’t sure of that – read closely. The standard deviation of 5.1 was in the context of the sample, so \(s = 5.1\). Thus, we will go ahead and use a t-interval since \(\sigma\) is unknown.

Before we can do that however, we need to look up the critical value. To know which row in the t-table to look at, we find the degrees of freedom which is \(n – 1 = 38 – 1 = 37\). Using the table linked here:

Now that we have that, we plug the values into the formula and do the calculations to get our two endpoints. Remember that we have:

  • Sample mean: \(\overline{x} = 12.4\)
  • Sample size: \(n = 38\)
  • Sample standard deviation: \(s = 5.1\)
  • Critical value: \(t_c = 2.715\)

Therefore the interval is:

\(\begin{align} \overline{x} &\pm t_{c}\left(\dfrac{s}{\sqrt{n}}\right)\\ 12.4 &\pm 2.715\left(\dfrac{5.1}{\sqrt{38}}\right)\end{align}\)

This gives us the following two endpoints for our interval.

Left hand endpoint:

\(12.4 – 2.715\left(\dfrac{5.1}{\sqrt{38}}\right) = 10.2\)

Right hand endpoint:

\(12.4 + 2.715\left(\dfrac{5.1}{\sqrt{38}}\right) = 14.6\)

99% Confidence Interval for \(\mu\): \(\boxed{(10.2, 14.6)}\)

Interpretation

“We are 99% confident that the mean amount of time that all employees at this company think is wasted on meetings each week is between 10.2 and 14.6 hours.”

The same warning applies here – make sure you take the time to truly study what this means.

Video of the examples

The following video goes through the examples completed above. Use this to help yourself better understand how to apply these formulas.

Other Considerations

Confidence intervals are most often calculated with tools like SAS, SPSS, R, (these are statistical calculations packages) Excel, or even a graphing calculator. It is helpful to calculate them by hand once or twice to get a feel for the concept but you should also take the time to learn how to calculate them using one of these common tools. Which tool you use depends on the course you are taking or the field you are working in.

Additional Reading

If you are currently taking a statistics course, we have a ton of free statistics lessons and videos. Be sure to check out the statistics section on MathBootCamps for more articles like this one!

A confidence interval for a population mean, when the population standard deviation is known based on the conclusion of the Central Limit Theorem that the sampling distribution of the sample means follow an approximately normal distribution.

Consider the standardizing formula for the sampling distribution developed in the discussion of the Central Limit Theorem:

Z1= X−−μX− σX− = X−−μ σn Z1= X−−μX− σX− = X−−μ σn

Notice that µ is substituted for µx−µx− because we know that the expected value of µx−µx− is µ from the Central Limit theorem and σx−σx− is replaced with σn σn , also from the Central Limit Theorem.

In this formula we know X−X−, σx−σx− and n, the sample size. (In actuality we do not know the population standard deviation, but we do have a point estimate for it, s, from the sample we took. More on this later.) What we do not know is μ or Z1. We can solve for either one of these in terms of the other. Solving for μ in terms of Z1 gives:

μ= X− ± Z1 σn μ=X−±Z1σn

Remembering that the Central Limit Theorem tells us that the distribution of the X¯X¯'s, the sampling distribution for means, is normal, and that the normal distribution is symmetrical, we can rearrange terms thus:

X¯− Zα (σn) ≤ μ ≤X¯+ Zα (σn) X¯−Zα(σn)≤μ≤X¯+Zα(σn)

This is the formula for a confidence interval for the mean of a population.

Notice that Zα has been substituted for Z1 in this equation. This is where a choice must be made by the statistician. The analyst must decide the level of confidence they wish to impose on the confidence interval. α is the probability that the interval will not contain the true population mean. The confidence level is defined as (1-α). Zα is the number of standard deviations X¯X¯ lies from the mean with a certain probability. If we chose Zα = 1.96 we are asking for the 95% confidence interval because we are setting the probability that the true mean lies within the range at 0.95. If we set Zα at 1.64 we are asking for the 90% confidence interval because we have set the probability at 0.90. These numbers can be verified by consulting the Standard Normal table. Divide either 0.95 or 0.90 in half and find that probability inside the body of the table. Then read on the top and left margins the number of standard deviations it takes to get this level of probability.

In reality, we can set whatever level of confidence we desire simply by changing the Zα value in the formula. It is the analyst's choice. Common convention in Economics and most social sciences sets confidence intervals at either 90, 95, or 99 percent levels. Levels less than 90% are considered of little value. The level of confidence of a particular interval estimate is called by (1-α).

A good way to see the development of a confidence interval is to graphically depict the solution to a problem requesting a confidence interval. This is presented in Figure 8.2 for the example in the introduction concerning the number of downloads from iTunes. That case was for a 95% confidence interval, but other levels of confidence could have just as easily been chosen depending on the need of the analyst. However, the level of confidence MUST be pre-set and not subject to revision as a result of the calculations.

Figure 8.2

For this example, let's say we know that the actual population mean number of iTunes downloads is 2.1. The true population mean falls within the range of the 95% confidence interval. There is absolutely nothing to guarantee that this will happen. Further, if the true mean falls outside of the interval we will never know it. We must always remember that we will never ever know the true mean. Statistics simply allows us, with a given level of probability (confidence), to say that the true mean is within the range calculated. This is what was called in the introduction, the "level of ignorance admitted".

Here again is the formula for a confidence interval for an unknown population mean assuming we know the population standard deviation:

X¯−Zα (σn) ≤ μ ≤ X¯+Zα (σn) X¯−Zα(σn)≤μ≤X¯+Zα(σn)

It is clear that the confidence interval is driven by two things, the chosen level of confidence, ZαZα, and the standard deviation of the sampling distribution. The Standard deviation of the sampling distribution is further affected by two things, the standard deviation of the population and the sample size we chose for our data. Here we wish to examine the effects of each of the choices we have made on the calculated confidence interval, the confidence level and the sample size.

For a moment we should ask just what we desire in a confidence interval. Our goal was to estimate the population mean from a sample. We have forsaken the hope that we will ever find the true population mean, and population standard deviation for that matter, for any case except where we have an extremely small population and the cost of gathering the data of interest is very small. In all other cases we must rely on samples. With the Central Limit Theorem we have the tools to provide a meaningful confidence interval with a given level of confidence, meaning a known probability of being wrong. By meaningful confidence interval we mean one that is useful. Imagine that you are asked for a confidence interval for the ages of your classmates. You have taken a sample and find a mean of 19.8 years. You wish to be very confident so you report an interval between 9.8 years and 29.8 years. This interval would certainly contain the true population mean and have a very high confidence level. However, it hardly qualifies as meaningful. The very best confidence interval is narrow while having high confidence. There is a natural tension between these two goals. The higher the level of confidence the wider the confidence interval as the case of the students' ages above. We can see this tension in the equation for the confidence interval.

μ=x_±Zα(σn)μ=x_±Zα(σn)

The confidence interval will increase in width as ZαZα increases, ZαZα increases as the level of confidence increases. There is a tradeoff between the level of confidence and the width of the interval. Now let's look at the formula again and we see that the sample size also plays an important role in the width of the confidence interval. The sample size, nn, shows up in the denominator of the standard deviation of the sampling distribution. As the sample size increases, the standard deviation of the sampling distribution decreases and thus the width of the confidence interval, while holding constant the level of confidence. This relationship was demonstrated in [link]. Again we see the importance of having large samples for our analysis although we then face a second constraint, the cost of gathering data.

Another way to approach confidence intervals is through the use of something called the Error Bound. The Error Bound gets its name from the recognition that it provides the boundary of the interval derived from the standard error of the sampling distribution. In the equations above it is seen that the interval is simply the estimated mean, sample mean, plus or minus something. That something is the Error Bound and is driven by the probability we desire to maintain in our estimate, ZαZα, times the standard deviation of the sampling distribution. The Error Bound for a mean is given the name, Error Bound Mean, or EBM.

To construct a confidence interval for a single unknown population mean μ, where the population standard deviation is known, we need x - x - as an estimate for μ and we need the margin of error. Here, the margin of error (EBM) is called the error bound for a population mean (abbreviated EBM). The sample mean x - x - is the point estimate of the unknown population mean μ.

The confidence interval estimate will have the form:

(point estimate - error bound, point estimate + error bound) or, in symbols,( x ¯ –EBM, x ¯ +EBM x ¯ –EBM, x ¯ +EBM )

The mathematical formula for this confidence interval is:

X¯−Zα (σn) ≤ μ ≤ X¯+Zα (σn) X¯−Zα(σn)≤μ≤X¯+Zα(σn)

The margin of error (EBM) depends on the confidence level (abbreviated CL). The confidence level is often considered the probability that the calculated confidence interval estimate will contain the true population parameter. However, it is more accurate to state that the confidence level is the percent of confidence intervals that contain the true population parameter when repeated samples are taken. Most often, it is the choice of the person constructing the confidence interval to choose a confidence level of 90% or higher because that person wants to be reasonably certain of his or her conclusions.

There is another probability called alpha (α). α is related to the confidence level, CL. α is the probability that the interval does not contain the unknown population parameter.
Mathematically, 1 - α = CL.

A confidence interval for a population mean with a known standard deviation is based on the fact that the sampling distribution of the sample means follow an approximately normal distribution. Suppose that our sample has a mean of x - x - = 10, and we have constructed the 90% confidence interval (5, 15) where EBM = 5.

To get a 90% confidence interval, we must include the central 90% of the probability of the normal distribution. If we include the central 90%, we leave out a total of α = 10% in both tails, or 5% in each tail, of the normal distribution.

Figure 8.3

To capture the central 90%, we must go out 1.645 standard deviations on either side of the calculated sample mean. The value 1.645 is the z-score from a standard normal probability distribution that puts an area of 0.90 in the center, an area of 0.05 in the far left tail, and an area of 0.05 in the far right tail.

It is important that the standard deviation used must be appropriate for the parameter we are estimating, so in this section we need to use the standard deviation that applies to the sampling distribution for means which we studied with the Central Limit Theorem and is, σ n σ n .

To construct a confidence interval estimate for an unknown population mean, we need data from a random sample. The steps to construct and interpret the confidence interval are:

  • Calculate the sample mean x - x - from the sample data. Remember, in this section we know the population standard deviation σ.
  • Find the z-score from the standard normal table that corresponds to the confidence level desired.
  • Calculate the error bound EBM.
  • Construct the confidence interval.
  • Write a sentence that interprets the estimate in the context of the situation in the problem.

We will first examine each step in more detail, and then illustrate the process with some examples.

When we know the population standard deviation σ, we use a standard normal distribution to calculate the error bound EBM and construct the confidence interval. We need to find the value of z that puts an area equal to the confidence level (in decimal form) in the middle of the standard normal distribution Z ~ N(0, 1).

The confidence level, CL, is the area in the middle of the standard normal distribution. CL = 1 – α, so α is the area that is split equally between the two tails. Each of the tails contains an area equal to α 2 α 2 .

The z-score that has an area to the right of α 2 α 2 is denoted by Z α 2 Z α 2 .

For example, when CL = 0.95, α = 0.05 and α 2 α 2 = 0.025; we write Z α 2 Z α 2 = Z0.025Z0.025.

The area to the right of Z0.025Z0.025 is 0.025 and the area to the left of Z0.025Z0.025 is 1 – 0.025 = 0.975.

Z α 2  =  Z 0.025  = 1.96 Z α 2  =  Z 0.025  = 1.96 , using a standard normal probability table. We will see later that we can use a different probability table, the Student's t-distribution, for finding the number of standard deviations of commonly used levels of confidence.

The error bound formula for an unknown population mean μ when the population standard deviation σ is known is

  • EBM = ( Z α 2 )( σ n ) ( Z α 2 )( σ n )
  • The confidence interval estimate has the format ( x ¯ –EBM, x ¯ +EBM) ( x ¯ –EBM, x ¯ +EBM) or the formula: X¯−Zα (σn) ≤ μ ≤ X¯+Zα (σn) X¯−Zα(σn)≤μ≤X¯+Zα(σn)

The graph gives a picture of the entire situation.

CL + α 2 α 2 + α 2 α 2 = CL + α = 1.

Figure 8.4

Suppose we are interested in the mean scores on an exam. A random sample of 36 scores is taken and gives a sample mean (sample mean score) of 68 (X¯X¯ = 68). In this example we have the unusual knowledge that the population standard deviation is 3 points. Do not count on knowing the population parameters outside of textbook examples. Find a confidence interval estimate for the population mean exam score (the mean score on all exams).

Find a 90% confidence interval for the true (population) mean of statistics exam scores.

  • The solution is shown step-by-step.

To find the confidence interval, you need the sample mean, x - x - , and the EBM.

  • x - x - = 68
  • EBM = ( Z α 2 ) ( Z α 2 ) ( σ n ) ( σ n )
  • σ = 3; n = 36; The confidence level is 90% (CL = 0.90)

CL = 0.90 so α = 1 – CL = 1 – 0.90 = 0.10

α 2 α 2 = 0.05 Z α 2 = z 0.05 Z α 2 = z 0.05

The area to the right of Z0.05 is 0.05 and the area to the left of Z0.05 is 1 – 0.05 = 0.95.

Z α 2  =  Z 0.05  = 1.645 Z α 2  =  Z 0.05  = 1.645

This can be found using a computer, or using a probability table for the standard normal distribution. Because the common levels of confidence in the social sciences are 90%, 95% and 99% it will not be long until you become familiar with the numbers , 1.645, 1.96, and 2.56

EBM = (1.645) ( 3 36 ) ( 3 36 ) = 0.8225

x - x - - EBM = 68 - 0.8225 = 67.1775

x - x - + EBM = 68 + 0.8225 = 68.8225

The 90% confidence interval is (67.1775, 68.8225).

Interpretation

We estimate with 90% confidence that the true population mean exam score for all statistics students is between 67.18 and 68.82.

Suppose we change the original problem in Example 8.1 by using a 95% confidence level. Find a 95% confidence interval for the true (population) mean statistics exam score.

μ = x _ ± Z α ( σ n ) μ = x _ ± Z α ( σ n )

μ = 68 ± 1.96 ( 3 36 ) μ = 68 ± 1.96 ( 3 36 )

67.02 ≤ μ ≤ 68.98 67.02 ≤ μ ≤ 68.98

σ = 3; n = 36; The confidence level is 95% (CL = 0.95).

CL = 0.95 so α = 1 – CL = 1 – 0.95 = 0.05

Z α 2 = Z 0.025 =1.96 Z α 2 = Z 0.025 =1.96

Notice that the EBM is larger for a 95% confidence level in the original problem.

Comparing the results

The 90% confidence interval is (67.18, 68.82). The 95% confidence interval is (67.02, 68.98). The 95% confidence interval is wider. If you look at the graphs, because the area 0.95 is larger than the area 0.90, it makes sense that the 95% confidence interval is wider. To be more confident that the confidence interval actually does contain the true value of the population mean for all statistics exam scores, the confidence interval necessarily needs to be wider. This demonstrates a very important principle of confidence intervals. There is a trade off between the level of confidence and the width of the interval. Our desire is to have a narrow confidence interval, huge wide intervals provide little information that is useful. But we would also like to have a high level of confidence in our interval. This demonstrates that we cannot have both.

  • Increasing the confidence level makes the confidence interval wider.
  • Decreasing the confidence level makes the confidence interval narrower.

And again here is the formula for a confidence interval for an unknown mean assuming we have the population standard deviation:

X¯−Zα (σn) ≤ μ ≤ X¯+Zα (σn) X¯−Zα(σn)≤μ≤X¯+Zα(σn)

The standard deviation of the sampling distribution was provided by the Central Limit Theorem as σnσn. While we infrequently get to choose the sample size it plays an important role in the confidence interval. Because the sample size is in the denominator of the equation, as nn increases it causes the standard deviation of the sampling distribution to decrease and thus the width of the confidence interval to decrease. We have met this before as we reviewed the effects of sample size on the Central Limit Theorem. There we saw that as nn increases the sampling distribution narrows until in the limit it collapses on the true population mean.

Suppose we change the original problem in Example 8.1 to see what happens to the confidence interval if the sample size is changed.

Leave everything the same except the sample size. Use the original 90% confidence level. What happens to the confidence interval if we increase the sample size and use n = 100 instead of n = 36? What happens if we decrease the sample size to n = 25 instead of n = 36?

μ=x_±Zα(σn)μ=x_±Zα(σn)
μ=68±1.645(3100)μ=68±1.645(3100)
67.5065≤μ≤68.493567.5065≤μ≤68.4935
If we increase the sample size n to 100, we decrease the width of the confidence interval relative to the original sample size of 36 observations.

μ=x_±Zα(σn)μ=x_±Zα(σn)
μ=68±1.645(325)μ=68±1.645(325)
67.013≤μ≤68.98767.013≤μ≤68.987
If we decrease the sample size n to 25, we increase the width of the confidence interval by comparison to the original sample size of 36 observations.

  • Increasing the sample size makes the confidence interval narrower.
  • Decreasing the sample size makes the confidence interval wider.

We have already seen this effect when we reviewed the effects of changing the size of the sample, n, on the Central Limit Theorem. See Figure 7.7 to see this effect. Before we saw that as the sample size increased the standard deviation of the sampling distribution decreases. This was why we choose the sample mean from a large sample as compared to a small sample, all other things held constant.

Thus far we assumed that we knew the population standard deviation. This will virtually never be the case. We will have the sample standard deviation, s, however. This is a point estimate for the population standard deviation and can be substituted into the formula for confidence intervals for a mean under certain circumstances. We just saw the effect the sample size has on the width of confidence interval and the impact on the sampling distribution for our discussion of the Central Limit Theorem. We can invoke this to substitute the point estimate for the standard deviation if the sample size is large "enough". Simulation studies indicate that 30 observations or more will be sufficient to eliminate any meaningful bias in the estimated confidence interval.

Spring break can be a very expensive holiday. A sample of 80 students is surveyed, and the average amount spent by students on travel and beverages is $593.84. The sample standard deviation is approximately $369.34.

Construct a 92% confidence interval for the population mean amount of money spent by spring breakers.

We begin with the confidence interval for a mean. We use the formula for a mean because the random variable is dollars spent and this is a continuous random variable. The point estimate for the population standard deviation, s, has been substituted for the true population standard deviation because with 80 observations there is no concern for bias in the estimate of the confidence interval.

μ=x¯±[Z(a /2)sn]μ=x¯±[Z(a /2)sn]

Substituting the values into the formula, we have:

μ = 593.84 ± [ 1.75 369.34 80 ] μ = 593.84 ± [ 1.75 369.34 80 ]

Z(a/2)Z(a/2) is found on the standard normal table by looking up 0.46 in the body of the table and finding the number of standard deviations on the side and top of the table; 1.75. The solution for the interval is thus:

μ = 593.84 ± 72.2636 = ( 521.57 , 666.10 ) μ = 593.84 ± 72.2636 = ( 521.57 , 666.10 )

$ 521.58 ≤ μ ≤ $ 666.10 $ 521.58 ≤ μ ≤ $ 666.10

The general form for a confidence interval for a single population mean, known standard deviation, normal distribution is given by X¯−Zα (σn) ≤ μ ≤ X¯+Zα (σn) X¯−Zα(σn)≤μ≤X¯+Zα(σn) This formula is used when the population standard deviation is known.

CL = confidence level, or the proportion of confidence intervals created that are expected to contain the true population parameter

α = 1 – CL = the proportion of confidence intervals that will not contain the population parameter

z α 2 z α 2 = the z-score with the property that the area to the right of the z-score is   ∝ 2   ∝ 2 this is the z-score used in the calculation of "EBM where α = 1 – CL.

Toplist

Latest post

TAGs