When a finding is considered statistically significant the likelihood of the results occurring as a result of chance is less than?

It's easy for non-scientists to misunderstand the term significant when they come across it in an article. In everyday English, the word means "important." But when researchers say the findings of a study were "statistically significant," they do not necessarily mean the findings are important.

Statistical significance refers to whether any differences observed between groups being studied are "real" or whether they are simply due to chance. These can be groups of workers who took part in a workplace health and safety intervention or groups of patients participating in a clinical trial.

Let's consider a study evaluating a new weight loss drug. Group A received the drug and lost an average of four kilograms (kg) in seven weeks. Group B didn't receive the drug but still lost an average of one kg over the same period. Did the drug produce this three-kg difference in weight loss? Or could it be that Group A lost more weight simply by chance?

Statistical testing starts off by assuming something impossible: that the two groups of people were exactly alike from the start. This means the average starting weight in each group was the same, and so were the proportions of lighter and heavier people.

Mathematical procedures are then used to examine differences in outcomes (weight loss) between the groups. The goal is to determine how likely it is that the observed difference — in this case, the three-kg difference in average weight loss — might have occurred by chance alone.

The "p" value

Now here's where it gets complicated. Scientists use the term "p" to describe the probability of observing such a large difference purely by chance in two groups of exactly-the-same people. In scientific studies, this is known as the "p-value."

If it is unlikely enough that the difference in outcomes occurred by chance alone, the difference is pronounced "statistically significant."

Mathematical probabilities like p-values range from 0 (no chance) to 1 (absolute certainty). So 0.5 means a 50 per cent chance and 0.05 means a 5 per cent chance.

In most sciences, results yielding a p-value of .05 are considered on the borderline of statistical significance. If the p-value is under .01, results are considered statistically significant and if it's below .005 they are considered highly statistically significant.

But how does this help us understand the meaning of statistical significance in a particular study? Let's go back to our weight loss study. If the results yield a p-value of .05, here is what the scientists are saying: "Assuming the two groups of people being compared were exactly the same from the start, there's a very good chance — 95 per cent — that the three-kg difference in weight loss would NOT be observed if the weight loss drug had no benefit whatsoever." From this finding, scientists would infer that the weight loss drug is indeed effective.

If you notice the p-value of a finding is .01 but prefer it expressed differently, just subtract the p-value from the number 1 (1 minus .01 equals .99). Thus a p-value of .01 means there is an excellent chance — 99 per cent — that the difference in outcomes would NOT be observed if the intervention had no benefit whatsoever.

Not all statistical testing is used to determine the effectiveness of interventions. Studies that seek associations — for example, whether new employees are more vulnerable to injury than experienced workers — also rely on mathematical testing to determine if an observation meets the standard for statistical significance.

Source: At Work, Issue 40, Spring 2005: Institute for Work & Health, Toronto

When a finding is considered statistically significant the likelihood of the results occurring as a result of chance is less than?

Published: 27th September 2021

When reading about or conducting research, you are likely to come across the term ‘statistical significance’. ‘Significance’ generally refers to something having particular importance – but in research, ‘significance’ has a very different meaning. Statistical significance is a term used to describe how certain we are that a difference or relationship between two variables exists and isn’t due to chance. When a result is identified as being statistically significant, this means that you are confident that there is a real difference or relationship between two variables, and it’s unlikely that it’s a one-off occurrence.

However, it’s commonplace for statistical significance (i.e., being confident that chance wasn’t involved in your results) to be confused with general significance (i.e., having importance). A statistically significant finding may, or may not, have any real-world utility. Therefore, having a thorough understanding of what statistical significance is, and what factors contribute to it, is important for conducting sound research.

1 – Hypotheses:

A hypothesis is a particular type of prediction for what the outcomes of research will be, and comes in two forms. A null hypothesis predicts that there is no difference or relationship between two groups or variables of interest and therefore the two groups or variables are equal.In contrast, an alternate hypothesis predicts that there is a difference or relationship between two groups or variables of interest. In this case, the two groups or variables are not equal, and so could be greater or less than one another.

A key purpose of statistical significance testing is to determine whether your null hypothesis occurred by chance. If your null hypothesis occurred by chance, then we do not reject (retain) the null hypothesis and conclude there is no difference. Because the result occurred by chance,it is not likely to happen in the real world. However, if your null hypothesis did not occur by chance, then we reject the null hypothesis and conclude there is a difference. Because it did not occur by chance,it is likely to occur in the real world. This will in turn will affect the conclusions that you can draw from your research.

2 – The Likelihood of Error:

When dealing with chance, there is always the possibility of error – including Type I or Type II errors. A Type I error occurs when the null hypothesis is rejected when it should have been retained (i.e., a false positive). This means that the results are identified as significant when they actually occurred by chance. Because they occurred by chance, it is unlikely to happen in the real world and so should have been identified as non-significant. A Type II error occurs when the null hypothesis is retained when it should have been rejected (i.e., a false negative). This means that the results are identified as non-significant when they actually did not occur by chance. Not occurring by chance suggests that it is likely to happen in the real world, and so should have been identified as significant.

3 – Alpha and p Values:

Prior to any statistical analyses, it is important to determine what you will consider the definition of statistically significant to be. This is referred to as the alpha value, and represents the probability you are going to make a Type I error (i.e., reject the null hypothesis when it is true).Alpha values are typically set at .05 (5%), meaning that we are 95% confident that we won’t make a Type I error. However, more conservative tests will use smaller alpha values such as .01 (1%), meaning that we are 99% confident we won’t make a Type I error.Alpha is not to be confused with the p value, which is the specifically calculated probability of the obtained result occurring by chance. For statistical significance, alpha is used as the threshold value and the p value is compared to it. If the p value is above the alpha value (p> .05), our result is not statistically significant. If it is below our alpha (p< .05), then it is statistically significant.

4 – One or Two Tailed Tests:

Your hypotheses will determine which type of significance test you will need to conduct. A one-tailed hypothesis is where you predict a specific direction of the difference (higher, lower) or relationship (positive, negative) between the two groups or variables of interest. Therefore, with a one-tailed test, while your alpha value stays the same, you halve your p value because you are focusing on one specific direction only. On the other hand, a two-tailed hypothesis is where you do not predict a specific direction of the difference or relationship and as such with a two-tailed test you keep the p value as a whole number. Two-tailed tests are more widely used in research compared to one-tailed tests.

5 – Sample Size and Power:

Statistical power refers to the probability that the statistical test you are using will correctly reject a false null hypothesis. Type II errors are reduced by having enough statistical power, which is generally kept at 80% or higher. Statistical power is increased by having an adequate sample size. However, if your study is not adequately powered because you don’t have enough participants, this will affect statistical significance. Generally, if the alternate hypothesis is true and there is a difference or relationship to be observed, then with a larger sample the chances of seeing this difference or relationship will increase. If you see a difference or relationship between two small groups, you could reasonably expect that the difference or relationship would increase in prominence if the groups became larger.

Determining Statistical Significance Using Hand Calculations:

  1. Determine your thresholds and tailed tests: Before performing any analyses, decide what your alpha value is (.05 or .01), and whether you are performing a one-tailed or two-tailed test.
  2. Determine your critical value:This step is unique to calculations done by hand. A critical value is a number that corresponds to the probability equal to your pre-determined alpha value, and for hand calculations will serve as the threshold for significance. Critical values are based on the number of tails in your test and your alpha value, which is why these parameters are determined first. There are different sets of critical values for each different type of statistical test you are conducting – these are easily accessible in statistics textbooks, or online.
  3. Calculate your test statistic: With your parameters set, perform the hand calculations needed. Your observed test statistic is the final numerical result.
  4. Compare your observed test statistic (Step 3) to the critical value (Step 2), and draw your conclusions: a. If your observed statistic is greater than the critical value (observed > critical), reject the null hypothesis. This means that the probability that this finding occurred by chance is less than 5%, and is evidence in support of a likely real-world difference or relationship between two groups or variables.

    b. If your observed statistic is less than the critical value (observed < critical), retain the null hypothesis. This means that the probability of your finding occurred by chance is greater than 5%, and suggests that there is no evidence of a real-world difference or relationship between two groups or variables.

Determining Statistical Significance Using Software Packages:

  1. Determine your thresholds and tailed tests: Before performing any analyses, decide what your alpha value is (.05 or .01), and whether you are performing a one-tailed or two-tailed test.
  2. Calculate your test statistic: With your parameters set, perform the calculations needed. Your observed test statistic is the final numerical result. You will note that next to your final result that there will be a p value next to it – software packages will calculate the specific p value for you.
  3. Compare your observed p value (Step 2) to your alpha value (Step 1), and draw your conclusions: a. If your p value is less than your alpha value (p< .050), reject the null hypothesis. This means that the probability that this finding occurred by chance is less than 5%, and is evidence in support of a likely real-world difference or relationship between two groups or variables.

    b. If your p value is greater than your alpha value (p> .050), retain the null hypothesis. This means that the probability of your finding occurred by chance is greater than 5%, and suggests that there is no evidence of a real-world difference or relationship between two groups or variables.

Effect Sizes:

Just because a result has statistical significance, it doesn’t mean that the result has any real-world importance. To help ‘translate’ the result to the real world, we can use an effect size. An effect size is a numerical index of how much your dependent variable of interest is affected by the independent variable, and determines whether the observed effect is important enough to translate to the real world.Therefore, effect sizes should be interpreted alongside your significance results.The two main types of effect size include Cohen’s d, which indexes the size of the difference between two groups in units of standard deviation. For Cohen’s d, a score of 0.2 is a small effect, 0.5 is a medium effect, and 0.8 is a large effect. The other effect size is eta-squared, with measures the strength of the relationship between two variables. For eta-squared, a score of .05 is a weak effect, .10 is a medium effect, and .15 is a strong effect. Both of these effect sizes can be calculated by hand, or you can ask for it to be calculated for you as part of statistics software.

Helpful References:

Australian Bureau of Statistics (2011). Significance Testing.