Categories
Economics

Statistics 101 – Inference and Hypothesis Testing (Part 1 of 3)

As a generalist consultant you are unlikely to need any statistics for day-to-day project work (there are specialists to call on for situations where it’s needed). The workaday numerical tool is Excel which, with the “Analysis Toolpak”, gives most consultants more than enough of what they need.

However, you are likely to build a lot of financial models (again using Excel). That is, models which culminate in a P&L for a business unit, projecting this out into the future, and then messing around with assumptions on drivers of value (market demand, pricing sensitivities, COGS, SG&A). Typically, strategies are then evaluated with standard approaches (e.g. NPV of the project vs NPV of the next best alternative) and sensitivities are run on the results.

Nonetheless, statistical knowledge can come in handy and it’s valuable to possess the skillset in order to understand the concepts and be able to apply them to your business scenarios.

Whenever we observe data, we are usually observing one or a few samples from a much larger population. For example, if we are looking at daily stock market returns for GOOGL in 2018, we are looking at a certain period. Often we try to understand the characteristics of the underlying population, all of the daily stock market returns, from the sample that we collected. The process that we follow to infer those characteristics is called statistical inference.

To draw conclusions about the underlying population, we must first define a few common terms:

  • Mean – the average value in a sample ($latex \bar{x} $) or population ($latex \mu $)
  • Variance ($latex \sigma^{2} $) – a measure of how far observations are spread out from the mean in a sample or population (technically, the average squared difference from the mean)
  • Standard Deviation – the square root of the variance ($latex \sigma $)
  • Probability Distribution – a distribution of all values in a sample or population (e.g. Normal distribution, Log-normal distribution, and exponential distribution). It describes the likelihood that an observation will take certain values within a given range. This range will be bounded between the minimum and maximum possible values.

Distribution of Sample Means and the Confidence Interval

When we draw a sample from the population, we usually do not know the shape of the underlying distribution. If we draw multiple samples from the population and plot the distribution of the means of each sample, we will get the distribution of sample means. This distribution has three special properties:

  1. The distribution is always normal, regardless of the shape of the underlying population
  2. The mean of the distribution is the mean of the population
  3. The standard deviation of the distribution is called the standard error

However, getting enough samples to create this distribution is time consuming. To address this issue, we can estimate the distribution of sample means from one sample by the following rule:

$latex Standard Error = \dfrac{Standard Deviation}{Square Root Of Sample Size} $

$latex \sigma_{\bar{x}} = \dfrac{\sigma}{\sqrt{n}} $

We also need to make a few assumptions:

  • The sample is randomly drawn
  • The observations are independent
  • The sample size is sufficiently large, n > 30 is a rule of thumb

Knowing the standard error and the fact that the distribution is normal, we then only need to know the mean in order to build the distribution. For example, suppose that the average daily return of GOOGL in 2018 was -3.57%, and the standard deviation of the daily returns is 1%. We can calculate the standard error to be roughly 1% / sqrt(300) (300 business days when GOOGL’s stock traded during the year). Based on the normal distribution, we can be 95% sure that the sample mean ($latex \bar{x} $) is within a certain distance (1.96 times the standard error ($latex \sigma_{\bar{x}} $)) of the true population mean ($latex \mu $):

$latex \mu – 1.96\sigma_{\bar{x}} \le \bar{x} \le \mu + 1.96\sigma_{\bar{x}} $

If we rearrange this equation, we can similarly infer that the true population mean ($latex \mu $) is within a certain distance (1.96 times the standard error ($latex \sigma_{\bar{x}} $)) of the sample mean ($latex \bar{x} $) with a 95% confidence level:

$latex \bar{x} – 1.96\sigma_{\bar{x}} \le \mu \le \bar{x} + 1.96\sigma_{\bar{x}} $

$latex -3.57\% – 1.96\dfrac{1\%}{\sqrt{300}} \le \mu \le -3.57\% + 1.96\dfrac{1\%}{\sqrt{300}} $

$latex -3.68\% \le \mu \le -3.46\% $

The range described above is known as the 95% confidence interval because the confidence interval will contain the true population mean 95% of the time.

Hypothesis Testing

Understanding the true population is important, but we can also gain insights by examining the relative difference between two sets of data. To test for the existence and significance of a difference, we use hypothesis testing, which is an extension on what we did above.

The 5 steps to hypothesis testing are as follows:

  1. Form a good hypothesis – this will be driven by the problem you are trying to solve
  2. Formalize the hypothesis – this will define your test and involves stating a null hypothesis (your default assumption about the data) and an alternative hypotheses (a statement that you will try to test against the null hypothesis). The two hypotheses are always stated in a way that makes them mutually exclusive. That is, if one is true, the other is false
  3. Find a test statistic – this can be a z-score or a t-score
  4. Analyze the data – find the value of the test statistic and the p-value. The p-value is the probability of obtaining an observed result at least that far away from the null hypothesis, if the null hypothesis is true. You can use the p-value to tell you whether the difference of your result from the null hypothesis is significant.
  5. Interpret the results – this is a written explanation of your conclusion. Explain whether you are able to reject the null hypothesis based on the statistical evidence.

For example, if we want to see whether GOOGL returns were different from bank interest (say 2%) last year we could follow the five steps outlined above:

  1. Step 1: Our hypothesis is that GOOGL daily returns are on average different from 2%.
  2. Step 2:
    • Null hypothesis ($latex H_{0} $) is what we are looking to reject and is always an equivalence: that the average returns are the same as 2%.
    • Alternative hypothesis ($latex H_{A} $) would then be the alternative: that the average returns are not the same as 2%.
  3. Step 3: As a general rule of thumb, use a t-score when your sample size is below 30 and has an unknown population standard deviation. If you know the standard deviation of the population and your sample size is above 30, then use the z-score.
  4. Step 4: To find the test statistic, we would use

$latex Z = \dfrac{\bar{x} – \mu}{\frac{\sigma}{\sqrt{n}}} = \dfrac{-3.57\% – 2.00\%}{\frac{1\%}{\sqrt{300}}} = -96.48 $

This step is typically automated by either Excel, or a software like R. The test statistic will come out as a z-score, which is the number of standard deviations between the null hypothesis and the sample mean. We can compare this z-score to the critical value (usually 1.96 if we are using the 95% confidence level) to determine statistical significance, but we usually just look at the p-value.

Excel or another software package will normally provide the p-value. It represents the probability of obtaining a sample mean that is even farther away from the null hypothesis. For example, a p-value of 0.02 would mean that 2% of the population data is to the left of -3.57% (GOOGL’s average daily return). We would then compare this p-value against a significance threshold called alpha. If the p-value is lower than alpha/2 (this is for a two tailed test), then we can say that there is strong evidence that the null hypothesis is invalid and reject the null hypothesis. If we have a larger p-value than alpha/2, it means the alternative hypothesis is weak, so we do not reject the null hypothesis.

  1. Step 5: We can then translate the statistical jargon into plain English. Based on the tests, the returns from GOOGL last year were on average different from the returns obtained from bank interest (2%).

One-tailed or two-tailed test?

There are two types of hypothesis tests, we just described the two-tailed test. The alternative is the one-tailed test, where we would see if the sample mean is greater than (or less than) the hypothesized value. In the one-tailed test it is easier to reject the null hypothesis at the same significance threshold (alpha), as the rejection zone is double the one used in the two-tailed test.

As the two-tailed test is more conservative, it should be used unless:

  • We have strong outside evidence (business reasons or practical constraints) that the true mean can only deviate in one direction. For example, if we are testing to see if the returns on a high-yield bond, on average, are higher than the returns on a government bond, there is justification for a one-tailed test, or
  • We only care about deviation in one direction, if the deviation was in the other direction it would be treated as if there was no deviation.

[If you would like to download our Brief Guide to Statistics, please click here.]

Jason Oh is a management consultant at Novantas with expertise in scaling profitability for retail banks (consumer / commercial finance) and diversified financial service firms (credit card / wealth management / direct bank).

Image: Pixabay

🔴 Interested in consulting?

Get insights on consulting, business, finance, and technology.

Join 5,500+ others and subscribe now by email!


🔴 Interested in consulting?

Follow now on LinkedIn.

Leave a Reply

Your email address will not be published. Required fields are marked *