# Basics of Statistics – Hypothesis

Share
Viewer Rating

Today, I want to discuss on a very important topic on inferential statistics area and that is “Hypothesis”. Hypothesis means a conclusion based on some evidence collected from data for a certain time. In statistics , we explain hypothesis in two ways: When we have something already defined, we shall put the assumption under null hypothesis and anything outside that assumption falls under alternative hypothesis.

For example: A water bottle manufacturer company has labelled the bottle having 350 ml as standard. Now, customers will be happy if the water bottle has an accurate filled water volume of 350 ml. If it is a little bit more, then also it is fine. But, from the manufacturer’s point of view, the water bottle must be filled accurately with 350 ml of purified water. Why ?

If the bottles are filled with less than 350 ml of water, then the company is cheating the customer which is not expected from a good and reputed company. But if the bottle is filled with water volume more than 350 ml, then company should compromise with their profit which is also not expected.

So, how do we solve this problem statistically? At first, we need to collect some good sample products with good amount of sample size in each batch. Next, we shall calculate the mean for the actual volume of products on each sample and plot a sampling distribution.

Now, let us consider H0 = null hypothesis ( = 350 ml). On the opposite side, H1 = alternative hypothesis (>350 ml or <350 ml). We now need to set the boundary of acceptance. Let us assume that 95% confidence interval (means whatever the mean of samples fall under the 95% area of the null hypothesis distribution curve boundary) will be considered as “NOT REJECTING null hypothesis” and whatever the mean samples fall outside the 95% area, will be considered as “REJECTING null hypothesis”. Null hypothesis curve having 95% confidence interval. The z value is 1.96 for 95% boundary region.

Now, suppose we find that the mean for a sample having size 50 is 350.5 ml. Logically, we can see that this value is under the region of 95% area (we can technically get the actual z value of the sample mean using z statistics but that is a different area of discussion). Hence, we cannot reject the null hypothesis in this case. We shall continue this practice for different amounts of sample or batch size of the product and apply hypothesis testing and find out the percentage of samples under null hypothesis among the total set of samples.

If we have a limit of acceptance of 99% confidence interval and our tested sample’s percentage is more than 99% under null hypothesis, then we can allow the product to run in the market without any customer complaint (means 99% samples out of total samples are under the curve area of null hypothesis having acceptance region of 95%). However, for the opposite case, if we find that the percentage of error is less than 99% then we should stop the production and marketing/distribution of that product and plan for future improvement.

I hope I was able to explain the concept of hypothesis in a simple way. Please share your comments if you have any suggestion. IT engineer, Machine Learning enthusiast and aqua hobbyist.
1. 