Hypothesis Testing

Hypothesis testing is a statistical method for determining whether or not a given hypothesis is true. A hypothesis can be any assumption based on data.

Hypothesis testing is a statistical method for determining whether a given hypothesis is true. A hypothesis can be any assumption based on data. We can apply hypothesis testing to find the better-performing version of the product. We will formulate the hypothesis as one version being null and the other being alternate.

The assumption behind a null hypothesis stands for whether the assumption is more likely to occur or not.

Tests for hypotheses are typically done for two reasons.

  1. Prove an established process or truth with hypothesis.

Or

  1. Use a null hypothesis to determine the truth between two statements.

For example :

Consider a webpage as an example. It normally gets five minutes of user session duration on average. Furthermore, we decided to make some changes in order to increase user session duration. After the changes, in 1000 observed sessions average user session increased by 10 minutes on average.

So here the null hypothesis would be

H0: Average sessions duration is the same after the change.

Alternate hypothesis is.

Ha: Average sessions duration increased after the change.

In a table of probability distributions, we need to know that the values provided under the null hypothesis do not go past a certain edge that refers to the final acceptable observation under Hâ‚€. That’s what the value of statistical significance is. 

You may understand the probability distributions tables from following link

https://www.accessengineeringlibrary.com/content/book/9780071432085/back-matter/appendix1

Statistical Significance

When data gathered under hypothesis is normally distributed. That means according to probability of individual observation’s occurrence.

Statistical significance is a threshold, a minimum, or a maximum from a probability distribution that shows the peak of the most probable values from an observation. 

The null statement in the hypothesis test refers to observations that are considered the norm or most common. This is why we define a border value that separates the most and least probable values in the observations.

Dg0KFf0at6DjoW8d 4FLOs83n5fPD XhnHY6xCJtfn3rVMEFsKar85nZ1Qmm4 A9I4idTn VFcrMkUkxK8cRvRI824Yi

When the data is collected according to hypotheses. We need some standards to test it against. We require a threshold to consider in order to reject or suggest a null hypothesis. The threshold for observations that are most likely to support the null hypothesis can be discovered using a probability chart.

Confidence, or confidence level, can be used to represent statistical significance.

In Python stats model library, Excel, or SPSS, you might find a ‘conf’ return value in statistical tests.

P-value

A probability chart can be used to determine the most likely number of observations to support the null hypothesis.

So we calculate the probability related to the alternate hypothesis. In short, the p-value is the probability of getting the least likely observations. If the p value is in an alternate hypothesis, it can be used to promote the alternate hypothesis.

A persona is an ideal type of audience that’s the focus of a marketing group for a product. It is very important to understand who you’re selling to and when.

For example:

A smartphone company is about to launch its 5G phone around Diwali. Before launching the new 5G model, it needs to empty out the unsold stocks. So the cellphone company is selling stock in this festival. So what kind of audience would be considered to sell the stocks, and what time should be right to get the new phones out in the market?

There can be two hypothesis tests here, one for audience and one for timing of the sale

1st test

Hypothesis Formulation

H0 : The null hypothesis is to sell the remaining stock before Diwali begins. To avoid drawing attention away from new stock.

(Common practice in market)

Ha : New stock will still sell if we sell old stock at discount

Data Gathering

Steps further from here would be gathered sales data from previous year where similarly old stocks were sold. In this case, we can’t use current data.

2nd Test

We can use the low-budget buyer persona. And take specific steps to get the audience’s attention under that persona.

The company can focus on these personas when buying the old stock under various discount schemes.

Many e-commerce websites have discount sales before big festivals. For example, the Great Amazon Festival and Flipkart’s Big Billion Days.

Hypothesis Formulation

H0 : Set ads focused at discounted prices for under budget buyer persona.

Ha : focused ads won’t change the sale numbers.

Data Gathering

Gather the data about audiences that clicked the ads and bought. Clicked and bought should be different groups.

Types of Tailed Test

  • One Tailed test
  • Two Tailed test

One tailed test

One-tailed tests have one direction. Where either the alternate hypothesis sample mean is larger or smaller than the mean. Âµ0 µa

image 31

Two-tailed test

Two-tailed tests cover multiple directional comparisons, as the alternate sample mean is considered different from the null hypothesis mean.

image 30

Types of Sampled Test

  • One sampled test
  • Two sampled test

One Sample test

With one sample test, you have only one population of data. And you test within the population groups for differences between means.

Two Sample test

Two sample tests have data from two samples. We compare between multiple population means for differences for hypothesis testing.

Types of Hypothesis Tests

T-test

Assumptions

  • In the T-test, we compare means of two populations.
  • Standard Deviation is unknown.
  • Sample size is less than or equal to 30.
  • Data is not necessarily distributed.
  • Default null hypothesis is no relation between samples
  • Data is normally distributed

Z-test

Assumptions

  • Standard Deviation is known
  • Data sample is from independent normal distribution.
  • Data sample can be greater than 30.
  • In z test we calculate the difference between two means.

How useful was this post?

Click on a star to rate it!

  • ANCOVA: Analysis of Covariance with python

    ANCOVA is an extension of ANOVA (Analysis of Variance) that combines blocks of regression analysis and ANOVA. Which makes it Analysis of Covariance.

  • Learn Python The Fun Way

    What if we learn topics in a desirable way!! What if we learn to write Python codes from gamers data !!

  • Meet the most efficient and intelligent AI assistant : NotebookLM

    Start using NotebookLM today and embark on a smarter, more efficient learning journey!

  • Break the ice

    This can be a super guide for you to start and excel in your data science career.

  • Manova Quiz

    Solve this quiz for testing Manova Basics

  • Quiz on Group By

    Test your knowledge on pandas groupby with this quiz

  • Visualization Quiz

    Observe the dataset and try to solve the Visualization quiz on it

  • Versions of ANCOVA (Analysis Of Covariance) with python

    To perform ANCOVA (Analysis of Covariance) with a dataset that includes multiple types of variables, you’ll need to ensure your dependent variable is continuous, and you can include categorical variables as factors. Below is an example using the statsmodels library in Python: Mock Dataset Let’s create a dataset with a mix of variable types: Performing…

  • Python Variables

    How useful was this post? Click on a star to rate it! Submit Rating

  • A/B Testing Quiz

    Complete the code by dragging and dropping the correct functions

  • Python Functions

    Python functions are a vital concept in programming which enables you to group and define a collection of instructions. This makes your code more organized, modular, and easier to understand and maintain. Defining a Function: In Python, you can define a function via the def keyword, followed by the function name, any parameters wrapped in parentheses,…

  • Python Indexing: A Guide for Data Science Beginners

    Mastering indexing will significantly boost your data manipulation and analysis skills, a crucial step in your data science journey.

  • Diffusion Models: Making AI Creativity

    Stable Diffusion Models: Where Art and AI Collide Artificial Intelligence meets creativity in the fascinating realm of Stable Diffusion Models. These innovative models take text descriptions and bring them to life in the form of detailed and realistic images. Let’s embark on a journey to understand the magic behind Stable Diffusion in a way that’s…

Leave a Reply

Points You Earned

Untitled design 6
0 distinction_points
Untitled design 5
python_points 0
0 Solver points
Instagram
WhatsApp
error: Content is protected !!