Chi-squared Test of Independence

It is a non-parametric test to determine if there is a significant relationship between two categorical variables. The frequency of one variable is compared with frequency of second variable .

Contact at TJT@TechnicalJockey.com , if you are looking for an Instructor Based Online Training or Corporate Training in R !

The assumptions of chi-square test are :

1. The data in the cells should be frequencies or counts of cases.

2. The levels of the variables are mutually exclusive .

3. Each subject may contribute data to one and only one cell .

4. The groups must be independent . There is no interdependency between groups while comparing the groups.

5. The variables should be categorical or we can change data in categorical form .

6. The sample data are displayed in a contingency table , the expected frequency count for each cell of the table is at least 5.

Expected frequencies :

The expected frequency is calculated for each cell in a contingency table. The expected frequency is calculated as :

 E = nr X nc /n

Where

E - represents the cell expected value,

nr - represents the total number of sample observations for row for level r

nc - represents the total number of sample observations for column for level c

n - represents the total sample size

Contact at TJT@TechnicalJockey.com , if you are looking for an Instructor Based Online Training or Corporate Training in R !

 

Test statistic :

The test statistic of the chi-square test is -

χ2 = ∑ (O-E)2/ E

Where

O - Observed value

E - Expected value

χ2 - the chi-square value

∑ - Calculate summation of all values in cell

Null hypothesis : Assumes that there is no association between the two variables .

Alternative hypothesis : Assumes that there is an association between the two variables.

If p-value > 0.05 , then null hypothesis is true. If p-value is less than 0.05 then alternative hypothesis is true.

 

Degrees of freedom :

The number of degrees of freedom can be defined as the minimum number of independent coordinates that can specify the position of the system completely.

The degrees of freedom , df = (Number of rows -1) X (Number of columns - 1)

We will use housetasks  data set from STHDA .

We import dataset using online link.

file_path <- "http://www.sthda.com/sthda/RDoc/data/housetasks.txt"

We import dataset by using read.delim() function.

housetasks <- read.delim(file_path, row.names = 1)

We are installing "gplots" library for visualization.

install.packages("gplots")

We load "gplots" library using following code:

library("gplots")

We want to create a table format to store the dataset . To convert dataset into a table , we used as.matrix() function to convert in matrix form and then convert matrix into a table format by using as.table() function on it.

dt <- as.table(as.matrix(housetasks))

We transform dt table to represent rows values corresponds to values in table .

t(dt)

We are using baloonplot to plot data in a dot form. In this plot , dot is bigger if the value of the variable is larger. We used label = FALSE  to not show the values of the elements on the plot. We used show.margins = FALSE to not print the total sum of rows and columns in the plot.

Contact at TJT@TechnicalJockey.com , if you are looking for an Instructor Based Online Training or Corporate Training in R !

 

balloonplot(t(dt), main ="housetasks", xlab ="", ylab="", label = FALSE, show.margins = FALSE)

We are installing "graphics" library for advanced visualization. We load "graphics" library as:

install.packages("graphics")

library("graphics")

We are using mosaicplot to plot the work associated with Husband and Wife .

The argument shade is used to color the graph

The argument las=2 produces vertical labels.

mosaicplot(dt, shade = TRUE, las=2, main = "housetasks")

 

From this plot , we can see that housetasks Laundry, Main_meal , Dinner and breakfast(blue color) are mainly done by the wife .

The chi-square test can be done as :

chisq <- chisq.test(housetasks)

chisq

Output :

Here , X-squared = 1944.5 means chi-square value is 1944.5 and the degrees of freedom is 36. The p-value is less than  2.2e-16

We can see observed frequency by using following code :

chisq$observed

We can see expected frequency by using following code :

round(chisq$expected,2)

Pearson residual  

The Pearson residuals can be used to check the model fit at each observation for generalized linear models. The Pearson residual for a cell in a two-way table is :

r = O - E / √ E

We can calculate residuals by following code :

round(chisq$residuals, 3)

The chi-square statistic is the sum of the contributions from each of the individual cells.

If an individual contribution is high, it is either because the expected value is low or the difference between the observed and the expected is reasonably high. If the independent variable has more than two values, you might like to consider whether the distinction between a specific value and all the others would be significant. 

We can see chi-square vale as :

We can also find contribution of each combination of pairs in chi-square test.

It is the ratio of squared residual value and chi-square value.

contrib <- 100*chisq$residuals^2/chisq$statistic

round(contrib, 3)

Contact at TJT@TechnicalJockey.com , if you are looking for an Instructor Based Online Training or Corporate Training in R !

 

 

Comments

There are no entries yet.
Please enter the code
* Required fields

Subscribe to our mailing list

* indicates required

Looking for Corporate Training ? Reach out to us at Akriti.Lal@instrovate.com

Reach out to us if you are looking for Corporate Training to Build The Next Generation Analytical Workforce with an in-depth understanding of  Exploratory Data Analysis , Data Visualisation, Data Analytics , AI First , Machine Learning & Deep Learning Training & Consulting helping them to take Data Informed Decision at each stage of the business. 

We understand that At the present times , the Entire Industry is in a Tranformation stage with the Softwares  being rebuilt with Artificial Intelligence Capabilities .

We need SMART WORKFORCE for the SMART SOFTWARES to reap the maximum return . 

 

Whatsapp at +91-9953805788 or email at - akriti.lal@instrovate.com if you would like to know more . 

 

  • Corporate Tableau Training in Gurgaon
  • Corporate Data Analytics Training in Gurgaon
  • Corporate Microsoft Power BI Training in Gurgaon
  • Corporate Microstrategy Training in Gurgaon
  • Corporate Google Data Studio Training in Gurgaon
  • Corporate Python Training in Gurgaon
  • Corporate Advance Analytics in R Programming Training in Gurgaon
  • Corporate Machine Learning Training in Gurgaon
  • Corporate Deep Learning Training in Gurgaon
  • Corporate Data Visualization Training in Gurgaon

 

Address

Instrovate Technologies

Noida, Gurgaon

 

akriti.lal@instrovate.com 

Hit Your Refresh Button To Rise Higher
Print Print | Sitemap
All rights reserved @ Instrovate Technologies
GSTIN : 09AAECI6862K1Z1

Terms Of Services    Privacy Policy    Disclaimer     Refund Policy    


Call

E-mail