It is a nonparametric test to determine if there is a significant relationship between two categorical variables. The frequency of one variable is compared with frequency of second variable .
Contact at TJT@TechnicalJockey.com , if you are looking for an Instructor Based Online Training or Corporate Training in R !
The assumptions of chisquare test are :
1. The data in the cells should be frequencies or counts of cases.
2. The levels of the variables are mutually exclusive .
3. Each subject may contribute data to one and only one cell .
4. The groups must be independent . There is no interdependency between groups while comparing the groups.
5. The variables should be categorical or we can change data in categorical form .
6. The sample data are displayed in a contingency table , the expected frequency count for each cell of the table is at least 5.
Expected frequencies :
The expected frequency is calculated for each cell in a contingency table. The expected frequency is calculated as :
E = nr X nc /n
Where
E  represents the cell expected value,
nr  represents the total number of sample observations for row for level r
nc  represents the total number of sample observations for column for level c
n  represents the total sample size
Contact at TJT@TechnicalJockey.com , if you are looking for an Instructor Based Online Training or Corporate Training in R !
Test statistic :
The test statistic of the chisquare test is 
χ2 = ∑ (OE)2/ E
Where
O  Observed value
E  Expected value
χ2  the chisquare value
∑  Calculate summation of all values in cell
Null hypothesis : Assumes that there is no association between the two variables .
Alternative hypothesis : Assumes that there is an association between the two variables.
If pvalue > 0.05 , then null hypothesis is true. If pvalue is less than 0.05 then alternative hypothesis is true.
Degrees of freedom :
The number of degrees of freedom can be defined as the minimum number of independent coordinates that can specify the position of the system completely.
The degrees of freedom , df = (Number of rows 1) X (Number of columns  1)
We will use housetasks data set from STHDA .
We import dataset using online link.
file_path < "http://www.sthda.com/sthda/RDoc/data/housetasks.txt"
We import dataset by using read.delim() function.
housetasks < read.delim(file_path, row.names = 1)
We are installing "gplots" library for visualization.
install.packages("gplots")
We load "gplots" library using following code:
library("gplots")
We want to create a table format to store the dataset . To convert dataset into a table , we used as.matrix() function to convert in matrix form and then convert matrix into a table format by using as.table() function on it.
dt < as.table(as.matrix(housetasks))
We transform dt table to represent rows values corresponds to values in table .
t(dt)
We are using baloonplot to plot data in a dot form. In this plot , dot is bigger if the value of the variable is larger. We used label = FALSE to not show the values of the elements on the plot. We used show.margins = FALSE to not print the total sum of rows and columns in the plot.
Contact at TJT@TechnicalJockey.com , if you are looking for an Instructor Based Online Training or Corporate Training in R !
balloonplot(t(dt), main ="housetasks", xlab ="", ylab="", label = FALSE, show.margins = FALSE)
We are installing "graphics" library for advanced visualization. We load "graphics" library as:
install.packages("graphics")
library("graphics")
We are using mosaicplot to plot the work associated with Husband and Wife .
The argument shade is used to color the graph
The argument las=2 produces vertical labels.
mosaicplot(dt, shade = TRUE, las=2, main = "housetasks")
From this plot , we can see that housetasks Laundry, Main_meal , Dinner and breakfast(blue color) are mainly done by the wife .
The chisquare test can be done as :
chisq < chisq.test(housetasks)
chisq
Output :
Here , Xsquared = 1944.5 means chisquare value is 1944.5 and the degrees of freedom is 36. The pvalue is less than 2.2e16
We can see observed frequency by using following code :
chisq$observed
We can see expected frequency by using following code :
round(chisq$expected,2)
Pearson residual
The Pearson residuals can be used to check the model fit at each observation for generalized linear models. The Pearson residual for a cell in a twoway table is :
r = O  E / √ E
We can calculate residuals by following code :
round(chisq$residuals, 3)
The chisquare statistic is the sum of the contributions from each of the individual cells.
If an individual contribution is high, it is either because the expected value is low or the difference between the observed and the expected is reasonably high. If the independent variable has more than two values, you might like to consider whether the distinction between a specific value and all the others would be significant.
We can see chisquare vale as :
We can also find contribution of each combination of pairs in chisquare test.
It is the ratio of squared residual value and chisquare value.
contrib < 100*chisq$residuals^2/chisq$statistic
round(contrib, 3)
Contact at TJT@TechnicalJockey.com , if you are looking for an Instructor Based Online Training or Corporate Training in R !

