Data Bias
Categorical Variables - Chi-Sqaure
Hypotheses
H_0: tested variables are independent (No Bias)
H_a Bias: tested variables are not independent (Bias)
# create a crosstab of variables to be tested
crosstab = pd.crosstab(df['sex'], df['class'])
from scipy import stats
from scipy.stats import chi2_contingency
stats.chi2_contingency(crosstab)
The first value is the Chi-square value, followed by the p-value, then comes the degrees of freedom, and lastly it outputs the expected frequencies as an array.
For the Chi-square test assumes that expected frequencies will be greater or equal to 5. If a cell has an expected frequency less that 5, then the Fisher’s Exact test should be use to overcome this problem.
Reject the null hypothesis if the p-value is less than 0.05 and concludes that the results indicate that there is a relationship between the two variables.