Using SPSS to Understand Research and Data Analysis.

  • 7.1b The logic of the Chi Square statistic

The preceding discussion suggests that the two variables, gender and masc, are not independent, i.e., there is a relationship between one's gender and one's masculinity level. We will test this hypothesis by examining the frequencies obtained in a crosstabulation of the variables.

To illustrate how we could determine whether or not two variables are related using a crosstabulation table, we will simplify things for the moment and assume that there are only 50 men and 50 women in our sample who have been classified as either low or high in masculinity. (in our ezdata file, there are actually 110 men and 118 women).

The crosstabulation table of this example would have four cells that comprise a matrix of the four possible combinations of the two levels of the two variables. The data in the cells would be the frequencies (i.e., the number) of men and women who were classified as either low or high in masculinity.

The table would present these four cells as a 2 x 2 contingency matrix. A contingency matrix classifies individuals into a given cell contingent upon their exhibiting a particular combination of one level of the first variable combined with one level of the second variable (e.g., being both a man and in the high masculinity category).

If there is no relationship between gender and masc (i.e., if they are independent), then the printout would show just as many low-masculine men as high-masculine men, and the same would hold for female employees. Further, among high-masculinity employees, there would be an equal frequency of men and women, and the same would be true for low-masculinity employees. In other words, the 100 men and women would distribute themselves evenly across the four cells Thus, the crosstabulation of frequencies would show equal frequencies in all four cells of this contingency table (25 per cell). as shown in Table 10.1.

Table 7.1
   
Gender
 
   
Male
Female
Total
Masculinity
Low-masculine
25
25
50
High-masculine
25
25
50
 
Total
50
50
100

In looking at this hypothetical table, we can see that, indeed, there is no pattern of frequencies beyond what would be expected by chance, indicating that the two variables are independent of each other. Specifically, looking down the columns of Table 10.1, we see that among the total of 50 men, 25 are low-masculine and 25 are high-masculine. The same is true for women - 25 are low-masculine and 25 are high-masculine. Thus, from this hypothetical crosstabulation, we would conclude that men are equally-likely to be low-masculine or high-masculine, and so are women.

Further, looking across the rows, among the total of 50 low-masculine employees, 25 are men and 25 are women. And for the 50 high-masculine employees, 25 are men and 25 are women. Thus, we would conclude that men are equally likely as women to be low in masculinity, and that women are equally likely as men to be high in masculinity. Again, this is the pattern of frequencies that would be expected by chance alone if the two variables are unrelated, as is the case in this table.

However, if the hypothesized relationship really exists, then there would be a pattern of frequencies that is different from chance expectations. A hypothetical example of a pattern reflecting a real relationship between gender and masc is illustrated in Table 10.2.

Table 7.2
   
Gender
 
   
Male
Female
Total
Masculinity
Low-masculine
10
40
50
High-masculine
40
10
50
 
Total
50
50
100

It can be seen from the hypothetical data in Table 10.2 that there is a higher frequency of men who are high-masculine (40 out of 50) than men who are low-masculine (only 10 out of 50). Further, there is a higher frequency of women who are low-masculine (40 out of 50) than women who are high-masculine (only 10 out of 50). It can also be seen that among the 50 high-masculine employees, there is a higher frequency of men (40) than women (10). Last, among the 50 low-masculine employees, there is a higher frequency of women (40) than men (10).

Thus, this table shows that there is a clear relationship between an employee's gender and his/her masculinity level. Masculinity level varies systematically across gender, with men being more likely than women to be high-masculine and women being more likely than men to be low-masculine.

Of course, the above examples were made to be very clear cut. Bivariate frequency distributions often do not lend themselves to such an easy visual determination of whether the frequencies indicate that the two variables are related or not. Only by running the crosstabulation procedure and computing the appropriate statistical test could we answer the question about this hypothesized relationship between gender and masulinity. The statistical test of interest here is called Chi square.

As we will see, SPSS will compute a Pearson chi square value that will answer the question of whether the actual data in our ezdata file demonstrate a statistically significant relationship (i.e., a real one) between gender and masculinity, or if they will show that the two variables are statistically independent (i.e., any apparent pattern in frequencies is not real, but due to random chance).

Chi square is computed based on a comparison of actual frequencies observed in our sample to that which would be expected to occur by chance alone. If there is a large difference between the observed vs. the expected frequencies, a large value for Chi square will be obtained. More importantly, the probability associated with this Chi square value is computed. This value determines whether the chi square value is statistically significant. The general convention used by researchers is that if this probability is .05 or lower, then we reject random chance as an explanation, and conclude that this is a real (statistically significant) relationship.

Thus, Chi square is an inferential statistic - it allows us to make inferences from our sample to the population regarding the hypothesized relationship. This process begins with the assumption that the relationship is due to chance - this is called the null hypothesis (Ho). Based on the obtained probability, we either retain or reject the null hypothesis using the following decision rule:

  • If the probability is < or = .05, reject Ho and conclude the relationship is signficant
  • If the probability is > .05, retain Ho and conclude the relationship is due to chance.

Thus, if the obtained probability is less than or equal to .05, we would conclude that the pattern of frequencies discussed in the crosstabs table is a real one (not due to chance), indicating that masculinity level is, indeed, significantly related to gender.