Loglinear Model Analysis in SAS and SPSS

 

Overview

Indentifiabilty Constraints

       Structural and Sampling Zeros; default options for cells with zero counts

 In SPSS

 In SAS

Examples

       Data Set

SAS, PROC CATMOD

SAS, PROC GENMOD

SPSS

References

 

 

Loglinear models are used to analyze categorical data. They model the means of cell counts in contingency tables by describing the association patterns among a set of categorical variables without specifying any variable as a response (dependent) variable. Quoting from reference (3), p.166,  “…, loglinear models are most natural when at least two variables are response variables. When only one variable is a response, it is more sensible to use logit models directly.” Logit models correspond to loglinear models when there is one response variable and it is easier to analyze and interpret results of a logit model, particularly when the response variable is binary.

 The term loglinear comes from the form of the model; the natural logarithm of cell counts is modeled as a linear function of the effects of categorical variables and their interactions. For example, suppose that we want to investigate relationships between three categorical variables, X, Y and Z, where X has I categories, Y has J categories and Z, K categories.  Then the full (saturated) loglinear model is

 

 log(mijk) = λ + λiX  + λjY +  λkZ +  λijXY +  λikXZ +  λjkYZ + λijkXYZ,

 

for each combination of the levels i=1,2,…, I , j=1, 2, …, J and k=1, 2, …, K, of the categorical variables X, Y and Z. In many situations, simpler models, containing a subset of the parameters from a saturated model may be adequate. For example,

            (i) If all three categorical variables are mutually independent, then the following, much simpler model will describe the relationships between X, Y and Z:

 

             log(mijk) = λ + λiX  + λjY +  λkZ ,

 

            (ii) If  X and Z are associated but both are independent of Y, then the following model will describe the relationships between X, Y and Z:

 

             log(mijk) = λ + λiX  + λjY +  λkZ  +  λikXZ ,

 

            (iii) If  X and Y are conditionally independent, that is, X and Y are independent if we control for Z, then the following model will describe the relationships between X, Y and Z:

 

            log(mijk) = λ + λiX  + λjY +  λkZ  +  λikXZ +  λjkYZ ,

 

           (iv) If there is no three-factor interaction (a no three-factor interaction or homogeneous association model) then the model is

 

             log(mijk) = λ + λiX  + λjY +  λkZ +  λijXY +  λikXZ +  λjkYZ ,

 

  and it implies that the conditional odds ratios between any two variables are the same at each level of the third variable.

 

     The examples of log-linear models given above, (i) – (iv), are called hierarchical models. Hierarchical models include all lower order terms composed from variables in the highest terms in the model.

 

     The parameters of log-linear models are usually estimated by the maximum likelihood (ML) method. Poisson or multinomial distributions of the cell counts are most commonly used in the log linear model analysis. The ML estimates are identical for both of these distributions. They are also the same for the negative multinomial distribution.  

 

     The parameters of a log-linear model can be estimated by ML using PROC GENMOD or PROC CATMOD in SAS, or the LOGLINEAR option on the ANALYZE menu in SPSS. However, there are certain important technical differences between these procedures. Proper interpretation of results depends on understanding these differences.

 

(i)                  Identifiabilty Constraints

 

 In order for log-linear models to be identifiable, that is, to have unique parameter estimates, certain constraints have to be imposed on the parameters.   The constraints used in PROC GENMOD in SAS and in the ANALYZE, LOGLINEAR in SPSS are the same as in the analysis of variance procedures (ANOVA, GLM, MIXED in SAS, General Linear Model in SPSS). That is, the parameter corresponding to the last category of a variable is set to zero, which is equivalent to defining a dummy variable for each category and leaving the last category out, as a reference. For the interactions, parameters for categories that involve the last category of any constituting variable are also set to zero. In PROC CATMOD, the constraints are different; they require that the sum of parameters over all categories of each variable be zero. For example, for a 2 X 2 table, that is, two variables, X and Y, both having two categories:

 

      λ1X + λ2X = 0,

      λ1Y + λ2Y = 0,

      λ11XY + λ12XY = 0,

      λ21XY + λ22XY = 0,

      λ11XY + λ21XY = 0,

      λ12XY + λ22XY = 0,

 

which implies that  λ1X = -λ2X,  λ1Y = -λ2Y and  λ11XY = -λ21XY = -λ12XY = λ22XY . It means that one parameter will be estimated for X (corresponding to the first category), one parameter for Y (also corresponding to the first category) and one for the interaction of X and Y (corresponding to the first categories of X and Y). The values of the remaining parameters can be determined from the equations above.

 

(ii)                Structural and Sampling Zeros; default options for cells with zero counts.

 

     By default, cells with zero counts are treated as sampling zeros in SPSS, but as structural zeros in PROC CATMOD in SAS. The explanation for PROC GENMOD in SAS is given below. To change the default options, do the following:

 

1.      In SPSS

 

To make SPSS treat cells with zero counts as structural zeros, you need to create a new variable that will indicate whether the cell is a structural zero (0) or not (1). If your data is in a case-by-case form then a new, aggregated data set will need to be created; it will list all combinations of levels of the categorical variables and will include a count variable (number of observations in each cell). Then select Analyze, Loglinear, General and in the General Loglinear Analysis window, enter the new variable in the Cell Structure box.

 

2.      In SAS

 

PROC CATMOD: To make PROC CATMOD in SAS treat cells with zero counts as sampling zeros, you need to change the cell count variable from 0 to a small value, such as 1E-20 (10 to the power of -20).  If your data is in a case-by-case form then a new, aggregated data set will need to be created; it will list all combinations of levels of the categorical variables and will include a count variable (number of observations in each cell). The aggregation can be done with PROC FREQ using the SPARSE and OUT=  options. For example, suppose that we want to study the associations between three categorical variables, X, Y and Z, and the data is in a case-by-case form. You can use the following program, first to create an aggregated data set (the output of proc freq), then to change the cell count variable (called count in the example) and finally run proc catmod.

 

      proc freq data=one;

      tables x*y*z /sparse out=combos;

      run;

      proc print data=combos;

      run;

 

      proc freq data=Raw;

      tables X*Y*Z/ sparse out=Combos noprint;

      run;

 

      data Combos2;

      set Combos;

         if count=0 then count=1e-20;

      run;

 

      proc catmod data=Combos2;

      weight count;

      model X*Y*Z=_response_

            / freq pred=freq noparm noresponse;

      loglin X Y Z X*Y;

   quit;

 

PROC GENMOD: To run loglinear model analysis in PROC GENMOD, data has to be in the aggregated form. There has to be a cell count variable, which will be used as a dependent variable in the model statement. If all possible combinations of categories of independent variables are listed, with the count variable taking value zero for the empty cells, then the zeros will be treated as sampling zeros. If only non-zero cells are included in the data set (empty cell are deleted from the data set), then the empty cells are treated as structural zeros. For example, let data one be a data set consisting of all cells and containing a variable count taking value 0 for empty cells. Then in the first part of the following program, proc genmod zero cells are treated as sampling zero and in the second, after the data set was modified by deleting zero cells, as structural zeros.

 

      proc genmod data=one;

      class X Y Z;

      model count = X Y Z X*Y/dist=poisson obstats type3;

      run;

 

      data one1;

      set one;

      if count=0 then delete;

      run;

      proc genmod data=one1;

      class X Y Z;

      model count = X Y Z X*Y /dist=poisson obstats type3;

      run;

 

         Examples

 

      We will use the following data set to show how to run the loglinear model analysis in SPSS and in SAS and how to interpret results. We will analyze the associations between three categorical variables, X with 2 categories (1 and 2), Y with 3 categories (1, 2, and 3) and Z with four categories (1, 2, 3 and 4). The variable named count contains the cell frequencies for all combinations of categories of X, Y and Z.

 

x y z count

 

1 1 1 13

1 1 2 12

1 1 3 25

1 1 4 27

1 2 1  3

1 2 2  8

1 2 3 10

1 2 4 12

1 3 1  7

1 3 2  9

1 3 3 17

1 3 4 14

2 1 1 12

2 1 2 17

2 1 3  8

2 1 4 10

2 2 1 13

2 2 2 17

2 2 3 10

2 2 4 13

2 3 1 14

2 3 2  3

2 3 3 14

2 3 4 17

 

1.      SAS, PROC CATMOD

-         Program

 

options ls=78;

data one;

input x y z count;

cards;

1 1 1 13

*the data set from above goes here;

2 3 4 17

;

run;

proc catmod data=one;

weight count;

model x*y*z =_response_

 / noparm noprofile noiter noresponse oneway;

loglin x| y| z;

run;

proc catmod data=one;

weight count;

model x*y*z =_response_

 / noparm noprofile noiter noresponse oneway;

loglin x| y| z @2;

run;

proc catmod data=one;

weight count;

model x*y*z =_response_

 / noprofile noiter noresponse oneway pred=prob;

loglin x y z x*y x*z;

run;

 

 

-         Interpretation of Results

 

The output of the first proc catmod with the loglin statement loglin x| y| z; (the saturated model, main effects and all interactions are included) contains the following table.

 

                   Maximum Likelihood Analysis of Variance

 

              Source               DF   Chi-Square    Pr > ChiSq

              ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

              x                     1         0.00        0.9847

              y                     2         7.14        0.0281

              x*y                   2        11.15        0.0038

              z                     3         9.66        0.0217

              x*z                   3         9.88        0.0196

              y*z                   6         8.71        0.1906

              x*y*z                 6         9.60        0.1427

 

              Likelihood Ratio      0          .           .

 

The "Maximum Likelihood Analysis of Variance" table displays significance tests for each effect in the specified model. The Chi-Square test for each effect is a Wald test based on the information matrix from the likelihood calculations. The Likelihood Ratio statistic at the bottom is a goodness-of-fit test for the model. It compares the specified model with the saturated model and is equal to -2 times the difference of the log likelihoods for the specified and the saturated models. Since in this example the specified and the saturated models are the same, the difference is 0.

            The Wald test for x*y*z does not indicate significance of the third order interaction. We can remove it from the model and rerun proc catmod with the loglin statement  loglin x| y| z @2; which specifies a model with the main effects and all possible interactions of order 2. Here is the "Maximum Likelihood Analysis of Variance" table for this model.

 

                Maximum Likelihood Analysis of Variance

 

              Source               DF   Chi-Square    Pr > ChiSq

              ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

              x                     1         0.28        0.5967

              y                     2         7.95        0.0188

              x*y                   2        11.59        0.0030

              z                     3         9.65        0.0218

              x*z                   3        11.34        0.0100

              y*z                   6         7.72        0.2591

 

              Likelihood Ratio      6        10.67        0.0990

 

Since the Likelihood Ratio statistic compares the model without the third order interaction (x*y*z) with the saturated model, it is the likelihood ratio test for the significance of x*y*z, that is, it tests the same hypothesis as the Wald test for x*y*z in the previous  "Maximum Likelihood Analysis of Variance" table (for the saturated model).

The Wald test indicates that the y*z interaction is not significant. We will remove it from the model and test if the model of conditional independence of Y and Z fits the data.

 

          Maximum Likelihood Analysis of Variance

 

              Source               DF   Chi-Square    Pr > ChiSq

              ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

              x                     1         0.25        0.6193

              y                     2         6.82        0.0330

              z                     3         8.47        0.0372

              x*y                   2        11.44        0.0033

              x*z                   3        11.21        0.0107

 

              Likelihood Ratio     12        19.18        0.0844

 

The model fits the data reasonably well, as indicated by the Likelihood Ratio test (p-value = 0.0844).  Since the third proc catmod specifications do not include the option noparm in the model statement, the output includes also a table containing parameter estimates. 

 

                 Analysis of Maximum Likelihood Estimates

 

                                    Standard        Chi-

        Parameter        Estimate      Error      Square    Pr > ChiSq

        ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

        x         1       -0.0303     0.0610        0.25        0.6193

        y         1        0.2000     0.0796        6.30        0.0120

                  2       -0.1636     0.0871        3.53        0.0603

        z         1       -0.2071     0.1103        3.53        0.0603

                  2       -0.1176     0.1058        1.23        0.2667

                  3        0.1018     0.0991        1.05        0.3044

        x*y       1 1      0.2470     0.0796        9.62        0.0019

                  1 2     -0.2367     0.0871        7.39        0.0066

        x*z       1 1     -0.2634     0.1103        5.71        0.0169

                  1 2     -0.1212     0.1058        1.31        0.2521

                  1 3      0.2434     0.0991        6.03        0.0141

 

            The parameter estimates in the table can be used to compute odds ratios. For example, suppose that we want to compute the odds of Z=1 vs. Z=4 for X=1 against the odds of Z=1 vs. Z=4 for X=2. Since the model has the following form:

 

            log mijk = λ + λiX +  λjY + λkZ + λijXY + λikXZ,

 

the log of the desired odds ratio is

 

            log ((m1j1 m2j4)/(m2j1 m1j4)) = log m1j1 + log m2j4 – log m2j1 –log m1j4

                                                     = λ1X + λ1Z + λ1jXY + λ11XZ + λ2X + λ4Z + λ2jXY + λ24XZ

                                                     -  λ2X - λ1Z - λ2jXY - λ21XZ - λ1X – λ4Z - λ1jXY - λ14XZ

                                                     = λ1X + λ1Z  + λ11XZ + λ2X + λ4Z + λ24XZ

                             -  λ2X - λ1Z  - λ21XZ - λ1X – λ4Z - λ14XZ.

 

Now, taking into account the identifiability constraints of proc catmod, which in this example are:

 

  λ1X + λ2X = 0

  λ11XZ + λ12XZ + λ13XZ + λ14XZ = 0

 λ21XZ + λ22XZ + λ23XZ + λ24XZ = 0

 λ11XZ + λ21XZ = 0

 λ12XZ + λ22XZ = 0

 λ13XZ + λ23XZ = 0

 λ14XZ + λ24XZ = 0,

 

we get,

            log ((m1j1 m2j4)/(m2j1 m1j4)) =  λ11XZ + λ24XZ - λ21XZ - λ14XZ

                                                     = 2λ11XZ - 2λ14XZ

                                                     = 4λ11XZ + 2λ12XZ + 2λ13XZ

                                                     = 4*(-0.2634) + 2*(-0.1212)+ 2*(0.2434)

                                                     = -0.8092.

Hence, the odds ratio is exp(-0.8092) = 0.4452.

It is easier to compute the odds and odds ratios under the identifiability constraints used in proc genmod and in SPSS. For the computations, see the next section.

If option  pred=prob  is used in the model statement, then in addition to the table with parameter estimates, the following table with observed and predicted probabilities for each combination of values of the variables X, Y and Z is printed.

 

                     The CATMOD Procedure

 

            Maximum Likelihood Predicted Values for Probabilities

 

               --------Observed-------    -------Predicted-------

                              Standard                   Standard

x    y    z    Probability       Error    Probability       Error    Residual

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

1    1    1         0.0426      0.0116          0.037       0.008      0.0056

1    1    2         0.0393      0.0111         0.0466      0.0091      -0.007

1    1    3          0.082      0.0157         0.0836      0.0126      -0.002

1    1    4         0.0885      0.0163         0.0852      0.0127      0.0033

1    2    1         0.0098      0.0057         0.0159       0.004      -0.006

1    2    2         0.0262      0.0092           0.02      0.0047      0.0062

1    2    3         0.0328      0.0102         0.0358      0.0072      -0.003

1    2    4         0.0393      0.0111         0.0365      0.0073      0.0028

1    3    1          0.023      0.0086         0.0226      0.0053      0.0004

1    3    2         0.0295      0.0097         0.0285      0.0061       0.001

1    3    3         0.0557      0.0131          0.051       0.009      0.0047

1    3    4         0.0459       0.012          0.052      0.0091      -0.006

2    1    1         0.0393      0.0111         0.0406      0.0078      -0.001

2    1    2         0.0557      0.0131         0.0385      0.0075      0.0172

2    1    3         0.0262      0.0092         0.0333      0.0069      -0.007

2    1    4         0.0328      0.0102         0.0416      0.0079      -0.009

2    2    1         0.0426      0.0116         0.0458      0.0085      -0.003

2    2    2         0.0557      0.0131         0.0434      0.0082      0.0123

2    2    3         0.0328      0.0102         0.0376      0.0075      -0.005

2    2    4         0.0426      0.0116          0.047      0.0086      -0.004

2    3    1         0.0459       0.012         0.0415      0.0079      0.0044

2    3    2         0.0098      0.0057         0.0393      0.0077       -0.03

2    3    3         0.0459       0.012          0.034       0.007      0.0119

2    3    4         0.0557      0.0131         0.0425       0.008      0.0132

 

The predicted probabilities printed in this table can also be used to compute odds ratios.  To compute the same odds ratio as the one we computed using the results presented in the parameter estimates table, we need to divide the ratio of probabilities P(X=1 and Z=1)/ P(X=1 and Z=4)  by P(X=2 and Z=1)/ P(X=2 and Z=4). It does not matter if we fix Y at 1, 2 or 3, since this odds ratio does not depend on Y. For Y=1, we get (0.037/0.0852) / (0.0406/0.0416) = 0.4450 (same as above).

 

 

2.      SAS, PROC GENMOD

 

-         Program

 

proc genmod data=one;

class x y z;

model count=x| y| z

            /dist=poisson type3 type1;

run;

proc genmod data=one;

class x y z;

model count=x| y| z @2

            /dist=poisson type3 type1;

run;

proc genmod data=one;

class x y z;

model count=x y z x*y x*z

            /dist=poisson type3 type1;

run;

 

-         Interpretation of Results

The output of the first proc genmod with the model statement model count=x| y| z; (the saturated model, main effects and all interactions are included) and the option type3 contains the following tables.

 

                   Criteria For Assessing Goodness Of Fit

 

         Criterion                 DF           Value        Value/DF

 

         Deviance                   0          0.0000           .

         Scaled Deviance            0          0.0000           .

         Pearson Chi-Square         0          0.0000           .

         Scaled Pearson X2          0          0.0000           .

         Log Likelihood                      499.7848

 

                   LR Statistics For Type 3 Analysis

 

                                           Chi-

                 Source           DF     Square    Pr > ChiSq

 

                 x                 1       0.00        0.9847

                 y                 2       7.06        0.0294

                 x*y               2      11.82        0.0027

                 z                 3       9.99        0.0187

                 x*z               3      10.72        0.0133

                 y*z               6      10.20        0.1166

                 x*y*z             6      10.67        0.0990

 

The Deviance is the likelihood ratio statistic, -2(log likelihood for the specified model – log likelihood for the saturated model), comparing the specified model with the saturated model. Since in this run of proc genmod, the saturated model was specified, the difference is 0. The next table, LR Statistics For Type 3 Analysis, displays significance tests for each effect in the specified model. The Chi-Square test for each effect is the likelihood ratio test (proc catmod prints the Wald tests in the "Maximum Likelihood Analysis of Variance" table). For example, the Chi-Square value corresponding to x*y*z is equal to      -2(log likelihood for the model without the third order interaction x*y*z – log likelihood for the saturated model).

The LR test for x*y*z does not indicate significance of the third order interaction. We can remove it from the model and rerun proc genmod with the model statement  model count=x| y| z @2; which specifies a model with the main effects and all possible interactions of order 2. Here are the goodness-of-fit and LR test tables for this model.

 

                 Criteria For Assessing Goodness Of Fit

 

         Criterion                 DF           Value        Value/DF

 

         Deviance                   6         10.6746          1.7791

         Scaled Deviance            6         10.6746          1.7791

         Pearson Chi-Square         6         10.6749          1.7792

         Scaled Pearson X2          6         10.6749          1.7792

         Log Likelihood                      494.4475

 

                     LR Statistics For Type 3 Analysis

 

                                           Chi-

                 Source           DF     Square    Pr > ChiSq

 

                 x                 1       0.28        0.5962

                 y                 2       7.80        0.0203

                 x*y               2      11.95        0.0025

                 z                 3       9.76        0.0208

                 x*z               3      11.69        0.0085

                 y*z               6       8.50        0.2036

 

 

The Deviance is equal to -2(log likelihood for the specified model – log likelihood for the saturated model), which is equal to 2(499.7948 (from the previous table, the saturated model) – 494.4475) = 2*5.3473=10.6946. The Chi-Square value for y*z is -2(log likelihood for the specified model without y*z – log likelihood for the specified model) (the specified model,  model count=x| y| z @2; does not include the third order interaction, x*y*z). The LR test indicates that the y*z interaction is not significant. We will remove it from the model and test if the model of conditional independence of Y and Z fits the data.

 

               Criteria For Assessing Goodness Of Fit

 

         Criterion                 DF           Value        Value/DF

 

         Deviance                  12         19.1766          1.5981

         Scaled Deviance           12         19.1766          1.5981

         Pearson Chi-Square        12         16.6976          1.3915

         Scaled Pearson X2         12         16.6976          1.3915

         Log Likelihood                      490.1964

 

                   LR Statistics For Type 3 Analysis

 

                                           Chi-

                 Source           DF     Square    Pr > ChiSq

 

                 x                 1       0.25        0.6189

                 y                 2       6.76        0.0341

                 z                 3       8.55        0.0360

                 x*y               2      11.77        0.0028

                 x*z               3      11.51        0.0092

 

 

The Deviance is 2*(499.7948 (log likelihood for saturated model) -490.1964 (log likelihood specified model)=19.1766 has Ch-square distribution with 12 degrees of freedom. It can be easily checked that the corresponding p-value is 0.0844. Hence, the model of conditional independence of Y and X fits the data reasonably well.  The output of proc genmod includes also a table containing parameter estimates. 

 

 

 

                       Analysis Of Parameter Estimates

 

                                     Standard   Wald 95% Confidence      Chi-

 Parameter           DF   Estimate      Error          Limits          Square

 

 Intercept            1     2.5629     0.1977     2.1754     2.9503    168.09

 x           1        1     0.2013     0.2699    -0.3277     0.7303      0.56

 x           2        0     0.0000     0.0000     0.0000     0.0000       .

 y           1        1    -0.0211     0.2052    -0.4233     0.3811      0.01

 y           2        1     0.0991     0.1993    -0.2914     0.4896      0.25

 y           3        0     0.0000     0.0000     0.0000     0.0000       .

 z           1        1    -0.0253     0.2250    -0.4664     0.4157      0.01

 z           2        1    -0.0780     0.2281    -0.5250     0.3691      0.12

 z           3        1    -0.2231     0.2372    -0.6880     0.2417      0.89

 z           4        0     0.0000     0.0000     0.0000     0.0000       .

 x*y         1   1    1     0.5147     0.2764    -0.0269     1.0564      3.47

 x*y         1   2    1    -0.4527     0.3021    -1.0449     0.1394      2.25

 x*y         1   3    0     0.0000     0.0000     0.0000     0.0000       .

 x*y         2   1    0     0.0000     0.0000     0.0000     0.0000       .

 x*y         2   2    0     0.0000     0.0000     0.0000     0.0000       .

 x*y         2   3    0     0.0000     0.0000     0.0000     0.0000       .

 x*z         1   1    1    -0.8095     0.3361    -1.4683    -0.1507      5.80

 x*z         1   2    1    -0.5250     0.3246    -1.1613     0.1112      2.62

 x*z         1   3    1     0.2041     0.3072    -0.3979     0.8061      0.44

 x*z         1   4    0     0.0000     0.0000     0.0000     0.0000       .

 x*z         2   1    0     0.0000     0.0000     0.0000     0.0000       .

 x*z         2   2    0     0.0000     0.0000     0.0000     0.0000       .

 x*z         2   3    0     0.0000     0.0000     0.0000     0.0000       .

 x*z         2   4    0     0.0000     0.0000     0.0000     0.0000       .

 Scale                0     1.0000     0.0000     1.0000     1.0000

 

The parameter estimates in the table can be used to compute odds ratios. For example, suppose that we want to compute the odds of Z=1 vs. Z=4 for X=1 against the odds of Z=1 vs. Z=4 for X=2. Since the model has the following form:

 

            log mijk = λ + λiX +  λjY + λkZ + λijXY + λikXZ,

 

the log of the desired odds ratio is

 

            log ((m1j1 m2j4)/(m2j1 m1j4)) = log m1j1 + log m2j4 – log m2j1 –log m1j4

                                                     = λ1X + λ1Z + λ1jXY + λ11XZ + λ2X + λ4Z + λ2jXY + λ24XZ

                                                     -  λ2X - λ1Z - λ2jXY - λ21XZ - λ1X – λ4Z - λ1jXY - λ14XZ

                                                     = λ1X + λ1Z  + λ11XZ + λ2X + λ4Z + λ24XZ

                                                     -  λ2X - λ1Z  - λ21XZ - λ1X – λ4Z - λ14XZ.

 

Since  λ2X = 0, λ4Z = 0, λ14XZ = 0,  λ24XZ = 0, λ21XZ = 0,

we get,

            log ((m1j1 m2j4)/(m2j1 m1j4)) =  λ11XZ

                                                     = -0.8095.

Hence, the odds ratio is exp(-0.8095) = 0.4451.

 

 

 

3.      SPSS

-         How to run?

 

            There are two choices available in the SPSS menu to run loglinear models. If first you want to examine what model fits the data, then you should start with Analyze, Loglinear, Model Selection. After deciding on a model, use Analyze, Loglinear, General, to get parameter estimates.

 

-         Interpretation of Results

 

            The output below was obtained for the data listed at the beginning of the Examples section and the following choices from SPSS menu: Analyze, Loglinear, Model Selection, Enter in a single step, Saturated model, and, in Options, Association Table (Display for Saturated Model), Delta=0. The Weight Cases option in the Data menu was used to Weight Cases by the variable count.

 

 Goodness-of-fit test statistics

 

    Likelihood ratio chi square =      .00000    DF = 0  P =  -INF

             Pearson chi square =      .00000    DF = 0  P =  -INF

 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

            Both, the Likelihood ratio and the Pearson tests are based on the difference between saturated and specified model. Since the specified model is saturated, both test statistics are 0.

 

Tests that K-way and higher order effects are zero.

 

         K     DF   L.R. Chisq    Prob  Pearson Chisq    Prob   Iteration

 

         3      6       10.675   .0990         10.675   .0990           3

         2     17       42.460   .0006         40.203   .0012           2

         1     23       58.792   .0001         57.990   .0001           0

 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The likelihood ratio (L.R.) and Pearson tests are based on a difference between the saturated model and the model without K-way  and higher interactions. For example, for K=3, L.R. Chisq = -2(log likelihood for the model without the third order interaction x*y*z – log likelihood for the saturated model). For K=2, L.R. Chisq = -2(log likelihood for the model without the third order interaction x*y*z  and all second order interactions, x*y, x*z, y*z (that is, the model with the main effects only) – log likelihood for the saturated model). For K=1, L.R. Chisq = -2 (log likelihood for the model with the intercept only  – log likelihood for the saturated model). In this example, only the third order interaction, x*y*z, is not significant.

 

Tests that K-way effects are zero.

 

         K     DF   L.R. Chisq    Prob  Pearson Chisq    Prob   Iteration

 

         1      6       16.332   .0121         17.787   .0068           0

         2     11       31.785   .0008         29.528   .0019           0

         3      6       10.675   .0990         10.675   .0990           0

_

            The likelihood ratio (L.R.) and Pearson tests are based on a difference between the saturated model and the model without K-way interactions. For example, for K=3, L.R. Chisq = -2(log likelihood for the model without the third order interaction x*y*z – log likelihood for the saturated model). For K=2, L.R. Chisq = -2(log likelihood for the model without all second order interactions, x*y, x*z, y*z  – log likelihood for the saturated model). For K=1, L.R. Chisq = -2 (log likelihood for the model without main effects, x, y, z – log likelihood for the saturated model).

 

 

* * * * * * * *  H I E R A R C H I C A L   L O G   L I N E A R  * * * * * * * *

 

 Tests of PARTIAL associations.

 

  Effect Name                                    DF  Partial Chisq    Prob  Iter

 

 

  X*Y                                             2         11.949   .0025     2

  X*Z                                             3         11.693   .0085     2

  Y*Z                                             6          8.502   .2036     2

  X                                               1           .266   .6063     2

  Y                                               2          7.577   .0226     2

  Z                                               3          8.489   .0369     2

 

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

            The Partial Chisq is the likelihood ratio statistic. For each effect, it compares the hierarchical model that includes the listed effect and all the effects of the same order with the same model minus the listed effect.  In this example, for effect X*Y, Partial. Chisq = -2(log likelihood for the model that includes main effects x, y, z and interactions x*z, y*z  – log likelihood for the model that includes main effects, x, y, z and all second order interactions, x*y, x*z, y*z). For effect X*Z, Partial. Chisq = -2(log likelihood for the model that includes main effects x, y, z and interactions x*y, y*z  – log likelihood for the model that includes main effects, x, y, z and all second order interactions, x*y, x*z, y*z). For effect Y*Z, Partial. Chisq = -2(log likelihood for the model that includes main effects x, y, z and interactions x*y, x*z  – log likelihood for the model that includes main effects, x, y, z and all second order interactions, x*y, x*z, y*z). For effect X, Partial. Chisq = -2(log likelihood for the model that includes main effects  y, z   – log likelihood for the model that includes main effects, x, y, z). For effect Y, Partial. Chisq = -2(log likelihood for the model that includes main effects  x, z   – log likelihood for the model that includes main effects, x, y, z). For effect Z, Partial. Chisq = -2(log likelihood for the model that includes main effects x, y  – log likelihood for the model that includes main effects, x, y, z).

            Since we know from the “Tests that K-way and higher order effects are zero” table that the 3-way interaction, x*y*z, is not significant, and the 2-way interactions (as a group) are significant, it makes sense to look at the significance of each second order interaction. In this example, Y*Z is not significant, so we will delete it from the model and use Loglinear, General to test if the model of conditional independence of Y and Z fits the data and to print parameter estimates.

 

            The output below was obtained under the following choices in SPSS: Analyze, Loglinear, General,  Model x, y, z, x*y, x*z, and, in Options, Display Estimates, Delta=0. As before, the Weight Cases option in the Data menu was used to Weight Cases by the variable count.

 

 

                  Goodness-of-fit Statistics

 

                    Chi-Square       DF       Sig.

 

Likelihood Ratio       19.1766       12      .0844

         Pearson       16.6976       12      .1613

 

Both, the Likelihood ratio and the Pearson tests are based on the difference between saturated and specified model. The Likelihood Ratio Chi-Square is equal to -2(log likelihood for the specified model – log likelihood for the saturated model). The tests indicate that the specified model (the model includes x, y, x and x*y, x*z) fits the data reasonably well.

 

                  Parameter Estimates

 

                                               Asymptotic 95% CI

Parameter   Estimate         SE    Z-value      Lower      Upper

 

        1     2.5629      .1977      12.97       2.18       2.95

        2      .2013      .2699        .75       -.33        .73

        3      .0000      .            .          .          .

        4     -.0211      .2052       -.10       -.42        .38

        5      .0991      .1992        .50       -.29        .49

        6      .0000      .            .          .          .

        7     -.0253      .2250       -.11       -.47        .42

        8     -.0780      .2281       -.34       -.52        .37

        9     -.2231      .2372       -.94       -.69        .24

       10      .0000      .            .          .          .

       11      .5147      .2763       1.86       -.03       1.06

       12     -.4527      .3021      -1.50      -1.04        .14

       13      .0000      .            .          .          .

       14      .0000      .            .          .          .

       15      .0000      .            .          .          .

       16      .0000      .            .          .          .

       17     -.8095      .3361      -2.41      -1.47       -.15

       18     -.5250      .3246      -1.62      -1.16        .11

       19      .2041      .3072        .66       -.40        .81

       20      .0000      .            .          .          .

       21      .0000      .            .          .          .

       22      .0000      .            .          .          .

       23      .0000      .            .          .          .

       24      .0000      .            .          .          .

 

Correspondence Between Parameters and Terms of the Design

 

Parameter   Aliased  Term

 

        1            Constant

        2            [X = 1.0000]

        3       x    [X = 2.0000]

        4            [Y = 1.0000]

        5            [Y = 2.0000]

        6       x    [Y = 3.0000]

        7            [Z = 1.0000]

        8            [Z = 2.0000]

        9            [Z = 3.0000]

       10       x    [Z = 4.0000]

       11            [X = 1.0000]*[Y = 1.0000]

       12            [X = 1.0000]*[Y = 2.0000]

       13       x    [X = 1.0000]*[Y = 3.0000]

       14       x    [X = 2.0000]*[Y = 1.0000]

       15       x    [X = 2.0000]*[Y = 2.0000]

       16       x    [X = 2.0000]*[Y = 3.0000]

       17            [X = 1.0000]*[Z = 1.0000]

       18            [X = 1.0000]*[Z = 2.0000]

       19            [X = 1.0000]*[Z = 3.0000]

       20       x    [X = 1.0000]*[Z = 4.0000]

       21       x    [X = 2.0000]*[Z = 1.0000]

       22       x    [X = 2.0000]*[Z = 2.0000]

       23       x    [X = 2.0000]*[Z = 3.0000]

       24       x    [X = 2.0000]*[Z = 4.0000]

 

Note: 'x' indicates an aliased (or a redundant) parameter.

      These parameters are set to zero.

 

 

            The parameter estimates in the “Parameter Estimates” table can be used to compute odds ratios. For example, suppose that we want to compute the odds of Z=1 vs. Z=4 for X=1 against the odds of Z=1 vs. Z=4 for X=2. Since the model has the following form:

 

            log mijk = λ + λiX +  λjY + λkZ + λijXY + λikXZ,

 

the log of the desired odds ratio is

 

            log ((m1j1 m2j4)/(m2j1 m1j4)) = log m1j1 + log m2j4 – log m2j1 –log m1j4

                                                     = λ1X + λ1Z + λ1jXY + λ11XZ + λ2X + λ4Z + λ2jXY + λ24XZ

                                                     -  λ2X - λ1Z - λ2jXY - λ21XZ - λ1X – λ4Z - λ1jXY - λ14XZ

                                                     = λ1X + λ1Z  + λ11XZ + λ2X + λ4Z + λ24XZ

                                                     -  λ2X - λ1Z  - λ21XZ - λ1X – λ4Z - λ14XZ.

 

Since  λ2X = 0, λ4Z = 0, λ14XZ = 0,  λ24XZ = 0, λ21XZ = 0,

we get,

            log ((m1j1 m2j4)/(m2j1 m1j4)) =  λ11XZ

                                                     = -0.8095.

Hence, the odds ratio is exp(-0.8095) = 0.4451.

 

 

References

 

1.      SAS/STAT User’s Guide, Version 8, SAS Institute Inc., 1999

2.      SPSS Advanced Models 10.0, SPSS Inc., 1999

3.      Alan Agresti, An Introduction to Categorical Data Analysis, John Wiley and Sons Inc., 1996

4.      Daniel Zelterman, Models for Discrete Data, Oxford University Press, 1999