Pairwise Comparisons in SAS and SPSS
This handout is for users of SAS or
SPSS software who would like to use multiple comparison methods available in
either software package to carry out pairwise comparisons. We
will give a short description of available methods and, for the analyses listed
below, a recommendation based on the comparisons of the procedures provided in
the references listed at the bottom of the handout. Some of the methods,
namely step-down Holm-Bonferroni and Holm-Sidak, are not directly available in
SAS or SPSS, but can be easily implemented using results of appropriate SAS or
SPSS procedures.
Multiple
comparisons procedures are used to control for the
familywise error rate. For example, suppose that we have four groups and we
want to carry out all pairwise comparisons of the group means. There are six
such comparisons: 1 with 2, 1 with 3, 1 with 4, 2 with
3, 2 with 4 and 3 with 4. Such set of comparisons is called
a family. If we use, for example, a t-test to compare each pair at a certain
significance level ALPHA, then the probability of Type I
error (incorrect rejection of the null hypothesis of equality of means) can be
guaranteed not to exceed ALPHA only individually, for each
pairwise comparison separately, but not for the whole family. To ensure that
the probability of incorrectly rejecting the null hypothesis for any of the
pairwise comparisons in the family does not exceed ALPHA, multiple
comparisons methods that control the familywise error rate (FWE) need to be
used.
Multiple
comparisons methods can be divided into two types: single-step methods, based
on simultaneous confidence intervals that allow directional decisions (for
example, mean of group 1 is bigger than mean of group 2), and stepwise,
sequentially rejective, methods that are limited to hypothesis testing and, in
most cases, do not produce simultaneous confidence intervals or lead to
directional decisions. Stepwise methods
are generally more powerful than the corresponding single-step procedures.
Therefore, if the hypothesis testing is the main goal of analysis and
confidence intervals are not needed, the stepwise
methods are preferable.
There
are several tests for pairwise comparisons available in SAS as well as in SPSS.
They are: LSD, Bonferroni, Sidak, Scheffe, REGWQ
(Ryan-Einot-Gabriel-Welch based on range), Tukey, Tukey-Kramer, Gabriel,
Hochberg’s GF2, SNK (Student-Newman-Keuls), Duncan, Waller-Duncan and Dunnett.
In addition, REGWF,
which is Ryan-Einot-Gabriel-Welch test based on ANOVA F, and Tukey’s-b test,
are available only in SPSS, while the simulation option for computing
approximations to the exact p-values for pairwise comparisons, is available in
SAS. SPSS also provides tests for pairwise comparisons in one-way ANOVA with
unequal group variances. The available tests are:
Tamhane’s T2, Dunnett’s T3, Games-Howell and Dunnett’s C.
-
LSD
-
Sidak
-
Scheffe
-
Tukey
-
Gabriel
-
Dunnett
-
REGWQ, REGWF, SNK (Student-Newman-Keuls), Duncan
1. Balanced one-way ANOVA, equal variances in
groups
2. Unbalanced one-way ANOVA, equal variances in groups
3. One-way ANOVA with unequal
variances
5. General unbalanced fixed effect ANOVA
6. Mixed and Repeated Measures ANOVA
The
following are single-step tests that, in addition to pairwise comparisons,
produce also simultaneous confidence intervals: Bonferroni, Sidak, Scheffe,
Tukey, Tukey-Kramer, Gabriel, Hochberg’s GF2 and Dunnett. Below are short
descriptions of these tests.
LSD: The LSD (Least Significant Difference) test is a two-step
test. First the ANOVA F test is performed. If it is
significant at level ALPHA, then all pairwise t-tests are carried out, each at level ALPHA. If the F
test is not significant, then the procedure terminates. The LSD test does not
control the FWE.
Bonferroni: The Bonferroni multiple comparison test
is a conservative test, that is, the FWE is not exactly equal to ALPHA, but is less
than ALPHA in most situations. It is easy to apply and can be used for
any set of comparisons. The get the Bonferroni adjusted p-values, just multiply
the ordinary, not adjusted pairwise p-values (for example, t-test p-values for comparing two
means) by the number of comparisons in the family and take the minimum of the
obtained number and 1. Even though the Bonferroni test controls the FEW rate,
in many situations it may be too conservative and not have enough power to
detect significant differences.
Sidak: Sidak adjusted
p-values are also easy to compute; the adjusted p-value is equal to
1-(1-unadjusted p-value)k ,
where k is the number of comparisons in the family. The Sidak test gives
slightly smaller adjusted p-values than Bonferroni, but it guarantees the
strict control of FWE only when the comparisons are independent as, for
example, orthogonal contrasts.
Scheffe: The Scheffe test is used in ANOVA analysis (balanced, unbalanced, with
covariates). It controls for the FWE for all possible contrasts, not only
pairwise comparisons and is too conservative in cases when pairwise comparisons
are the only comparisons of interest.
Tukey: The Tukey test is based on the
studentized range distribution (standardized maximum difference between the
means). For one–way
balanced anova, the FWE of the Tukey test is exactly equal the
assumed value of ALPHA. The Tukey test is also exact
for one-way balanced anova with correlated errors when the type of correlation
structure is compound symmetry.
Tukey-Kramer: The Tukey-Kramer test is an extension of the Tukey test to
unbalanced designs. Unlike Tukey test for balanced designs, it is not exact.
The FWE of the Tukey-Kramer test may be less than ALPHA. It is less
conservative for only slightly unbalanced designs and more conservative when
differences among samples sizes are bigger.
Hochberg’s GF2: The GF2 test is
similar to Tukey, but the critical values are based on
the studentized maximum modulus distribution instead of the studentized range. For
balanced or unbalanced one-way anova, its FWE does not exceed ALPHA. It is
usually more conservative than the Tukey-Kramer test for unbalanced designs and
it is always more conservative than the Tukey test for balanced designs.
Gabriel: Like the GF2 test,
the Gabriel test is based on studentized maximum
modulus. It is equivalent to the GF2 test for balanced one-way anova. For
unbalanced one-way anova, it is less conservative than GF2, but its FWE may
exceed ALPHA in highly unbalanced designs.
Dunnett: The Dunnett’s test
is a test to use when the only pariwise comparisons of interest are comparisons
with a control. It is an exact test, that is, its FWE is exactly equal to ALPHA, for balanced as well as unbalanced
one-way designs.
The
following tests are stepwise tests: REGWQ, REGWF, SNK (Student-Newman-Keuls),
Duncan, Tukey’s-b. These tests do not provide confidence intervals. They just
divide pairwise differences into possibly overlapping groups. Means within the
same group are not significantly different, those from
different groups are significantly different at an assumed level ALPHA. The
Bonferroni-Holm and Sidak-Holm step-down tests belong to this class of tests.
They are not available in SAS or SPSS, but can be easily performed
using the results printed by either software package.
All
tests listed above are step-down tests. They share a common testing scheme that
consists of the following steps:
- First, the equality of all of the means is tested at a level ALPHAk. If the test results in a rejection, then each subset of k-1 means is
tested at level ALPHAk-1; otherwise,
the procedure stops.
-
In general, if the hypothesis of equality of a set of p means is
rejected at level
ALPHAp, then each subset of p-1 means is tested at the level ALPHAp-1; otherwise,
the set of p means is considered not to differ significantly and none of its
subsets is tested.
-
Continue in this manner until no subsets remain to be tested.
Significance levels ALPHAk, ALPHAk-1, … depend on the
number of comparisons and the tests.
REGWQ, REGWF, SNK
(Student-Newman-Keuls), Duncan
are step-down paiwise comparison procedures for one-way balanced anova.
Although these tests can be obtained in SAS as well as
in SPSS for unbalanced designs (both software packages use the harmonic mean of
the sample sizes as the common sample size), their use in unbalanced cases is
not recommended. It is also not recommended to use the
SNK (Student-Newman-Keuls) and
Bonferroni-Holm: The biggest advantage of the Bonferroni-Holm step down
test is that it does not require any assumptions (model or distribution
related) and therefore can be applied to any family of
pairwise comparisons. It is a conservative test, its
FWE does not exceed ALPHA. Here is how it works. Suppose
that there are k pairwise comparisons of interest and corresponding p-values,
not adjusted for multiple comparisons, are p1, p2, … , pk.
Order the p-values from the smallest to the largest, p(1), p(2),
… , p(k) with the corresponding comparisons C(1), C(2),
… , C(k). If p(1)
> ALPHA/k, then stop, retain all hypotheses and conclude there is
no evidence there are differences between means at significance level ALPHA. If p(1)
<= ALPHA/k, then reject the hypothesis related to comparison C(1),
conclude that the means in comparison C(1) are significantly
different at level ALPHA, and go to the next step. The
next step is to compare p(2)
with ALPHA/(k-1). If
p(2) > ALPHA/(k-1), then
stop and retain all remaining hypotheses. If p(2) <= ALPHA/(k-1), then
reject the hypothesis related to comparison C(2), conclude that the
means in comparison C(2) are significantly different at level ALPHA, and go to
the next step. The next step is to compare p(3)
with ALPHA/(k-2). If
p(3) > ALPHA/(k-2), then
stop and retain all remaining hypotheses. If p(3) <= ALPHA/(k-2), then
reject the hypothesis related to comparison C(3), conclude that the
means in comparison C(3) are significantly different at level ALPHA, and go to
the next step. Continue until the procedure requires to stop
or until all p-values are compared.
Sidak-Holm: The testing procedure in the Sidak-Holms method is very
similar to the Bonferroni-Holms method. The only difference is that the ordered
p-values are not compared with ALPHA/(k-j), but with
the Sidak adjustment, 1-(1-unadjusted p-value)k-j, for j=0,1, … ,
k-1. The Sidak-Holms test is slightly less conservative than Bonferroni-Holms,
but its control of FWE is guaranteed only for
independent comparisons.
Waller-Duncan test is different from all the tests mentioned above. It is based on a Bayesian approach and minimizes an additive loss
function, which is a sum of loss functions for each pairwise comparison. The
individual loss functions are linear with the loss equal to absolute value of
the difference between means multiplied by a constant k0 if the null hypothesis
was incorrectly accepted, or by a constant k1 if the alternative hypothesis was
incorrectly accepted. The ratio K=k1/k0 is a measure of relative seriousness of
a Type I error versus a Type II error and it has to be specified instead of the
significance level ALPHA. The values of K=50, 100 and 500
roughly correspond to ALPHA = 0.10, 0.05 and 0.01,
respectively (see Multiple Comparison Procedures, by Y. Hochberg and A. Tamhane
for details).
In
SPSS, the tests can be chosen as an option in the
analysis of variance procedures: One-way ANOVA in the Compare Means menu and in
the General Linear Model. Click on the Post Hoc button to select a test. In
SAS, the tests are available in PROC GLM as options in the LSMEANS and MEANS
statements, and in PROC MIXED in the LSMEANS statement. Examples of sas code will be given in the sections below.
1. Balanced one-way ANOVA, equal variances in groups
If all pairwise comparisons of the means are
tested and confidence intervals are also required,
then the Tukey test is recommended. The Tukey test is exact for balanced
one-way ANOVA, that is, the FWE is exactly equal to ALPHA, and it is
more powerful than available alternatives.
If comparisons with a control are
the only ones needed and confidence intervals are required, then the Dunnett’s
test is recommended. The FWE of the Dunnett’s test is exactly equal
to ALPHA for pairwise
comparisons with a control.
If all pairwise comparisons of the
means are tested, but confidence intervals are not required, then the REGWQ test
is recommended. The FWE of the REGWQ does not exceed ALPHA and REGWQ is
more powerful than the Tukey test when estimating the amount of the difference between means
(confidence intervals) is not required.
SAS code:
Suppose that the name of the grouping variable is group with values 1,
2, 3 and 4, and the name of the variable containing measurements whose group
means we want to compare is y.
-
SAS code to obtain Tukey test with
confidence intervals:
proc glm;
class group;
model y = group;
means group
/cldiff tukey;
The above code will print the 95% confidence intervals and the same
letter by the means that are not significantly different. To print 99%
confidence intervals, change
means group /cldiff tukey;
to
means group /cldiff tukey alpha=0.01;
If you want to get p-values for each
comparison, change
means group /cldiff
tukey;
to
means group /pdiff adjust=tukey;
-
SAS code to obtain Dunnett’s test for
comparisons with group=4:
proc glm;
class group;
model y = group;
means group
/dunnett (‘4’);
-
SAS code to obtain REGWQ test:
proc glm;
class group;
model y = group;
means group
/regwq;
2. Unbalanced one-way ANOVA, equal variances in groups
If all pairwise comparisons of the means are
tested, then the Tukey-Kramer test is recommended. The
Tukey-Kramer test is not exact, but conservative for unbalanced one-way ANOVA,
that is, the FWE does not exceed ALPHA (may be less
than ALPHA). The Tukey-Kramer test is conservative, because the
critical value used in it is not exact, but an approximation to the exact
value. The approximation is quite accurate for slightly unbalanced designs and
becomes less accurate when the differences in sample sizes increase. The
SIMULATION option in SAS may provide a better approximation and therefore a
less conservative test. The accuracy of the approximation increases with the
number of samples used in the simulation.
If comparisons with a control are the
only ones needed, then the Dunnett’s test is recommended. The FWE of the Dunnett’s
test is exactly equal to ALPHA
for pairwise comparisons with a control.
In SPSS, the Tukey-Kramer test is obtained for unbalanced design if the Tukey option is
checked in the Post Hoc menu.
In SAS, the following code can be used to obtain the Tukey-Kramer test (assuming group
is the name of a group variable and y is a variable whose group means we want
to compare):
proc glm;
class group;
model y = group;
means group
/tukey;
The following program can be used to obtain the simulation test:
proc glm;
class group;
model y = group;
lsmeans group
/pdiff cl adjust=simulate (NSAMP=100000 seed=278912);
where NSAMP is the
number of samples used in simulations and seed is the starting seed for the
random number generation. Higher values of NSAMP result in a better test, but
increase the computation time.
The following SAS program may be used to obtain
Dunnett’s test for comparisons with a control. In the program, the control
group is group=4:
proc glm;
class group;
model y = group;
means group
/dunnett (‘4’);
3. One-way ANOVA with unequal variances
In
SAS no tests are available for pairwise comparisons for
one-way anova when variances in groups are not equal.
In
SPSS, the following tests are available: Tamhane T2, Dunnett’s T3, Games and Howell and Dunnett’s C. None of these tests is
exact. T2, T3 and C are conservative procedures, that is, for all of them the
FWE does not exceed ALPHA. T2 is more conservative than
T3, for large samples they are approximately equal. T3 is more conservative
than C for large samples, while C is more conservative for smaller. The Games
and Howell test is an extension of the Tukey-Kramer test to the case of unequal
variances. It has higher power (narrower confidence intervals) than T2, T3 or
C, but its FWE may exceed ALPHA. The Games
and Howell test is most liberal (its FWE is most likely to exceed ALPHA) when the
variances of the sample means, σi2 / ni, are approximately equal.
Recommendation: Dunnett’s T3
or Dunnett’s C should be used for pairwise
comparisons. T3 is recommended when sample sizes in
groups are small, C is recommended when sample sizes are large.
(i) Main effect models
In
general balanced ANOVA with main effects and no interactions, tests recommended
in Section 1, One-Way balanced ANOVA, can be used.
That is, the Tukey test is recommended for all
pairwise comparisons if confidence intervals for mean differences are needed,
and the step-down REGWQ test, when the confidence intervals are not required.
For pairwise comparisons with a control, Dunnett’s test is
recommended.
To
get the test in SPSS, click on Post Hoc, select the main effect of interest and
the test. In SAS, the following programs can be used.
proc glm;
class group edu;
model y = group
edu;
means group
/cldiff tukey;
Replace
the means statement with
means group /dunnett (‘4’);
to obtain the Dunnett’s test for pairwise comparisons with the
control group 4,
and with
means group /regwq;
to obtain the REGWQ test.
(ii) Models with
interactions
Pairwise
comparisons for main effects are not usually of interest when interactions are
present in the model. If they are, then the tests described above in subsection
(i) can be used. However, in many cases, different
sets of comparisons related to the interactions may be of interest. SPSS does
not provide any multiple comparison procedure for such comparisons. In SAS
multiple comparisons tests can be easily obtained,
without additional programming, only in the case when all pairwise comparisons
among all combinations of the levels of variables involved in the interaction
or only pairwise comparisons with a control are of interest. For example,
suppose that there are two class variables, group and edu, and their
interaction in the model. If we want to carry out all pairwise comparisons
among all combinations of levels of group and edu, then the Tukey test can be used for it and the following program can be used to
obtain it.
proc glm;
class group edu;
model y = group
edu group*edu;
lsmeans
group*edu /pdiff cl adjust=tukey;
For
a comparison with a control, the Dunnett’s test controls the FWE exactly and can be obtained with the following program:
proc glm;
class group edu;
model y = group
edu group*edu;
lsmeans
group*edu /pdiff=control (‘1’ ‘3’) cl adjust=dunnett;
where (group=1 and
edu=’3’) is a control group.
For
other sets of comparisons, the Bonferroni-Holm step down
test with the t-test p-values can be used. It controls for FWE. However, it may
be too conservative and in some situations better, more powerful, procedures
may be available (see reference 2 for sas macros), although not ready-made in
SAS or SPSS.
5. General unbalanced fixed effect ANOVA
SPSS
does not have any FWE controlling procedures that can be used
for pairwise comparisons in general unbalanced designs. SPSS users can run General
Linear Model and request comparisons based on the t-test (no multiple
comparison correction) by using COMPARE option in the EMMEANS statement (for
comparisons related to interactions) or selecting Compare Main Effects (overall
main effect comparisons) in the Options menu. Then the t-test p-values can be adjusted for multiple comparisons by applying the Bonferroni-Holm procedure. The procedure is easy to apply, but
it may be overly conservative. It does not provide confidence intervals.
In
SAS, the Dunnett-Hsu, Tukey-Kramer, GT2 and SIMULATE options are available in
the LSMEANS statement. Since it is not known if the Tukey-Kramer test controls
the FWE for pairwise comparisons for general unbalanced designs (it is proven
to guarantee the FWE only in some cases), a more conservative GT2 test or
approximately exact SIMULATE option is recommended. For pairwise comparisons
with a control, the approximately exact Dunnet-Hsu test, obtained by specifying
adjust=dunnet in the lsmeans statement, or the SIMULATE option is recommended. For example,
proc glm;
class drug
disease;
model y =
drug disease drug*disease;
lsmeans
drug /pdiff cl adjust=gt2;
lsmeans
drug*disease /pdiff cl adjust=gt2;
*or
to compare with a control level defined as level 1 for drug and level 2 for;
*
disease;
lsmeans
drug*disease /pdiff=control('1' '2') cl
adjust=gt2;
proc glm;
class drug
disease;
model y = drug
disease drug*disease;
lsmeans drug
/pdiff cl adjust=simulate(seed=198351, acc=0.001);
lsmeans
drug*disease /pdiff cl adjust=simulate(seed=198351 acc=0.001);
*or
to compare with a control level defined as level 1 for drug and *level 2 for
disease;
lsmeans
drug*disease /pdiff=control('1' '2') cl
adjust=dunnett;
lsmeans
drug*disease /pdiff=control('1' '2') cl
adjust=simulate(seed=198351
acc=0.001 cvadjust);
For
sets of comparisons that do not include all pairwise comparisons or comparisons
with a control, the Bonferroni-Holm step down test with the t-test p-values can be carried out. The t-test p-values can be obtained with
the adjust=t statement. For example,
lsmeans drug*disease /pdiff adjust=t;
The
Bonferroni-Holm test is easy to carry out, but may be too
conservative. There are SAS macros evailable (2) that provide less conservative
adjustments.
6. Mixed and Repeated Measures ANOVA
In
balanced ANOVA with one random factor and one fixed factor, the Tukey test
controls the FWE exactly and can be used for all pariwise comparisons and
Dunnett’s test can be used for pairwise comparisons with a control. For
example, suppose that subject is a random factor and trial a fixed factor. In
SPSS, General Linear Model, Univariate, Custom Model with main effects only,
select the Tukey (or Dunnett) test for trial in the Post Hoc menu. In SAS, the
following program can be used to obtain the tests.
Proc mixed;
class
trail subject;
model y =
trial /ddfm=satterth;
random
subject;
lsmeans
trail /cl adjust=tukey;
lsmeans
trail /pdiff=control (‘1’) cl adjust=dunnett;
run;
For
general mixed and repeated measures models, SPSS does not have any procedures
that control FWE. In SAS, the SIMULATE options is recommended for all pairwise
comparisons. For pairwise comparisons with a control, adjust=dunnett, that
gives the approximately exact Dunnett-Hsu test, or adjust=simulate can be used
to control for FWE. For example,
Proc mixed;
class id
t trt;
model y =
trt x;
repeated
t /type=un subject=id;
lsmeans
trt /pdiff cl adjust=simulate(seed=18713 nsamp=200000);
*or
to compare with a control level defined as level 1 of trt
lsmeans
trt /pdiff=control('1'') cl
adjust=dunnett;
lsmeans
trt /pdiff=control('1' ) cl
adjust=simulate(seed=121211
nsamp=200000);
For
sets of comparisons that do not include all pairwise comparisons or comparisons
with a control, the Bonferroni-Holm step down test with the
t-test p-values is the easiest option but may be too conservative. There are
SAS macros available (2) that provide less conservative adjustments.
In
general case when no assumptions about the
distribution or model are made, the following tests are recommended: the Bonferroni test, if confidence intervals in addition to the
tests are required, and the Bonferroni-Holm step-down test,
if confidence intervals are not needed. Both tests control FWE conservatively
(FWE <= ALPHA).
3. SAS/STAT User’s Guide, Version 8, SAS
Institute 1999.