D.

2005-10-31 04:08:11 UTC

I have been teaching myself different multivariate statistical

techniques over the past few months to try to get a viable method to

use on my dataset, and am still somewhat confused.

I gathered data on about 200 people. My independent variables are

genotype at two genes (each genotype can be considered a binary

variable with two roughly equal sized groupings), gender, and season of

birth (a binary variable separating the year into halves). The

dependent variables of interest are about 15 continuous psychological

scales (based upon past research that derives these scales from factor

analysis of many other individual questions), and about 4 other

continuous variables that I am interested in such as at what age

subjects think they will die at. The dependents are often significantly

correlated with each other.

The independents are expected to have small and perhaps interactive

effects on the dependents. The analysis is meant to be exploratory. I

expect little to none of my "significant" results to hold up to

corrections for multiple testing.

I considered DFA or logistic regression using one of the binary

independent variables as a pseudo-dependent variable. The

pseudo-independent continuous variables would then be ranked as to

which best distinguish between the binary pseudo-dependent. I could

solve the problem of multicollinearity by doing a PCA on the continuous

variables. I decided against this method because its method of flipping

the dependent/independent relationship on its head is dubious, factors

found significant in DFA would have to be deconstructed to understand

what they are saying, and further analysis would need to be done to

evaluate interactions between the binary dependents.

MANOVA seems like a good alternative. It allows interaction effects and

has a fairly straightforward interpretation. However I have some

concerns:

-For my interpretation I plan to report the results for each

multivariate main effect and the univariate individual effects

regardless of if the main effect is significant. I realize a common

technique is to only proceed to the individual effects if the main

effect is significant. Is my plan acceptable in an exploratory

analysis?

-The effect of multicollinear dependents on a model is ambiguous. Some

say that correlated dependents are a serious problem

(http://www.matforsk.no/ola/ffmanova.htm), while others present a more

ambiguous case (How the Power of MANOVA Can Both Increase and Decrease

as a Function of the Intercorrelations Among the Dependent Variables.

Cole, David A.1; Maxwell, Scott E.1; Arvey, Richard2; Salas, Eduardo3,

Psychological Bulletin. Vol 115 (3), May 1994, pp. 465-474). Fooling

around with my model so far, I find that changing the number of

independents and dependents in the model changes my P values some, but

not a ton. How much should I be worrying about this assumption?

Having four interacting binary independents with N=200 causes some

major stratification. I've read that no cell in the analysis should

have an N=20, or alternatively that the minimum N of the lowest cell

should not be outnumbered by the number of dependent variables. I may

lower my number of dependents to fit the latter rule if it is correct.

Is MANOVA a good option for my needs? Would I be better off doing one,

two and three way ANOVAs and the nonparametric equivalents

individually?

Thanks for any comments you can provide.

.d

techniques over the past few months to try to get a viable method to

use on my dataset, and am still somewhat confused.

I gathered data on about 200 people. My independent variables are

genotype at two genes (each genotype can be considered a binary

variable with two roughly equal sized groupings), gender, and season of

birth (a binary variable separating the year into halves). The

dependent variables of interest are about 15 continuous psychological

scales (based upon past research that derives these scales from factor

analysis of many other individual questions), and about 4 other

continuous variables that I am interested in such as at what age

subjects think they will die at. The dependents are often significantly

correlated with each other.

The independents are expected to have small and perhaps interactive

effects on the dependents. The analysis is meant to be exploratory. I

expect little to none of my "significant" results to hold up to

corrections for multiple testing.

I considered DFA or logistic regression using one of the binary

independent variables as a pseudo-dependent variable. The

pseudo-independent continuous variables would then be ranked as to

which best distinguish between the binary pseudo-dependent. I could

solve the problem of multicollinearity by doing a PCA on the continuous

variables. I decided against this method because its method of flipping

the dependent/independent relationship on its head is dubious, factors

found significant in DFA would have to be deconstructed to understand

what they are saying, and further analysis would need to be done to

evaluate interactions between the binary dependents.

MANOVA seems like a good alternative. It allows interaction effects and

has a fairly straightforward interpretation. However I have some

concerns:

-For my interpretation I plan to report the results for each

multivariate main effect and the univariate individual effects

regardless of if the main effect is significant. I realize a common

technique is to only proceed to the individual effects if the main

effect is significant. Is my plan acceptable in an exploratory

analysis?

-The effect of multicollinear dependents on a model is ambiguous. Some

say that correlated dependents are a serious problem

(http://www.matforsk.no/ola/ffmanova.htm), while others present a more

ambiguous case (How the Power of MANOVA Can Both Increase and Decrease

as a Function of the Intercorrelations Among the Dependent Variables.

Cole, David A.1; Maxwell, Scott E.1; Arvey, Richard2; Salas, Eduardo3,

Psychological Bulletin. Vol 115 (3), May 1994, pp. 465-474). Fooling

around with my model so far, I find that changing the number of

independents and dependents in the model changes my P values some, but

not a ton. How much should I be worrying about this assumption?

Having four interacting binary independents with N=200 causes some

major stratification. I've read that no cell in the analysis should

have an N=20, or alternatively that the minimum N of the lowest cell

should not be outnumbered by the number of dependent variables. I may

lower my number of dependents to fit the latter rule if it is correct.

Is MANOVA a good option for my needs? Would I be better off doing one,

two and three way ANOVAs and the nonparametric equivalents

individually?

Thanks for any comments you can provide.

.d