The power of a multivariate approach to genome-wide association studies.

The power of a multivariate approach to genome-wide association studies. David Houle¹, Jessica Nye^1,2, Eladio Marquez¹, William Pitchers³, Alycia Kowalski³, Ian Dworkin³. 1) Dept Biological Science, Florida State Univ, Tallahassee, FL; 2) Dept. of Genetics, North Carolina State Univ, Raleigh, NC; 3) Dept. of Zoology, Michigan State University, Lansing, MI.

Genome-wide association studies (GWAS) are almost invariably conducted on one phenotypic trait at a time, despite the fact that organisms present integrated patterns of variation. We demonstrate that a multivariate GWAS has increased power, and gives more interpretable results than a set of univariate analyses. We measured the shape of Drosophila melanogaster wings in the Drosophila Genome Reference Panel (DGRP) using the automated Wingmachine system. We analyzed data with 59 degrees of freedom that captures the size of the wing and the location of all the wing veins. Over 22,000 wings from 165 DGRP lines were measured in two different labs. We analyzed the data by MANOVA, which determines both the direction of the phenotypic effect in the 59 dimensional space, and the statistical significance of the effect. Inferred effects were very consistent across labs. After eliminating SNPs in strong gametic disequilibrium (GD) with nearby SNPs, we found 2711 of 1.5 X 10⁶ SNPs had a significant effect at a false discovery rate of 5%. Causal inferences in the DGRP lines are greatly hampered by random disequilibrium between SNPs across the entire genome. SNPs with minor allele frequencies less than 10% are almost certain to be correlated at greater than r²>0.8 with at least one SNP elsewhere in the genome - usually on a different chromosome. Simulations show that this random GD effect alone can explain the tendency of small MAF SNPs to have large estimated effects. We have validated hits by comparing of the effect vectors of RNAi knockdowns for several implicated genes including ds and dpp. The multivariate vector of phenotypic effects makes informative validation much easier as it is very unlikely that similar directions of effects will be generated with no causal connection. Similarly, when two effect vectors for SNPs not in GD are similar in direction this is very likely to be due to similar mechanisms of development.