A fundamental question in human genetics is the degree to which the polygenic character of complex traits derives from polymorphism
in genes with similar or with dissimilar functions. The many genome-wide association studies now being performed offer an
opportunity to investigate this, and although early attempts are emerging, new tools and modeling strategies still need to
be developed and deployed. Towards this goal, we implemented a new algorithm to facilitate the transition from genetic marker
lists (principally those generated by PLINK) to pathway analyses of representational gene sets in either threshold or threshold-free
downstream applications (e.g. DAVID, GSEA-P, and Ingenuity Pathway Analysis). This was applied to several large genome-wide
association studies covering diverse human traits that included type 2 diabetes, Crohn’s disease, and plasma lipid levels.
Validation of this approach was obtained for plasma HDL levels, where functional categories related to lipid metabolism emerged
as the most significant in two independent studies. From analyses of these samples, we highlight and address numerous issues
related to this strategy, including appropriate gene based correction statistics, the utility of imputed versus non-imputed
marker sets, and the apparent enrichment of pathways due solely to the positional clustering of functionally related genes.
The latter in particular emphasizes the importance of studies that directly tie genetic variation to functional characteristics
of specific genes. The software freely provided that we have called ProxyGeneLD may resolve an important bottleneck in pathway-based
analyses of genome-wide association data. This has allowed us to identify at least one replicable case of pathway enrichment
but also to highlight functional gene clustering as a potentially serious problem that may lead to spurious pathway findings
if not corrected.