We propose a new testing procedure for the automatic ontological analysis of gene expression data. The objective of the ontological
analysis is to retrieve some functional annotations, e.g. Gene Ontology terms, relevant to underlying cellular mechanisms
behind the gene expression profiles, and currently, a large number of tools have been developed for this purpose. The most
existing tools implement the same approach that exploits rank statistics of the genes which are ordered by the strength of
statistical evidences, e.g. p-values computed by testing hypotheses at the individual gene level. However, such an approach often causes the serious false
discovery. Particularly, one of the most crucial drawbacks is that the rank-based approaches wrongly judge the ontology term
as statistically significant although all of the genes annotated by the ontology term are irrelevant to the underlying cellular
mechanisms. In this paper, we first point out some drawbacks of the rank-based approaches from the statistical point of view,
and then, propose a new testing procedure in order to overcome the drawbacks. The method that we propose has the theoretical
basis on the statistical meta-analysis, and the hypothesis to be tested is suitably stated for the problem of the ontological
analysis. We perform Monte Carlo experiments for highlighting the disadvantages of the rank-based approach and the advantages
of the proposed method. Finally, we demonstrate the applicability of the proposed method along with the ontological analysis
of the gene expression data of human diabetes.
Keywords Gene Ontology - Gene Expression Data - Statistical Meta-Analysis - Fisher’s Exact Test