The paper introduces a new framework for feature learning in classification motivated by information theory. We first systematically
study the information structure and present a novel perspective revealing the two key factors in information utilization:
class-relevance and redundancy. We derive a new information decomposition model where a novel concept called class-relevant
redundancy is introduced. Subsequently a new algorithm called Conditional Informative Feature Extraction is formulated, which
maximizes the joint class-relevant information by explicitly reducing the class-relevant redundancies among features. To address
the computational difficulties in information-based optimization, we incorporate Parzen window estimation into the discrete
approximation of the objective function and propose a Local Active Region method which substantially increases the optimization
efficiency. To effectively utilize the extracted feature set, we propose a Bayesian MAP formulation for feature fusion, which
unifies Laplacian Sparse Prior and Multivariate Logistic Regression to learn a fusion rule with good generalization capability.
Realizing the inefficiency caused by separate treatment of the extraction stage and the fusion stage, we further develop an
improved design of the framework to coordinate the two stages by introducing a feedback from the fusion stage to the extraction
stage, which significantly enhances the learning efficiency. The results of the comparative experiments show remarkable improvements
achieved by our framework.