On the basis of exploratory factor analysis, six multidimensional patterns of 516 amino acid attributes, namely, factor analysis
scales of generalized amino acid information (FASGAI) involving hydrophobicity, alpha and turn propensities, bulky properties,
compositional characteristics, local flexibility and electronic properties, are proposed to represent structures of 48 bitter-tasting
dipeptides and 58 angiotensin-converting enzyme inhibitors. Characteristic parameters related to bioactivities of the peptides
studied are selected by genetic algorithm, and quantitative structure–activity relationship (QSAR) models are constructed
by partial least square (PLS). Our results by a leave-one-out cross validation are compared with the previously known structure
representation method and are shown to give slightly superior or comparative performance. Further, two data sets are divided
into training sets and test sets to validate the characterization repertoire of FASGAI. Performance of the PLS models developed
by training samples by a leave-one-out cross validation and external validation for test samples are satisfying. These results
demonstrate that FASGAI is an effective representation technique of peptide structures, and that FASGAI vectors have many
preponderant characteristics such as straightforward physicochemical information, high characterization competence and easy
manipulation. They can be further applied to investigate the relationship between structures and functions of various peptides,
even proteins.
Keywords Peptide - Factor analysis scales of generalized amino acid information - Quantitative structure–activity relationship - Partial least squares - Genetic algorithm-partial least square