We derive new quantitative descriptors for the 20 naturally occurring amino acids based on multidimensional scaling of 237 physical–chemical properties. We show that a five-dimensional property space can be constructed such that the amino acids are in a similar spatial distribution to that in the original high-dimensional property space. Properties that correlate well with the five major components were hydrophobicity, size, preferences for amino acids to occur in

-helices, number of degenerate triplet codons and the frequency of occurrence of amino acid residues in

-strands. Distances computed for pairs of amino acids in the five-dimensional property space are highly correlated with corresponding scores from similarity matrices derived from sequence and 3D structure comparison. We used the five-dimensional property distances to cluster the amino acids in groups depending on a cutoff distance. These groups define a reduced amino acid alphabet for protein folding studies. Our descriptors should provide a quantitative means to identify property motifs in sequences of protein families. Electronic supplementary material to this paper can be obtained by using the Springer Link server located at http://dx.doi.org/10.1007/s00894-001-0058-5.
Keywords Multidimensional scaling – Amino acid – Substitution matrices – BLOSUM – PAM – Physical–chemical properties – Cluster analysis