This paper describes a novel scheme for automatic identification of a species from its genomic data. Random samples of a given
length (10,000 elements) are taken from a genome sequence of a particular species. A set of 64 keywords is generated using
all possible 3-tuple combinations of the 4 letters: A (for Adenine), T (for Thymine), C (for Cytosine) and G (for Guanine)
representing the four types of nucleotide bases in a DNA strand. These 43= 64 keywords are searched in a sample of the genome sequence and their corresponding frequencies of occurrence are determined.
Upon repeating this process for N randomly selected samples taken from the genome sequence, an N × 64 matrix of frequency
count data is obtained. Then Principal Component Analysis is employed on this data to obtain a Feature Descriptor of reduced
dimension (1 × 64). On determining the feature descriptors of different species and also by taking different samples from
the same species, it is found that they are unique for a particular species while wide differences exist between those of
different species. The variance of the descriptors for a given genome sequence being negligible, the proposed scheme finds
extensive applications in automatic species identification.