The emerging field of synthetic biology moves beyond conventional genetic manipulation to construct novel life forms which do not originate in nature. We explore
the problem of designing the provably shortest genomic sequence to encode a given set of genes by exploiting alternate reading
frames. We present an algorithm for designing the shortest DNA sequence simultaneously encoding two given amino acid sequences.
We show that the coding sequence of naturally occurring pairs of overlapping genes approach maximum compression. We also investigate
the impact of alternate coding matrices on overlapping sequence design. Finally, we discuss an interesting application for
overlapping gene design, namely the interleaving of an antibiotic resistance gene into a target gene inserted into a virus
or plasmid for amplification.
This research was partially supported by NSF grants EIA-0325123 and DBI-0444815.