DNA sequence analysis methods, such as motif discovery, gene detection or phylogeny reconstruction, can often provide important
input for biological studies. Many of such methods require a background
model, representing the expected distribution of short substrings in a given DNA region. Most current techniques for modeling this
distribution disregard the evolutionary processes underlying DNA formation. We propose a novel approach for modeling DNA k-mer distribution that is capable of taking the notions of evolution and natural selection into account. We derive a computionally
tractable approximation for estimating k-mer probabilities at genetic equilibrium, given a description of evolutionary processes in terms of fitness and mutation
probabilities. We assess the goodness of this approximation via numerical experiments. Besides providing a generative model
for DNA sequences, our method has further applications in motif discovery.