I am still confused about how to get the e-values. I thought that you said in class that t was just the observed count, but if I plug in the observed count into the equation in the assignment: erfc(t/sqrt(2)) my numbers are way off.
The erfc function gives the P-value for a normal distribution with mean 0 and standard deviation 1. You have to rescale first (that's what the Z-score is for).
counts[kmer[1:]] * counts[kmer[:-1]] / c if c>0 else 0
to avoid divide-by-zero problems.
I did not use pseudocounts, but maximum likelihood estimates. If you do use pseudocounts, you have to be careful about computing the marginals right. For very tiny pseudocounts, your answer should be very close to mine (unless I have a bug).
I did not circularize the genome, because it did not occur to me to do so. If you do decide to circularize, make that an option that is off by default (so I can test more easily). Bacterial chromosomes are usually circular, but there is a good chance that a program like this could be applied to an incomplete genome, a viral one, or a eukaryotic one, none of which should be circularized.