Qs about palindrome assignment

Bioinformatics Models and Algorithms

Moderator: KevinKarplus

Re: Qs about palindrome assignment

Postby MiaGrifford » 11/03/2009 08:55 pm

I am still confused about how to get the e-values. I thought that you said in class that t was just the observed count, but if I plug in the observed count into the equation in the assignment: erfc(t/sqrt(2)) my numbers are way off.
MiaGrifford
Newbie
 
Posts: 7
Joined: 10/02/2009 05:06 pm

Re: Qs about palindrome assignment

Postby KevinKarplus » 11/03/2009 09:03 pm

The erfc function gives the P-value for a normal distribution with mean 0 and standard deviation 1. You have to rescale first (that's what the Z-score is for).
User avatar
KevinKarplus
Guru
 
Posts: 75
Joined: 09/08/2009 03:15 pm
Location: PSB 318

Re: Qs about palindrome assignment

Postby sng » 11/04/2009 02:01 pm

I am using: P(W) = P(W1 | W2...Wn-1) * P(Wn | W2...Wn-1) * P(W2...Wn-1)
and then multiplying that by N to get E(C(W)) = N*P(W)

Does it matter or should we be using:
E(C(W)) =~ C(W1...Wn-1*) * C(*W2...Wn) / C(*W2...Wn-1*)

Also, did you add pseudocounts? So what if a given C(*W2...Wn-1*) is not observed you will divide by 0.
sng
Newbie
 
Posts: 6
Joined: 11/04/2009 01:58 pm

Re: Qs about palindrome assignment

Postby KevinKarplus » 11/04/2009 04:41 pm

I did
Code: Select all
counts[kmer[1:]] * counts[kmer[:-1]] / c if c>0 else 0

to avoid divide-by-zero problems.

I did not use pseudocounts, but maximum likelihood estimates.
If you do use pseudocounts, you have to be careful about computing the marginals right.
For very tiny pseudocounts, your answer should be very close to mine (unless I have a bug).
User avatar
KevinKarplus
Guru
 
Posts: 75
Joined: 09/08/2009 03:15 pm
Location: PSB 318

Re: Qs about palindrome assignment

Postby sng » 11/04/2009 11:59 pm

Do you want us to circularize the genome?
sng
Newbie
 
Posts: 6
Joined: 11/04/2009 01:58 pm

Re: Qs about palindrome assignment

Postby KevinKarplus » 11/05/2009 06:18 am

I did not circularize the genome, because it did not occur to me to do so. If you do decide to circularize, make that an option that is off by default (so I can test more easily). Bacterial chromosomes are usually circular, but there is a good chance that a program like this could be applied to an incomplete genome, a viral one, or a eukaryotic one, none of which should be circularized.
User avatar
KevinKarplus
Guru
 
Posts: 75
Joined: 09/08/2009 03:15 pm
Location: PSB 318

Previous

Return to BME 205: Bioinformatics Models and Algorithms

Who is online

Users browsing this forum: No registered users and 1 guest