Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

similar and complementary sequences #9

Closed
jonathanperrie opened this issue Mar 1, 2022 · 5 comments
Closed

similar and complementary sequences #9

jonathanperrie opened this issue Mar 1, 2022 · 5 comments

Comments

@jonathanperrie
Copy link

Hello,

I was wondering what to do about highly similar and complementary probes produced by PaintSHOP. Would it be best to just filter those that are most similar to all other probes after running the pipeline, or is there some module within the pipeline that can be tuned to address this?

Thanks,

Jonathan

probe_complementarity

C2 and Arid2 are complementary for 17/30 bases

@brianbeliveau
Copy link
Member

Hi,

Can you please clarify what is in your screenshot?

Also, it is not clear that a priori, having 17/30 bases of complementarity would be problematic—this would depend greatly on the experimental conditions. We would recommend using the Off-Target Score as a filter for potential issues, provided that the experimental conditions you intend to use do not vary greatly from 390 mM Na+ (ie 2x SSC), 42ºC, 50% formamide for hybridization.

@jonathanperrie
Copy link
Author

jonathanperrie commented Mar 2, 2022

These are probes generated from the PaintSHOP pipeline for the mouse genome with blockParse_unmasked.py params changed to:
-l, L : 30, 30
-t, T : 47, 57
-g, G: 43, 63
-s 300
-F 30

And some filters on the probes to just whittle them down:
on-target > 95
off-target < 200
max k < 17
2nd structure < 0.25

What you are seeing is the output of those secondary thresholds grouped by sequence and gene id (hence the order) with values taken as the median (Pandas groupby).

@brianbeliveau
Copy link
Member

I see, thanks for clarifying.

Which pickled model are you loading for the themodynamic analysis? https://github.com/beliveau-lab/PaintSHOP_pipeline/tree/master/workflow/pickled_models

@jonathanperrie
Copy link
Author

Sorry for the late reply, this was with the model temp 37

@brianbeliveau
Copy link
Member

Got it. In that case, the On/Off Target predictions should be reasonable to use as a proxy, with the caveat that conditions you used to generate those calculations (37ºC, 50% formamide, 390 mM Na+) will not be a perfect match to your blockParse conditions.

We have not been able to identify a clearly defined set of rules based on sequence comparison itself to say that being within Hamming distance X in conditions Y is problematic, which is why we instead use the thermodynamic calculations to drive our decision making about what should or should not be filtered. If you are interested in using sequence comparison instead, I'd suggest converting to one-hot encoding and using https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.hamming.html

I'm going to close this out as there does not seem to be any issue with the underlying code itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants