-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
similar and complementary sequences #9
Comments
Hi, Can you please clarify what is in your screenshot? Also, it is not clear that a priori, having 17/30 bases of complementarity would be problematic—this would depend greatly on the experimental conditions. We would recommend using the Off-Target Score as a filter for potential issues, provided that the experimental conditions you intend to use do not vary greatly from 390 mM Na+ (ie 2x SSC), 42ºC, 50% formamide for hybridization. |
These are probes generated from the PaintSHOP pipeline for the mouse genome with blockParse_unmasked.py params changed to: And some filters on the probes to just whittle them down: What you are seeing is the output of those secondary thresholds grouped by sequence and gene id (hence the order) with values taken as the median (Pandas groupby). |
I see, thanks for clarifying. Which pickled model are you loading for the themodynamic analysis? https://github.com/beliveau-lab/PaintSHOP_pipeline/tree/master/workflow/pickled_models |
Sorry for the late reply, this was with the model temp 37 |
Got it. In that case, the On/Off Target predictions should be reasonable to use as a proxy, with the caveat that conditions you used to generate those calculations (37ºC, 50% formamide, 390 mM Na+) will not be a perfect match to your blockParse conditions. We have not been able to identify a clearly defined set of rules based on sequence comparison itself to say that being within Hamming distance X in conditions Y is problematic, which is why we instead use the thermodynamic calculations to drive our decision making about what should or should not be filtered. If you are interested in using sequence comparison instead, I'd suggest converting to one-hot encoding and using https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.hamming.html I'm going to close this out as there does not seem to be any issue with the underlying code itself. |
Hello,
I was wondering what to do about highly similar and complementary probes produced by PaintSHOP. Would it be best to just filter those that are most similar to all other probes after running the pipeline, or is there some module within the pipeline that can be tuned to address this?
Thanks,
Jonathan
C2 and Arid2 are complementary for 17/30 bases
The text was updated successfully, but these errors were encountered: