similar and complementary sequences #9

jonathanperrie · 2022-03-01T23:52:23Z

Hello,

I was wondering what to do about highly similar and complementary probes produced by PaintSHOP. Would it be best to just filter those that are most similar to all other probes after running the pipeline, or is there some module within the pipeline that can be tuned to address this?

Thanks,

Jonathan

C2 and Arid2 are complementary for 17/30 bases

brianbeliveau · 2022-03-02T01:11:03Z

Hi,

Can you please clarify what is in your screenshot?

Also, it is not clear that a priori, having 17/30 bases of complementarity would be problematic—this would depend greatly on the experimental conditions. We would recommend using the Off-Target Score as a filter for potential issues, provided that the experimental conditions you intend to use do not vary greatly from 390 mM Na+ (ie 2x SSC), 42ºC, 50% formamide for hybridization.

jonathanperrie · 2022-03-02T01:29:23Z

These are probes generated from the PaintSHOP pipeline for the mouse genome with blockParse_unmasked.py params changed to:
-l, L : 30, 30
-t, T : 47, 57
-g, G: 43, 63
-s 300
-F 30

And some filters on the probes to just whittle them down:
on-target > 95
off-target < 200
max k < 17
2nd structure < 0.25

What you are seeing is the output of those secondary thresholds grouped by sequence and gene id (hence the order) with values taken as the median (Pandas groupby).

brianbeliveau · 2022-03-02T01:38:02Z

I see, thanks for clarifying.

Which pickled model are you loading for the themodynamic analysis? https://github.com/beliveau-lab/PaintSHOP_pipeline/tree/master/workflow/pickled_models

jonathanperrie · 2022-03-02T02:21:50Z

Sorry for the late reply, this was with the model temp 37

brianbeliveau · 2022-03-02T17:16:14Z

Got it. In that case, the On/Off Target predictions should be reasonable to use as a proxy, with the caveat that conditions you used to generate those calculations (37ºC, 50% formamide, 390 mM Na+) will not be a perfect match to your blockParse conditions.

We have not been able to identify a clearly defined set of rules based on sequence comparison itself to say that being within Hamming distance X in conditions Y is problematic, which is why we instead use the thermodynamic calculations to drive our decision making about what should or should not be filtered. If you are interested in using sequence comparison instead, I'd suggest converting to one-hot encoding and using https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.hamming.html

I'm going to close this out as there does not seem to be any issue with the underlying code itself.

brianbeliveau closed this as completed Mar 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

similar and complementary sequences #9

similar and complementary sequences #9

jonathanperrie commented Mar 1, 2022

brianbeliveau commented Mar 2, 2022

jonathanperrie commented Mar 2, 2022 •

edited

Loading

brianbeliveau commented Mar 2, 2022

jonathanperrie commented Mar 2, 2022

brianbeliveau commented Mar 2, 2022

similar and complementary sequences #9

similar and complementary sequences #9

Comments

jonathanperrie commented Mar 1, 2022

brianbeliveau commented Mar 2, 2022

jonathanperrie commented Mar 2, 2022 • edited Loading

brianbeliveau commented Mar 2, 2022

jonathanperrie commented Mar 2, 2022

brianbeliveau commented Mar 2, 2022

jonathanperrie commented Mar 2, 2022 •

edited

Loading