Skip to content
This repository was archived by the owner on Feb 23, 2024. It is now read-only.

feat: Return the status of unproductive contigs in is_valid function #287

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

nmmsv10x
Copy link
Contributor

@nmmsv10x nmmsv10x commented Aug 24, 2023

Defining failure categories and returning why a contig is not productive.

Todo:

  • Don't return early for no cdr3

JIRA: https://10xtech.atlassian.net/browse/CELLRANGER-7568

@nmmsv10x nmmsv10x marked this pull request as ready for review August 24, 2023 22:04
@nmmsv10x nmmsv10x requested a review from shamoni August 24, 2023 22:05
Comment on lines 23 to 41
pub enum UnproductiveContigCause {
NoCdr3,
Misordered,
NotFull,
TooLarge,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add comments above each failure state describing exactly what criteria need to met to qualify.

// Unwrap gamma/delta mode flag
let gd_mode = is_gd.unwrap_or(false);
let refs = &refdata.refs;
let rheaders = &refdata.rheaders;
let mut ret_vec = Vec::new();
let mut never_full = true;
// two passes, one for light chains and one for heavy chains
for pass in 0..2 {
let mut m = "A";
if pass == 1 {
Copy link
Collaborator

@shamoni shamoni Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ann tuple below is all over the vdj code base and it makes everything so confusing and hard to follow. Converting this into a struct is a lot more work but at-least in this function if we could rename the unpacked fields to something more intuitive it would really help. I think its defined here

// { ( sequence start, match length, ref tig, ref tig start, {mismatches} ) }.

Comment on lines -104 to +170
if pass == 2 || n % 3 == 1 {
// on second pass, go through with checking for stop codon regardless of n % 3 value
if inner_pass == 2 || n % 3 == 1 {
Copy link
Collaborator

@shamoni shamoni Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original code was conflating two checks: (1) full length i.e. having a vstart and jstop and (2) finding a frameshift and/or stop codon. We should split the two checks and specify them as separate fields in the enum. FYI these are the categories described on our software support site: https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/algorithms/annotation#productive

@nmmsv10x nmmsv10x force-pushed the nmmsv/wasted-data-contig branch from 63e7479 to 16a96cf Compare October 30, 2023 21:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants