Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: conditional filtering of missing values #182

Open
KristinaGomoryova opened this issue Jan 15, 2023 · 8 comments
Open

feature request: conditional filtering of missing values #182

KristinaGomoryova opened this issue Jan 15, 2023 · 8 comments

Comments

@KristinaGomoryova
Copy link

Hi,

would it be possible to add a function allowing to set a threshold in how many replicates of at least one condition can be protein missing? Similarly as it is done in filter_proteins() or filter_missval() functions in DEP R package.

Thanks for considering!

@cvanderaa
Copy link
Collaborator

cvanderaa commented Jan 16, 2023

Hi @KristinaGomoryova ,

If I understand correctly, you want a function where you can provide a group (eg experimental condition, phenotype,...) and a threshold. The function than looks at the number of missing values within each group. If at least one of the groups has a value lower than the threshold, you keep that protein, otherwise you discard it. Is it correct?

@lgatto
Copy link
Member

lgatto commented Jan 16, 2023

Rather than filtering, I think we should aim for a function that tags proteins that match the desired criteria, either by returning a vector of booleans (for a SE) or list of booleans or adds new rowData variables. That way, the user can either user filterFeatures() or use that variable for mixed imputation.

@cvanderaa
Copy link
Collaborator

Two parallel ideas:

  • I think what Kristina wants could be implement in filterNA() by adding a groupBy argument.
  • I like the idea of adding a tag and then call filterFeatures(), but then should apply this "tagging" behavior to filterNA()?

@lgatto
Copy link
Member

lgatto commented Jan 16, 2023

Yes, filterNA() is also a good suggestion. But personally, I would also want to be able to look at these proteins - these are candidates that could have present/absent patterns, might not be amenable to statistical tests without imputation, and this lead to mixed imputation... hence with important downstream implications

So we could have a function that identifies these proteins, so that

  • we can explore visualise them (for instance with a heatmap)
  • impute with mixed imputation (randna parameter)
  • to remove these, we could consider adding a fcol parameter to filterNA(), although filterFeatures() would fit out of the box

@KristinaGomoryova
Copy link
Author

Yes, I meant it exactly like you describe it, Chris - the threshold would mean maximum number (or percentage) of missing values allowed per condition, and if at least one condition has value lower, we want to keep that protein.

And I like Laurent's idea that they would be just labelled, although I am not sure if these are the present/absent ones - I think these are rather the ones we indeed want to filter out from the dataset (e.g. these will be the proteins, which were identified e.g. only in 1 out of 3 replicates in most conditions), but I might be wrong here

@lgatto
Copy link
Member

lgatto commented Jan 16, 2023

I think we are talking about different things:

  • Indeed, @KristinaGomoryova wants a better way to specify a threshold in filterNA(), that takes groups into account. Yes, that is indeed a sensible request. This would be addressed by a new groupBy argument to filterNA(). @KristinaGomoryova, feel free to send a PR if you want.
  • I was referring to something else, which would be based on a similar logic: are there any proteins that are (mostly) present in one/multiple group(s) and (mostly) absent in another/others. I think this would deserve a new function, such as, for example, naPatterns(), or something along those lines.

@KristinaGomoryova
Copy link
Author

KristinaGomoryova commented Jan 16, 2023

Now I get it, sorry for misunderstanding :)

It would be great to have both of these then!

@lgatto
Copy link
Member

lgatto commented Jan 16, 2023

I was the one misunderstanding your initial request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants