-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request: conditional filtering of missing values #182
Comments
Hi @KristinaGomoryova , If I understand correctly, you want a function where you can provide a group (eg experimental condition, phenotype,...) and a threshold. The function than looks at the number of missing values within each group. If at least one of the groups has a value lower than the threshold, you keep that protein, otherwise you discard it. Is it correct? |
Rather than filtering, I think we should aim for a function that tags proteins that match the desired criteria, either by returning a vector of booleans (for a SE) or list of booleans or adds new rowData variables. That way, the user can either user |
Two parallel ideas:
|
Yes, So we could have a function that identifies these proteins, so that
|
Yes, I meant it exactly like you describe it, Chris - the threshold would mean maximum number (or percentage) of missing values allowed per condition, and if at least one condition has value lower, we want to keep that protein. And I like Laurent's idea that they would be just labelled, although I am not sure if these are the present/absent ones - I think these are rather the ones we indeed want to filter out from the dataset (e.g. these will be the proteins, which were identified e.g. only in 1 out of 3 replicates in most conditions), but I might be wrong here |
I think we are talking about different things:
|
Now I get it, sorry for misunderstanding :) It would be great to have both of these then! |
I was the one misunderstanding your initial request. |
Hi,
would it be possible to add a function allowing to set a threshold in how many replicates of at least one condition can be protein missing? Similarly as it is done in
filter_proteins()
orfilter_missval()
functions in DEP R package.Thanks for considering!
The text was updated successfully, but these errors were encountered: