Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added implementation of zapline for power noise removal #1032

Merged
merged 10 commits into from
Feb 19, 2025

Conversation

ariguiba
Copy link
Contributor

Information about this PR:

Current issues:

  • The algorithm takes too long to run for even a small dataset
  • Some artifacts are still visible

Copy link

welcome bot commented Dec 17, 2024

Hello! 👋 Thanks for opening your first pull request here! ❤️ We will try to get back to you soon. 🚴🏽‍♂️

@ariguiba
Copy link
Contributor Author

@behinger

@behinger
Copy link

Thanks Boshra!

  • this looks already good to me - I think zapline is at the conceptually right place (a "replacement" to notch-filtering).
  • meegkit as a requirement, here someone from mne-bids-pipeline time has to chim in for sure, is that too large? is it ok? can it be made optionally, or how does the dependency-management work?
  • the failing unittests because of deprecated use of numpy.core.numerictype are a problem to be still fixed. Maybe this is something to update upstream to the pyriemann package, can you check? I'm also wondering if we can use meegkit without ASR etc. - just the dss.py importants - but I dont know enough about python

@larsoner
Copy link
Member

meegkit as a requirement, here someone from mne-bids-pipeline time has to chim in for sure, is that too large? is it ok? can it be made optionally, or how does the dependency-management work?

We could make it optional but really:

$ pip show meegkit
...
Requires: joblib, matplotlib, numpy, pandas, pyriemann, scikit-learn, scipy, statsmodels, tqdm
...

...we already require all of these except statsmodels and pyriemann so I think it's okay just to add it, assuming it's on PyPI and conda-forge, and it does appear to be both places.

the failing unittests because of deprecated use of numpy.core.numerictype are a problem to be still fixed. Maybe this is something to update upstream to the pyriemann package, can you check? I'm also wondering if we can use meegkit without ASR etc. - just the dss.py importants - but I dont know enough about python

Either meegkit could make some of these imports optional, or we can just ignore the dtype issue locally in our tests. It would be okay to add another ignore to mne_bids_pipeline/tests/conftest.py

@hoechenberger
Copy link
Member

I'm okay with depending on meegkit. If it ever starts to cause trouble, we can simply drop the functionality again -- it's not a "core" functionality we critically depend on.

@agramfort WDYT?

@behinger
Copy link

is there an update on this? How should we move this forward?

@larsoner
Copy link
Member

@ariguiba do you still want to work on this? If so I'm happy to do a quick review, looks like it might be a few small tweaks then we could get it in!

@behinger if there is no response for a little bit (maybe a week?) then you could take over if you want

@ariguiba
Copy link
Contributor Author

So I would be done with my part, I don't know what more to tweak honestly. I think a decision needs to be made about the following:
As I understand it, the errors are caused because the code we're using from MEEGKit is using some deprecated or problematic numpy method.
Also, in my opinion using the dss_line method may not be the best idea also because it seems to be super slow even on a not-so-big dataset.
So I think the best choice would be to take the source code and adapt it to our use-case 1. to remove the problematic numpy method and 2. maybe make it faster when integrated in our pipeline.
But I don't know if it's possible to just reuse the code, what do you think?
Or how would you move forward with this? Is there some small tweaks I can still do?

@larsoner
Copy link
Member

So I think the best choice would be to take the source code and adapt it to our use-case 1. to remove the problematic numpy method and 2. maybe make it faster when integrated in our pipeline.

I think it would be better to improve meegkit directly if possible -- have you raised the issue over there yet? Better to improve the upstream package rather than start maintaining a parallel implementation

In the meantime I can hopefully push some commits next week to make CIs happy

@larsoner
Copy link
Member

... actually meegkit 0.1.9 landed three days ago, I'll restart CIs to see if it's fixed already

@larsoner
Copy link
Member

Looks like it ran out of memory, I'll try an 8GB machine but if that dies, too, then the implementation will need to be improved before this can proceed I think

@larsoner
Copy link
Member

Looking at the CircleCI example output from https://output.circle-artifacts.com/output/job/22f9deac-3a86-4049-b3c3-4d05693364f6/artifacts/0/site/examples/eeg_matchingpennies.html#generated-output things look okay, WDYT @ariguiba ?

@ariguiba
Copy link
Contributor Author

Looks good to me too! As I said maybe the upstream implementation can be improved in matters of performance but otherwise it seems to be doing what it should thank you for the tweaks 👍

@larsoner larsoner marked this pull request as ready for review February 18, 2025 15:52
@larsoner
Copy link
Member

@drammock feel free to merge if you're happy

Copy link
Member

@drammock drammock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is actually getting tested adequately. I spot-checked several dataset test logs on the CIs, and all of them under 04 frequency filter said "computation unnecessary (cached)" so I think our CIs didn't actually hit any case where zapline_fline=None (which, see below for why I'm concerned about that).

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only dataset where zapline is enabled is eeg_matchingpennies so you should see it used here:

https://app.circleci.com/pipelines/github/mne-tools/mne-bids-pipeline/4911/workflows/2955b461-eadd-49d4-b353-6a031601b6b3/jobs/76007?invite=true#step-104-1870_63

Note if you ever look at GHA logs also make sure that you're looking at the correct one. I suspect you maybe looked at the bottommost one, which should show all steps cached. You need to go one above that to see the first run. (The second one is actually a caching test!) See for example:

image

I'll push a little commit to name the runs to help with that part

Comment on lines +78 to +79
if fline is None:
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drammock there is a short-circuit here for the None case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a short-circuit here for the None case

🤦🏻 sorry, how did I miss that.

@drammock
Copy link
Member

if you ever look at GHA logs also make sure that you're looking at the correct one. I suspect you maybe looked at the bottommost one, which should show all steps cached. You need to go one above that to see the first run. (The second one is actually a caching test!)

I was looking at the CircleCI runs, but thanks for the GHA tip. The problem was that I somehow missed the short-circuit.

@drammock drammock enabled auto-merge (squash) February 19, 2025 15:49
@drammock drammock merged commit c9b79e0 into mne-tools:main Feb 19, 2025
55 of 56 checks passed
Copy link

welcome bot commented Feb 19, 2025

🎉 Congrats on merging your first pull request! 🥳 Looking forward to seeing more from you in the future! 💪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants