Skip to content

Feature: Support for file-like objects for EDF/BDF/GDF files #13156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 78 commits into
base: main
Choose a base branch
from

Conversation

szz-dvl
Copy link

@szz-dvl szz-dvl commented Mar 12, 2025

Reference issue (if any)

Feature #8976.

What does this implement/fix?

Added support for file-like objects for EDF/BDF/GDF files.

Additional information

Until today, mne was checking for the file extension to validate file types, we may not have a file name from a binary object, so, if a binary object (file-like) is provided we trust the user (depending on what API function he is using: read_raw_edf, read_raw_gdf, read_raw_bdf). A new enumeration has been declared to propagate the file type to underlying methods:

class FileType(Enum):
    """Enumeration to differentiate files when the extension is not known."""

    GDF = 1
    EDF = 2
    BDF = 3

In the function _read_edf_header a new exception is raised if the header can be properly parsed, informing the user that the file provided is not valid:

try:
     header_nbytes = int(_edf_str(fid.read(8)))
except ValueError:
    raise Exception(
        f"Bad {'EDF' if file_type is FileType.EDF else 'BDF'} file provided."
    )

In the function _read_gdf_header, a new exception is raised when the version of the document can't be properly parsed, informing the user that the file provided is not valid:

try:
    version = fid.read(8).decode()
    edf_info["type"] = edf_info["subtype"] = version[:3]
    edf_info["number"] = float(version[4:])
except ValueError:
    raise Exception("Bad GDF file provided.")        

To accommodate this changes the class RawEDF class has been duplicated (RawBDF) to differentiate between BDF and EDF files when the extension is not known. To avoid changes in the file mne-python/mne/io/tests/test_raw.py.

Additionally 4 new tests have been added to the file mne-python/mne/io/edf/tests/test_edf.py and 2 new tests have been added to the file mne-python/mne/io/edf/tests/test_gdf.py, those tests will open a valid file and check for the channels, and open an invalid file and expect an error. For GDF I'm using an empty file (I don't know where to get bad GDF files).

I've created a helper function to open GDF and EDF files under mne-python/mne/_edf/open.py.

To support in memory file-like objects I'm using the workaround mentioned here, the function has been placed in "fixes.py". The calls to numpy.fromfile have then been replaced by calls to the new function read_from_file_or_buffer. The implementation is as follows:

def read_from_file_or_buffer(
    file: str | bytes | os.PathLike | io.IOBase, dtype: numpy.typing.DTypeLike = float, count: int = -1
):
    """numpy.fromfile() wrapper, handling io.BytesIO file-like streams.

    Numpy requires open files to be actual files on disk, i.e., must support
    file.fileno(), so it fails with file-like streams such as io.BytesIO().

    If numpy.fromfile() fails due to no file.fileno() support, this wrapper
    reads the required bytes from file and redirects the call to
    numpy.frombuffer().

    See https://github.com/numpy/numpy/issues/2230
    """
    try:
        return numpy.fromfile(file, dtype=dtype, count=count)
    except io.UnsupportedOperation as e:
        if not (e.args and e.args[0] == "fileno" and isinstance(file, io.IOBase)):
            raise  # Nothing I can do about it
        dtype = numpy.dtype(dtype)
        buffer = file.read(dtype.itemsize * count)
        return numpy.frombuffer(buffer, dtype=dtype, count=count)
        

Copy link

welcome bot commented Mar 12, 2025

Hello! 👋 Thanks for opening your first pull request here! ❤️ We will try to get back to you soon. 🚴

@szz-dvl szz-dvl marked this pull request as draft March 20, 2025 18:28
@szz-dvl szz-dvl marked this pull request as ready for review March 20, 2025 20:30
Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments, once you push and expect tests to pass can you ping me? I'll need to approve the CI workflow manually

@@ -1597,7 +1882,8 @@ def read_raw_edf(
Parameters
----------
input_fname : path-like
Path to the EDF or EDF+ file.
Path to the EDF or EDF+ file or EDF/EDF+ file itself. If a file-like
object is provided, preload must be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
object is provided, preload must be used.
object is provided, preload must be used.
.. versionchanged:: 1.10
Added support for file-like objects.

Same in other functions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larsoner Only for public functions, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah read_raw_* should have them, classes not as critical

Comment on lines 2208 to 2210
if not _file_like(input_fname):
input_fname = os.path.abspath(input_fname)
ext = os.path.splitext(input_fname)[1][1:].lower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is 95% the same code 3 places, should move to a helper function to be more DRY

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larsoner I'm using _check_fname as suggested below, is it enough?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larsoner a new function _check_args has been created, I encapsulated the whole conditional in this function.

Comment on lines 1983 to 1984
input_fname = os.path.abspath(input_fname)
ext = os.path.splitext(input_fname)[1][1:].lower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to prefer pathlib interface, which _check_fname will provide for you

input_fname = _check_fname(fname=input_fname, overwrite="read", must_exist=True)

now input_fname is a pathlib.Path and you can get the extension using a method

@szz-dvl
Copy link
Author

szz-dvl commented Apr 18, 2025

@larsoner Pushing the changes

@szz-dvl
Copy link
Author

szz-dvl commented Apr 19, 2025

@larsoner I took a look at the failing tests, but I don't know the reason they are failing, do you have any clue?

Thanks!

@drammock
Copy link
Member

@larsoner I took a look at the failing tests, but I don't know the reason they are failing, do you have any clue?

Thanks!

CI errors are unrelated, tracking them at #13221

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants