Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance binary file detection by integrating MIME type analysis, resolves #975 #1003

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
- [Improvement] This is a non-breaking change. Enhanced is_binary_file function in env file for better binary file detection. (by @Abdul-Muqadim-Arbisoft)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changelog entry is not very informative. As a Tutor user or plugin developer, what change(s) should I expect? Will some files now be considered as text/binary, that previously were not? If there is no change, what's the purpose of this PR?

9 changes: 9 additions & 0 deletions tests/test_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,15 @@ def test_files_are_rendered(self) -> None:
def test_is_binary_file(self) -> None:
self.assertTrue(env.is_binary_file("/home/somefile.ico"))

def test_is_binary_file_with_text_extension(self) -> None:
self.assertFalse(env.is_binary_file("/home/script.js"))

def test_is_binary_file_with_unrecognized_extension(self) -> None:
self.assertFalse(env.is_binary_file("/home/unknown.extension"))

def test_is_binary_file_without_extension(self) -> None:
self.assertFalse(env.is_binary_file("/home/file"))

def test_find_os_path(self) -> None:
environment = env.JinjaEnvironment()
path = environment.find_os_path("local/docker-compose.yml")
Expand Down
28 changes: 27 additions & 1 deletion tutor/env.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
import typing as t
from copy import deepcopy

import mimetypes

import jinja2
import importlib_resources

Expand All @@ -26,6 +28,8 @@
".woff",
".woff2",
]
TEXT_MIME_TYPES = ["application/xml", "application/json"]
TEXT_FILE_EXTENSIONS = [".html", ".xml", ".json", ".css", ".js"]
JinjaFilter = t.Callable[..., t.Any]


Expand Down Expand Up @@ -501,7 +505,29 @@ def read_core_template_file(*path: str) -> str:


def is_binary_file(path: str) -> bool:
ext = os.path.splitext(path)[1]
"""
Determines if the specified file is binary based on its MIME type or file extension.

This function first attempts to guess the MIME type of the file based on its path.
If the MIME type indicates that the file is not text and not a known text-based MIME type,
it is considered binary. If the MIME type cannot be determined or is not indicative
of a binary file, the function then checks the file's extension against a predefined
list of binary file extensions, as well as a list of known text file extensions.

Parameters:
- path (str): The path to the file whose type is to be determined.

Returns:
- bool: True if the file is determined to be binary, False otherwise.
"""
mime_type, _ = mimetypes.guess_type(path)
if mime_type:
if mime_type.startswith("text/") or mime_type in TEXT_MIME_TYPES:
return False
return True
ext = os.path.splitext(path)[1].lower()
if ext in TEXT_FILE_EXTENSIONS:
return False
return ext in BIN_FILE_EXTENSIONS


Expand Down
Loading