Enhance binary file detection by integrating MIME type analysis, resolves #975 #1003

Abdul-Muqadim-Arbisoft · 2024-02-19T14:34:53Z

This update significantly refines the is_binary_file function with a more nuanced detection mechanism. The function now starts by attempting to guess the MIME type of the file based on its path. If the MIME type explicitly indicates a non-text nature or belongs to a set of known binary MIME types, the file is considered binary. However, this version introduces a critical improvement: it recognizes specific MIME types and file extensions that, despite not starting with 'text/', are commonly associated with text content (e.g., 'application/xml', 'application/json'). These are explicitly checked and treated as text files.

In situations where the MIME type is inconclusive or suggests a binary nature — but the file extension is known to be associated with text content (such as '.html', '.xml', '.json', '.css', '.js') — the function overrides the MIME type indication and classifies the file as text. Conversely, if the file's extension is among those recognized as binary, it is deemed as such, making this approach highly effective for distinguishing between binary and text files with greater precision

tutor/env.py

DawoudSheraz

Add a changelog
Squash the commits

Abdul-Muqadim-Arbisoft · 2024-02-28T09:12:10Z

@regisb can you squash the commits while merging the branch

DawoudSheraz · 2024-03-17T16:48:13Z

@regisb A reminder for reviewing and squash-merging this PR, thanks.

regisb

Unless I'm mistaken, this PR was designed to address issue #975 right? If yes, then we need to implement the filter that I described in this comment. This filter would then make use of the is_binary function to decide whether any given file should be rendered. Does that make sense? If not we should have a live discussion about this.

Also, please rebase and squash your changes on top of master.

regisb · 2024-03-21T08:01:00Z

changelog.d/20240222_174417_abdul.muqadim_binary_file_detection_update.md

@@ -0,0 +1 @@
+- [Improvement] This is a non-breaking change. Enhanced is_binary_file function in env file for better binary file detection. (by @Abdul-Muqadim-Arbisoft)


This changelog entry is not very informative. As a Tutor user or plugin developer, what change(s) should I expect? Will some files now be considered as text/binary, that previously were not? If there is no change, what's the purpose of this PR?

regisb · 2024-04-16T09:49:43Z

To clarify my thought above, here's what I have in mind:

@hooks.Filters.IS_FILE_RENDERED.add()
def _do_not_render_binary_files(result: bool, path: str) -> bool:
    if result and is_binary(path):
        result = False
    return result


def render_files():
    for path in files:
        if hooks.Filters.IS_FILE_RENDERED.apply(True, path):
            # render file
        else:
            # copy file

To understand the benefit of the above approach, consider the following scenarios:

A certain plugin needs a certain html file to be copied, not rendered.
Plugin 1 needs .zoopla1 files to be copied, not rendered. Same for plugin 2 with .zoopla2 files.
The list of binary extensions changes over time.

Neither scenario can be addressed with simple configuration settings; in the scenarios 2 and 3, users have to run manual config save --append commands to keep up with the changes.

DawoudSheraz · 2024-05-13T06:40:38Z

Closing this in favor of #1062

Improve binary file detection in is_binary_file function

db2efb3

Abdul-Muqadim-Arbisoft requested a review from regisb February 19, 2024 14:35

Abdul-Muqadim-Arbisoft self-assigned this Feb 19, 2024

Abdul-Muqadim-Arbisoft changed the title ~~Enhance binary file detection by integrating MIME type analysis~~ Enhance binary file detection by integrating MIME type analysis, resolves #975 Feb 19, 2024

Code reformatted

af0963e

Abdul-Muqadim-Arbisoft added the enhancement Enhancements will be processed by decreasing priority label Feb 19, 2024

Code reformatted

9f5635d

regisb requested a review from DawoudSheraz February 20, 2024 08:08

regisb assigned DawoudSheraz and unassigned Abdul-Muqadim-Arbisoft Feb 20, 2024

DawoudSheraz reviewed Feb 21, 2024

View reviewed changes

tutor/env.py Outdated Show resolved Hide resolved

Additonal checks added for checking binary file

5e3311e

Abdul-Muqadim-Arbisoft requested a review from DawoudSheraz February 21, 2024 11:50

Abdul-Muqadim-Arbisoft and others added 6 commits February 21, 2024 17:16

code reformatted

8b46a8a

Enhance binary file detection by integrating MIME type analysis

479ba54

code reformatted

ecca4f8

Merge branch 'master' into muqadim/binary-file-detection-update

e811c4e

Unit tests added for is_binary_file function in env file

75e031d

Binary_file_extension name corrected

4a3f8a0

DawoudSheraz approved these changes Feb 22, 2024

View reviewed changes

Abdul-Muqadim-Arbisoft requested review from regisb and removed request for regisb February 22, 2024 12:24

Abdul-Muqadim-Arbisoft and others added 2 commits February 22, 2024 17:46

Changelog entry added

335d179

Update 20240222_174417_abdul.muqadim_binary_file_detection_update.md

08e6794

regisb requested changes Mar 21, 2024

View reviewed changes

DawoudSheraz closed this May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance binary file detection by integrating MIME type analysis, resolves #975 #1003

Enhance binary file detection by integrating MIME type analysis, resolves #975 #1003

Abdul-Muqadim-Arbisoft commented Feb 19, 2024 •

edited

Loading

DawoudSheraz left a comment

Abdul-Muqadim-Arbisoft commented Feb 28, 2024

DawoudSheraz commented Mar 17, 2024

regisb left a comment

regisb Mar 21, 2024

regisb commented Apr 16, 2024

DawoudSheraz commented May 13, 2024

		@@ -0,0 +1 @@
		- [Improvement] This is a non-breaking change. Enhanced is_binary_file function in env file for better binary file detection. (by @Abdul-Muqadim-Arbisoft)

Enhance binary file detection by integrating MIME type analysis, resolves #975 #1003

Enhance binary file detection by integrating MIME type analysis, resolves #975 #1003

Conversation

Abdul-Muqadim-Arbisoft commented Feb 19, 2024 • edited Loading

DawoudSheraz left a comment

Choose a reason for hiding this comment

Abdul-Muqadim-Arbisoft commented Feb 28, 2024

DawoudSheraz commented Mar 17, 2024

regisb left a comment

Choose a reason for hiding this comment

regisb Mar 21, 2024

Choose a reason for hiding this comment

regisb commented Apr 16, 2024

DawoudSheraz commented May 13, 2024

Abdul-Muqadim-Arbisoft commented Feb 19, 2024 •

edited

Loading