Skip to content

FIX: filename decryption issue #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

dhgatjeye
Copy link

Problem Description:
When decrypting URL-encoded filenames with Turkish and other languages characters, the function incorrectly transforms the original filename. Example:

Original Filename: Yeni Klasör 2
Decrypted Filename: Yeni+klas%C3%B6r+%282%29

The issue stems from incomplete Unicode character handling during URL decoding, which can:

Corrupt special characters
Misinterpret Turkish character encodings
Potentially break file naming across different systems

And I solved that problems in that pr.

Also you can run the tests;

import pathlib
from urllib.parse import unquote
from typing import Union


def fix_filename(path: Union[str, pathlib.Path]) -> pathlib.Path:
    path = pathlib.Path(str(path))

    parts = []
    for part in path.parts:
        if part == path.drive or part == '/':
            parts.append(part)
            continue

        decoded = str(part)
        if '%' in decoded:
            decoded = decoded.replace('%2B', '§PLUS§')
            decoded = unquote(decoded)
            decoded = decoded.replace('+', ' ')
            decoded = decoded.replace('§PLUS§', '+')
  
        INVALID_CHARS = '<>:"|?*\0'
        DEVICE_NAMES = {'CON', 'PRN', 'AUX', 'NUL', 'COM1', 'COM2', 'COM3', 'COM4',
                        'COM5', 'COM6', 'COM7', 'COM8', 'COM9', 'LPT1', 'LPT2',
                        'LPT3', 'LPT4', 'LPT5', 'LPT6', 'LPT7', 'LPT8', 'LPT9'}

        cleaned = ''.join(c if c not in INVALID_CHARS and ord(c) >= 32 else '-' for c in decoded)
        cleaned = cleaned.strip('. ')

        if cleaned.upper() in DEVICE_NAMES:
            cleaned = f'_{cleaned}_'

        if not cleaned or set(cleaned) <= {' ', '+'}:
            if all(c == '+' for c in cleaned):
                parts.append(cleaned)
            else:
                parts.append('_')
            continue

        parts.append(cleaned)

    return pathlib.Path(*parts)


def test_filename_fixes():
    test_cases = [
        ("Yeni+klas%C3%B6r+%282%29", "Yeni klasör (2)"),
        ("Dosya%20adı.txt", "Dosya adı.txt"),
        ("Yeni klasör (2)+", "Yeni klasör (2)+"),
        ("Document+.txt", "Document+.txt"),
        ("Hello+World%2B", "Hello World+"),
        ("Test%2B+File", "Test+ File"),
        ("%D0%9F%D1%80%D0%B8%D0%BC%D0%B5%D1%80%2B", "Пример+"),
        ("%E4%BD%A0%E5%A5%BD+%2B+File.txt", "你好 + File.txt"),
        ("%F0%9F%98%80+Smile%2B", "😀 Smile+"),
        ("plain_filename.txt", "plain_filename.txt"),
        ("Hello World+.txt", "Hello World+.txt"),
        ("%2B%2B%2B", "+++"),
        ("%2BFile%2B", "+File+"),
        ("No+encoding%21", "No encoding!"),
        ("File%20Name%21%40%23%24.txt", "File Name!@#$.txt"),
        ("%C3%87%C4%B1lg%C4%B1n+Dosya.txt", "Çılgın Dosya.txt"),
    ]

    failed_cases = []
    for encoded, expected in test_cases:
        result = fix_filename(pathlib.Path(encoded))
        if str(result) != expected:
            failed_cases.append({
                'input': encoded,
                'expected': expected,
                'got': result
            })
            print(f"\nTest case:")
            print(f"Input:    {encoded}")
            print(f"Expected: {expected}")
            print(f"Got:      {result}")
            print(f"Pass:     False")

    if failed_cases:
        print("\nFailed Test Cases:")
        for case in failed_cases:
            print(f"Input: {case['input']}")
            print(f"Expected: {case['expected']}")
            print(f"Got: {case['got']}\n")
    else:
        print("All test cases passed successfully!")

test_filename_fixes()

@giacomoferretti
Copy link
Owner

Gonna test as soon as possible. The code looks right, but because I never stumbled upon it, I prefer to test it.

@dhgatjeye
Copy link
Author

Gonna test as soon as possible. The code looks right, but because I never stumbled upon it, I prefer to test it.

Yeah okey! thank u

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants