Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Humanize byte size error for community image upload #1273

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Meowcenary
Copy link

@Meowcenary Meowcenary commented Feb 7, 2025

❤️ Thank you for your contribution!

Description

Update logo size limit error to show humanized byte size error on community image upload. Add helper function to perform this conversion.

Checklist

Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge:

Frontend

Reminder

By using GitHub, you have already agreed to the GitHub’s Terms of Service including that:

  1. You license your contribution under the same terms as the current repository’s license.
  2. You agree that you have the right to license your contribution under the current repository’s license.

run-tests.sh Outdated
@@ -54,7 +54,7 @@ fi

python -m check_manifest
python -m sphinx.cmd.build -qnNW docs docs/_build/html
eval "$(docker-services-cli up --db ${DB:-postgresql} --search ${SEARCH:-opensearch} --mq ${MQ:-redis} --env)"
eval "$(docker services up --db ${DB:-postgresql} --search ${SEARCH:-opensearch} --mq ${MQ:-redis} --env)"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script run-tests.sh has a command that I don’t think is available on OSX. I tried making some changes to see if I could find a match for OSX, but that’s going nowhere. I think the move might be to spin up containers specifically for this, but I’m not sure where to find the files that define the containers for these tests. The --env flag seems to imply there is a Dockerfile or some other type of file that is supposed to be around. I was thinking it might make sense to just write my own and fill it in with the container definitions I need, but that seems wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing test dependencies should provide you with the docker-services-cli CLI utility: pipenv run pip install -e .[tests,opensearch2] . This is an invenio utility, not a docker provided thing and should work on MacOS. So these sections shouldn't be modified.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give that another shot. I was pretty sure I had run setup tools correctly, but there's a good chance I did something wrong and then went this painful road of ignorance and confusion.

Copy link
Contributor

@fenekku fenekku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devil is in the details. Write up a test for the humanizing function and with suggestions/investigation we should have something that can be submitted 👍

Comment on lines 47 to 48
limit_hr, limit_unit = human_readable_unit()
file_size_hr, file_size_unit = human_readable_unit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functions as verbs + passing argument

Suggested change
limit_hr, limit_unit = human_readable_unit()
file_size_hr, file_size_unit = human_readable_unit()
limit_value, limit_unit = humanize_byte_size(limit)
file_size_value, file_size_unit = humanize_byte_size(file_size)

Comment on lines 52 to 53
"Logo size limit exceeded. Limit: {limit} {limit_unit} Given: {file_size} {file_size_unit} bytes".format(
limit=ceil(limit_hr), limit_unit=limit_unit, file_size=ceil(file_size_hr), file_size_unit=file_size_unit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Logo size limit exceeded. Limit: {limit} {limit_unit} Given: {file_size} {file_size_unit} bytes".format(
limit=ceil(limit_hr), limit_unit=limit_unit, file_size=ceil(file_size_hr), file_size_unit=file_size_unit
"Logo size limit exceeded. Limit: {limit_value} {limit_unit}. Given: {file_size_value} {file_size_unit}".format(
limit_value=limit_value, limit_unit=limit_unit, file_size_value=file_size_value, file_size_unit=file_size_unit

@@ -109,3 +109,13 @@ def on_datastore_post_commit(sender, session):
users = role.users.all()
for user in users:
on_user_membership_change(Identity(user.id))


def human_readable_unit(size):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def human_readable_unit(size):
def humanize_byte_size(size):

for unit in ['B', 'KB', 'MB', 'GB', 'TB', 'PB']:
if size < 1000.0:
return size, unit
size /= 1000.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# at top of file
from decimal import Decimal, ROUND_HALF_UP, getcontext

# in function
# 
old_rounding = getcontext().rounding
getcontext().rounding = ROUND_HALF_UP  # otherwise Python rounds differently than what we, as humans, are used to
# Do check however if we have set the rounding behaviour to ROUND_HALF_UP already
s = Decimal(size)
# do rest of operations with s (naming isn't a deal, just saying to do rest of ops with the Decimal version of the value.
# when ready to return round(s, 2):
s = round(s, 2)
getcontext().rounding = old_rounding
return s, unit

This is really to mimic what the frontend does.

  • Maybe we can add a note about eventually supporting Kili, Mebi...
  • Document the input and outputs

run-tests.sh Outdated
@@ -54,7 +54,7 @@ fi

python -m check_manifest
python -m sphinx.cmd.build -qnNW docs docs/_build/html
eval "$(docker-services-cli up --db ${DB:-postgresql} --search ${SEARCH:-opensearch} --mq ${MQ:-redis} --env)"
eval "$(docker services up --db ${DB:-postgresql} --search ${SEARCH:-opensearch} --mq ${MQ:-redis} --env)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing test dependencies should provide you with the docker-services-cli CLI utility: pipenv run pip install -e .[tests,opensearch2] . This is an invenio utility, not a docker provided thing and should work on MacOS. So these sections shouldn't be modified.

Copy link
Author

@Meowcenary Meowcenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the feedback @fenekku! I'll add what I can to my ongoing notes and hopefully avoid some of this trouble next time.

run-tests.sh Outdated
@@ -54,7 +54,7 @@ fi

python -m check_manifest
python -m sphinx.cmd.build -qnNW docs docs/_build/html
eval "$(docker-services-cli up --db ${DB:-postgresql} --search ${SEARCH:-opensearch} --mq ${MQ:-redis} --env)"
eval "$(docker services up --db ${DB:-postgresql} --search ${SEARCH:-opensearch} --mq ${MQ:-redis} --env)"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give that another shot. I was pretty sure I had run setup tools correctly, but there's a good chance I did something wrong and then went this painful road of ignorance and confusion.

@Meowcenary Meowcenary force-pushed the humanize_byte_size_error_for_community_image_upload branch from 19404f3 to 5591c32 Compare February 10, 2025 18:12
Copy link
Author

@Meowcenary Meowcenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and (finally) passing all tests locally. I was wondering where I should put a test for the humanize_byte_size function or if it was even something worth including?

I searched for some other functions in utils.py across the tests directory and was able to match a few to tests, but it doesn't seem like everything is unit tested, so I'm not sure if my changes are considered test worthy.

@@ -19,7 +19,9 @@
SearchOptionsMixin,
)
from invenio_records_resources.services.files.links import FileLink
from invenio_records_resources.services.records.config import RecordServiceConfig
from invenio_records_resources.services.records.config import (
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this wasn't causing the test suite to fail, but it triggered when I was running the tests locally. I can leave this out if it doesn't make sense for this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this was triggered locally for you: this line is 81 chars long, which is considered longer than the classic 80 of some python linting tools, but it is not too long for the black linter which we use. If you share what you ran and what you got, it may help.

In any case, let's leave this line as it was before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think isort could caused the problem with this line like here

Copy link
Contributor

@fenekku fenekku Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could try adding line_length = 88 to the isort setup.cfg section:

# setup.cfg
[isort]
profile=black
line_length = 88

I guess profile=black isn't quite enough anymore?

EDIT: And why are we getting this ridiculous amount of warnings in the test logs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw it is already fixed on the master branch.
the line_length = 88 seems not to work

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for the followup @utnapischtim

s = round(s, 2)
getcontext().rounding = old_rounding
return s, unit
s /= 1000.0
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not include a note about using a different library as a comment because I didn't want to overstep as a first time contributor. I get the feeling that regular contributors who have more experience will have opinions on this, so it might make more senes to discuss this in the PR.

@Meowcenary Meowcenary requested a review from fenekku February 10, 2025 18:20
super().__init__(
_(
"Logo size limit exceeded. Limit: {limit} bytes Given: {file_size} bytes".format(
limit=ceil(limit), file_size=ceil(file_size)
"Logo size limit exceeded. Limit: {limit_value} {limit_unit} Given: {file_size_value} {file_size_unit} bytes".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Logo size limit exceeded. Limit: {limit_value} {limit_unit} Given: {file_size_value} {file_size_unit} bytes".format(
"Logo size limit exceeded. Limit: {limit_value} {limit_unit} Given: {file_size_value} {file_size_unit}".format(

"Logo size limit exceeded. Limit: {limit} bytes Given: {file_size} bytes".format(
limit=ceil(limit), file_size=ceil(file_size)
"Logo size limit exceeded. Limit: {limit_value} {limit_unit} Given: {file_size_value} {file_size_unit} bytes".format(
limit_value=ceil(limit_value),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ceil should be omitted actually. It's confusing.

@Meowcenary Meowcenary force-pushed the humanize_byte_size_error_for_community_image_upload branch from 5591c32 to 5068c24 Compare February 11, 2025 23:02
Copy link
Author

@Meowcenary Meowcenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully close to being ready for official review.

def humanize_byte_size(size):
"""Converts bytes to largest unit (e.g., KB, MB, GB)."""
s = Decimal(size)
for unit in ["B", "KB", "MB", "GB", "TB", "PB"]:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably unrealistic that anyone is going to upload anything that is even a gigabyte, let alone a terabyte or petabyte, but I don't think it adds a lot of overhead or anything, so it's likely fine.

if s < BYTES_PER_UNIT:
with localcontext() as ctx:
ctx.rounding = ROUND_HALF_UP
s = s.quantize(Decimal("0.00"))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have all calls to Decimal() use a string argument? It seems to be the best practice is to use a string, but from my perspective it doesn't make a difference for whole number values like 1000. I guess consistency would be a reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For small d decimal, a string is the way to go, but for whole numbers it doesn't matter too much like you are saying. It's not one of those consistency things that bothers me beyond that.

@@ -16,6 +18,7 @@
from .generators import CommunityRoleNeed
from .proxies import current_communities, current_identities_cache

BYTES_PER_UNIT = Decimal(1000)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be overkill or be better if it was renamed.

@@ -689,6 +689,9 @@ def test_logo_max_content_length(
data=BytesIO(logo_data),
)
assert res.status_code == 400
assert (
res.json["message"] == "Logo size limit exceeded. Limit: 1.00 MB Given: 4.00 MB"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to look for other places that I could test this new function, but this seemed like a good spot to start. I'm not sure how thorough the tests needs to be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good enough 👍

ctx.rounding = ROUND_HALF_UP
s = s.quantize(Decimal("0.00"))
return s, unit
s /= BYTES_PER_UNIT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple last things:

  • If we are going the quantize route, we may as well pass the rounding to simplify things: s = s.quantize(q, rounding=ROUND_HALF_UP) (q = Decimal("0.00") once and outside the loop) and remove all the context stuff
  • Let's keep BYTES_PER_UNIT in the function. Was there a particular reason why out? (keeping it out exposes us to problems that we shouldn't have to care about)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Good call on the first point
  • Honestly, the only reason I put it outside of the function was that I thought it was the standard way to place constants. I'm fine with scoping it to a function. For my own understanding, are we doing this to avoid naming collisions or were you thinking of something else?

Copy link
Contributor

@fenekku fenekku Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is module namespaced in python so it wouldn't really be a problem. Constants don't have to be global unless we want them to be part of an interface that other pieces of code use. And with such a large project, my philosophy is to tamp down on the coupling / global access it makes us vulnerable to. Especially here, where "bytes per unit" could be changed to 1024 or 1000 in the function in the future and still make sense as a concept. IOW, we want to keep coupled things close together and not introduce unintended 'dependencies'.

@Meowcenary Meowcenary force-pushed the humanize_byte_size_error_for_community_image_upload branch 2 times, most recently from e3beac7 to c58e322 Compare February 12, 2025 18:09
@Meowcenary Meowcenary requested a review from fenekku February 12, 2025 20:39
@Meowcenary Meowcenary force-pushed the humanize_byte_size_error_for_community_image_upload branch from c58e322 to b059dfc Compare February 12, 2025 21:03
@Meowcenary Meowcenary force-pushed the humanize_byte_size_error_for_community_image_upload branch from b059dfc to d211494 Compare February 12, 2025 21:08
@Meowcenary Meowcenary marked this pull request as ready for review February 12, 2025 21:09
@fenekku fenekku requested a review from utnapischtim February 12, 2025 22:44
Copy link
Contributor

@fenekku fenekku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

For context, @utnapischtim, @Meowcenary is my colleague at Northwestern who's getting into InvenioRDM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants