-
Notifications
You must be signed in to change notification settings - Fork 227
Modify process_local_sources to support mount paths #1635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Modify process_local_sources to support mount paths #1635
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
EDIT: additional changes and testing requested below, so please do not merge in the meantime.
1580f40
to
391b2da
Compare
"""Process and validate data sources.""" | ||
cwd = os.getcwd() | ||
os.getcwd() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only returns cwd
. If cwd
is unused, line also can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry right. removed of course
return LocalDataSource(source_name, rel_path, base_path=Path(".")) | ||
else: | ||
raise ValueError(f"Invalid path for local data source '{source_name}': {path}.") | ||
def process_local_sources(local_datasources, datasources, check_dir_traversal=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is also supposed to be staticmethod
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, right.
# to the mount path inside the container. | ||
# This way, we only need to adjust the base_dir to point to the mount path. | ||
absolute_paths = { | ||
source_name: os.path.realpath(params.get("path", None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
realpath
would raise an exception if path
is None
. If we know params will always have path
, better to set it as params['path']
IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, fixed.
"This file should contain a JSON object with the data sources to be registered. For local " | ||
"data source, 'type' is 'fs', and 'params' must include: 'path'. For 's3' type, 'params' " | ||
"must include: 'uri', 'access_key_env_name', 'secret_key_env_name', 'secret_name', and " | ||
"optionally 'endpoint'. For azure_blob, 'type' is 'ab', and 'params' must include: " | ||
"'connection_string', 'container_name', and optionally 'folder_prefix'." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion would be to point to OpenFL documentation URL that describes data sources in detail with examples, in addition to the format of JSON files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. @teoparvanov what doc file should we insert that info to?
@@ -258,7 +260,7 @@ def calchash(data_path): | |||
sys.exit(1) | |||
with open(datasources_json_path, "r", encoding="utf-8") as file: | |||
data = file.read() | |||
vds = DataSourcesJsonParser.parse(data) | |||
vds = DataSourcesJsonParser.parse(data, check_dir_traversal=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, can we check_dir_traversal
before calling this function? I don't think the class needs to know about whether it is in a valid subdirectory...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. That means we go over the json one more time to look for the local data sources. Makes sense? currently happens only in calchash
Hi @Efrat1, after addressing @MasterSkepticista's comments, I'd request that you additionally test those changes by running a containerized OpenFL federation locally. |
d8751d7
to
de73c55
Compare
Signed-off-by: Dar, Efrat <efrat.dar@intel.com>
de73c55
to
465447c
Compare
Summary
Modify
process_local_sources
to support mount pathsType of Change (Mandatory)
Description (Mandatory)
Use common
base_dir
for all local data sources, andsource_path
s relative to that base.The reason is to simplify path management in containerized environments, such as Docker. By using a common
base_dir
, we can ensure that paths remain consistent when mounting volumes, as only thebase_dir
needs to be adjusted to point to the mount path inside the container. This way, we only need to adjust thebase_dir
to point to the mount path.Testing
Run full experiment in OpenFL and in OpenFL-Security.