Skip to content

Update documentation regarding preserve_metadata #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Aariq opened this issue Feb 27, 2025 · 4 comments
Open

Update documentation regarding preserve_metadata #158

Aariq opened this issue Feb 27, 2025 · 4 comments

Comments

@Aariq
Copy link
Collaborator

Aariq commented Feb 27, 2025

Discussed initially in #141

Ever since rspatial/terra#1714, metadata is no longer stored in an aux.json file for many GDAL drivers. Let's double-check the documentation for the preserve_metadata argument and make sure it reflects this change appropriately. (may not need any change, this is just a reminder to take a look at it)

@brownag
Copy link
Contributor

brownag commented Mar 16, 2025

I think with recent changes in terra it may be worth considering changing the preserve_metadata argument and associated option name to something a bit more descriptive of what is happening internally, and also possibly expand it for use in tar_terra_vect() and tar_stars()

Essentially the name and documentation should capture the idea of using a ZIP archive to retain all files whenever the target "write" method may create multiple files. This would include auxiliary metadata files, but is not technically be limited to just metadata.

Something like:

When "drop" (default), any auxiliary files written by terra::writeRaster() are lost. When "zip", auxiliary files are retained by archiving all written files in a ZIP file upon writing and unzipping them upon reading. This adds extra overhead and will slow pipelines. Also note auxiliary files may be impacted by different versions of GDAL and different drivers. Please file an issue at https://github.com/ropensci/geotargets/issues/ with any specific concerns with this functionality. Also note that you can specify this option inside geotargets_option_set() if you want to set this for the entire pipeline.

Some drivers, e.g. ESRI Shapefile, ESRI File Geodatabase, Zarr always produce multi-file or directory output. Additional files in the latter cases are not specifically limited to "metadata" (could be attributes, SRS, etc) and also would apply to vector data sources, not just raster.

Relatively recent terra changes to GeoTIFF and COG behavior mean that metadata are always stored internally for those file formats, so testing of the "preserve_metadata" functionality also should probably use a driver that requires use of multiple files, and test other user-set metadata storage from metags() in addition to units and time.

Tests I added in #123 for the gdalraster::addFilesInZip() approach account for the other ways of setting metadata for a SpatRaster and use the "HFA" driver which writes a GDAL .aux.xml file for storing unit, datetime, and user tags at dataset and band level.

@Aariq
Copy link
Collaborator Author

Aariq commented Mar 17, 2025

Do you have a proposal for the new argument name?

@brownag
Copy link
Contributor

brownag commented Mar 17, 2025

Do you have a proposal for the new argument name?

Perhaps something like multiple_file_policy ?

I don't really have a better suggestion, and I think preserve_metadata certainly makes sense given the context of how it developed, but we aren't doing anything explicitly with metadata that terra doesn't do, so it sort of suggests more depth of involvement from geotargets in the process. There are cases where terra will fail to preserve metadata even with preserve_metadata="zip" which sortof puts the onus on us to explain "why" whereas it is actually an "issue" with terra or the GDAL driver being used.

@Aariq
Copy link
Collaborator Author

Aariq commented Mar 17, 2025

Maybe just multiple_files or extra_files? extra_files = "drop" makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants