-
Notifications
You must be signed in to change notification settings - Fork 7
implementing parquet filetype? #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So far, the following works for terra SpatVector objects via the GDAL (Geo)Parquet driver: library(targets)
tar_script({
list(
geotargets::tar_terra_vect(test_terra_parquet,
terra::vect(system.file("ex", "lux.shp", package = "terra")),
filetype = "Parquet")
)
})
tar_make()
#> Loading required namespace: terra
#> ▶ dispatched target test_terra_parquet
#> ● completed target test_terra_parquet [0.012 seconds]
#> ▶ ended pipeline [0.095 seconds]
x <- tar_read(test_terra_parquet)
x
#> class : SpatVector
#> geometry : polygons
#> dimensions : 12, 6 (geometries, attributes)
#> extent : 5.74414, 6.528252, 49.44781, 50.18162 (xmin, xmax, ymin, ymax)
#> source : test_terra_parquet
#> coord. ref. : lon/lat WGS 84 (EPSG:4326)
#> names : ID_1 NAME_1 ID_2 NAME_2 AREA POP
#> type : <num> <chr> <num> <chr> <num> <int>
#> values : 1 Diekirch 1 Clervaux 312 18081
#> 1 Diekirch 2 Diekirch 218 32543
#> 1 Diekirch 3 Redange 259 18664
terra::describe(tar_path_target(test_terra_parquet))
#> [1] "Driver: Parquet/(Geo)Parquet"
#> [2] "Files: _targets/objects/test_terra_parquet"
#> [3] "Size is 512, 512"
#> [4] "Corner Coordinates:"
#> [5] "Upper Left ( 0.0, 0.0)"
#> [6] "Lower Left ( 0.0, 512.0)"
#> [7] "Upper Right ( 512.0, 0.0)"
#> [8] "Lower Right ( 512.0, 512.0)"
#> [9] "Center ( 256.0, 256.0)" Still need to implement analogous methods for {sf} objects via #13. Also, we may want to implement a variant that uses write methods via {arrow} RE: #2 as this may be more efficient for larger targets? Would be interesting to benchmark GDAL vs. Arrow |
I think benchmarking is definitely part of the plan once things are somewhat stable. Would be good to give users an idea of the tradeoffs in speed, size, and dependency requirements. |
Confirming that parquet doesn't work "out of the box" with just library(targets)
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
tar_dir({
tar_script({
library(targets)
library(sf)
library(arrow)
list(
tar_target(nc, st_read(system.file("shape/nc.shp", package="sf")), format = "parquet")
)
})
tar_make()
tar_read(nc)
})
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
#>
#> Attaching package: ‘arrow’
#>
#> The following object is masked from ‘package:utils’:
#>
#> timestamp
#>
#> ▶ dispatched target nc
#> Reading layer `nc' from data source
#> `/Users/ericscott/Library/R/x86_64/4.4/library/sf/shape/nc.shp'
#> using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> Geodetic CRS: NAD27
#> ✖ errored target nc
#> ✖ errored pipeline [0.226 seconds]
#>
#>Error:
#> ! targets::tar_make() error
#>
#>── Debug target nc ──────────────────────────────────────────────────────────────────────────────────────────────────────
#>tar_meta(nc)$error
#>tar_workspace(nc)
#>
#>── General debugging ────────────────────────────────────────────────────────
#>• tar_errored()
#>• tar_meta(fields = any_of("error"), complete_only = TRUE)
#>• tar_workspace()
#>• tar_workspaces()
#>
#>── How to ────────────────────────────────────────────────────────
#>• Debug: https://books.ropensci.org/targets/debugging.html
#>• Help: https://books.ropensci.org/targets/help.html
#>
#>── Last error message ──────────────────────────────────────────────────────
#>_store_ Can't infer Arrow data type from object inheriting from XY / MULTIPOLYGON / sfg
#>
#>── Last error traceback ────────────────────────────────────────────────────────
#> No traceback available. Created on 2024-10-03 with reprex v2.1.1 |
Would it be different from the |
I can investigate, but I think all of the |
As mentioned in #4, e.g.
The text was updated successfully, but these errors were encountered: