Skip to content

st_write significantly slower when writing logical (boolean) field types #1689

Closed
@wkmor1

Description

@wkmor1

Have noticed that when there are logical data type columns in an sf object writing out to a file is much slower. The following illustrates the issue:

f <- function(x, n, fmt = ".csv") {
  x <- rep(x, n)
  x <- data.frame(0, 0, x)
  x <- sf::st_as_sf(x, coords = 1:2)
  sf::st_write(x, tempfile(fileext = fmt), quiet = TRUE)
}

microbenchmark::microbenchmark(
  f(FALSE, 10000),
  f(0L, 10000),
  f(0, 10000),
  f("FALSE", 10000),
  f(FALSE, 10000, ".gpkg"),
  f(0L, 10000, ".gpkg"),
  f(0, 10000, ".gpkg"),
  f("FALSE", 10000, ".gpkg"),
  times = 10
)
#> Unit: milliseconds
#>                        expr       min        lq      mean    median        uq
#>             f(FALSE, 10000) 189.83119 204.43109 307.18853 225.08582 256.84633
#>                f(0L, 10000)  35.12176  35.90689  43.35830  38.03695  42.48434
#>                 f(0, 10000)  37.55803  39.72591  47.20795  41.14088  43.24004
#>           f("FALSE", 10000)  34.27908  35.90282  38.39523  37.67921  40.45558
#>    f(FALSE, 10000, ".gpkg") 284.02960 300.72006 350.00660 337.46137 368.82369
#>       f(0L, 10000, ".gpkg") 126.78843 133.17075 154.76982 137.61813 161.53709
#>        f(0, 10000, ".gpkg") 129.27981 129.50055 140.01243 133.58174 140.45955
#>  f("FALSE", 10000, ".gpkg") 125.95240 130.73475 146.92906 135.61447 157.52841
#>         max neval
#>  1025.06738    10
#>    67.04742    10
#>   104.12921    10
#>    44.11557    10
#>   486.99444    10
#>   244.68378    10
#>   173.41739    10
#>   201.76884    10

Created on 2021-06-08 by the reprex package (v2.0.0)

Session info
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.6           pillar_1.6.1         compiler_4.1.0      
#>  [4] highr_0.9            R.methodsS3_1.8.1    R.utils_2.10.1      
#>  [7] class_7.3-19         tools_4.1.0          digest_0.6.27       
#> [10] evaluate_0.14        lifecycle_1.0.0      tibble_3.1.2        
#> [13] R.cache_0.15.0       pkgconfig_2.0.3      rlang_0.4.11        
#> [16] reprex_2.0.0         DBI_1.1.1            microbenchmark_1.4-7
#> [19] yaml_2.2.1           xfun_0.23            e1071_1.7-7         
#> [22] dplyr_1.0.6          withr_2.4.2          styler_1.4.1        
#> [25] stringr_1.4.0        knitr_1.33           generics_0.1.0      
#> [28] fs_1.5.0             vctrs_0.3.8          tidyselect_1.1.1    
#> [31] grid_4.1.0           classInt_0.4-3       glue_1.4.2          
#> [34] R6_2.5.0             sf_0.9-8             fansi_0.5.0         
#> [37] rmarkdown_2.8        purrr_0.3.4          magrittr_2.0.1      
#> [40] units_0.7-1          backports_1.2.1      ellipsis_0.3.2      
#> [43] htmltools_0.5.1.1    assertthat_0.2.1     utf8_1.2.1          
#> [46] KernSmooth_2.23-20   stringi_1.6.2        proxy_0.4-25        
#> [49] crayon_1.4.1         R.oo_1.24.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions