DataLad command to convert ODS/XLSX files into "archival" single-sheet collection of TSVs #14

mih · 2023-06-29T09:27:43Z

This is the format with which we want/need to keep things long-term, and the format a metadata extractor would eat.

At first glance openpyxl looks like a sufficient and lightweight solution. Pandas can also do it, but it much heavier.

A more detailed analysis was done by @mslw in #8 already.

mih · 2023-06-30T14:38:43Z

https://github.com/pyexcel/pyexcel is an alternative (wrapper) that may be useful for supporting more than xlsx (e.g., ods files).

jsheunis · 2023-07-03T09:48:41Z

Would this just be a utility that runs on a (possibly multi-sheet) in put spreadsheet (xlsx or ods format) and then outputs a collection of TSV fles? i.e. before any form of validation, or would it make sense to also incorporate validation of at least some structured interpreting into this process? I'm thinking that this command might need to have some understanding of the intended tabby structure for the conversion process, rather than just doing a dumb transformation.

mih · 2023-07-03T11:33:31Z

ATM I am thinking about it as a dumb comverter. However, in my brain the optimal point for performing validation is not yet clear.

mih · 2023-07-03T12:40:45Z

pyexcel is not good. A change in a dependency broke basic functionality in Feb 2023 and no fix was released yet, although an applicable fix appears to be known since a fix days after the initial report. pyexcel/pyexcel-xlsx#52

We better stick to openpyxl (see #8)

mih · 2023-07-03T13:21:42Z

I implemented the XLSX -> TSV part.

We would need to think more about how (and if) we would support the representation of custom contexts and frames when going from tabby (back) to XLSX.

mih · 2023-07-18T09:11:50Z

With #50 settled, we know all the pieces. A record in XLSX format would still carry all the other files (context, overrides, etc). Conversion to TSV brings it into an archival format, with no changes necessary to the non-TSV parts.

The only thing TODO here is exposing this functionality via the CLI

mih self-assigned this Jul 3, 2023

mih mentioned this issue Jul 4, 2023

Minimal converters to and from TSV and XLSX #34

Merged

mih removed their assignment Jul 4, 2023

mih mentioned this issue Jul 20, 2023

Support config classes for record sheets #86

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataLad command to convert ODS/XLSX files into "archival" single-sheet collection of TSVs #14

DataLad command to convert ODS/XLSX files into "archival" single-sheet collection of TSVs #14

mih commented Jun 29, 2023 •

edited

Loading

mih commented Jun 30, 2023 •

edited

Loading

jsheunis commented Jul 3, 2023

mih commented Jul 3, 2023

mih commented Jul 3, 2023

mih commented Jul 3, 2023

mih commented Jul 18, 2023 •

edited

Loading

DataLad command to convert ODS/XLSX files into "archival" single-sheet collection of TSVs #14

DataLad command to convert ODS/XLSX files into "archival" single-sheet collection of TSVs #14

Comments

mih commented Jun 29, 2023 • edited Loading

mih commented Jun 30, 2023 • edited Loading

jsheunis commented Jul 3, 2023

mih commented Jul 3, 2023

mih commented Jul 3, 2023

mih commented Jul 3, 2023

mih commented Jul 18, 2023 • edited Loading

mih commented Jun 29, 2023 •

edited

Loading

mih commented Jun 30, 2023 •

edited

Loading

mih commented Jul 18, 2023 •

edited

Loading