ExcelSynth

An enumerative synthesizer for recovering Excel formulas from CSVs.

Input: CSV File (samples/example.csv) Output: Formula Mask

_{Col 1 , Col 2 , Col 3 , Col 4 , Col 5
Row 2 , 1. , 10. , 9.5 , 24.
Row 3 , 23. , 12. , 0.5 , 35.
Row 4 , 22. , 2. , -9. , 24.
Row 5 , -1. , 6. , 6.5 , 5.
Row 6 , 59. , 0. , -29.5 , 41.
Row 7 , 11. , -2. , -7.5 , 9.
Row 8 , 115. , 14. , -43.5 , 23.}

_{, , , ,
, , , =(C2-(B2/(1.+1.)) ,
, , , =(C3-(B3/(1.+1.)) ,
, , , =(C4-(B4/(1.+1.)) ,
, , , =(C5-(B5/(1.+1.)) ,
, , , =(C6-(B6/(1.+1.)) ,
, , , =(C7-(B7/(1.+1.)) ,
, =SUM(B2:B7) , =(SUM(C2:C7)/(1.+1.)) , =(C8-(B8/(1.+1.)) , =AVERAGE(E2:E7)}

Installation

Get docker for your OS.
Pull the docker image^#: docker pull padhi/excelsynth.
Run a container over the image: docker run -it padhi/excelsynth.
This would give you a bash shell within ExcelSynth directory.
To run ExcelSynth on samples/unit_test.csv, execute: dune exec bin/App.exe -- samples/unit_test.csv
To run the unit tests, execute: dune runtest

^# Alternatively, you could also build the Docker image locally:

docker build -t padhi/excelsynth github.com/SaswatPadhi/ExcelSynth

Usage

Formula Synthesis from CSV

$ dune exec bin/App.exe -- -h
Synthesize Excel formulas for a CSV file.

  App.exe [flag] ... FILENAME

=== flags ===

  [-check-last-col-aggregations BOOLEAN]     synthesize aggregation formulas for
                                             cells in the last column
  [-check-last-row-aggregations BOOLEAN]     synthesize aggregation formulas for
                                             cells in the last row
  [-check-pointwise-col-operations BOOLEAN]  synthesize pointwise
                                             transformations for columns
  [-check-pointwise-row-operations BOOLEAN]  synthesize pointwise
                                             transformations for rows
  [-constant FLOAT] ...                      additional Boolean/numeric/string
                                             constants
  [-disable-constant-solutions BOOLEAN]      disable constant formulas (e.g.
                                             =0.0) for cells
  [-enable-2d-aggregation BOOLEAN]           use 2D ranges in aggregation
                                             operations
  [-enable-booleans BOOLEAN]                 enable Boolean and conditional
                                             expressions
  [-log-path FILENAME]                       enable logging and output to the
                                             specified path
  [-mask-path FILENAME]                      a known formula mask for the CSV
                                             file
  [-max-expr-size INTEGER]                   maximum cost (AST size) of
                                             expressions to explore
  [-max-threads INTEGER]                     maximum number of threads to create
  [-range STRING]                            a range (in RC:R'C' format) that
                                             bounds the synthesis space
  [-relative-error FLOAT]                    the fractional relative error
                                             allowed in float comparisons
  [-restrict-to-top-left-data BOOLEAN]       only use data to the top left of a
                                             cell in formulas
  [-type-error-threshold FLOAT]              maximum fraction of cells that may
                                             be ignored due to type errors
  [-value-error-threshold FLOAT]             maximum fraction of cells that may
                                             be ignored due to value errors

Bulk Processing (`scripts/evaluate.sh`)

The following input directory structure is required:

<data>
 |
 +-- table_ranges.csv              <--- Contains table ranges for CSV files
 |
 +-- evaluated_csvs                <--- Contains fully evaluated CSV files
 |    |
 |    +-- <a>.csv
 |    |
 |    `-- <b>.csv
 |
 +-- formula_csvs                  <--- Contains CSV files with formulas
      |
      +-- <a>.csv
      |
      `-- <b>.csv

The following output data is generated within this directory:

<data>
 |
 : · · ·
 |
 : · · ·
 |
 +-- extracted_masks               <--- Contents generated by scripts/extract_mask.py
 |    |
 |    +-- <a>.csv                  <--- Ground truth mask from `../formula_csvs/<a>.csv`
 |    |
 |    `-- <b>.csv
 |
 +-- recovered_masks               <--- Contents generated by scripts/recover_mask.py
 |    |
 |    +-- Baseline                 <--- Unrestricted synthesis (over whole sheet)
 |    |    |
 |    |    +-- <a>.csv             <--- Synthesized mask from `../evaluated_csvs/<a>.csv`
 |    |    |
 |    |    `-- <b>.csv
 |    |
 :    : · · ·
 |    |
 |    `-- <table_detector_n>       <--- Synthesis restricted to tables from <table_detector_n>
 |         |
 |         +-- <a>.csv             <--- Synthesized mask from `../evaluated_csvs/<a>.csv`
 |         |
 |         `-- <b>.csv
 |
 `-- comparison_masks              <--- Contents generated by scripts/compare_masks.py
      |
      +-- full                     <--- All cells within a recovered mask are checked
      |    |
      |    +-- Baseline            <--- Evaluation of unrestricted-synthesis masks
      |    |    |
      |    |    +-- <a>.csv        <--- Evaluation of `../recovered_masks/Baseline/<a>.csv`
      |    |    |
      |    |    `-- <b>.csv
      |    |
      :    : · · ·
      |    |
      |    `-- <table_detector_n>  <--- Evaluation of restricted-synthesis masks
      |         |
      |         +-- <a>.csv
      |         |
      |         `-- <b>.csv
      |
      `-- in-table                 <--- Only in-table cells are checked
           |
           +-- Baseline
           |    |
           |    +-- <a>.csv
           |    |
           |    `-- <b>.csv
           |
           : · · ·
           |
           `-- <table_detector_n>
                |
                +-- <a>.csv
                |
                `-- <b>.csv

Extract A Formula Masks from CSVs

$ python3 scripts/extract_mask.py -h
usage: extract_mask.py [-h] -i INPUT_DIR -o OUTPUT_DIR

optional arguments:
  -h, --help            show this help message and exit

  -i INPUT_DIR, --input-dir INPUT_DIR
  -o OUTPUT_DIR, --output-dir OUTPUT_DIR

$ python3 scripts/extract_mask.py -i data/formula_csvs -o data/extracted_masks

Recover Formula Masks from CSVs

$ python3 scripts/recover_mask.py -h
usage: recover_mask.py [-h] -e EVAL_CSV_DIR -o OUTPUT_DIR -c TABLE_RANGE_COLUMN
                       tables_data_csv

positional arguments:
  tables_data_csv

optional arguments:
  -h, --help            show this help message and exit

  -e EVAL_CSV_DIR, --eval-csv-dir EVAL_CSV_DIR
  -o OUTPUT_DIR, --output-dir OUTPUT_DIR
  -c TABLE_RANGE_COLUMN, --table-range-column TABLE_RANGE_COLUMN

$ python3 scripts/recover_mask.py -e data/evaluated_csvs -o data/recovered_masks \
                                  -c 1 data/table_ranges.csv

tables_data_csv has extracted table ranges in TABLE_RANGE_COLUMN (0-indexed).

Compare Formula Masks

$ python3 scripts/compare_masks.py -h
usage: compare_masks.py [-h] -g GROUND_TRUTH_DIR -p PREDICTION_DIR -o OUTPUT_DIR

optional arguments:
  -h, --help            show this help message and exit

  -g GROUND_TRUTH_DIR, --ground-truth-dir GROUND_TRUTH_DIR
  -p PREDICTION_DIR, --prediction-dir PREDICTION_DIR
  -o OUTPUT_DIR, --output-dir OUTPUT_DIR

$ python3 scripts/compare_mask.py -g data/extracted_masks \
                                  -p data/recovered_masks/table_detector_1 \
                                  -o data/comparison_masks/table_detector_1

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.travis		.travis
bin		bin
samples		samples
scripts		scripts
src		src
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
ExcelSynth.opam		ExcelSynth.opam
LICENSE		LICENSE
README.md		README.md
dune		dune
dune-project		dune-project
dune-workspace		dune-workspace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ExcelSynth

Installation

Usage

Formula Synthesis from CSV

Bulk Processing (`scripts/evaluate.sh`)

Extract A Formula Masks from CSVs

Recover Formula Masks from CSVs

Compare Formula Masks

About

Releases

Packages

Languages

License

SaswatPadhi/ExcelSynth

Folders and files

Latest commit

History

Repository files navigation

ExcelSynth

Installation

Usage

Formula Synthesis from CSV

Bulk Processing (scripts/evaluate.sh)

Extract A Formula Masks from CSVs

Recover Formula Masks from CSVs

Compare Formula Masks

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Bulk Processing (`scripts/evaluate.sh`)

Packages