This is the companion repository to the following medium post: Doing cool data science in Java: how 3 DataFrame libraries stack up
The data was extracted from Eurostat in the beginning of September 2018. I opened the extracted CSV in LibreOffice and saved it again because there were some illegal UTF-8 characters in the Eurostat output that some csv importers couldn't handle directly.
Library | Maintained | Version | Time (ms) |
---|---|---|---|
DuckDb | Y | 1.3.0 | 93 |
DFLib | Y | 1.3.0 | 226 |
Kotlin DataFrame | Y | 1.0-beta2 | 816 |
Tablesaw | Y | 0.44.1 | 820 |
Joinery | n | 1.9 | 1,478 |
Krangl | n | 0.18.4 | 1,796 |
Morpheus | n | 0.9.23 | * |
- Morpheus is no longer maintained and doesn't seem to work on later java versions (error related to accessing
sun.util.calendar.ZoneInfo
)
The code for the three libraries is present in the Test{libraryname}.java
files. They all use CheckResult.java
to do a basic correctness check for the top-growing cities.
As described in the medium post, I couldn't find a good way to do the pivot step in datavec, but I included the code I wrote up until that point.