Description
Submitting Author: Stefan Krawczyk (@skrawcz)
Package Name: Hamilton (sf-hamilton on pypi)
One-Line Description of Package: A general purpose micro-framework for defining dataflows.
Repository Link (if existing): https://github.com/stitchfix/hamilton
Description
Hamilton is a general purpose micro-framework for creating dataflows from python functions! Specifically, Hamilton defines a novel paradigm, that allows you to specify a flow of (delayed) execution, that forms a Directed Acyclic Graph (DAG). It was originally built to solve the challenges in wrangling and maintaining production code to create wide (1000+) column dataframes, but has been extended to enable modeling any python object generation. Core to the design of Hamilton is a clear mapping of function name to dataflow output. That is, Hamilton forces a declarative paradigm expressed through writing python functions, and aims for DAG clarity, low code upkeep costs, ease of modification, with always unit testable and naturally documentable code.
Scope
-
Please indicate which category or categories this package falls under:
- Data retrieval
- Data extraction
- Data munging
- Data deposition
- Data visualization
- Reproducibility
- Geospatial
- Education
- Unsure/Other (explain below)
-
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
data munging
Hamilton was built for a team to manage their time-series forecasting feature engineering. So it's design goal was to help data science teams maintain data munging code well.
reproducibility
Core to reproducibility is sharing code. Most researchers only share data, not their code. We believe that with Hamilton, one could more easily share their implementation and in a standardized way that is approachable to a broad audience.
data extraction
Kind of unsure here. But Hamilton helps you structure and "orchestrate" the code that does extraction.
data retrieval
Kind of unsure here. But Hamilton helps you structure and "orchestrate" the code that does retrieval.
- Who is the target audience and what are the scientific applications of this package?
Anyone doing any data transformations in python.
Scientific applications: time-series forecasting, any machine learning, any work that involves executing a dataflow.
- Are there other Python packages that accomplish similar things? If so, how does yours differ?
None that the author is aware of.
- Any other questions or issues we should be aware of:
N/A
P.S. *Have feedback/comments about our review process? Leave a comment here
Metadata
Metadata
Assignees
Type
Projects
Status