Hamilton

Submitting Author: Stefan Krawczyk (@skrawcz)  
Package Name: Hamilton (sf-hamilton on pypi)
One-Line Description of Package: A general purpose micro-framework for defining dataflows.
Repository Link (if existing):   https://github.com/stitchfix/hamilton

---

## Description

Hamilton is a general purpose micro-framework for creating dataflows from python functions! Specifically, Hamilton defines a novel paradigm, that allows you to specify a flow of (delayed) execution, that forms a Directed Acyclic Graph (DAG). It was originally built to solve the challenges in wrangling and maintaining production code to create wide (1000+) column dataframes, but has been extended to enable modeling any python object generation. Core to the design of Hamilton is a clear mapping of function name to dataflow output. That is, Hamilton forces a declarative paradigm expressed through writing python functions, and aims for DAG clarity, low code upkeep costs, ease of modification, with always unit testable and naturally documentable code.


## Scope 

- Please indicate which [category or categories][PackageCategories] this package falls under:
	- [x] Data retrieval
	- [x] Data extraction
	- [x] Data munging
	- [ ] Data deposition
	- [ ] Data visualization
	- [ ] Reproducibility
	- [ ] Geospatial
	- [ ] Education
	- [ ] Unsure/Other (explain below)
        
- Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:

*data munging*
Hamilton was built for a team to manage their time-series forecasting feature engineering. So it's design goal was to help data science teams maintain data munging code well.

*reproducibility*
Core to reproducibility is sharing code. Most researchers only share data, not their code. We believe that with Hamilton, one could more easily share their implementation and in a standardized way that is approachable to a broad audience.

*data extraction*
Kind of unsure here. But Hamilton helps you structure and "orchestrate" the code that does extraction. 

*data retrieval*
Kind of unsure here. But Hamilton helps you structure and "orchestrate" the code that does retrieval. 


- Who is the target audience and what are the scientific applications of this package?  

Anyone doing any data transformations in python.

Scientific applications: time-series forecasting, any machine learning, any work that involves executing a dataflow.

- Are there other Python packages that accomplish similar things? If so, how does yours differ?

None that the author is aware of.

- Any other questions or issues we should be aware of:
N/A


**P.S.** *Have feedback/comments about our review process? Leave a comment [here][Comments]


[PackageCategories]: https://www.pyopensci.org/contributing-guide/open-source-software-peer-review/aims-and-scope.html?highlight=data#package-categories

[Comments]: https://github.com/pyOpenSci/governance/issues/8


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hamilton #74

Description

Scope

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hamilton #74

Description

Description

Scope

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions