Skip to content

Commit d1c1124

Browse files
thompsonmjegrace479hlapp
authored
Prep for initial PyPI release (#7)
* Apply default ruff linting * Remove outdated design docs * Add PyPI workflow; update license year * Change version format to comply with PyPI's requirement for PEP 440 compliance * README edits in prep for release * Move development instructions for setup and GNVerifier OpenAPI specs to wiki * Add badge info and project links * Add keywords * Add link-outs to data sources * Add citation --------- Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com> Co-authored-by: Hilmar Lapp <hlapp@drycafe.net>
1 parent b17e8a7 commit d1c1124

File tree

48 files changed

+238
-588
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+238
-588
lines changed
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
name: Publish Python 🐍 distribution 📦 to PyPI
2+
3+
on:
4+
release:
5+
types: [published]
6+
7+
jobs:
8+
build-n-publish:
9+
name: Build and publish Python 🐍 distribution 📦 to PyPI
10+
runs-on: ubuntu-latest
11+
steps:
12+
- uses: actions/checkout@v4
13+
- name: Set up Python
14+
uses: actions/setup-python@v5
15+
with:
16+
python-version: "3.x"
17+
- name: Install pypa/build
18+
run: >-
19+
python3 -m
20+
pip install
21+
build
22+
--user
23+
- name: Build a binary wheel and a source tarball
24+
run: >-
25+
python3 -m
26+
build
27+
--sdist
28+
--wheel
29+
--outdir dist/
30+
.
31+
- name: Publish distribution 📦 to PyPI
32+
if: startsWith(github.ref, 'refs/tags')
33+
uses: pypa/gh-action-pypi-publish@release/v1
34+
with:
35+
password: ${{ secrets.PYPI_API_TOKEN }}

CITATION.cff

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
abstract: "A Python package for efficiently aligning organismal taxonomic hierarchies using the Global Names Verifier."
2+
authors:
3+
- family-names: "Thompson"
4+
given-names: "Matthew J."
5+
orcid: "https://orcid.org/0000-0003-0583-8585"
6+
- family-names: "Campolongo"
7+
given-names: "Elizabeth G."
8+
orcid: "https://orcid.org/0000-0003-0846-2413"
9+
cff-version: 1.2.0
10+
date-released: "2025-05-23"
11+
identifiers:
12+
- description: "The GitHub release URL of tag v0.1.0-beta."
13+
type: url
14+
value: "https://github.com/Imageomics/TaxonoPy/releases/tag/v0.1.0-beta"
15+
- description: "The GitHub URL of the commit tagged with v0.1.0-beta"
16+
type: url
17+
value: "https://github.com/Imageomics/TaxonoPy/tree/<update-after-release>"
18+
keywords:
19+
- imageomics
20+
- taxonomy
21+
- "taxonomic resolution"
22+
- "tree of life"
23+
- alignment
24+
- hierarchy
25+
references:
26+
- type: software
27+
title: "GNverifier -- a reconciler and resolver of scientific names against more than 100 data sources."
28+
version: "v1.2.2"
29+
authors:
30+
- family-names: "Mozzherin"
31+
given-names: "Dmitry"
32+
orcid: "https://orcid.org/0000-0003-1593-1417"
33+
repository-code: "https://github.com/gnames/gnverifier"
34+
date-released: "2024-11-04"
35+
doi: 10.5281/zenodo.10070488
36+
license: MIT
37+
license: MIT
38+
message: "If you use this software, please cite it using the metadata from this file."
39+
repository-code: "https://github.com/Imageomics/TaxonoPy"
40+
title: "TaxonoPy"
41+
version: "0.1.0-beta"
42+
doi: "<update-after-doi>"
43+
type: software

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) 2024 Imageomics Institute
3+
Copyright (c) 2025 Imageomics Institute
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

README.md

Lines changed: 13 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,18 @@
11
# TaxonoPy
22

3-
`TaxonoPy` (taxon-o-py) is a command-line tool for creating an internally consistent taxonomic hierarchy using the [Global Names Verifier (gnverifier)](https://github.com/gnames/gnverifier).
3+
`TaxonoPy` (taxon-o-py) is a command-line tool for creating an internally consistent taxonomic hierarchy using the [Global Names Verifier (gnverifier)](https://github.com/gnames/gnverifier). See below for the structure of inputs and outputs.
44

55
## Purpose
6-
The motivation for this package is to create an internally consistent and standardized classification set for organisms in the TreeOfLife-200M (TOL) dataset.
6+
The motivation for this package is to create an internally consistent and standardized classification set for organisms in a large biodiversity dataset composed from different data providers that may use very similar and overlapping but not identical taxonomic hierarchies.
77

8-
This dataset contains over 200 million samples of organisms from four core data providers:
8+
Its development has been driven by its application in the TreeOfLife-200M (TOL) dataset. This dataset contains over 200 million samples of organisms from four core data providers:
99

10-
- The GLobal Biodiversity Information Facility (GBIF)
11-
- BIOSCAN-5M
12-
- FathomNet
13-
- The Encyclopedia of Life (EOL)
10+
- [The GLobal Biodiversity Information Facility (GBIF)](https://www.gbif.org/)
11+
- [BIOSCAN-5M](https://biodiversitygenomics.net/projects/5m-insects/)
12+
- [FathomNet](https://www.fathomnet.org/)
13+
- [The Encyclopedia of Life (EOL)](https://eol.org/)
1414

15-
16-
This package is a tool for creating an internally consistent classification set for a list of organisms whose entries have inconsistent naming.
15+
The names (and classification) of taxa may be (and often are) inconsistent across these resources. This package addresses this problem by creating an internally consistent classification set for such taxa.
1716

1817
### Input
1918

@@ -42,7 +41,7 @@ Taxonomic authorities exist to standardize classification, but ...
4241
- A given organism may be missing from some.
4342

4443
### Solution
45-
`TaxonoPy` uses the taxonomic hierarchies provided by the TOL core data providers to query GNVerifier and create a standardized classification for each sample in the TOL dataset. It prioritizes the GBIF backbone taxonomy, since this represents the largest part of the TOL dataset. Where GBIF misses, backup sources such as the Catalogue of Life and Open Tree of Life (OTOL) taxonomy are used.
44+
`TaxonoPy` uses the taxonomic hierarchies provided by the TOL core data providers to query GNVerifier and create a standardized classification for each sample in the TOL dataset. It prioritizes the [GBIF Backbone Taxonomy](https://verifier.globalnames.org/data_sources/11), since this represents the largest part of the TOL dataset. Where GBIF misses, backup sources such as the [Catalogue of Life](https://verifier.globalnames.org/data_sources/1) and [Open Tree of Life (OTOL) Reference Taxonomy](https://verifier.globalnames.org/data_sources/179) are used.
4645

4746
## Installation
4847

@@ -55,20 +54,6 @@ To install the latest version of `TaxonoPy`, run:
5554
pip install taxonopy
5655
```
5756

58-
### Development Installation with `pip`
59-
60-
Clone the repository and install the package in development mode with an activated virtual environment:
61-
```console
62-
git clone git@github.com:Imageomics/TaxonoPy.git
63-
cd TaxonoPy
64-
```
65-
Set up and activate a virtual environment.
66-
67-
Install the package in development mode:
68-
```console
69-
pip install -e .[dev]
70-
```
71-
7257
### Usage
7358
You may view the help for the command line interface by running:
7459
```console
@@ -96,7 +81,7 @@ options:
9681
--show-config Show current configuration and exit (default: False)
9782
--version Show version number and exit
9883
```
99-
#### Commands: `resolve`
84+
#### Command: `resolve`
10085
The `resolve` command is used to perform taxonomic resolution on a dataset. It takes a directory of Parquet partitions as input and outputs a directory of resolved Parquet partitions.
10186
```
10287
usage: taxonopy resolve [-h] -i INPUT -o OUTPUT_DIR [--output-format {csv,parquet}] [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--log-file LOG_FILE] [--force-input] [--batch-size BATCH_SIZE] [--all-matches]
@@ -128,7 +113,7 @@ Cache Management:
128113
```
129114
It is recommended to keep GNVerifier settings at their defaults.
130115

131-
#### Commands: `trace`
116+
#### Command: `trace`
132117
The `trace` command is used to trace the provenance of a taxonomic entry. It takes a UUID and an input path as arguments and outputs the full path of the entry through TaxonoPy.
133118
```console
134119
usage: taxonopy trace [-h] {entry} ...
@@ -151,7 +136,7 @@ options:
151136
--verbose Show full details including all UUIDs in group
152137
```
153138

154-
#### Commands: `common-names`
139+
#### Command: `common-names`
155140
The `common-names` command is used to merge vernacular names into the resolved output. It takes a directory of resolved Parquet partitions as input and outputs a directory of resolved Parquet partitions with common names.
156141
```console
157142
usage: taxonopy common-names [-h] --resolved-dir ANNOTATION_DIR --output-dir OUTPUT_DIR
@@ -182,21 +167,4 @@ taxonopy common-names \
182167
TaxonoPy creates a cache of the objects associated with input entries for use with the `trace` command. By default, this cache is stored in the `~/.cache/taxonopy` directory.
183168

184169
## Development
185-
186-
This section assumes that you have installed the package in development mode.
187-
188-
### OpenAPI Specification Managment and Type Generation
189-
190-
`TaxonoPy` uses GNVerifier to generate and integrates with its API from its OpenAPI specification.
191-
192-
The script that handles this is `scripts/generate_gnverifier_types.py`, which saves `api_specs/gnverifier_openapi.json` and from this produces `src/taxonopy/types/gnverifier.py`.
193-
194-
To check for changes in the OpenAPI specification, run:
195-
```console
196-
python scripts/generate_gnverifier_types.py
197-
```
198-
199-
If the OpenAPI specification has changed, you will need to decide whether to update the generated types.
200-
201-
The script will save `api_specs/gnverifier_openapi.json.new` and `src/taxonopy/types/gnverifier.py.new` for you to compare with the existing files and decide whether to overwrite them and make any necessary changes to the rest of the codebase.
202-
170+
See the [Wiki Development Page](https://github.com/Imageomics/TaxonoPy/wiki/Development) for development instructions.

docs/architecture.md

Lines changed: 0 additions & 182 deletions
This file was deleted.

0 commit comments

Comments
 (0)