Imageomics
diff --git a/‎.github/workflows/publish-to-pypi.yaml
Lines changed: 35 additions & 0 deletions b/‎.github/workflows/publish-to-pypi.yaml
Lines changed: 35 additions & 0 deletions
diff --git a/‎CITATION.cff
Lines changed: 43 additions & 0 deletions b/‎CITATION.cff
Lines changed: 43 additions & 0 deletions
diff --git a/‎LICENSE
Lines changed: 1 addition & 1 deletion b/‎LICENSE
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md
Lines changed: 13 additions & 45 deletions b/‎README.md
Lines changed: 13 additions & 45 deletions
diff --git a/‎docs/architecture.md
Lines changed: 0 additions & 182 deletions b/‎docs/architecture.md
Lines changed: 0 additions & 182 deletions
@@ -0,0 +1,35 @@
+name: Publish Python 🐍 distribution 📦 to PyPI
+
+on:
+  release:
+    types: [published]
+
+jobs:
+  build-n-publish:
+    name: Build and publish Python 🐍 distribution 📦 to PyPI
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v4
+    - name: Set up Python
+      uses: actions/setup-python@v5
+      with:
+        python-version: "3.x"
+    - name: Install pypa/build
+      run: >-
+        python3 -m
+        pip install
+        build
+        --user
+    - name: Build a binary wheel and a source tarball
+      run: >-
+        python3 -m
+        build
+        --sdist
+        --wheel
+        --outdir dist/
+        .
+    - name: Publish distribution 📦 to PyPI
+      if: startsWith(github.ref, 'refs/tags')
+      uses: pypa/gh-action-pypi-publish@release/v1
+      with:
+        password: ${{ secrets.PYPI_API_TOKEN }}
@@ -0,0 +1,43 @@
+abstract: "A Python package for efficiently aligning organismal taxonomic hierarchies using the Global Names Verifier."
+authors:
+- family-names: "Thompson"
+  given-names: "Matthew J."
+  orcid: "https://orcid.org/0000-0003-0583-8585"
+- family-names: "Campolongo"
+  given-names: "Elizabeth G."
+  orcid: "https://orcid.org/0000-0003-0846-2413"
+cff-version: 1.2.0
+date-released: "2025-05-23"
+identifiers:
+  - description: "The GitHub release URL of tag v0.1.0-beta."
+    type: url
+    value: "https://github.com/Imageomics/TaxonoPy/releases/tag/v0.1.0-beta"
+  - description: "The GitHub URL of the commit tagged with v0.1.0-beta"
+    type: url
+    value: "https://github.com/Imageomics/TaxonoPy/tree/<update-after-release>"
+keywords:
+  - imageomics
+  - taxonomy
+  - "taxonomic resolution"
+  - "tree of life"
+  - alignment
+  - hierarchy
+references:
+  - type: software
+    title: "GNverifier -- a reconciler and resolver of scientific names against more than 100 data sources."
+    version: "v1.2.2"
+    authors:
+      - family-names: "Mozzherin"
+        given-names: "Dmitry"
+        orcid: "https://orcid.org/0000-0003-1593-1417"
+    repository-code: "https://github.com/gnames/gnverifier"
+    date-released: "2024-11-04"
+    doi: 10.5281/zenodo.10070488
+    license: MIT
+license: MIT
+message: "If you use this software, please cite it using the metadata from this file."
+repository-code: "https://github.com/Imageomics/TaxonoPy"
+title: "TaxonoPy"
+version: "0.1.0-beta"
+doi: "<update-after-doi>"
+type: software
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2024 Imageomics Institute
+Copyright (c) 2025 Imageomics Institute
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 
@@ -1,19 +1,18 @@
 # TaxonoPy
 
-`TaxonoPy` (taxon-o-py) is a command-line tool for creating an internally consistent taxonomic hierarchy using the [Global Names Verifier (gnverifier)](https://github.com/gnames/gnverifier). 
+`TaxonoPy` (taxon-o-py) is a command-line tool for creating an internally consistent taxonomic hierarchy using the [Global Names Verifier (gnverifier)](https://github.com/gnames/gnverifier). See below for the structure of inputs and outputs.
 
 ## Purpose
-The motivation for this package is to create an internally consistent and standardized classification set for organisms in the TreeOfLife-200M (TOL) dataset.
+The motivation for this package is to create an internally consistent and standardized classification set for organisms in a large biodiversity dataset composed from different data providers that may use very similar and overlapping but not identical taxonomic hierarchies.
 
-This dataset contains over 200 million samples of organisms from four core data providers:
+Its development has been driven by its application in the TreeOfLife-200M (TOL) dataset. This dataset contains over 200 million samples of organisms from four core data providers:
 
-- The GLobal Biodiversity Information Facility (GBIF)
-- BIOSCAN-5M
-- FathomNet
-- The Encyclopedia of Life (EOL)
+- [The GLobal Biodiversity Information Facility (GBIF)](https://www.gbif.org/)
+- [BIOSCAN-5M](https://biodiversitygenomics.net/projects/5m-insects/)
+- [FathomNet](https://www.fathomnet.org/)
+- [The Encyclopedia of Life (EOL)](https://eol.org/)
 
-
-This package is a tool for creating an internally consistent classification set for a list of organisms whose entries have inconsistent naming. 
+The names (and classification) of taxa may be (and often are) inconsistent across these resources. This package addresses this problem by creating an internally consistent classification set for such taxa. 
 
 ### Input
 
@@ -42,7 +41,7 @@ Taxonomic authorities exist to standardize classification, but ...
 - A given organism may be missing from some.
 
 ### Solution
-`TaxonoPy` uses the taxonomic hierarchies provided by the TOL core data providers to query GNVerifier and create a standardized classification for each sample in the TOL dataset. It prioritizes the GBIF backbone taxonomy, since this represents the largest part of the TOL dataset. Where GBIF misses, backup sources such as the Catalogue of Life and Open Tree of Life (OTOL) taxonomy are used.
+`TaxonoPy` uses the taxonomic hierarchies provided by the TOL core data providers to query GNVerifier and create a standardized classification for each sample in the TOL dataset. It prioritizes the [GBIF Backbone Taxonomy](https://verifier.globalnames.org/data_sources/11), since this represents the largest part of the TOL dataset. Where GBIF misses, backup sources such as the [Catalogue of Life](https://verifier.globalnames.org/data_sources/1) and [Open Tree of Life (OTOL) Reference Taxonomy](https://verifier.globalnames.org/data_sources/179) are used.
 
 ## Installation
 
@@ -55,20 +54,6 @@ To install the latest version of `TaxonoPy`, run:
 pip install taxonopy
 ```
 
-### Development Installation with `pip`
-
-Clone the repository and install the package in development mode with an activated virtual environment:
-```console
-git clone git@github.com:Imageomics/TaxonoPy.git
-cd TaxonoPy
-```
-Set up and activate a virtual environment.
-
-Install the package in development mode:
-```console
-pip install -e .[dev]
-```
-
 ### Usage
 You may view the help for the command line interface by running:
 ```console
@@ -96,7 +81,7 @@ options:
   --show-config         Show current configuration and exit (default: False)
   --version             Show version number and exit
 ```
-#### Commands: `resolve`
+#### Command: `resolve`
 The `resolve` command is used to perform taxonomic resolution on a dataset. It takes a directory of Parquet partitions as input and outputs a directory of resolved Parquet partitions.
 ```
 usage: taxonopy resolve [-h] -i INPUT -o OUTPUT_DIR [--output-format {csv,parquet}] [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--log-file LOG_FILE] [--force-input] [--batch-size BATCH_SIZE] [--all-matches]
@@ -128,7 +113,7 @@ Cache Management:
   ```
 It is recommended to keep GNVerifier settings at their defaults.
 
-#### Commands: `trace`
+#### Command: `trace`
 The `trace` command is used to trace the provenance of a taxonomic entry. It takes a UUID and an input path as arguments and outputs the full path of the entry through TaxonoPy.
 ```console
 usage: taxonopy trace [-h] {entry} ...
@@ -151,7 +136,7 @@ options:
   --verbose             Show full details including all UUIDs in group
 ```
 
-#### Commands: `common-names`
+#### Command: `common-names`
 The `common-names` command is used to merge vernacular names into the resolved output. It takes a directory of resolved Parquet partitions as input and outputs a directory of resolved Parquet partitions with common names.
 ```console
 usage: taxonopy common-names [-h] --resolved-dir ANNOTATION_DIR --output-dir OUTPUT_DIR
@@ -182,21 +167,4 @@ taxonopy common-names \
 TaxonoPy creates a cache of the objects associated with input entries for use with the `trace` command. By default, this cache is stored in the `~/.cache/taxonopy` directory.
 
 ## Development
-
-This section assumes that you have installed the package in development mode.
-
-### OpenAPI Specification Managment and Type Generation
-
-`TaxonoPy` uses GNVerifier to generate and integrates with its API from its OpenAPI specification.
-
-The script that handles this is `scripts/generate_gnverifier_types.py`, which saves `api_specs/gnverifier_openapi.json` and from this produces `src/taxonopy/types/gnverifier.py`.
-
-To check for changes in the OpenAPI specification, run:
-```console
-python scripts/generate_gnverifier_types.py
-```
-
-If the OpenAPI specification has changed, you will need to decide whether to update the generated types. 
-
-The script will save `api_specs/gnverifier_openapi.json.new` and `src/taxonopy/types/gnverifier.py.new` for you to compare with the existing files and decide whether to overwrite them and make any necessary changes to the rest of the codebase.
-
+See the [Wiki Development Page](https://github.com/Imageomics/TaxonoPy/wiki/Development) for development instructions.