Merge pull request #18 from phac-nml/dev

0.1.3 Patch Release
phac-nml · Dec 20, 2024 · 0f13505 · 0f13505
2 parents b2eac69 + 565bf9c
commit 0f13505
Show file tree

Hide file tree

Showing 14 changed files with 311 additions and 122 deletions.
diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml
@@ -0,0 +1,44 @@
+name: GAS branch protection
+# This workflow is triggered on PRs to main branch on the repository
+# It fails when someone tries to make a PR against the phac-nml `main` branch instead of `dev`
+on:
+  pull_request_target:
+    branches: [main]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      # PRs to the phac-nml repo main branch are only ok if coming from the phac-nml repo `dev` or any `patch` branches
+      - name: Check PRs
+        if: github.repository == 'phac-nml/genomic_address_service'
+        run: |
+          { [[ ${{github.event.pull_request.head.repo.full_name }} == phac-nml/genomic_address_service ]] && [[ $GITHUB_HEAD_REF == "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]]
+
+      # If the above check failed, post a comment on the PR explaining the failure
+      # NOTE - this doesn't currently work if the PR is coming from a fork, due to limitations in GitHub actions secrets
+      - name: Post PR comment
+        if: failure()
+        uses: mshick/add-pr-comment@b8f338c590a895d50bcbfa6c5859251edc8952fc # v2
+        with:
+          message: |
+            ## This PR is against the `main` branch :x:
+
+            * Do not close this PR
+            * Click _Edit_ and change the `base` to `dev`
+            * This CI test will remain failed until you push a new commit
+
+            ---
+
+            Hi @${{ github.event.pull_request.user.login }},
+
+            It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `main` branch.
+            The `main` branch on phac-nml repositories should always contain code from the latest release.
+            Because of this, PRs to `main` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.
+
+            You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page.
+            Note that even after this, the test will continue to show as failing until you push a new commit.
+
+            Thanks again for your contribution!
+          repo-token: ${{ secrets.GITHUB_TOKEN }}
+          allow-repeats: false
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,46 @@
+# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
+# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python
+
+name: Python package
+
+on:
+  push:
+    branches:
+      - dev
+  pull_request:
+  release:
+    types: [published]
+
+
+jobs:
+  test:
+    name: Run pytest
+    # Only run on push if this is the phac-nml dev branch (merged PRs)
+    if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'phac-nml/genomic_address_service') }}"
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ["3.10","3.12"]
+
+    steps:
+    - uses: actions/checkout@v4
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v3
+      with:
+        python-version: ${{ matrix.python-version }}
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        python -m pip install flake8 pytest
+        pip install .
+        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
+    - name: Lint with flake8
+      run: |
+        # stop the build if there are Python syntax errors or undefined names
+        flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
+        # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
+        flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
+    - name: Test with pytest
+      run: |
+        pytest
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+.DS_Store
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,6 +3,18 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.1.3] - 2024-12-20
+
+### `Fixed`
+
+- Converted `data[sample_id]` to a string in the `format_df` function with `assign.py` to prevent `AttributeErrors` when non-string values are in the genomic address. [PR14](https://github.com/phac-nml/genomic_address_service/pull/14)
+- Updated `buildNewick` formula to use cophenetic distances for branch lengths, aligning cluster visualization with BioNumerics dendrogram representation. [PR15](https://github.com/phac-nml/genomic_address_service/pull/15)
+
+### `Added`
+
+- Fixed pytests [PR7](https://github.com/phac-nml/genomic_address_service/pull/7)
+- Added github actions for pytest and branch protection [PR7](https://github.com/phac-nml/genomic_address_service/pull/7)
+
 ## v1.0dev - [date]
 
 Initial release of phac-nml/genomic_address_service
@@ -16,3 +28,5 @@ Changed README format to standard DAAD README, added useage arguments.
 ### `Dependencies`
 
 ### `Deprecated`
+
+[0.1.3]: https://github.com/phac-nml/genomic_address_service/releases/tag/0.1.3
diff --git a/genomic_address_service/classes/assign.py b/genomic_address_service/classes/assign.py
@@ -85,7 +85,7 @@ def format_df(self,data,delim='.'):
         self.error_samples = []
         membership = {}
         for sample_id in data:
-            address = data[sample_id].split(delim)
+            address = str(data[sample_id]).split(delim)
             if len(address) != num_thresholds:
                 self.error_samples.append(sample_id)
                 continue

diff --git a/genomic_address_service/classes/matrix_splitter.py b/genomic_address_service/classes/matrix_splitter.py
diff --git a/genomic_address_service/classes/multi_level_clustering.py b/genomic_address_service/classes/multi_level_clustering.py
@@ -54,7 +54,7 @@ def buildNewick(self,node, newick, parentdist, leaf_names):
             return "%s:%f%s" % (leaf_names[node.id], parentdist - node.dist, newick)
         else:
             if len(newick) > 0:
-                newick = f"):{(parentdist - node.dist) / 2}{newick}"
+                newick = f"):{parentdist - node.dist}{newick}"
             else:
                 newick = ");"
             newick = self.buildNewick(node.get_left(), newick, node.dist, leaf_names)

diff --git a/genomic_address_service/version.py b/genomic_address_service/version.py
@@ -1 +1 @@
-__version__ = '0.1.2'
+__version__ = '0.1.3'
diff --git a/setup.py b/setup.py
@@ -30,7 +30,7 @@ def read(fname):
     name='genomic_address_service',
     include_package_data=True,
     version=__version__,
-    python_requires='>=3.8.2,<4',
+    python_requires='>=3.10.0,<3.13.0',
     setup_requires=['pytest-runner'],
     tests_require=['pytest'],
     packages=find_packages(exclude=['tests']),
@@ -48,13 +48,16 @@ def read(fname):
     },
 
     install_requires=[
-        'pyarrow==12.0.0',
-        'fastparquet==2023.4.0',
-        'numba==0.57.1',
-        'numpy==1.24.4',
-        'tables==3.8.0',
+        'pyarrow>=14.0.0',
+        'numba==0.59.1',
+        'numpy==1.26.4',
+        'tables==3.9.1',
         'six>=1.16.0',
         'pandas==2.0.2 ',
+        'pytest==8.3.3',
+        'scipy==1.14.1',
+        'psutil==6.1.0',
+        'fastparquet==2023.4.0' #Will drop support of fastparquet in future versions
 
     ],
 

diff --git a/tests/__init__.py b/tests/__init__.py
diff --git a/tests/test_GAS_assign.py b/tests/test_GAS_assign.py
@@ -0,0 +1,43 @@
+import pytest
+import pandas as pd
+from tempfile import NamedTemporaryFile
+import os
+from genomic_address_service.classes.assign import assign
+
+@pytest.fixture
+def mock_dist_file():
+    content = """query_id\tref_id\tdist
+q1\tq1\t0.0
+q1\tr1\t0.1
+q1\tr2\t0.2
+"""
+    with NamedTemporaryFile('w+', suffix='.tsv', delete=False) as tmp:
+        tmp.write(content)
+        tmp.flush()
+        yield tmp.name
+        os.unlink(tmp.name)
+
+@pytest.fixture
+def mock_membership_file():
+    content = """id\taddress_levels_notsplit
+r1\t1.1
+r2\t2.1
+"""
+    with NamedTemporaryFile('w+', suffix='.tsv', delete=False) as tmp:
+        tmp.write(content)
+        tmp.flush()
+        yield tmp.name
+        os.unlink(tmp.name)
+
+def test_initialization(mock_dist_file, mock_membership_file):
+    threshold_map = {"level_0": 0.1, "level_1": 0.2}
+    a = assign(dist_file=mock_dist_file, membership_file=mock_membership_file, threshold_map=threshold_map, linkage_method='single', sample_col='id', address_col='address_levels_notsplit', batch_size=100)
+    assert a.status, "Initialization failed, check error_msgs for details"
+    assert not a.error_msgs, f"Unexpected errors during initialization: {a.error_msgs}"
+    assert isinstance(a.memberships_df, pd.DataFrame), "Memberships DataFrame"
+
+def test_check_membership_columns(mock_dist_file, mock_membership_file):
+    threshold_map = {"level_0": 0.1, "level_1": 0.2}
+    a = assign(dist_file=mock_dist_file, membership_file=mock_membership_file, threshold_map=threshold_map, linkage_method='single', sample_col='id', address_col='address_levels_notsplit', batch_size=100)
+    cols = ['level_0', 'level_1']
+    assert a.check_membership_columns(cols), "Membership column check failed for valid columns"
diff --git a/tests/test_GAS_mcluster.py b/tests/test_GAS_mcluster.py
@@ -0,0 +1,33 @@
+#!/usr/bin/env python
+import os
+import tempfile
+from genomic_address_service.mcluster import write_clusters  # Adjust the import path based on your project structure
+
+def test_write_clusters():
+    # Create mock cluster data
+    mock_clusters = {
+        '1': ['1', '1', '1'],
+        '2': ['1', '1', '2'],
+        '3': ['1', '2', '3']
+    }
+    num_thresholds = 3
+    delimiter = "."
+
+    # Create a temporary file
+    temp_file = tempfile.NamedTemporaryFile(delete=False)
+    try:
+        # Write mock clusters to the temporary file
+        write_clusters(mock_clusters, num_thresholds, temp_file.name, delimiter)
+
+        # Verify the contents of the file
+        with open(temp_file.name, 'r') as file:
+            lines = file.readlines()
+            # Check the header
+            assert lines[0].strip() == "id\taddress\tlevel_1\tlevel_2\tlevel_3"
+            # Check the first line of data
+            assert lines[1].strip() == "1\t1.1.1\t1\t1\t1"
+            assert lines[2].strip() == "2\t1.1.2\t1\t1\t2"
+            assert lines[3].strip() == "3\t1.2.3\t1\t2\t3"
+    finally:
+        # Clean up - delete the temporary file
+        os.remove(temp_file.name)