Skip to content

Commit

Permalink
Merge pull request #18 from phac-nml/dev
Browse files Browse the repository at this point in the history
0.1.3 Patch Release
  • Loading branch information
sgsutcliffe authored Dec 20, 2024
2 parents b2eac69 + 565bf9c commit 0f13505
Show file tree
Hide file tree
Showing 14 changed files with 311 additions and 122 deletions.
44 changes: 44 additions & 0 deletions .github/workflows/branch.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: GAS branch protection
# This workflow is triggered on PRs to main branch on the repository
# It fails when someone tries to make a PR against the phac-nml `main` branch instead of `dev`
on:
pull_request_target:
branches: [main]

jobs:
test:
runs-on: ubuntu-latest
steps:
# PRs to the phac-nml repo main branch are only ok if coming from the phac-nml repo `dev` or any `patch` branches
- name: Check PRs
if: github.repository == 'phac-nml/genomic_address_service'
run: |
{ [[ ${{github.event.pull_request.head.repo.full_name }} == phac-nml/genomic_address_service ]] && [[ $GITHUB_HEAD_REF == "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]]
# If the above check failed, post a comment on the PR explaining the failure
# NOTE - this doesn't currently work if the PR is coming from a fork, due to limitations in GitHub actions secrets
- name: Post PR comment
if: failure()
uses: mshick/add-pr-comment@b8f338c590a895d50bcbfa6c5859251edc8952fc # v2
with:
message: |
## This PR is against the `main` branch :x:
* Do not close this PR
* Click _Edit_ and change the `base` to `dev`
* This CI test will remain failed until you push a new commit
---
Hi @${{ github.event.pull_request.user.login }},
It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `main` branch.
The `main` branch on phac-nml repositories should always contain code from the latest release.
Because of this, PRs to `main` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.
You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.
Thanks again for your contribution!
repo-token: ${{ secrets.GITHUB_TOKEN }}
allow-repeats: false
46 changes: 46 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python

name: Python package

on:
push:
branches:
- dev
pull_request:
release:
types: [published]


jobs:
test:
name: Run pytest
# Only run on push if this is the phac-nml dev branch (merged PRs)
if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'phac-nml/genomic_address_service') }}"
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.10","3.12"]

steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v3
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install flake8 pytest
pip install .
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
pytest
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.DS_Store
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,18 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.3] - 2024-12-20

### `Fixed`

- Converted `data[sample_id]` to a string in the `format_df` function with `assign.py` to prevent `AttributeErrors` when non-string values are in the genomic address. [PR14](https://github.com/phac-nml/genomic_address_service/pull/14)
- Updated `buildNewick` formula to use cophenetic distances for branch lengths, aligning cluster visualization with BioNumerics dendrogram representation. [PR15](https://github.com/phac-nml/genomic_address_service/pull/15)

### `Added`

- Fixed pytests [PR7](https://github.com/phac-nml/genomic_address_service/pull/7)
- Added github actions for pytest and branch protection [PR7](https://github.com/phac-nml/genomic_address_service/pull/7)

## v1.0dev - [date]

Initial release of phac-nml/genomic_address_service
Expand All @@ -16,3 +28,5 @@ Changed README format to standard DAAD README, added useage arguments.
### `Dependencies`

### `Deprecated`

[0.1.3]: https://github.com/phac-nml/genomic_address_service/releases/tag/0.1.3
2 changes: 1 addition & 1 deletion genomic_address_service/classes/assign.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ def format_df(self,data,delim='.'):
self.error_samples = []
membership = {}
for sample_id in data:
address = data[sample_id].split(delim)
address = str(data[sample_id]).split(delim)
if len(address) != num_thresholds:
self.error_samples.append(sample_id)
continue
Expand Down
113 changes: 0 additions & 113 deletions genomic_address_service/classes/matrix_splitter.py

This file was deleted.

2 changes: 1 addition & 1 deletion genomic_address_service/classes/multi_level_clustering.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def buildNewick(self,node, newick, parentdist, leaf_names):
return "%s:%f%s" % (leaf_names[node.id], parentdist - node.dist, newick)
else:
if len(newick) > 0:
newick = f"):{(parentdist - node.dist) / 2}{newick}"
newick = f"):{parentdist - node.dist}{newick}"
else:
newick = ");"
newick = self.buildNewick(node.get_left(), newick, node.dist, leaf_names)
Expand Down
2 changes: 1 addition & 1 deletion genomic_address_service/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '0.1.2'
__version__ = '0.1.3'
15 changes: 9 additions & 6 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def read(fname):
name='genomic_address_service',
include_package_data=True,
version=__version__,
python_requires='>=3.8.2,<4',
python_requires='>=3.10.0,<3.13.0',
setup_requires=['pytest-runner'],
tests_require=['pytest'],
packages=find_packages(exclude=['tests']),
Expand All @@ -48,13 +48,16 @@ def read(fname):
},

install_requires=[
'pyarrow==12.0.0',
'fastparquet==2023.4.0',
'numba==0.57.1',
'numpy==1.24.4',
'tables==3.8.0',
'pyarrow>=14.0.0',
'numba==0.59.1',
'numpy==1.26.4',
'tables==3.9.1',
'six>=1.16.0',
'pandas==2.0.2 ',
'pytest==8.3.3',
'scipy==1.14.1',
'psutil==6.1.0',
'fastparquet==2023.4.0' #Will drop support of fastparquet in future versions

],

Expand Down
Empty file added tests/__init__.py
Empty file.
43 changes: 43 additions & 0 deletions tests/test_GAS_assign.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import pytest
import pandas as pd
from tempfile import NamedTemporaryFile
import os
from genomic_address_service.classes.assign import assign

@pytest.fixture
def mock_dist_file():
content = """query_id\tref_id\tdist
q1\tq1\t0.0
q1\tr1\t0.1
q1\tr2\t0.2
"""
with NamedTemporaryFile('w+', suffix='.tsv', delete=False) as tmp:
tmp.write(content)
tmp.flush()
yield tmp.name
os.unlink(tmp.name)

@pytest.fixture
def mock_membership_file():
content = """id\taddress_levels_notsplit
r1\t1.1
r2\t2.1
"""
with NamedTemporaryFile('w+', suffix='.tsv', delete=False) as tmp:
tmp.write(content)
tmp.flush()
yield tmp.name
os.unlink(tmp.name)

def test_initialization(mock_dist_file, mock_membership_file):
threshold_map = {"level_0": 0.1, "level_1": 0.2}
a = assign(dist_file=mock_dist_file, membership_file=mock_membership_file, threshold_map=threshold_map, linkage_method='single', sample_col='id', address_col='address_levels_notsplit', batch_size=100)
assert a.status, "Initialization failed, check error_msgs for details"
assert not a.error_msgs, f"Unexpected errors during initialization: {a.error_msgs}"
assert isinstance(a.memberships_df, pd.DataFrame), "Memberships DataFrame"

def test_check_membership_columns(mock_dist_file, mock_membership_file):
threshold_map = {"level_0": 0.1, "level_1": 0.2}
a = assign(dist_file=mock_dist_file, membership_file=mock_membership_file, threshold_map=threshold_map, linkage_method='single', sample_col='id', address_col='address_levels_notsplit', batch_size=100)
cols = ['level_0', 'level_1']
assert a.check_membership_columns(cols), "Membership column check failed for valid columns"
33 changes: 33 additions & 0 deletions tests/test_GAS_mcluster.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/usr/bin/env python
import os
import tempfile
from genomic_address_service.mcluster import write_clusters # Adjust the import path based on your project structure

def test_write_clusters():
# Create mock cluster data
mock_clusters = {
'1': ['1', '1', '1'],
'2': ['1', '1', '2'],
'3': ['1', '2', '3']
}
num_thresholds = 3
delimiter = "."

# Create a temporary file
temp_file = tempfile.NamedTemporaryFile(delete=False)
try:
# Write mock clusters to the temporary file
write_clusters(mock_clusters, num_thresholds, temp_file.name, delimiter)

# Verify the contents of the file
with open(temp_file.name, 'r') as file:
lines = file.readlines()
# Check the header
assert lines[0].strip() == "id\taddress\tlevel_1\tlevel_2\tlevel_3"
# Check the first line of data
assert lines[1].strip() == "1\t1.1.1\t1\t1\t1"
assert lines[2].strip() == "2\t1.1.2\t1\t1\t2"
assert lines[3].strip() == "3\t1.2.3\t1\t2\t3"
finally:
# Clean up - delete the temporary file
os.remove(temp_file.name)
Loading

0 comments on commit 0f13505

Please sign in to comment.