Skip to content

Commit 43eddc8

Browse files
authored
Bump version: 0.28.2 → 0.29.0
Bump version: 0.28.2 → 0.29.0
2 parents fd3a198 + a03279a commit 43eddc8

40 files changed

+974
-280
lines changed

.github/workflows/ci.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ jobs:
2222
runs-on: ubuntu-latest
2323
strategy:
2424
matrix:
25-
python: [3.7, 3.8, 3.9 ]
25+
python: [3.8, 3.9 ]
2626
os: [ubuntu-20.04]
2727
name: Test on Python ${{ matrix.python }}
2828
steps:

.github/workflows/release.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ jobs:
1414
- name: Setup python
1515
uses: actions/setup-python@v1
1616
with:
17-
python-version: '3.7'
17+
python-version: '3.8'
1818
architecture: x64
1919
- name: Install dependencies
2020
run: pip install -r dev-requirements.txt

.gitignore

+6
Original file line numberDiff line numberDiff line change
@@ -109,3 +109,9 @@ venv.bak/
109109
.DS_Store
110110
.vscode/
111111
.Rhistory
112+
113+
# PyCharm
114+
/.idea/
115+
116+
# Temp files
117+
/scratch/

Makefile

+2-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22

33
test:
44
rm -f .coverage
5-
nosetests --verbose --with-coverage --cover-package kb_python tests/* tests/dry/*
5+
pytest --verbose --cov=kb_python tests/* tests/dry/* && coverage report && coverage xml
6+
# nosetests --verbose --with-coverage --cover-package kb_python tests/* tests/dry/*
67

78
check:
89
flake8 kb_python && echo OK

README.md

+4-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# kb-python
2-
![github version](https://img.shields.io/badge/Version-0.28.0-informational)
2+
![github version](https://img.shields.io/badge/Version-0.29.0-informational)
33
[![pypi version](https://img.shields.io/pypi/v/kb-python)](https://pypi.org/project/kb-python/0.28.0/)
44
![python versions](https://img.shields.io/pypi/pyversions/kb_python)
55
![status](https://github.com/pachterlab/kb_python/workflows/CI/badge.svg)
@@ -10,7 +10,7 @@
1010

1111
`kb-python` is a python package for processing single-cell RNA-sequencing. It wraps the [`kallisto` | `bustools`](https://www.kallistobus.tools) single-cell RNA-seq command line tools in order to unify multiple processing workflows.
1212

13-
`kb-python` was developed by [Kyung Hoi (Joseph) Min](https://twitter.com/lioscro) and [A. Sina Booeshaghi](https://twitter.com/sinabooeshaghi) while in [Lior Pachter](https://twitter.com/lpachter)'s lab at Caltech. If you use `kb-python` in a publication please [cite*](#cite):
13+
`kb-python` was first developed by [Kyung Hoi (Joseph) Min](https://twitter.com/lioscro) and [A. Sina Booeshaghi](https://twitter.com/sinabooeshaghi) while in [Lior Pachter](https://twitter.com/lpachter)'s lab at Caltech. If you use `kb-python` in a publication please [cite*](#cite):
1414
```
1515
Melsted, P., Booeshaghi, A.S., et al.
1616
Modular, efficient and constant-memory single-cell RNA-seq preprocessing.
@@ -34,7 +34,7 @@ There are no prerequisite packages to install. The `kallisto` and `bustools` bin
3434

3535
## Usage
3636

37-
`kb` consists of four subcommands
37+
`kb` consists of five subcommands
3838
```bash
3939
$ kb
4040
usage: kb [-h] [--list] <CMD> ...
@@ -44,6 +44,7 @@ positional arguments:
4444
compile Compile `kallisto` and `bustools` binaries from source
4545
ref Build a kallisto index and transcript-to-gene mapping
4646
count Generate count matrices from a set of single-cell FASTQ files
47+
extract Extract reads that were pseudoaligned to specific genes/transcripts (or extract all reads that were / were not pseudoaligned)
4748
```
4849

4950
### `kb ref`: generate a pseudoalignment index

dev-requirements.txt

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
bumpversion==0.6.0
2-
coverage==5.1
2+
coverage==5.2.1
33
flake8==3.8.2
4-
nose==1.3.7
4+
pytest==8.2.2
5+
pytest-cov==5.0.0
56
pre-commit==2.4.0
67
sphinx>=3.3.1
78
sphinx-autoapi>=1.5.1

docs/conf.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
author = 'Kyung Hoi (Joseph) Min'
2525

2626
# The full version, including alpha/beta/rc tags
27-
release = '0.28.2'
27+
release = '0.29.0'
2828
master_doc = 'index'
2929

3030
# -- General configuration ---------------------------------------------------

docs/index.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
Welcome to kb-python's documentation!
77
=====================================
88

9-
This page contains **DEVELOPER** documentation for ``kb-python`` version ``0.28.2``.
9+
This page contains **DEVELOPER** documentation for ``kb-python`` version ``0.29.0``.
1010
For user documentation and tutorials, please go to `kallisto | bustools <https://www.kallistobus.tools/>`_.
1111

1212
Development Prerequisites
@@ -18,7 +18,7 @@ necessary packages by running::
1818
pip install -r requirements.txt
1919
pip install -r dev-requirements.txt
2020

21-
Code qualty and unit tests are strictly enforced for every pull request via
21+
Code quality and unit tests are strictly enforced for every pull request via
2222
Github actions.
2323

2424
Code Quality
@@ -33,7 +33,7 @@ at the root of the repository.
3333

3434
Unit-testing
3535
""""""""""""
36-
``kb-python`` uses ``nose`` to run unit tests. There is a convenient Makefile
36+
``kb-python`` uses ``pytest`` to run unit tests. There is a convenient Makefile
3737
rule in place to run all tests.::
3838

3939
make test

kb_python/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = '0.28.2'
1+
__version__ = '0.29.0'
15.3 KB
Binary file not shown.
12.5 KB
Binary file not shown.
2.19 MB
Binary file not shown.
2.13 MB
Binary file not shown.
Binary file not shown.
15.3 KB
Binary file not shown.
103 KB
Binary file not shown.
1.91 MB
Binary file not shown.
Binary file not shown.
Binary file not shown.
2.92 KB
Binary file not shown.
-560 KB
Binary file not shown.
8.07 MB
Binary file not shown.
8.02 MB
Binary file not shown.
Binary file not shown.
3.18 KB
Binary file not shown.
-1.1 MB
Binary file not shown.
10.4 MB
Binary file not shown.
Binary file not shown.
Binary file not shown.

kb_python/config.py

+17-3
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,14 @@ def get_provided_kallisto_path() -> Optional[str]:
3434
Returns:
3535
Path to the binary, `None` if not found
3636
"""
37-
bin_filename = 'kallisto.exe' if PLATFORM == 'windows' else 'kallisto'
37+
bin_name = 'kallisto'
38+
if '_KALLISTO_OPTOFF' in globals():
39+
if _KALLISTO_OPTOFF:
40+
bin_name = f'{bin_name}_optoff'
41+
if '_KALLISTO_KMER_64' in globals():
42+
if _KALLISTO_KMER_64:
43+
bin_name = f'{bin_name}_k64'
44+
bin_filename = f'{bin_name}.exe' if PLATFORM == 'windows' else bin_name
3845
path = os.path.join(BINS_DIR, PLATFORM, CPU, 'kallisto', bin_filename)
3946
if not os.path.isfile(path):
4047
return None
@@ -54,11 +61,18 @@ def get_provided_bustools_path() -> Optional[str]:
5461
return path
5562

5663

64+
def set_special_kallisto_binary(k64: bool, optoff: bool):
65+
global _KALLISTO_KMER_64
66+
global _KALLISTO_OPTOFF
67+
_KALLISTO_KMER_64 = k64
68+
_KALLISTO_OPTOFF = optoff
69+
70+
5771
def get_compiled_kallisto_path(alias: str = COMPILED_DIR) -> Optional[str]:
5872
"""Finds platform-dependent kallisto binary compiled with `compile`.
5973
6074
Args:
61-
Alias: Alias of compiled binary.
75+
alias: Alias of compiled binary.
6276
6377
Returns:
6478
Path to the binary, `None` if not found
@@ -74,7 +88,7 @@ def get_compiled_bustools_path(alias: str = COMPILED_DIR) -> Optional[str]:
7488
"""Finds platform-dependent bustools binary compiled with `compile`.
7589
7690
Args:
77-
Alias: Alias of compiled binary.
91+
alias: Alias of compiled binary.
7892
7993
Returns:
8094
Path to the binary, `None` if not found

kb_python/count.py

+83-8
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,11 @@ def kallisto_bus(
105105
demultiplexed: bool = False,
106106
batch_barcodes: bool = False,
107107
numreads: int = None,
108+
lr: bool = False,
109+
lr_thresh: float = 0.8,
110+
lr_error_rate: float = None,
111+
union: bool = False,
112+
no_jump: bool = False,
108113
) -> Dict[str, str]:
109114
"""Runs `kallisto bus`.
110115
@@ -133,6 +138,11 @@ def kallisto_bus(
133138
demultiplexed: Whether FASTQs are demultiplexed, defaults to `False`
134139
batch_barcodes: Whether sample ID should be in barcode, defaults to `False`
135140
numreads: Maximum number of reads to process from supplied input
141+
lr: Whether to use lr-kallisto in read mapping, defaults to `False`
142+
lr_thresh: Sets the --threshold for lr-kallisto, defaults to `0.8`
143+
lr_error_rate: Sets the --error-rate for lr-kallisto, defaults to `None`
144+
union: Use set union for pseudoalignment, defaults to `False`
145+
no_jump: Disable pseudoalignment "jumping", defaults to `False`
136146
137147
Returns:
138148
Dictionary containing paths to generated files
@@ -194,6 +204,16 @@ def kallisto_bus(
194204
command += ['--rf-stranded']
195205
if inleaved:
196206
command += ['--inleaved']
207+
if lr:
208+
command += ['--long']
209+
if lr and lr_thresh:
210+
command += ['-r', str(lr_thresh)]
211+
if lr and lr_error_rate:
212+
command += ['-e', str(lr_error_rate)]
213+
if union:
214+
command += ['--union']
215+
if no_jump:
216+
command += ['--no-jump']
197217
if batch_barcodes:
198218
command += ['--batch-barcodes']
199219
if is_batch:
@@ -224,12 +244,14 @@ def kallisto_quant_tcc(
224244
matrix_to_files: bool = False,
225245
matrix_to_directories: bool = False,
226246
no_fragment: bool = False,
247+
lr: bool = False,
248+
lr_platform: str = 'ONT',
227249
) -> Dict[str, str]:
228250
"""Runs `kallisto quant-tcc`.
229251
230252
Args:
231253
mtx_path: Path to counts matrix
232-
saved_index_path: Path to index.saved
254+
saved_index_path: Path to index
233255
ecmap_path: Path to ecmap
234256
t2g_path: Path to T2G
235257
out_dir: Output directory path
@@ -241,6 +263,8 @@ def kallisto_quant_tcc(
241263
matrix_to_files: Whether to write quant-tcc output to files, defaults to `False`
242264
matrix_to_directories: Whether to write quant-tcc output to directories, defaults to `False`
243265
no_fragment: Whether to disable quant-tcc effective length normalization, defaults to `False`
266+
lr: Whether to use lr-kallisto in quantification, defaults to `False`
267+
lr_platform: Sets the --platform for lr-kallisto, defaults to `ONT`
244268
245269
Returns:
246270
Dictionary containing path to output files
@@ -255,6 +279,10 @@ def kallisto_quant_tcc(
255279
command += ['-e', ecmap_path]
256280
command += ['-g', t2g_path]
257281
command += ['-t', threads]
282+
if lr:
283+
command += ['--long']
284+
if lr and lr_platform:
285+
command += ['-P', lr_platform]
258286
if flens_path and not no_fragment:
259287
command += ['-f', flens_path]
260288
if l and not no_fragment:
@@ -1178,6 +1206,14 @@ def count(
11781206
no_fragment: bool = False,
11791207
numreads: int = None,
11801208
store_num: bool = False,
1209+
lr: bool = False,
1210+
lr_thresh: float = 0.8,
1211+
lr_error_rate: float = None,
1212+
lr_platform: str = 'ONT',
1213+
union: bool = False,
1214+
no_jump: bool = False,
1215+
quant_umis: bool = False,
1216+
keep_flags: bool = False,
11811217
) -> Dict[str, Union[str, Dict[str, str]]]:
11821218
"""Generates count matrices for single-cell RNA seq.
11831219
@@ -1242,6 +1278,14 @@ def count(
12421278
no_fragment: Whether to disable quant-tcc effective length normalization, defaults to `False`
12431279
numreads: Maximum number of reads to process from supplied input
12441280
store_num: Whether to store read numbers in BUS file, defaults to `False`
1281+
lr: Whether to use lr-kallisto in read mapping, defaults to `False`
1282+
lr_thresh: Sets the --threshold for lr-kallisto, defaults to `0.8`
1283+
lr_error_rate: Sets the --error-rate for lr-kallisto, defaults to `None`
1284+
lr_platform: Sets the --platform for lr-kallisto, defaults to `ONT`
1285+
union: Use set union for pseudoalignment, defaults to `False`
1286+
no_jump: Disable pseudoalignment "jumping", defaults to `False`
1287+
quant_umis: Whether to run quant-tcc when there are UMIs, defaults to `False`
1288+
keep_flags: Preserve flag column when sorting BUS file, defaults to `False`
12451289
12461290
Returns:
12471291
Dictionary containing paths to generated files
@@ -1292,7 +1336,12 @@ def count(
12921336
demultiplexed=demultiplexed,
12931337
batch_barcodes=batch_barcodes,
12941338
numreads=numreads,
1295-
n=store_num
1339+
n=store_num,
1340+
lr=lr,
1341+
lr_thresh=lr_thresh,
1342+
lr_error_rate=lr_error_rate,
1343+
union=union,
1344+
no_jump=no_jump
12961345
)
12971346
else:
12981347
logger.info(
@@ -1309,7 +1358,7 @@ def count(
13091358
temp_dir=temp_dir,
13101359
threads=threads,
13111360
memory=memory,
1312-
store_num=store_num
1361+
store_num=store_num and not keep_flags
13131362
)
13141363
correct = True
13151364
if whitelist_path and whitelist_path.upper() == "NONE":
@@ -1404,6 +1453,9 @@ def update_results_with_suffix(current_results, new_results, suffix):
14041453
technology.upper() in ('BULK', 'SMARTSEQ2', 'SMARTSEQ3')
14051454
) or ignore_umis
14061455
quant = cm and tcc
1456+
if quant_umis:
1457+
quant = True
1458+
no_fragment = True
14071459
suffix_to_inspect_filename = {'': ''}
14081460
if (technology.upper() == 'SMARTSEQ3'):
14091461
suffix_to_inspect_filename = {
@@ -1518,6 +1570,8 @@ def update_results_with_suffix(current_results, new_results, suffix):
15181570
matrix_to_files=matrix_to_files,
15191571
matrix_to_directories=matrix_to_directories,
15201572
no_fragment=no_fragment,
1573+
lr=lr,
1574+
lr_platform=lr_platform,
15211575
)
15221576
update_results_with_suffix(
15231577
unfiltered_results, quant_result, suffix
@@ -1695,6 +1749,14 @@ def count_nac(
16951749
batch_barcodes: bool = False,
16961750
numreads: int = None,
16971751
store_num: bool = False,
1752+
lr: bool = False,
1753+
lr_thresh: float = 0.8,
1754+
lr_error_rate: float = None,
1755+
lr_platform: str = 'ONT',
1756+
union: bool = False,
1757+
no_jump: bool = False,
1758+
quant_umis: bool = False,
1759+
keep_flags: bool = False,
16981760
) -> Dict[str, Union[Dict[str, str], str]]:
16991761
"""Generates RNA velocity matrices for single-cell RNA seq.
17001762
@@ -1756,6 +1818,14 @@ def count_nac(
17561818
batch_barcodes: Whether sample ID should be in barcode, defaults to `False`
17571819
numreads: Maximum number of reads to process from supplied input
17581820
store_num: Whether to store read numbers in BUS file, defaults to `False`
1821+
lr: Whether to use lr-kallisto in read mapping, defaults to `False`
1822+
lr_thresh: Sets the --threshold for lr-kallisto, defaults to `0.8`
1823+
lr_error_rate: Sets the --error-rate for lr-kallisto, defaults to `None`
1824+
lr_platform: Sets the --platform for lr-kallisto, defaults to `ONT`
1825+
union: Use set union for pseudoalignment, defaults to `False`
1826+
no_jump: Disable pseudoalignment "jumping", defaults to `False`
1827+
quant_umis: Whether to run quant-tcc when there are UMIs, defaults to `False`
1828+
keep_flags: Preserve flag column when sorting BUS file, defaults to `False`
17591829
17601830
Returns:
17611831
Dictionary containing path to generated index
@@ -1803,7 +1873,12 @@ def count_nac(
18031873
demultiplexed=demultiplexed,
18041874
batch_barcodes=batch_barcodes,
18051875
numreads=numreads,
1806-
n=store_num
1876+
n=store_num,
1877+
lr=lr,
1878+
lr_thresh=lr_thresh,
1879+
lr_error_rate=lr_error_rate,
1880+
union=union,
1881+
no_jump=no_jump
18071882
)
18081883
else:
18091884
logger.info(
@@ -1820,7 +1895,7 @@ def count_nac(
18201895
temp_dir=temp_dir,
18211896
threads=threads,
18221897
memory=memory,
1823-
store_num=store_num
1898+
store_num=store_num and not keep_flags
18241899
)
18251900
correct = True
18261901
if whitelist_path and whitelist_path.upper() == "NONE":
@@ -2073,8 +2148,8 @@ def update_results_with_suffix(current_results, new_results, suffix):
20732148
if batch_barcodes else None for prefix in prefixes
20742149
],
20752150
genes_paths=[
2076-
unfiltered_results[prefix][f'txnames{suffix}'] if tcc
2077-
else unfiltered_results[prefix].get(f'genes{suffix}')
2151+
unfiltered_results[prefix][f'ec{suffix}'] if tcc else
2152+
unfiltered_results[prefix].get(f'genes{suffix}')
20782153
for prefix in prefixes
20792154
],
20802155
t2g_path=t2g_path,
@@ -2975,7 +3050,7 @@ def update_results_with_suffix(current_results, new_results, suffix):
29753050
for prefix in prefixes
29763051
],
29773052
genes_paths=[
2978-
unfiltered_results[prefix][f'txnames{suffix}'] if tcc else
3053+
unfiltered_results[prefix][f'ec{suffix}'] if tcc else
29793054
unfiltered_results[prefix].get(f'genes{suffix}')
29803055
for prefix in prefixes
29813056
],

0 commit comments

Comments
 (0)