Skip to content

Commit 1f3758c

Browse files
authored
Reduce package dependencies (#57)
- All dependencies are now listed under optional, except for numpy and biocutils.
1 parent b2b2d18 commit 1f3758c

9 files changed

+77
-50
lines changed

CHANGELOG.md

+4
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# Changelog
22

3+
## Version 0.7.0
4+
5+
- All dependencies are now listed under optional, except for numpy and biocutils.
6+
37
## Version 0.6.1
48

59
- Fix name of the attribute that contains names of dimensions in matrices.

README.md

+32-20
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,12 @@
44

55
# rds2py
66

7-
Parse and construct Python representations for datasets stored in RDS files. `rds2py` supports various base classes from R, and Bioconductor's `SummarizedExperiment` and `SingleCellExperiment` S4 classes. ***For more details, check out [rds2cpp library](https://github.com/LTLA/rds2cpp).***
7+
Parse and construct Python representations for datasets stored in RDS files. `rds2py` supports various base classes from R, and Bioconductor's `SummarizedExperiment` and `SingleCellExperiment` S4 classes. **_For more details, check out [rds2cpp library](https://github.com/LTLA/rds2cpp)._**
88

99
---
10+
1011
**Version 0.5.0** brings major changes to the package,
12+
1113
- Complete overhaul of the codebase using pybind11
1214
- Streamlined readers for R data types
1315
- Updated API for all classes and methods
@@ -18,7 +20,7 @@ Please refer to the [documentation](https://biocpy.github.io/rds2py/) for the la
1820

1921
The package provides:
2022

21-
- Efficient parsing of RDS files with *minimal* memory overhead
23+
- Efficient parsing of RDS files with _minimal_ memory overhead
2224
- Support for R's basic data types and complex S4 objects
2325
- Vectors (numeric, character, logical)
2426
- Factors
@@ -48,51 +50,61 @@ pip install rds2py
4850
pip install rds2py[optional]
4951
```
5052

53+
By default, the package does not install packages to convert python representations to BiocPy classes. Please consider installing all optional dependencies.
54+
5155
## Usage
5256

5357
If you do not have an RDS object handy, feel free to download one from [single-cell-test-files](https://github.com/jkanche/random-test-files/releases).
5458

55-
### Basic Usage
56-
5759
```python
5860
from rds2py import read_rds
5961
r_obj = read_rds("path/to/file.rds")
6062
```
6163

6264
The returned `r_obj` either returns an appropriate Python class if a parser is already implemented or returns the dictionary containing the data from the RDS file.
6365

64-
## Write-your-own-reader
66+
In addition, the package provides the dictionary representation of the RDS file.
67+
68+
```python
69+
from rds2py import parse_rds
70+
71+
robject_dict = parse_rds("path/to/file.rds")
72+
print(robject_dict)
73+
```
74+
75+
### Write-your-own-reader
6576

66-
In addition, the package provides the dictionary representation of the RDS file, allowing users to write their own custom readers into appropriate Python representations.
77+
Reading RDS files as dictionary representations allows users to write their own custom readers into appropriate Python representations.
6778

6879
```python
6980
from rds2py import parse_rds
7081

71-
data = parse_rds("path/to/file.rds")
72-
print(data)
82+
robject = parse_rds("path/to/file.rds")
83+
print(robject)
7384
```
7485

7586
if you know this RDS file contains an `GenomicRanges` object, you can use the built-in reader or write your own reader to convert this dictionary.
7687

7788
```python
7889
from rds2py.read_granges import read_genomic_ranges
7990

80-
gr = read_genomic_ranges(data)
91+
gr = read_genomic_ranges(robject)
92+
print(gr)
8193
```
8294

8395
## Type Conversion Reference
8496

85-
| R Type | Python/NumPy Type |
86-
|--------|------------------|
87-
| numeric | numpy.ndarray (float64) |
88-
| integer | numpy.ndarray (int32) |
89-
| character | list of str |
90-
| logical | numpy.ndarray (bool) |
91-
| factor | list |
92-
| data.frame | BiocFrame |
93-
| matrix | numpy.ndarray or scipy.sparse matrix |
94-
| dgCMatrix | scipy.sparse.csc_matrix |
95-
| dgRMatrix | scipy.sparse.csr_matrix |
97+
| R Type | Python/NumPy Type |
98+
| ---------- | ------------------------------------ |
99+
| numeric | numpy.ndarray (float64) |
100+
| integer | numpy.ndarray (int32) |
101+
| character | list of str |
102+
| logical | numpy.ndarray (bool) |
103+
| factor | list |
104+
| data.frame | BiocFrame |
105+
| matrix | numpy.ndarray or scipy.sparse matrix |
106+
| dgCMatrix | scipy.sparse.csc_matrix |
107+
| dgRMatrix | scipy.sparse.csr_matrix |
96108

97109
## Developer Notes
98110

setup.cfg

+7-8
Original file line numberDiff line numberDiff line change
@@ -50,13 +50,7 @@ python_requires = >=3.9
5050
install_requires =
5151
importlib-metadata; python_version<"3.8"
5252
numpy
53-
scipy
54-
biocframe
5553
biocutils>=0.1.5
56-
genomicranges>=0.4.9
57-
summarizedexperiment>=0.4.1
58-
singlecellexperiment>=0.4.1
59-
multiassayexperiment
6054

6155
[options.packages.find]
6256
where = src
@@ -70,14 +64,19 @@ exclude =
7064
optional =
7165
pandas
7266
hdf5array
67+
scipy
68+
biocframe
69+
genomicranges>=0.4.9
70+
summarizedexperiment>=0.4.1
71+
singlecellexperiment>=0.4.1
72+
multiassayexperiment
7373

7474
# Add here test requirements (semicolon/line-separated)
7575
testing =
7676
setuptools
7777
pytest
7878
pytest-cov
79-
pandas
80-
hdf5array
79+
%(optional)s
8180

8281
[options.entry_points]
8382
# Add here console scripts like:

src/rds2py/read_delayed_matrix.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
"""Functions and classes for parsing R delayed matrix objects from HDF5Array."""
22

3-
from hdf5array import Hdf5CompressedSparseMatrix
4-
53
from .generics import _dispatcher
64
from .rdsutils import get_class
75

@@ -10,7 +8,7 @@
108
__license__ = "MIT"
119

1210

13-
def read_hdf5_sparse(robject: dict, **kwargs) -> Hdf5CompressedSparseMatrix:
11+
def read_hdf5_sparse(robject: dict, **kwargs):
1412
"""Convert an R delayed sparse array (H5-backed).
1513
1614
Args:
@@ -38,4 +36,6 @@ def read_hdf5_sparse(robject: dict, **kwargs) -> Hdf5CompressedSparseMatrix:
3836
fpath = list(_dispatcher(_seed_obj["attributes"]["filepath"], **kwargs))[0]
3937
group_name = list(_dispatcher(_seed_obj["attributes"]["group"], **kwargs))[0]
4038

39+
from hdf5array import Hdf5CompressedSparseMatrix
40+
4141
return Hdf5CompressedSparseMatrix(path=fpath, group_name=group_name, shape=shape, by_column=by_column)

src/rds2py/read_granges.py

+8-5
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,6 @@
44
equivalents, preserving all genomic coordinates and associated metadata.
55
"""
66

7-
from genomicranges import GenomicRanges, GenomicRangesList, SeqInfo
8-
from iranges import IRanges
9-
107
from .generics import _dispatcher
118
from .rdsutils import get_class
129

@@ -15,7 +12,7 @@
1512
__license__ = "MIT"
1613

1714

18-
def read_genomic_ranges(robject: dict, **kwargs) -> GenomicRanges:
15+
def read_genomic_ranges(robject: dict, **kwargs):
1916
"""Convert an R `GenomicRanges` object to a Python :py:class:`~genomicranges.GenomicRanges` object.
2017
2118
Args:
@@ -29,6 +26,10 @@ def read_genomic_ranges(robject: dict, **kwargs) -> GenomicRanges:
2926
A Python `GenomicRanges` object containing genomic intervals
3027
with associated annotations.
3128
"""
29+
30+
from genomicranges import GenomicRanges, SeqInfo
31+
from iranges import IRanges
32+
3233
_cls = get_class(robject)
3334

3435
if _cls not in ["GenomicRanges", "GRanges"]:
@@ -74,7 +75,7 @@ def read_genomic_ranges(robject: dict, **kwargs) -> GenomicRanges:
7475
)
7576

7677

77-
def read_granges_list(robject: dict, **kwargs) -> GenomicRangesList:
78+
def read_granges_list(robject: dict, **kwargs):
7879
"""Convert an R `GenomicRangesList` object to a Python :py:class:`~genomicranges.GenomicRangesList`.
7980
8081
Args:
@@ -89,6 +90,8 @@ def read_granges_list(robject: dict, **kwargs) -> GenomicRangesList:
8990
`GenomicRanges` objects.
9091
"""
9192

93+
from genomicranges import GenomicRangesList
94+
9295
_cls = get_class(robject)
9396

9497
if _cls not in ["CompressedGRangesList", "GRangesList"]:

src/rds2py/read_mae.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,6 @@
44
preserving the complex relationships between multiple experimental assays and sample metadata.
55
"""
66

7-
from multiassayexperiment import MultiAssayExperiment
8-
97
from .generics import _dispatcher
108
from .rdsutils import get_class
119
from .read_matrix import MatrixWrapper
@@ -43,7 +41,7 @@ def _sanitize_expts(expts, **kwargs):
4341
return res
4442

4543

46-
def read_multi_assay_experiment(robject: dict, **kwargs) -> MultiAssayExperiment:
44+
def read_multi_assay_experiment(robject: dict, **kwargs):
4745
"""Convert an R `MultiAssayExperiment` to a Python :py:class:`~multiassayexperiment.MultiAssayExperiment` object.
4846
4947
Args:
@@ -73,6 +71,8 @@ def read_multi_assay_experiment(robject: dict, **kwargs) -> MultiAssayExperiment
7371
# parse coldata
7472
robj_coldata = _dispatcher(robject["attributes"]["colData"], **kwargs)
7573

74+
from multiassayexperiment import MultiAssayExperiment
75+
7676
return MultiAssayExperiment(
7777
experiments=_sanitize_expts(robj_expts),
7878
sample_map=robj_samplemap,

src/rds2py/read_matrix.py

+6-5
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88
from typing import Literal
99

1010
from numpy import ndarray
11-
from scipy.sparse import csc_matrix, csr_matrix, spmatrix
1211

1312
from .generics import _dispatcher
1413
from .rdsutils import get_class
@@ -37,7 +36,7 @@ def __init__(self, matrix, dimnames=None) -> None:
3736
self.dimnames = dimnames
3837

3938

40-
def _as_sparse_matrix(robject: dict, **kwargs) -> spmatrix:
39+
def _as_sparse_matrix(robject: dict, **kwargs):
4140
"""Convert an R sparse matrix to a SciPy sparse matrix.
4241
4342
Notes:
@@ -57,6 +56,8 @@ def _as_sparse_matrix(robject: dict, **kwargs) -> spmatrix:
5756
A SciPy sparse matrix or wrapped matrix if dimension names exist.
5857
"""
5958

59+
from scipy.sparse import csc_matrix, csr_matrix
60+
6061
_cls = get_class(robject)
6162

6263
if _cls not in ["dgCMatrix", "dgRMatrix", "dgTMatrix"]:
@@ -145,7 +146,7 @@ def _as_dense_matrix(robject, order: Literal["C", "F"] = "F", **kwargs) -> ndarr
145146
return mat
146147

147148

148-
def read_dgcmatrix(robject: dict, **kwargs) -> spmatrix:
149+
def read_dgcmatrix(robject: dict, **kwargs):
149150
"""Parse an R dgCMatrix (sparse column matrix).
150151
151152
Args:
@@ -161,7 +162,7 @@ def read_dgcmatrix(robject: dict, **kwargs) -> spmatrix:
161162
return _as_sparse_matrix(robject, **kwargs)
162163

163164

164-
def read_dgrmatrix(robject: dict, **kwargs) -> spmatrix:
165+
def read_dgrmatrix(robject: dict, **kwargs):
165166
"""Parse an R dgRMatrix (sparse row matrix).
166167
167168
Args:
@@ -177,7 +178,7 @@ def read_dgrmatrix(robject: dict, **kwargs) -> spmatrix:
177178
return _as_sparse_matrix(robject, **kwargs)
178179

179180

180-
def read_dgtmatrix(robject: dict, **kwargs) -> spmatrix:
181+
def read_dgtmatrix(robject: dict, **kwargs):
181182
"""Parse an R dgTMatrix (sparse triplet matrix)..
182183
183184
Args:

src/rds2py/read_sce.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,6 @@
55
data including multiple assays, reduced dimensions, and alternative experiments.
66
"""
77

8-
from singlecellexperiment import SingleCellExperiment
9-
108
from .generics import _dispatcher
119
from .rdsutils import get_class
1210

@@ -30,7 +28,7 @@ def read_alts_summarized_experiment_by_column(robject: dict, **kwargs):
3028
return objs
3129

3230

33-
def read_single_cell_experiment(robject: dict, **kwargs) -> SingleCellExperiment:
31+
def read_single_cell_experiment(robject: dict, **kwargs):
3432
"""Convert an R SingleCellExperiment to Python SingleCellExperiment.
3533
3634
Args:
@@ -76,6 +74,8 @@ def read_single_cell_experiment(robject: dict, **kwargs) -> SingleCellExperiment
7674
# ignore colpairs for now, does anyone even use this ?
7775
# if col == "colPairs":
7876

77+
from singlecellexperiment import SingleCellExperiment
78+
7979
return SingleCellExperiment(
8080
assays=_rse.assays,
8181
row_data=_rse.row_data,

src/rds2py/read_se.py

+11-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1-
from summarizedexperiment import RangedSummarizedExperiment, SummarizedExperiment
1+
"""Functions for parsing Bioconductor `SummarizedExperiment` objects.
2+
3+
This module provides parsers for converting Bioconductor's `SummarizedExperiment`
4+
objects into their Python equivalents.
5+
"""
26

37
from .generics import _dispatcher
48
from .rdsutils import get_class
@@ -27,7 +31,7 @@ def _sanitize_assays(assays):
2731
return res
2832

2933

30-
def read_summarized_experiment(robject: dict, **kwargs) -> SummarizedExperiment:
34+
def read_summarized_experiment(robject: dict, **kwargs):
3135
"""Convert an R SummarizedExperiment to Python
3236
:py:class:`~summarizedexperiment.SummarizedExperiment.SummarizedExperiment`.
3337
@@ -68,14 +72,16 @@ def read_summarized_experiment(robject: dict, **kwargs) -> SummarizedExperiment:
6872
# parse rowdata
6973
robj_rowdata = _sanitize_empty_frame(_dispatcher(robject["attributes"]["elementMetadata"], **kwargs), assay_dims[0])
7074

75+
from summarizedexperiment import SummarizedExperiment
76+
7177
return SummarizedExperiment(
7278
assays=_sanitize_assays(robj_asys),
7379
row_data=robj_rowdata,
7480
column_data=robj_coldata,
7581
)
7682

7783

78-
def read_ranged_summarized_experiment(robject: dict, **kwargs) -> RangedSummarizedExperiment:
84+
def read_ranged_summarized_experiment(robject: dict, **kwargs):
7985
"""Convert an R RangedSummarizedExperiment to its Python equivalent.
8086
8187
Args:
@@ -102,6 +108,8 @@ def read_ranged_summarized_experiment(robject: dict, **kwargs) -> RangedSummariz
102108
if "rowRanges" in robject["attributes"]:
103109
row_ranges_data = _dispatcher(robject["attributes"]["rowRanges"], **kwargs)
104110

111+
from summarizedexperiment import RangedSummarizedExperiment
112+
105113
return RangedSummarizedExperiment(
106114
assays=_se.assays,
107115
row_data=_se.row_data,

0 commit comments

Comments
 (0)