Skip to content

Data/raw images to quilt #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 70 additions & 36 deletions data/README.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,78 @@
# Data set for _Cell states beyond transcriptomics: integrating structural organization and gene expression in hiPSC-derived cardiomyocytes_

This data package contains the input data for all analyses in the manuscript [_Cell states beyond transcriptomics: integrating structural organization and gene expression in hiPSC-derived cardiomyocytes_](https://www.biorxiv.org/content/10.1101/2020.05.26.081083v1) in a compute-friendly form.
This data package contains the input data for all analyses in the manuscript [_Cell states beyond transcriptomics: integrating structural organization and gene expression in hiPSC-derived cardiomyocytes_](https://doi.org/10.1016/j.cels.2021.05.001) in a compute-friendly form.
Not all of these data were used in the manuscript, but all of the data used in the manuscript are included here.

In this manuscript, we used hiPSC-derived cardiomyocytes as a model system for studying the relationship between transcript abundance and cellular organization as shown below.
![fig1](resources/quilt_data_package_schematic_fig1.png)

## Overview
Notably, we provide 478 fields of view containing approximately 5000 segmented single cells in different stages of cardiomyogenesis, imaged in five channels:
Notably, we provide 2,911 fields of view (FOVs) containing segmented single cells in different stages of cardiomyogenesis. There are 1,215 FOVs
from RNA-FISH experiments (FISH; 12,941 cells) and 1,696 FOVs of live imaged cardiomyocytes (Live; 18,045 cells). The following channels were imaged:
- Brightfield
- Hoechst nuclear stain
- Endogenously GFP-tagged alpha-actinin-2 structure
- Two FISH probes per cell (eight probes overall)
- Two FISH probes per cell (FISH FOVs only; 18 probes overall)

Also included are
- expert annotations of these ~5000 segmented cells
- FISH images of cells without a GFP labeled structure (~30 probes)
- scRNA-seq (Split-seq) data collected on approximately 22,000 cells that underwent similar differentiation protocols as the cells we imaged
- expert scoring of sarcomere structure organization of 6,677 cells (5,755 scored cells from FISH; 922 scored cells from Live)

## Organization
The data in this package is organized into separate data sets, reflecting different data of different types (scRNA-seq vs FISH / image data), and different downstream processing / feature derivation.

The data creation and processing pipeline is organized according to the following schematic:
![Data pipeline schematic](resources/Website_schematic_data_flow_20200310_v2.png)
The data in this package is organized into separate data sets, reflecting different data of different types (FISH/Live image data), and different downstream processing / feature derivation.

The data sets included in this package are:

### Raw 3D images:

```bash
raw_images
├──FISH
├──Live
```

### FISH 2D segmented cells
```bash
├──2d_segmented_fields_fish_1
├──2d_segmented_fields_fish_2
├──2d_segmented_fields_fish_3
├──2d_segmented_fields_fish_4
```

### FISH 2D FOVs used as input to cellprofiler
```bash
├──2d_autocontrasted_fields_and_single_cells_fish_1
├──2d_autocontrasted_fields_and_single_cells_fish_2
├──2d_autocontrasted_fields_and_single_cells_fish_3
├──2d_autocontrasted_fields_and_single_cells_fish_4
```


### Cellprofiler output
```bash
├──2d_autocontrasted_single_cell_features_fish_1
├──2d_autocontrasted_single_cell_features_fish_2
├──2d_autocontrasted_single_cell_features_fish_3
├──2d_autocontrasted_single_cell_features_fish_4
```

### Structure classifier
```bash
cardio_diff_manuscript
├── 2d_autocontrasted_fields_and_single_cells
├── 2d_autocontrasted_single_cell_features
├── 2d_nonstructure_fields
├── 2d_nonstructure_single_cell_features
├── 2d_nuclear_masks
├── 2d_segmented_fields
├── 3d_actn2_segmentation
├── automated_local_and_global_structure
├── manuscript_plots
├── probe_localization
├── probe_structure_classifier
├── scrnaseq_data
└── scrnaseq_raw
├──automated_local_and_global_structure_fish_1
├──automated_local_and_global_structure_fish_2
├──automated_local_and_global_structure_fish_3
├──automated_local_and_global_structure_fish_4
├──automated_local_and_global_structure_live
```

Notably absent from this release are the raw 3D images from which our 2D images are derived.
These will be included shortly.
### Cell features used to make manuscript figures
```bash
revised_manuscript_plots
├──data.csv
```

The data creation and processing pipeline is organized according to the following schematic:
![Data pipeline schematic](resources/Website_schematic_data_flow_20200310_v2.png)


## Access
The data are programmatically accessible via `quilt`, and is also (somewhat) browse-able via this web ui.
Expand Down Expand Up @@ -76,16 +107,19 @@ Instructions for interacting with quilt packages in Python can be found [here](h

## Citation
```
@article {Gerbin2020.05.26.081083,
author = {Gerbin, Kaytlyn A and Grancharova, Tanya and Donovan-Maiye, Rory and Hendershott, Melissa C and Brown, Jackson and Dinh, Stephanie Q and Gehring, Jamie L and Hirano, Matthew and Johnson, Gregory R and Nath, Aditya and Nelson, Angelique and Roco, Charles M and Rosenberg, Alex B and Sluzewski, M Filip and Viana, Matheus P and Yan, Calysta and Zaunbrecher, Rebecca J and Cordes Metzler, Kimberly R and Menon, Vilas and Palecek, Sean P and Seelig, Georg and Gaudreault, Nathalie and Knijnenburg, Theo and Rafelski, Susanne M and Theriot, Julie A and Gunawardane, Ruwanthi N},
title = {Cell states beyond transcriptomics: integrating structural organization and gene expression in hiPSC-derived cardiomyocytes},
elocation-id = {2020.05.26.081083},
year = {2020},
doi = {10.1101/2020.05.26.081083},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2020/05/27/2020.05.26.081083},
eprint = {https://www.biorxiv.org/content/early/2020/05/27/2020.05.26.081083.full.pdf},
journal = {bioRxiv}
@article{Gerbin2021,
author = {Gerbin, K. A. and Grancharova, T. and Donovan-Maiye, R. M. and Hendershott, M. C. and Anderson, H. G. and Brown, J. M. and Chen, J. and Dinh, S. Q. and Gehring, J. L. and Johnson, G. R. and Lee, H. and Nath, A. and Nelson, A. M. and Sluzewski, M. F. and Viana, M. P. and Yan, C. and Zaunbrecher, R. J. and Cordes Metzler, K. R. and Gaudreault, N. and Knijnenburg, T. A. and Rafelski, S. M. and Theriot, J. A. and Gunawardane, R. N.},
title = {Cell states beyond transcriptomics: Integrating structural organization and gene expression in hiPSC-derived cardiomyocytes},
journal = {Cell Syst},
volume = {12},
number = {6},
pages = {670-687 e10},
ISSN = {2405-4720 (Electronic)
2405-4712 (Linking)},
DOI = {10.1016/j.cels.2021.05.001},
url = {https://doi.org/10.1016/j.cels.2021.05.001},
year = {2021},
type = {Journal Article}
}
```

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#!/usr/bin/env python


from pathlib import Path
import quilt3
import fire


def distribute_actn2_pattern_classifier_model(
pkg_dest="aics/integrated_transcriptomics_structural_organization_hipsc_cm",
s3_bucket="s3://allencell",
edit=True,
):

# either edit package if it exists or create new
if edit:
p = quilt3.Package.browse(pkg_dest, registry=s3_bucket)
else:
p = quilt3.Package()

# fetch internal package
internal_package = quilt3.Package.browse(
"matheus/assay_dev_actn2_classifier", "s3://allencell-internal-quilt"
)
internal_package.fetch(
"/allen/aics/gene-editing/FISH/2019/assay_dev_classifier_model/"
)
data_dir = Path("/allen/aics/gene-editing/FISH/2019/assay_dev_classifier_model/")

# copy contents
for path in data_dir.iterdir():
if path.is_dir():
subdir = path.name
for f in path.iterdir():
f_name = f.name
p.set(f"actn2_pattern_ml_classifier_model/{subdir}/{f_name}", f)
else:
f_name = path.name
p.set(f"actn2_pattern_ml_classifier_model/{f_name}", path)

# print(p["actn2_pattern_ml_classifier_model"])

p.push(pkg_dest, s3_bucket, message="actn2_pattern_ml_classifier_model")


if __name__ == "__main__":
fire.Fire(distribute_actn2_pattern_classifier_model)
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#!/usr/bin/env python


from pathlib import Path
import quilt3
import fire


def distribute_actn2_pattern_classifier_train(
pkg_dest="aics/integrated_transcriptomics_structural_organization_hipsc_cm",
s3_bucket="s3://allencell",
edit=True,
):

# either edit package if it exists or create new
if edit:
p = quilt3.Package.browse(pkg_dest, registry=s3_bucket)
else:
p = quilt3.Package()

# fetch internal package
internal_package = quilt3.Package.browse(
"matheus/assay_dev_classifier_train", "s3://allencell-internal-quilt"
)
internal_package.fetch(
"/allen/aics/gene-editing/FISH/2019/assay_dev_classifier_train/"
)
data_dir = Path("/allen/aics/gene-editing/FISH/2019/assay_dev_classifier_train/")

# copy contents
for path in data_dir.iterdir():
if path.is_dir():
subdir = path.name
for f in path.iterdir():
f_name = f.name
p.set(f"actn2_pattern_ml_classifier_train/{subdir}/{f_name}", f)
else:
f_name = path.name
p.set(f"actn2_pattern_ml_classifier_train/{f_name}", path)

# print(p["actn2_pattern_ml_classifier_train"])

p.push(pkg_dest, s3_bucket, message="actn2_pattern_ml_classifier_train")


if __name__ == "__main__":
fire.Fire(distribute_actn2_pattern_classifier_train)
13 changes: 13 additions & 0 deletions data/raw_images/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Raw 3i images

Images are either of live cells or fixed cells (after hcr RNA-FISH)

Channels:
{
"638": 0,
"nuc": 1,
"bf1": 2,
"561": 3,
"bf2": 4,
"488": 5
}
Loading