Skip to content

RFC95: Generating Study data files #11482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 68 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
d3958d5
Enable HTTP streaming by eliminating unnecessary response copying.
forus Mar 28, 2025
e36e34f
Port clinical export
forus Mar 19, 2025
47c4bff
Enable genetic profile and MAF data export
forus Mar 26, 2025
e416fb9
Enable case list export
forus Mar 26, 2025
b5820f7
Port POC unit tests. Make them pass
forus Mar 26, 2025
f5f846a
Port Export integration smoke test
forus Mar 26, 2025
6d27801
Port import-export functionality test
forus Mar 26, 2025
9795ca0
Enable dynamic study export mode on CI for running tests
forus Mar 26, 2025
147c9fd
Fix import export study data differences
forus Mar 26, 2025
c7d07d4
Refactor the code
forus Mar 31, 2025
07d19eb
Fix order of columsn for the clinical file
forus Mar 31, 2025
d212961
Refactor exporters layer
forus Apr 3, 2025
643f14c
Make test to pass
forus Apr 4, 2025
86cdbf0
Return 404 http status when no study exists
forus Apr 4, 2025
93b6f74
Make sure the rows are ordered by sample/patient IDs
forus Apr 4, 2025
5787b06
Check clinical attributes for duplicates
forus Apr 4, 2025
a795247
Clean code from done TODO comments
forus Apr 4, 2025
e952343
Run study export in low priority custom thread pool
forus Apr 4, 2025
b30768d
Specify the type of MAF exporter
forus Apr 4, 2025
828449e
Should we maybe currupt file if download fails?
forus Apr 4, 2025
5e0855c
Remove unneeded comments
forus Apr 4, 2025
b45c0d7
Make sure input stream closes
forus Apr 9, 2025
e9535cb
Add support of MRNA Expression data type export
forus Apr 9, 2025
c90c35c
Add support for generic data types
forus Apr 11, 2025
1b99977
Add README to the package
forus Apr 11, 2025
e2c69c5
Add MRNA and Generic Assay data type to the test study
forus Apr 14, 2025
d06861b
Remove patient level generic assay data to fix the build
forus Apr 15, 2025
afe269f
Fix special attribute values not showing for the first line
forus Apr 15, 2025
e67414b
Skip generic properties with mistmatching id
forus Apr 15, 2025
95308d8
Fix skipping rows in generic assay data
forus Apr 15, 2025
d44cbdc
Update import/export test study to new fixes
forus Apr 15, 2025
ab4f2b1
Fix metadata tests
forus Apr 16, 2025
46f372c
Fix unit tests
forus Apr 18, 2025
66eeafa
Improve test coverage for all exporters
forus Apr 18, 2025
525bc46
Expand MRNA export support to Z-SCORE and DISCRETE
forus Apr 18, 2025
427a8b0
Support export of cancer types
forus Apr 21, 2025
ca81c77
Do not export patient level generic assay data
forus Apr 21, 2025
3cc4bbe
Fix number of columns for cancer type file
forus Apr 22, 2025
fc7b399
Export clinical timeline aka events
forus Apr 22, 2025
112fe7e
Support protein level data export
forus Apr 23, 2025
187e684
Support more generic assay data types
forus Apr 23, 2025
539d237
Add suport for exporting mutation uncalled data type
forus Apr 23, 2025
ad5f795
Add CNA contineous and log2 data export
forus Apr 23, 2025
d9005c7
Lower case p in phosphosite to mark it as such
forus Apr 23, 2025
621665e
Support methylation data type export
forus Apr 23, 2025
822d982
Support CNA discrete data type
forus Apr 24, 2025
b2a535e
Support export of CNA Segment data
forus Apr 24, 2025
9dc936d
Support Structural Variant Data Export
forus Apr 24, 2025
36ab43d
Support exporting gene panel matrix data
forus Apr 24, 2025
00634db
Remove pipe output as not reliable
forus Apr 24, 2025
ce3e352
Use forward only cursors where possible for memory optimisation
forus Apr 24, 2025
362e7e6
Corrupt zip file intentionally in case of exception
forus Apr 24, 2025
a4bb008
Warn about incorrect format for phosphoprotein, not crash
forus Apr 25, 2025
6a4c4d6
Provide a way to increase timeout for async requests
forus Apr 25, 2025
219a228
Fix sonar cube reported issues
forus Apr 25, 2025
1f3e3b4
Add README.txt file to the exported study data
forus Apr 25, 2025
c20cfdf
Write NA instad of blank string for absent gene panel
forus Apr 25, 2025
dbd19f0
Move code to fail zip streaming to the factory
forus Apr 29, 2025
24f81d1
Add posibility to export study under alternative study id
forus Apr 29, 2025
041d726
Enable filtering exported data by sample id
forus Apr 30, 2025
7c4a40b
Move pre authorization check to the service layer
forus May 1, 2025
5472cdf
Move check for presense of study into the service
forus May 1, 2025
c4ebc2f
Enable downloading virtual studies
forus May 1, 2025
d3d8e79
Use CI session service instead of default remote one
forus May 2, 2025
32f9ce0
Improve splitting gene name to hugo symbol and phosphosite
forus May 2, 2025
2b4c552
Override name, description, pmid and cancer type of virtual study
forus May 2, 2025
b3e8997
Test Virtual Study download that is defined with multiple Materialise…
forus May 2, 2025
525e62e
Refresh dynamic Virtual Studies before export
forus May 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion .github/workflows/integration-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,9 @@ jobs:
cat $PORTAL_SOURCE_DIR/src/main/resources/application.properties | \
sed 's|spring.datasource.url=.*|spring.datasource.url=jdbc:mysql://cbioportal-database:3306/cbioportal?useSSL=false|' | \
sed 's|spring.datasource.username=.*|spring.datasource.username=cbio_user|' | \
sed 's|spring.datasource.password=.*|spring.datasource.password=somepassword|' \
sed 's|spring.datasource.password=.*|spring.datasource.password=somepassword|' | \
sed 's|session.service.url=.*|session.service.url=http://cbioportal-session:5001/api/sessions/my_portal/|' | \
sed 's|dynamic_study_export_mode=.*|dynamic_study_export_mode=true|' \
> application.properties
- name: 'Copy cgds.sql file into Docker Compose'
run: cp ./cbioportal/src/main/resources/db-scripts/cgds.sql ./cbioportal-docker-compose/data/.
Expand Down Expand Up @@ -73,6 +75,11 @@ jobs:
working-directory: ./cbioportal-docker-compose
run: |
$PORTAL_SOURCE_DIR/test/integration/test_load_study.sh
- name: 'TEST - Import and Export of study_es_0_import_export'
if: steps.startup.conclusion == 'success'
working-directory: ./cbioportal-docker-compose
run: |
$PORTAL_SOURCE_DIR/test/integration/test_import_export.sh
- name: 'TEST - Add OncoKB annotations to study'
if: steps.startup.conclusion == 'success'
working-directory: ./cbioportal-docker-compose
Expand Down
4 changes: 4 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,10 @@
<artifactId>mybatis-spring-boot-starter</artifactId>
<version>${mybatis.starter.version}</version>
</dependency>
<dependency>
<groupId>com.zaxxer</groupId>
<artifactId>HikariCP</artifactId>
</dependency>
<dependency>
<groupId>org.redisson</groupId>
<artifactId>redisson</artifactId>
Expand Down
15 changes: 15 additions & 0 deletions src/main/java/org/cbioportal/application/file/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Study Data Export

This package contains the code for exporting study data from the database to a file format. The export process involves several steps, including:
1. Retrieving the study data from the database.
2. Transforming the data into a suitable format for export.
3. Writing the transformed data to a file.

The implementation is done with minimum dependencies on the rest of the code to ensure that the code is lightweight, performant and easy to move to a separate web application if needed.
To make export process take less RAM, the code uses a streaming approach to read and write data. On the database side, the code uses a cursor to read data in chunks, and on the web controller side, the code uses a streaming response to write data in chunks.
This allows the code to handle large datasets without running out of memory.

## Usage

Set `dynamic_study_export_mode` to `true` in the application properties file to enable the dynamic study export mode.
This mode allows the user to export study with `/export/study/{studyId}.zip` link.
367 changes: 367 additions & 0 deletions src/main/java/org/cbioportal/application/file/export/ExportConfig.java

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
package org.cbioportal.application.file.export;

import org.cbioportal.application.file.export.exporters.ExportDetails;
import org.cbioportal.application.file.export.services.CancerStudyMetadataService;
import org.cbioportal.application.file.export.services.ExportService;
import org.cbioportal.application.file.export.services.VirtualStudyAwareExportService;
import org.cbioportal.application.file.utils.ZipOutputStreamWriterFactory;
import org.cbioportal.legacy.utils.config.annotation.ConditionalOnProperty;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.servlet.mvc.method.annotation.StreamingResponseBody;

import java.io.BufferedOutputStream;

@RestController
//How to have only one conditional on property in the config only
// https://stackoverflow.com/questions/62355615/define-a-spring-restcontroller-via-java-configuration
@ConditionalOnProperty(name = "dynamic_study_export_mode", havingValue = "true")
public class ExportController {

private final VirtualStudyAwareExportService exportService;

public ExportController(VirtualStudyAwareExportService exportService) {
this.exportService = exportService;
}

@GetMapping("/export/study/{studyId}.zip")
public ResponseEntity<StreamingResponseBody> downloadStudyData(@PathVariable String studyId) throws Exception {
if (!exportService.isStudyExportable(studyId)) {
return ResponseEntity.notFound().build();
}

StreamingResponseBody stream = outputStream -> {
try (BufferedOutputStream bos = new BufferedOutputStream(outputStream);
ZipOutputStreamWriterFactory zipFactory = new ZipOutputStreamWriterFactory(bos)) {
exportService.exportData(zipFactory, new ExportDetails(studyId));
}
};

return ResponseEntity.ok()
.contentType(new MediaType("application", "zip"))
.header("Content-Disposition", "attachment; filename=\"" + studyId + ".zip\"")
.body(stream);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
package org.cbioportal.application.file.export.exporters;

import org.cbioportal.application.file.export.services.CancerStudyMetadataService;
import org.cbioportal.application.file.model.CancerStudyMetadata;

import java.util.Optional;
import java.util.SequencedMap;

/**
* Exports metadata for a cancer study
*/
public class CancerStudyMetadataExporter extends MetadataExporter<CancerStudyMetadata> {

private final CancerStudyMetadataService cancerStudyMetadataService;

public CancerStudyMetadataExporter(CancerStudyMetadataService cancerStudyMetadataService) {
this.cancerStudyMetadataService = cancerStudyMetadataService;
}

@Override
public String getMetaFilename(CancerStudyMetadata metadata) {
return "meta_study.txt";
}

@Override
protected Optional<CancerStudyMetadata> getMetadata(String studyId) {
return Optional.ofNullable(cancerStudyMetadataService.getCancerStudyMetadata(studyId));
}

@Override
protected void updateMetadata(ExportDetails exportDetails, SequencedMap<String, String> metadataSeqMap) {
super.updateMetadata(exportDetails, metadataSeqMap);
// used primarily while downloading a Virtual Study
CancerStudyMetadata alternativeCancerStudyMetadata = new CancerStudyMetadata();
alternativeCancerStudyMetadata.setCancerStudyIdentifier(exportDetails.getExportWithStudyId());
alternativeCancerStudyMetadata.setName(exportDetails.getExportWithStudyName());
alternativeCancerStudyMetadata.setDescription(exportDetails.getExportAsStudyDescription());
alternativeCancerStudyMetadata.setPmid(exportDetails.getExportWithStudyPmid());
alternativeCancerStudyMetadata.setTypeOfCancer(exportDetails.getExportWithStudyTypeOfCancerId());
alternativeCancerStudyMetadata.toMetadataKeyValues().forEach((key, value) -> {
if (value != null && metadataSeqMap.containsKey(key)) {
metadataSeqMap.put(key, value);
}
});
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
package org.cbioportal.application.file.export.exporters;

import org.cbioportal.application.file.export.services.CancerStudyMetadataService;
import org.cbioportal.application.file.model.CancerType;
import org.cbioportal.application.file.model.ClinicalAttributesMetadata;
import org.cbioportal.application.file.model.Table;
import org.cbioportal.application.file.model.TableRow;
import org.cbioportal.application.file.utils.CloseableIterator;

import java.util.List;
import java.util.Optional;
import java.util.Set;

public class CancerTypeDataTypeExporter extends DataTypeExporter<ClinicalAttributesMetadata, Table> {

private final CancerStudyMetadataService cancerStudyMetadataService;

public CancerTypeDataTypeExporter(CancerStudyMetadataService cancerStudyMetadataService) {
this.cancerStudyMetadataService = cancerStudyMetadataService;
}

@Override
protected Optional<ClinicalAttributesMetadata> getMetadata(String studyId, Set<String> sampleIds) {
return Optional.of(new ClinicalAttributesMetadata(
studyId,
"CANCER_TYPE",
"CANCER_TYPE"
));
}

@Override
public String getDataFilename(ClinicalAttributesMetadata metadata) {
return "data_cancer_type.txt";
}

@Override
public String getMetaFilename(ClinicalAttributesMetadata metadata) {
return "meta_cancer_type.txt";
}

@Override
protected Table getData(String studyId, Set<String> sampleIds) {
List<CancerType> cancerTypes = cancerStudyMetadataService.getCancerTypeHierarchy(studyId);
var iterator = cancerTypes.iterator();
return new Table(new CloseableIterator<>() {
@Override
public void close() {
}

@Override
public boolean hasNext() {
return iterator.hasNext();
}

@Override
public TableRow next() {
return iterator.next();
}
});
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
package org.cbioportal.application.file.export.exporters;

import org.cbioportal.application.file.export.services.CaseListMetadataService;
import org.cbioportal.application.file.model.CaseListMetadata;
import org.cbioportal.application.file.utils.FileWriterFactory;

import java.util.List;
import java.util.Optional;
import java.util.SequencedMap;

/**
* Exports all case lists for a study
*/
public class CaseListsExporter implements Exporter {

private final CaseListMetadataService caseListMetadataService;

public CaseListsExporter(CaseListMetadataService caseListMetadataService) {
this.caseListMetadataService = caseListMetadataService;
}

@Override
public boolean exportData(FileWriterFactory fileWriterFactory, ExportDetails exportDetails) {
List<CaseListMetadata> caseLists = caseListMetadataService.getCaseListsMetadata(exportDetails.getStudyId(), exportDetails.getSampleIds());
boolean exported = false;
for (CaseListMetadata metadata : caseLists) {
exported |= new CaseListExporter(metadata).exportData(fileWriterFactory, exportDetails);
}
return exported;
}

/**
* Exports a case list metadata to a file
*/
public static class CaseListExporter extends MetadataExporter<CaseListMetadata> {

private final CaseListMetadata caseListMetadata;

public CaseListExporter(CaseListMetadata caseListMetadata) {
this.caseListMetadata = caseListMetadata;
}

@Override
public String getMetaFilename(CaseListMetadata metadata) {
return "case_lists/cases_" + metadata.getCaseListTypeStableId() + ".txt";
}

@Override
protected Optional<CaseListMetadata> getMetadata(String studyId) {
return Optional.of(caseListMetadata);
}


@Override
protected void updateMetadata(ExportDetails exportDetails, SequencedMap<String, String> metadataSeqMap) {
super.updateMetadata(exportDetails, metadataSeqMap);
if (exportDetails.getExportWithStudyId() != null) {
metadataSeqMap.put("stable_id", exportDetails.getExportWithStudyId() + "_" + caseListMetadata.getCaseListTypeStableId());
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
package org.cbioportal.application.file.export.exporters;

import org.cbioportal.application.file.export.services.ClinicalAttributeDataService;
import org.cbioportal.application.file.model.ClinicalAttribute;
import org.cbioportal.application.file.model.ClinicalAttributeValue;
import org.cbioportal.application.file.model.ClinicalAttributesMetadata;
import org.cbioportal.application.file.model.ClinicalAttributesTable;
import org.cbioportal.application.file.utils.CloseableIterator;

import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import java.util.Set;

/**
* Export metadata and data for clinical patient attributes.
*/
public class ClinicalPatientAttributesDataTypeExporter extends DataTypeExporter<ClinicalAttributesMetadata, ClinicalAttributesTable> {

private final ClinicalAttributeDataService clinicalDataAttributeDataService;

public ClinicalPatientAttributesDataTypeExporter(ClinicalAttributeDataService clinicalDataAttributeDataService) {
this.clinicalDataAttributeDataService = clinicalDataAttributeDataService;
}

@Override
protected Optional<ClinicalAttributesMetadata> getMetadata(String studyId, Set<String> sampleIds) {
if (!clinicalDataAttributeDataService.hasClinicalPatientAttributes(studyId, sampleIds)) {
return Optional.empty();
}
return Optional.of(new ClinicalAttributesMetadata(studyId, "CLINICAL", "PATIENT_ATTRIBUTES"));
}

@Override
protected ClinicalAttributesTable getData(String studyId, Set<String> sampleIds) {
List<ClinicalAttribute> clinicalPatientAttributes = new ArrayList<>();
clinicalPatientAttributes.add(ClinicalAttribute.PATIENT_ID);
clinicalPatientAttributes.addAll(clinicalDataAttributeDataService.getClinicalPatientAttributes(studyId));
CloseableIterator<ClinicalAttributeValue> clinicalPatientAttributeValues = clinicalDataAttributeDataService.getClinicalPatientAttributeValues(studyId, sampleIds);
return new ClinicalAttributesTable(clinicalPatientAttributes, clinicalPatientAttributeValues);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
package org.cbioportal.application.file.export.exporters;

import org.cbioportal.application.file.export.services.ClinicalAttributeDataService;
import org.cbioportal.application.file.model.ClinicalAttribute;
import org.cbioportal.application.file.model.ClinicalAttributeValue;
import org.cbioportal.application.file.model.ClinicalAttributesMetadata;
import org.cbioportal.application.file.model.ClinicalAttributesTable;
import org.cbioportal.application.file.utils.CloseableIterator;

import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import java.util.Set;

/**
* Export metadata and data for clinical sample attributes.
*/
public class ClinicalSampleAttributesDataTypeExporter extends DataTypeExporter<ClinicalAttributesMetadata, ClinicalAttributesTable> {

private final ClinicalAttributeDataService clinicalDataAttributeDataService;

public ClinicalSampleAttributesDataTypeExporter(ClinicalAttributeDataService clinicalDataAttributeDataService) {
this.clinicalDataAttributeDataService = clinicalDataAttributeDataService;
}

@Override
protected Optional<ClinicalAttributesMetadata> getMetadata(String studyId, Set<String> sampleIds) {
if (!clinicalDataAttributeDataService.hasClinicalSampleAttributes(studyId, sampleIds)) {
return Optional.empty();
}
return Optional.of(new ClinicalAttributesMetadata(studyId, "CLINICAL", "SAMPLE_ATTRIBUTES"));
}

@Override
protected ClinicalAttributesTable getData(String studyId, Set<String> sampleIds) {
List<ClinicalAttribute> clinicalSampleAttributes = new ArrayList<>();
clinicalSampleAttributes.add(ClinicalAttribute.PATIENT_ID);
clinicalSampleAttributes.add(ClinicalAttribute.SAMPLE_ID);
clinicalSampleAttributes.addAll(clinicalDataAttributeDataService.getClinicalSampleAttributes(studyId));
CloseableIterator<ClinicalAttributeValue> clinicalSampleAttributeValues = clinicalDataAttributeDataService.getClinicalSampleAttributeValues(studyId, sampleIds);
return new ClinicalAttributesTable(clinicalSampleAttributes, clinicalSampleAttributeValues);
}
}
Loading
Loading