Skip to content

Commit

Permalink
PIPE-80-megamap-from-hic
Browse files Browse the repository at this point in the history
  • Loading branch information
ottojolanki authored Jul 27, 2022
2 parents 95d886b + 2c17b40 commit 0cd51d2
Show file tree
Hide file tree
Showing 6 changed files with 109 additions and 112 deletions.
10 changes: 5 additions & 5 deletions diploidify.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ import "./hic.wdl"

workflow diploidify {
meta {
version: "1.14.3"
caper_docker: "encodedcc/hic-pipeline:1.14.3"
caper_singularity: "docker://encodedcc/hic-pipeline:1.14.3"
version: "1.15.0"
caper_docker: "encodedcc/hic-pipeline:1.15.0"
caper_singularity: "docker://encodedcc/hic-pipeline:1.15.0"
}

input {
Expand All @@ -33,8 +33,8 @@ workflow diploidify {
Int? create_diploid_dhs_ram_gb
Int? create_diploid_dhs_disk_size_gb

String docker = "encodedcc/hic-pipeline:1.14.3"
String singularity = "docker://encodedcc/hic-pipeline:1.14.3"
String docker = "encodedcc/hic-pipeline:1.15.0"
String singularity = "docker://encodedcc/hic-pipeline:1.15.0"
}

RuntimeEnvironment runtime_environment = {
Expand Down
12 changes: 9 additions & 3 deletions docker/hic-pipeline/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -103,8 +103,8 @@ RUN git clone https://github.com/aidenlab/EigenVector.git && \

RUN git clone https://github.com/ENCODE-DCC/kentUtils_bin_v381.git && \
cd kentUtils_bin_v381/bin && \
chmod +x wigToBigWig bedGraphToBigWig && \
mv wigToBigWig bedGraphToBigWig /usr/local/bin && \
chmod +x wigToBigWig bedGraphToBigWig bigWigMerge && \
mv wigToBigWig bedGraphToBigWig bigWigMerge /usr/local/bin && \
cd ../../ && \
rm -rf kentUtils_bin_v381

Expand All @@ -115,7 +115,7 @@ RUN git clone --branch encode https://github.com/theaidenlab/juicer.git && \
chmod +x CPU/* CPU/common/* misc/* && \
find -mindepth 1 -maxdepth 1 -type d -not -name "CPU" -not -name ".git" -not -name "misc" | xargs rm -rf

# Install Juicer tools
# Install Juicer tools 2.13 and 2.16
RUN curl \
-L \
https://github.com/aidenlab/Juicebox/releases/download/v2.13.06/juicer_tools_2.13.06.jar \
Expand All @@ -125,6 +125,12 @@ RUN curl \
ln -s /opt/juicer/CPU/common/juicer_tools /opt/juicer/CPU/juicer_tools && \
ln -s /opt/juicer/CPU/common/juicer_tools.jar /opt/juicer/CPU/juicer_tools.jar

RUN curl \
-L \
https://github.com/aidenlab/Juicebox/releases/download/v2.16.00/juicer_tools_2.16.00.jar \
-o /opt/juicer/CPU/juicer_tools_2.16.00.jar && \
chmod 666 /opt/juicer/CPU/juicer_tools_2.16.00.jar

RUN curl \
-LO \
https://github.com/aidenlab/Juicebox/releases/download/v.2.14.00/feature_tools.jar
Expand Down
21 changes: 6 additions & 15 deletions docs/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@

[genophase.wdl](../genophase,wdl) takes in BAM files and produces phased variant calls. It is based on [hic2gatk](https://github.com/aidenlab/hic2gatk) and the [3D-DNA pipeline](https://github.com/aidenlab/3d-dna/).

[megamap.wdl](../megamap,wdl) takes in BAM files. It is based on [Juicer](https://github.com/aidenlab/juicer)
[megamap.wdl](../megamap,wdl) takes in HIC and BigWig files. It is based on [Juicer](https://github.com/aidenlab/juicer)

[diploidify.wdl](../diploidify,wdl) takes in BAM files and produces phased variant calls. It is based on [Juicer](https://github.com/aidenlab/juicer) [hic2gatk](https://github.com/aidenlab/hic2gatk) and the [3D-DNA pipeline](https://github.com/aidenlab/3d-dna/).

Expand Down Expand Up @@ -271,18 +271,17 @@ The two haplotype-specific *corrected* accessibility tracks are available under

## Megamap

This is the workflow contained in [megamap.wdl](../megamap,wdl). It is very similar to the main `hic.wdl`. However, it starts from deduplicated bams from a previous run of the `hic.wdl` workflow. It also takes in the `.hic` files from the same previous runs as the bams in order to be able to calculate stats for the merged data.
This is the workflow contained in [megamap.wdl](../megamap,wdl). It is very similar to the main `hic.wdl`. However, it starts from .hic and DNA accessibility raw signal BigWig files from a previous run of the `hic.wdl` workflow.

### Inputs

The required inputs are `megamap.assembly_name`, `megamap.bams`, `megamap.chrom_sizes`, and `megamap.hic_files`. An example input is below:
The required inputs are `megamap.bigwig_files`, `megamap.chrom_sizes`, and `megamap.hic_files`. An example input is below:

```json
{
"megamap.assembly_name": "GRCh38",
"megamap.bams": [
"https://www.encodeproject.org/files/ENCFF194KEQ/@@download/ENCFF194KEQ.bam",
"https://www.encodeproject.org/files/ENCFF169MUQ/@@download/ENCFF169MUQ.bam"
"megamap.bigwig_files": [
"https://www.encodeproject.org/files/ENCFF370ZZP/@@download/ENCFF370ZZP.bigWig",
"https://www.encodeproject.org/files/ENCFF169MUQ/@@download/ENCFF745TVA.bigWig"
],
"megamap.chrom_sizes": "https://www.encodeproject.org/files/GRCh38_no_alt_analysis_set_GCA_000001405.15/@@download/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz",
"megamap.hic_files": [
Expand Down Expand Up @@ -313,14 +312,6 @@ Use the WDL `make_restriction_site_locations.wdl` to generate the restriction si

If a workflow failed, you will see `Failed` status for that ID in `caper list`. To see what went wrong, run `caper debug WORKFLOW_ID`, replacing `WORKFLOW_ID` with the actual workflow ID given by `caper`. It will find the stdout/stderr of the failed tasks and print them to the terminal. Sometimes this information is not enough, and you may need to find the backend logs for the failed task, which capture issues that may have occurred before the task even started executing. The paths to these are available in the workflow metadata JSON which you can get via `caper metadata 1234`. Then, the paths to the logs are available under `.calls.[task_name].[shard_index].backendLogs`.

## Failure during create_hic

If you see `Killed` in the logs, try reducing the number of CPUs for `create_hic` task by setting `"hic.create_hic_num_cpus": 4` in your input JSON.

## Failure during `hic.normalize_assembly_name`

If this task fails, it almost certainly means something is wrong with your installation. Check that `docker` is installed and running if running locally.

## Generic out of memory (OOM) issue

If you `Killed` in the logs for the failed task, this means the machine ran out of RAM. Double the RAM for that failed task by updating the appropriate input (`hic.[task_name]_ram_gb`). Note that on Google Cloud the max allowed ram is ~624 GB, don't try more than that otherwise it will fail.
Expand Down
10 changes: 5 additions & 5 deletions genophase.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ struct RuntimeEnvironment {

workflow genophase {
meta {
version: "1.14.3"
caper_docker: "encodedcc/hic-pipeline:1.14.3"
caper_singularity: "docker://encodedcc/hic-pipeline:1.14.3"
version: "1.15.0"
caper_docker: "encodedcc/hic-pipeline:1.15.0"
caper_singularity: "docker://encodedcc/hic-pipeline:1.15.0"
croo_out_def: "https://raw.githubusercontent.com/ENCODE-DCC/hic-pipeline/dev/croo_out_def.json"
}

Expand All @@ -29,8 +29,8 @@ workflow genophase {
Int? concatenate_bams_disk_size_gb
Boolean no_phasing = false

String docker = "encodedcc/hic-pipeline:1.14.3"
String singularity = "docker://encodedcc/hic-pipeline:1.14.3"
String docker = "encodedcc/hic-pipeline:1.15.0"
String singularity = "docker://encodedcc/hic-pipeline:1.15.0"
}

RuntimeEnvironment runtime_environment = {
Expand Down
14 changes: 7 additions & 7 deletions hic.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ struct RuntimeEnvironment {

workflow hic {
meta {
version: "1.14.3"
caper_docker: "encodedcc/hic-pipeline:1.14.3"
caper_singularity: "docker://encodedcc/hic-pipeline:1.14.3"
version: "1.15.0"
caper_docker: "encodedcc/hic-pipeline:1.15.0"
caper_singularity: "docker://encodedcc/hic-pipeline:1.15.0"
croo_out_def: "https://raw.githubusercontent.com/ENCODE-DCC/hic-pipeline/dev/croo_out_def.json"
description: "ENCODE Hi-C pipeline, see https://github.com/ENCODE-DCC/hic-pipeline for details."
}
Expand Down Expand Up @@ -77,10 +77,10 @@ workflow hic {
Int? create_accessibility_track_disk_size_gb
String assembly_name = "undefined"

String docker = "encodedcc/hic-pipeline:1.14.3"
String singularity = "docker://encodedcc/hic-pipeline:1.14.3"
String delta_docker = "encodedcc/hic-pipeline:1.14.3_delta"
String hiccups_docker = "encodedcc/hic-pipeline:1.14.3_hiccups"
String docker = "encodedcc/hic-pipeline:1.15.0"
String singularity = "docker://encodedcc/hic-pipeline:1.15.0"
String delta_docker = "encodedcc/hic-pipeline:1.15.0_delta"
String hiccups_docker = "encodedcc/hic-pipeline:1.15.0_hiccups"
}

RuntimeEnvironment runtime_environment = {
Expand Down
154 changes: 77 additions & 77 deletions megamap.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,15 @@ import "./hic.wdl"

workflow megamap {
meta {
version: "1.14.3"
caper_docker: "encodedcc/hic-pipeline:1.14.3"
caper_singularity: "docker://encodedcc/hic-pipeline:1.14.3"
version: "1.15.0"
caper_docker: "encodedcc/hic-pipeline:1.15.0"
caper_singularity: "docker://encodedcc/hic-pipeline:1.15.0"
}

input {
Array[File] bams
Array[File] bigwig_files
Array[File] hic_files
File? restriction_sites
File chrom_sizes
String assembly_name = "undefined"

# Parameters
Int quality = 30
Expand All @@ -23,21 +21,15 @@ workflow megamap {
Boolean intact = true

# Resource parameters
Int? create_hic_num_cpus
Int? create_hic_ram_gb
Int? create_hic_juicer_tools_heap_size_gb
Int? create_hic_disk_size_gb
Int? add_norm_num_cpus
Int? add_norm_ram_gb
Int? add_norm_disk_size_gb
Int? create_accessibility_track_ram_gb
Int? create_accessibility_track_disk_size_gb

# Pipeline images
String docker = "encodedcc/hic-pipeline:1.14.3"
String singularity = "docker://encodedcc/hic-pipeline:1.14.3"
String delta_docker = "encodedcc/hic-pipeline:1.14.3_delta"
String hiccups_docker = "encodedcc/hic-pipeline:1.14.3_hiccups"
String docker = "encodedcc/hic-pipeline:1.15.0"
String singularity = "docker://encodedcc/hic-pipeline:1.15.0"
String delta_docker = "encodedcc/hic-pipeline:1.15.0_delta"
String hiccups_docker = "encodedcc/hic-pipeline:1.15.0_hiccups"
}

RuntimeEnvironment runtime_environment = {
Expand All @@ -59,27 +51,10 @@ workflow megamap {
Array[Int] delta_resolutions = if intact then [5000, 2000, 1000] else [5000, 10000]
Array[Int] create_hic_resolutions = if intact then create_hic_intact_resolutions else create_hic_in_situ_resolutions

call hic.normalize_assembly_name as normalize_assembly_name { input:
assembly_name = assembly_name,
runtime_environment = runtime_environment,
}
call hic.merge as merge { input:
bams = bams,
runtime_environment = runtime_environment,
}

call hic.bam_to_pre as bam_to_pre { input:
bam = merge.bam,
quality = quality,
runtime_environment = runtime_environment,
}
call hic.create_accessibility_track as accessibility { input:
pre = bam_to_pre.pre,
call merge_bigwigs as accessibility { input:
bigwig_files = bigwig_files,
chrom_sizes = chrom_sizes,
ram_gb = create_accessibility_track_ram_gb,
disk_size_gb = create_accessibility_track_disk_size_gb,
runtime_environment = runtime_environment,
}
Expand All @@ -88,51 +63,13 @@ workflow megamap {
runtime_environment = runtime_environment,
}
if (normalize_assembly_name.assembly_is_supported) {
call hic.create_hic as create_hic { input:
pre = bam_to_pre.pre,
pre_index = bam_to_pre.index,
restriction_sites = restriction_sites,
quality = quality,
stats = merge_stats_from_hic_files.merged_stats,
stats_hists = merge_stats_from_hic_files.merged_stats_hists,
resolutions = create_hic_resolutions,
assembly_name = normalize_assembly_name.normalized_assembly_name,
num_cpus = create_hic_num_cpus,
ram_gb = create_hic_ram_gb,
juicer_tools_heap_size_gb = create_hic_juicer_tools_heap_size_gb,
disk_size_gb = create_hic_disk_size_gb,
runtime_environment = runtime_environment,
}
}
if (!normalize_assembly_name.assembly_is_supported) {
call hic.create_hic as create_hic_with_chrom_sizes { input:
pre = bam_to_pre.pre,
pre_index = bam_to_pre.index,
restriction_sites = restriction_sites,
quality = quality,
stats = merge_stats_from_hic_files.merged_stats,
stats_hists = merge_stats_from_hic_files.merged_stats_hists,
resolutions = create_hic_resolutions,
assembly_name = assembly_name,
num_cpus = create_hic_num_cpus,
ram_gb = create_hic_ram_gb,
juicer_tools_heap_size_gb = create_hic_juicer_tools_heap_size_gb,
disk_size_gb = create_hic_disk_size_gb,
chrsz = chrom_sizes,
runtime_environment = runtime_environment,
}
call sum_hic_files { input:
hic_files = hic_files,
runtime_environment = runtime_environment,
}
File unnormalized_hic_file = select_first([
if (defined(create_hic.output_hic))
then create_hic.output_hic
else create_hic_with_chrom_sizes.output_hic
])
call hic.add_norm as add_norm { input:
hic = unnormalized_hic_file,
hic = sum_hic_files.summed_hic,
quality = quality,
num_cpus = add_norm_num_cpus,
ram_gb = add_norm_ram_gb,
Expand Down Expand Up @@ -251,3 +188,66 @@ task merge_stats_from_hic_files {
singularity: runtime_environment.singularity
}
}
task merge_bigwigs {
input {
Array[File] bigwig_files
File chrom_sizes
RuntimeEnvironment runtime_environment
}
command <<<
bigWigMerge \
~{sep=" " bigwig_files} \
combined.bedGraph
sort -k1,1 -k2,2n combined.bedGraph > combined.sorted.bedGraph
bedGraphToBigWig \
combined.sorted.bedGraph \
~{chrom_sizes} \
merged.bw
>>>
output {
File merged_bigwig = "merged.bw"
}

runtime {
cpu : 4
memory: "32 GB"
disks: "local-disk 500 HDD"
docker: runtime_environment.docker
singularity: runtime_environment.singularity
}
}
task sum_hic_files {
input {
Array[File] hic_files
Int num_cpus = 16
Int ram_gb = 100
RuntimeEnvironment runtime_environment
}
command <<<
set -euo pipefail
java \
-jar \
/opt/juicer/CPU/juicer_tools_2.16.00.jar \
sum \
--threads ~{num_cpus} \
summed.hic \
~{sep=" " hic_files}
>>>
output {
File summed_hic = "summed.hic"
}

runtime {
cpu : "~{num_cpus}"
disks: "local-disk 500 HDD"
memory : "~{ram_gb} GB"
docker: runtime_environment.docker
singularity: runtime_environment.singularity
}
}

0 comments on commit 0cd51d2

Please sign in to comment.