You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A lot of processes have file manipluation being performed via bash scripts, which leads to tmp files being created. Would be more efficient and scalable if it was wrapped in a python script:
e.g., APPEND_CLUSTERS() which has a script block of:
script:
""" # Function to get the first address line from the files, handling gzipped files get_address() { if [[ "\${1##*.}" == "gz" ]]; then
zcat "\$1"| awk 'NR>1 {print \$2}'| head -n 1
else
awk 'NR>1 {print \$2}'"\$1"| head -n 1
fi
}
# Check if two files have consistent delimeter splits in the address column
init_splits=\$(get_address "${initial_clusters}"| awk -F '${params.gm_delimiter}''{print NF}')
add_splits=\$(get_address "${additional_clusters}"| awk -F '${params.gm_delimiter}''{print NF}')
if [ "\$init_splits"!="\$add_splits" ];thenecho"Error: Address levels do not match between initial_clusters and --db_clusters."exit 1
fi# Add a "source" column to differentiate the reference profiles and additional profiles
csvtk mutate2 -t -n source -e " 'ref' "${initial_clusters}> reference_clusters_source.tsv
csvtk mutate2 -t -n source -e " 'db' "${additional_clusters}> additional_clusters_source.tsv
# Combine profiles from both the reference and database into a single file
csvtk concat -t reference_clusters_source.tsv additional_clusters_source.tsv | csvtk sort -t -k id > combined_profiles.tsv
# Calculate the frequency of each sample_id across both sources
csvtk freq -t -f id combined_profiles.tsv > sample_counts.tsv
# For any sample_id that appears in both the reference and database, add a 'db_' prefix to the sample_id from the database
csvtk join -t -f id combined_profiles.tsv sample_counts.tsv | \
csvtk mutate2 -t -n id -e '(\$source == "db" && \$frequency > 1) ? "db_" + \$id : \$id'| \
csvtk cut -t -f id,address > reference_clusters.tsv
"""}
The text was updated successfully, but these errors were encountered:
A lot of processes have file manipluation being performed via bash scripts, which leads to tmp files being created. Would be more efficient and scalable if it was wrapped in a python script:
In the current release of gasnomenclature 0.3.0
e.g.,
APPEND_CLUSTERS()
which has a script block of:The text was updated successfully, but these errors were encountered: