Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding gzip and VCF to the schema #89

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from
Open

Adding gzip and VCF to the schema #89

wants to merge 6 commits into from

Conversation

stschiff
Copy link
Member

@stschiff stschiff commented Jan 16, 2025

@nevrome I added the gzip-option to the README.md. How can I update the pdf? Can we perhaps add a little documentation for that? I think it's just some quarto command, right?

@TCLamnidis
Copy link
Member

it's gzipping only allowed for geno and snp, but not ind?

i know fam files are not that large to need it, but "in for a penny in for a pound"?

@stschiff
Copy link
Member Author

Yes, of course we can make this work as well. Right now, our implementation only supports gzip for those files that go through stream-processing, which is the SNP and Geno files and the VCF. I will create an issue on sequence-formats to allow zip there as well.

@nevrome nevrome changed the base branch from master to dev January 17, 2025 12:15
@nevrome
Copy link
Member

nevrome commented Jan 17, 2025

About the .pdf building. I think you just have to render this quarto document: https://github.com/poseidon-framework/poseidon-schema/blob/master/toPDFviaQuarto.qmd

@stschiff
Copy link
Member Author

stschiff commented Jan 20, 2025

OK, I just tried that, but unfortunately I cannot run quarto render here because of failing R dependencies (specifically package systemfonts is required and fails to install on both my Intel-iMac and my ARM64-Macbook).

@stschiff stschiff self-assigned this Jan 21, 2025
@TCLamnidis
Copy link
Member

I just pushed the rendered PDF

@stschiff
Copy link
Member Author

Thanks. I've just made two minor changes, and I decided that I would like add also VCF support now to this Schema update. I have a PR to update trident for VCF writing support in the queue, so I think I will just keep this PR open until I've made that change, too. I will ping you, @TCLamnidis for re-generating the PDF again later.

@stschiff stschiff changed the title added gzip to the schema Adding gzip and VCF to the schema Jan 31, 2025
@stschiff
Copy link
Member Author

stschiff commented Feb 5, 2025

OK, I've added VCF. Could you please re-render the PDF, @TCLamnidis?

@stschiff stschiff marked this pull request as draft February 5, 2025 20:22
@TCLamnidis
Copy link
Member

@stschiff Done!

@stschiff stschiff marked this pull request as ready for review February 10, 2025 15:09
Copy link
Member

@nevrome nevrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I wonder if we should add some more details about the respective file types. trident has this --inPlinkPopName flag, because .fam files can differ slightly. .bim files can have .s for the bases, which trident does not support (?). And for .vcf files trident only supports a certain subset of features, right? Maybe we should document/specify/enshrine some of these limitations here in the schema?

  2. I suggest we don't render the .pdf version in these feature branches. We can do that when everything is collected in the dev branch 👍

@stschiff
Copy link
Member Author

Yes, I guess indeed we could say a bit more about these formats. Not entirely trivial to document, though, as there are endless possibilities within these formats and I don't know in all cases what our limits are. I'll try to write something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants