-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define JSON-Schema for the OBO YAML metadata format #663
Comments
|
We use kwalify for the PURL system. I was initially quite happy with it, but this nasty issue has undermined my confidence: OBOFoundry/purl.obolibrary.org#290 |
@jamesaoverton If I'm reading correctly, I think there may be an easy workaround (OBOFoundry/purl.obolibrary.org#290 (comment)) |
Hi @cmungall - I started a first-draft of the actual JSON-schema based on @rvita's initial work on the ontology metadata. I wrote a quick python script to implement it, but I wasn't sure how you wanted to configure this with the other tests that are already being run on Travis. Also, I wanted to get some feedback on validation of the |
I think using a python lib for the json schema validation is a good idea. We could change the license field to take a URI rather than object as parameter. Some of the initial modeling decisions were driven by what made things easy in the UI within the jekyll framework. OBOFoundry.github.io/_layouts/ontology_detail.html Lines 125 to 143 in 27c66d3
OBOFoundry.github.io/_includes/ontology_table.html Lines 25 to 27 in d774c1e
But this was probably a poor decision. We could make a onetime change, and synchronize this with a UI change in the jekyll templates. To avoid repetitive code we could have a yaml object that has a lookup between URIs and shortlabels and logos in _config_header.yml and use that in the code above. |
I think that There are some exceptions that we're discussing and have to make some decisions about. They will need URIs and labels for now. |
So it looks like we are baking in the license choices into the json schema. I can see some advantages to this. This feels like we are conflating syntactic/structural schema conformance with principle conformance. Some reasons we may want to separate:
is it possible to include simple logic in a json-schema? I would be in favor of restricting licenses at the schema level if it were possible to say something like This also goes for other kinds of principle conformance checks - e.g. checking the code has documented users. Also I just remembered that over a year ago I wrote this: To address the auto-review proposed in #288 This is largely supplanted by the json-schema in #710. However, it also checks the usage field and reports if the ontology has no documented usages. I feel this is something we want to check at the principle-conformance level rather than structure-conformance level, so some additional dumb python on top of the schema checking may be useful |
1 similar comment
So it looks like we are baking in the license choices into the json schema. I can see some advantages to this. This feels like we are conflating syntactic/structural schema conformance with principle conformance. Some reasons we may want to separate:
is it possible to include simple logic in a json-schema? I would be in favor of restricting licenses at the schema level if it were possible to say something like This also goes for other kinds of principle conformance checks - e.g. checking the code has documented users. Also I just remembered that over a year ago I wrote this: To address the auto-review proposed in #288 This is largely supplanted by the json-schema in #710. However, it also checks the usage field and reports if the ontology has no documented usages. I feel this is something we want to check at the principle-conformance level rather than structure-conformance level, so some additional dumb python on top of the schema checking may be useful |
I'm not set on the licenses in the schema - I like James's idea of enforcing what labels are accepted, though. JSON schema includes if/then statements, so we could do something like that. I like that this solves the issue of non-conforming ontologies, but going forwards enforcing the CC-BY/CC-0 license (as James said, enforce the labels accepted). |
This may be of interest: |
As we've been reviewing how this test is working out, we realized that there are some ontologies that may not necessarily conform to all the checks, but we want to "grandfather" in. One solution to this is to have different validation levels, as a tag in the metadata like With the numbers, it would also be easy to add in increasingly strict validation by adding another level. The full validation would be the current schema - plus a required This is kind of similar to what @cmungall was saying with the |
Do we plan to extend the schema files to cover the entire contents on the yaml? Currently there are mini-schemas for a subset of fields which is a great start, but bad metadata can still sneak through, e.g. #1107. I can help with this just checking this was the plan. |
My assumption was that our metadata is open-ended/open-world, like most semantic web stuff. So certain fields must conform to the schema, but we ignore stuff that we don't recognize. Only recognized fields will end up on the HTML pages, but I guess everything is in the RDF/Turtle/JSONLD versions. It should be easy enough to enforce a whitelist of fields, if we want. There are advantages to that. |
For the obo managed metadata I think being stricter is better. It's easy
for us to extend the schema if we need new fields
FWIW, on other projects we have been moving towards stricter closed shex
schemas as the open ended nature bites us as we try and build robust
software around our sem web infrastructure.. but ymmv
…On Tue, Feb 4, 2020, 06:42 James A. Overton ***@***.***> wrote:
My assumption was that our metadata is open-ended/open-world, like most
semantic web stuff. So certain fields must conform to the schema, but we
ignore stuff that we don't recognize. Only recognized fields will end up on
the HTML pages, but I guess everything is in the RDF/Turtle/JSONLD versions.
It should be easy enough to enforce a whitelist of fields, if we want.
There are advantages to that.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#663?email_source=notifications&email_token=AAAMMOIQSOP7S7CJX7VOBBTRBF5FXA5CNFSM4FIAG7LKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKX3UYI#issuecomment-581941857>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMOOPP5V4VK33PB5XVFTRBF5FXANCNFSM4FIAG7LA>
.
|
Ok, that's fine with me. |
On #789 @althonos noticed that I dropped the |
@althonos also pointed me to his Rust library for working with OBO registry entries. Hey @althonos, is this a complete list of all the fields used across all OBO registry entries? https://github.com/althonos/obofoundry.rs/blob/master/src/lib.rs#L140 |
Somewhat related to validation: I'd kinda like to lint the registry files for consistent formatting and key order. |
Yes, I extracted that as-is from the
I have a CI setup which reruns on http://www.obofoundry.org/registry/ontologies.jsonld everyday, so that I can experimentally check if something changed in the schema, but until #789 it was quite stable. |
Thanks @althonos, that's useful information. I just regenerated the |
Adding to my comment above #663 (comment) and #1126 (comment) I'd like the JSON schema files to include some sort of 'applies_to' field that will let us figure out when that schema applies to an ontology. I'm torn between forcing a simple an ordered list (foundry, active, inactive, orphaned, obsolete) and allowing some sort of mix-and-match. This is connected to long-term plans about using the dashboard information to classify ontologies. |
My inclination would be keep foundry status as orthogonal to the other
categories.
…On Fri, Feb 28, 2020 at 6:47 AM James A. Overton ***@***.***> wrote:
Adding to my comment above #663 (comment)
<#663 (comment)>
and #1126 (comment)
<#1126 (comment)>
I'd like the JSON schema files to include some sort of 'applies_to' field
that will let us figure out when that schema applies to an ontology.
I'm torn between forcing a simple an ordered list (foundry, active,
inactive, orphaned, obsolete) and allowing some sort of mix-and-match. This
is connected to long-term plans about using the dashboard information to
classify ontologies.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#663?email_source=notifications&email_token=AAAMMOPC5RNZZ4MZBWS6RKTRFEPZRA5CNFSM4FIAG7LKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENIYLSQ#issuecomment-592545226>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMOIWZNZZUMANPG7C2H3RFEPZRANCNFSM4FIAG7LA>
.
|
just wondering if there is any change in status here |
I feel the Essence of this ticket has been achieved and new issues should get new tickets. Closing this now. |
We should use JSON-Schema to define what is permissible for the yaml metadata files. We would run a schema validator in Travis-CI. This may still be augmented by procedural checks (e.g. check the tracker URL is not a 404) but the core structure can be verified.
If someone wants to speak up for any alternatives:
The text was updated successfully, but these errors were encountered: