Skip to content
Damion Dooley edited this page Nov 16, 2017 · 16 revisions

The Genomic Epidemiology Entity Mart (GEEM) is a portal for examining and downloading ontology-driven specifications for standardized data components. The portal aims to provide term reviewers and software developers with ways to utilize application ontology contents (a collection of terms and relations from other ontologies that combine to model the operation of some domain) without the need to be trained in ontology curation or querying. Ontology-driven standards benefit from features of open-source published OWL 2.0 ontologies such as globally unique identifiers for terms, multilingual label and definition functionality, and logical validation and reasoning over controlled vocabularies. Such a specification can be designed to satisfy the requirements of an environmental pH measurement, or a person's age, or a more structured entity like a contact address, or a genomic sequence repository submission for example.

Underlying this design is the idea that ontology vocabularies should be able to support a star network hub to connect domain-specific data silos. Rather than entertain peer-to-peer data conversion projects, a silo curator can develop a converter for the hub vocabulary. GEEM works with http://OBOFoundry.org/ family of ontologies expressed in OWL 2.0, enabling an ontology curator to create a specification for each set of numeric, categorical, or ordinal fields they wish to share in a more structured way. In an application ontology these specifications are organized under the Ontology For Biomedical Investigation (OBI) "data representational model" class. Popular open source tools like Stanford's Protege can be used to curate these specifications along with the ontologies they are composed of.

A pragmatic use-case for developing specifications is to begin with standards for data collection submissions. Our test cases so far involve a variety of genomic sequence curation and submission standards described in the Genomic Epidemiology Ontology (GenEpiO).

Ontology development introduces extra complexity over and above regular data dictionaries and object oriented design schemas. GEEM aims to show the benefits that an ontology approach delivers via tangible web forms and downloadable specifications in JSON, YAML and soon Microsoft Excel format. However, there is much work to do to standardize broader domain vocabularies and relations (not to mention secure access) in order for the ultimate vision of global data silo integration and seamless querying to be fulfilled.

Clone this wiki locally