RDF in PostgresSQL

The Shared Meaning Platform currently stores some RDF intermediate data in PostgresSQL. We should review the specifics of how this works to ensure it makes some kind of sense.

History

The initial proof of concept used postgres to store all data, including RDF data in a table with one row per triple, for simplicity.

To reduce the number of rows, and for the moment match the exact format used by the api, each graph is now a JSON field with an array of all terms, in the format:

[
      {"subj": aSubject, "pred": aPredicate, "obj": anObj},
      ...
]

All values are strings, with no additional type markers or signifiers. This means IRIs and literals are not distinguished.

This is not a standard RDF serialisation format, and does not expose any structure that would be useful for queries.

Why not a graph database?

Graph databases aim to do two things well:

Store millions of terms
Resolve complex queries

We don't really have either problem yet. Our graphs are split by project and will remain quite small, and querying makes most sense to do client-side for the immediate future.

Considerations

The most common operation at present is loading the whole JSON document and sending it across the network. The only write operation replaces the entire graph. Multiple graphs are stored as separate documents.

PostgreSQL JSON Types describes shapes of JSON which enable pattern based querying which may become useful for some operations.

PostgresSQL offers json and jsonb type with slightly different characteristics. We currently use jsonb as the exact input text is unimportant for RDF.

The main JSON serialisation for RDF is JSON-LD which at the complex end of the scale, with local context binding, multiple representations for the same information, and extends the RDF data model.

N3.js Parsing supports multiple formats, but incremental loading isn't particularly important for JSON (it's hard to do, and browser engines already make it fast). The internal store uses objects keyed on subject and predicate for better querying.

Question

Would storing the JSON in a different shape make more sense right now?

It's easy enough to migrate the existing data, and make the client understand a new arrangement.

Given we're uninterested in blank nodes, an object keyed by subject IRI perhaps makes sense.

Alternatively maybe this JSON business is a bad idea and a canonical form of N-Triples is just better.

Future

At some point the right way to manage this content should be through git, per Distributed Collaboration on RDF Datasets Using Git, though keeping HEAD in the database for some client operations will likely continue to make sense.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

postgres_rdf_json_querying.md

postgres_rdf_json_querying.md

RDF in PostgresSQL

History

Why not a graph database?

Considerations

Question

Future

Files

postgres_rdf_json_querying.md

Latest commit

History

postgres_rdf_json_querying.md

File metadata and controls

RDF in PostgresSQL

History

Why not a graph database?

Considerations

Question

Future