pycottas is a library for working with compressed RDF files in the COTTAS format. COTTAS stores triples as a triple table in Apache Parquet. It is built on top of DuckDB and provides an HDT-like interface.
- Compression and decompression of RDF files.
- Querying COTTAS files with triple patterns.
- RDFLib backend for querying COTTAS files with SPARQL.
- Supports RDF datasets (quads).
- Can be used as a library or via command line.
PyPI is the fastest way to install pycottas:
pip install pycottas
We recommend to use virtual environments to install pycottas.
import pycottas
from rdflib import Graph, URIRef
pycottas.rdf2cottas('my_file.ttl', 'my_file.cottas', index='spo')
res = pycottas.search('my_file.cottas', '?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o')
print(res)
pycottas.cottas2rdf('my_file.cottas', 'my_file.nt')
# COTTASDocument class for querying with triple patterns
cottas_doc = pycottas.COTTASDocument('my_file.cottas')
# It is possible to create a document from multiple COTTAS files matching a glob pattern
cottas_doc = pycottas.COTTASDocument('test/*.cottas')
# the triple pattern can be a string or a tuple
res = cottas_doc.search('?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o')
# limit and offset are optional
res = cottas_doc.search((None, URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), None), limit=10, offset=20)
print(res)
# COTTASStore class for querying with SPARQL
graph = Graph(store=pycottas.COTTASStore("my_file.cottas"))
res = graph.query("""
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?s ?o WHERE {
?s rdf:type ?o .
} LIMIT 10""")
for row in res:
print(row)
To execute via command line check the docs.
pycottas is available under the Apache License 2.0.