Skip to content

Vertex Query Graph Filters

okram edited this page Apr 20, 2013 · 15 revisions

Blueprints maintains the notion of a VertexQuery (see details). In Blueprints, and the graph databases that provide a native implementation (e.g. Titan), a vertex’s edges can be filtered at the database level prior to being pulled into memory. If data is organized on disk in a manner that respects edge indexes/sorts, then this technique can drastically reduce traversal times by intelligently limiting the search space of a traverser (e.g. Gremlin).

In Faunus, the same VertexQuery construct exists. However, in the context of Faunus, it is used to filter the input graph to a subset of the full graph prior to pulling the data into Hadoop. For those graph sources that support push down predicates, this allows the graph source to only return the edges of the vertices that satisfy the contraints of the query. The Faunus graph configuration that specifies the vertex query constraint is faunus.graph.input.vertex-query-filter. A few examples are itemized below.

  • Only vertices and their properties (no edges): v.query().limit(0)
  • Only edges with a weight greater than 0.5: v.query().has('weight',0.5,Query.Compare.GREATER_THAN)
  • Only edges with label knows: v.query().labels('knows')
  • Only outgoing edges: v.query().direction(OUT)
  • Combinations of the above as specified by the VertexQuery API.

For those graph sources that do not support database level filtering, Faunus will process the vertex (dropping edges as specified by the VertexQuery) before inserting them into the <NullWritable,FaunusVertex> Faunus stream.

References

Bröcheler, M., Rodriguez, M.A., A Solution to the Supernode Problem, Aurelius Blog, 2012.

Clone this wiki locally