Skip to content

Commit 158c624

Browse files
authored
Add a Neo4j Chunk reader (#135)
* Adds a neo4j chunk reader One e2e test is failing, that's normal for now * Update * Add example * Merge and fix * Add more end-to-end examples * Fix tests * Update changelog and doc * Merge * Use constants everywhere for consistency * Use the dynamic properties from the LexicalGraphConfig everywhere * Cleaning * Minor fixes in tests * Improve description * ruff
1 parent 99bf50e commit 158c624

File tree

14 files changed

+821
-6
lines changed

14 files changed

+821
-6
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
- Made `relations` and `potential_schema` optional in `SchemaBuilder`.
77
- Added a check to prevent the use of deprecated Cypher syntax for Neo4j versions 5.23.0 and above.
88
- Added a `LexicalGraphBuilder` component to enable the import of the lexical graph (document, chunks) without performing entity and relation extraction.
9+
- Added a `Neo4jChunkReader` component to be able to read chunk text from the database.
910

1011
### Changed
1112
- Vector and Hybrid retrievers used with `return_properties` now also return the node labels (`nodeLabels`) and the node's element ID (`id`).

docs/source/api.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,14 @@ LexicalGraphBuilder
5858
:members:
5959
:exclude-members: component_inputs, component_outputs
6060

61+
62+
Neo4jChunkReader
63+
================
64+
65+
.. autoclass:: neo4j_graphrag.experimental.components.neo4j_reader.Neo4jChunkReader
66+
:members:
67+
:exclude-members: component_inputs, component_outputs
68+
6169
SchemaBuilder
6270
=============
6371

docs/source/user_guide_kg_builder.rst

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ unstructured data.
1616
Pipeline Structure
1717
******************
1818

19-
A Knowledge Graph (KG) construction pipeline requires a few components:
19+
A Knowledge Graph (KG) construction pipeline requires a few components (some of the below components are optional):
2020

2121
- **Document parser**: extract text from files (PDFs, ...).
2222
- **Document chunker**: split the text into smaller pieces of text, manageable by the LLM context window (token limit).
@@ -205,6 +205,47 @@ Example usage:
205205
See :ref:`kg-writer-section` to learn how to write the resulting nodes and relationships to Neo4j.
206206

207207

208+
Neo4j Chunk Reader
209+
==================
210+
211+
The Neo4j chunk reader component is used to read text chunks from Neo4j. Text chunks can be created
212+
by the lexical graph builder or another process.
213+
214+
.. code:: python
215+
216+
import neo4j
217+
from neo4j_graphrag.experimental.components.neo4j_reader import Neo4jChunkReader
218+
from neo4j_graphrag.experimental.components.types import LexicalGraphConfig
219+
220+
reader = Neo4jChunkReader(driver)
221+
result = await reader.run()
222+
223+
224+
Configure node labels and relationship types
225+
---------------------------------------------
226+
227+
Optionally, the document and chunk node labels can be configured using a `LexicalGraphConfig` object:
228+
229+
.. code:: python
230+
231+
from neo4j_graphrag.experimental.components.neo4j_reader import Neo4jChunkReader
232+
from neo4j_graphrag.experimental.components.types import LexicalGraphConfig, TextChunks
233+
234+
# optionally, define a LexicalGraphConfig object
235+
# shown below with the default values
236+
config = LexicalGraphConfig(
237+
id_prefix="", # used to prefix the chunk and document IDs
238+
chunk_node_label="Chunk",
239+
document_node_label="Document",
240+
chunk_to_document_relationship_type="PART_OF_DOCUMENT",
241+
next_chunk_relationship_type="NEXT_CHUNK",
242+
node_to_chunk_relationship_type="PART_OF_CHUNK",
243+
chunk_embedding_property="embeddings",
244+
)
245+
reader = Neo4jChunkReader(driver)
246+
result = await reader.run(lexical_graph_config=config)
247+
248+
208249
Schema Builder
209250
==============
210251

examples/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,8 @@ are listed in [the last section of this file](#customize).
9292
- [End to end example with explicit components and text input](./customize/build_graph/pipeline/kg_builder_from_text.py)
9393
- [End to end example with explicit components and PDF input](./customize/build_graph/pipeline/kg_builder_from_pdf.py)
9494
- [Process multiple documents](./customize/build_graph/pipeline/kg_builder_two_documents_entity_resolution.py)
95+
- [Export lexical graph creation into another pipeline](./customize/build_graph/pipeline/text_to_lexical_graph_to_entity_graph_two_pipelines.py)
96+
9597

9698
#### Components
9799

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
import asyncio
2+
3+
import neo4j
4+
from neo4j_graphrag.experimental.components.neo4j_reader import Neo4jChunkReader
5+
from neo4j_graphrag.experimental.components.types import LexicalGraphConfig, TextChunks
6+
7+
8+
async def main(driver: neo4j.Driver) -> TextChunks:
9+
config = LexicalGraphConfig( # only needed to overwrite the default values
10+
chunk_node_label="TextPart",
11+
)
12+
reader = Neo4jChunkReader(driver)
13+
result = await reader.run(lexical_graph_config=config)
14+
return result
15+
16+
17+
if __name__ == "__main__":
18+
with neo4j.GraphDatabase.driver(
19+
"bolt://localhost:7687", auth=("neo4j", "password")
20+
) as driver:
21+
print(asyncio.run(main(driver)))

examples/customize/build_graph/components/lexical_graph_builder/lexical_graph_builder.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@
1010

1111

1212
async def main() -> GraphResult:
13-
""" """
1413
# optionally, define a LexicalGraphConfig object
1514
# shown below with default values
1615
config = LexicalGraphConfig(
Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
"""In this example, we set up a single pipeline with two Neo4j writers:
2+
one for creating the lexical graph (Document and Chunks)
3+
and another for creating the entity graph (entities and relations derived from the text).
4+
"""
5+
6+
from __future__ import annotations
7+
8+
import asyncio
9+
10+
import neo4j
11+
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings
12+
from neo4j_graphrag.experimental.components.embedder import TextChunkEmbedder
13+
from neo4j_graphrag.experimental.components.entity_relation_extractor import (
14+
LLMEntityRelationExtractor,
15+
)
16+
from neo4j_graphrag.experimental.components.kg_writer import Neo4jWriter
17+
from neo4j_graphrag.experimental.components.lexical_graph import LexicalGraphBuilder
18+
from neo4j_graphrag.experimental.components.schema import (
19+
SchemaBuilder,
20+
SchemaEntity,
21+
SchemaProperty,
22+
SchemaRelation,
23+
)
24+
from neo4j_graphrag.experimental.components.text_splitters.fixed_size_splitter import (
25+
FixedSizeSplitter,
26+
)
27+
from neo4j_graphrag.experimental.components.types import LexicalGraphConfig
28+
from neo4j_graphrag.experimental.pipeline import Pipeline
29+
from neo4j_graphrag.experimental.pipeline.pipeline import PipelineResult
30+
from neo4j_graphrag.llm import LLMInterface, OpenAILLM
31+
32+
33+
async def define_and_run_pipeline(
34+
neo4j_driver: neo4j.Driver,
35+
llm: LLMInterface,
36+
lexical_graph_config: LexicalGraphConfig,
37+
text: str,
38+
) -> PipelineResult:
39+
"""Define and run the pipeline with the following components:
40+
41+
- Text Splitter: to split the text into manageable chunks of fixed size
42+
- Chunk Embedder: to embed the chunks' text
43+
- Lexical Graph Builder: to build the lexical graph, ie creating the chunk nodes and relationships between them
44+
- LG KG writer: save the lexical graph to Neo4j
45+
46+
- Schema Builder: this component takes a list of entities, relationships and
47+
possible triplets as inputs, validate them and return a schema ready to use
48+
for the rest of the pipeline
49+
- LLM Entity Relation Extractor is an LLM-based entity and relation extractor:
50+
based on the provided schema, the LLM will do its best to identity these
51+
entities and their relations within the provided text
52+
- EG KG writer: once entities and relations are extracted, they can be writen
53+
to a Neo4j database
54+
55+
"""
56+
pipe = Pipeline()
57+
# define the components
58+
pipe.add_component(
59+
FixedSizeSplitter(chunk_size=200, chunk_overlap=50),
60+
"splitter",
61+
)
62+
pipe.add_component(TextChunkEmbedder(embedder=OpenAIEmbeddings()), "chunk_embedder")
63+
pipe.add_component(
64+
LexicalGraphBuilder(lexical_graph_config),
65+
"lexical_graph_builder",
66+
)
67+
pipe.add_component(Neo4jWriter(neo4j_driver), "lg_writer")
68+
pipe.add_component(SchemaBuilder(), "schema")
69+
pipe.add_component(
70+
LLMEntityRelationExtractor(
71+
llm=llm,
72+
create_lexical_graph=False,
73+
),
74+
"extractor",
75+
)
76+
pipe.add_component(Neo4jWriter(neo4j_driver), "eg_writer")
77+
# define the execution order of component
78+
# and how the output of previous components must be used
79+
pipe.connect("splitter", "chunk_embedder", input_config={"text_chunks": "splitter"})
80+
pipe.connect(
81+
"chunk_embedder",
82+
"lexical_graph_builder",
83+
input_config={"text_chunks": "chunk_embedder"},
84+
)
85+
pipe.connect(
86+
"lexical_graph_builder",
87+
"lg_writer",
88+
input_config={
89+
"graph": "lexical_graph_builder.graph",
90+
"lexical_graph_config": "lexical_graph_builder.config",
91+
},
92+
)
93+
# define the execution order of component
94+
# and how the output of previous components must be used
95+
pipe.connect(
96+
"chunk_embedder", "extractor", input_config={"chunks": "chunk_embedder"}
97+
)
98+
pipe.connect("schema", "extractor", input_config={"schema": "schema"})
99+
pipe.connect(
100+
"extractor",
101+
"eg_writer",
102+
input_config={"graph": "extractor"},
103+
)
104+
# make sure the lexical graph is created before creating the entity graph:
105+
pipe.connect("lg_writer", "eg_writer", {})
106+
# user input:
107+
# the initial text
108+
# and the list of entities and relations we are looking for
109+
pipe_inputs = {
110+
"splitter": {
111+
"text": text,
112+
},
113+
"lexical_graph_builder": {
114+
"document_info": {
115+
# 'path' can be anything
116+
"path": "example/lexical_graph_from_text.py"
117+
},
118+
},
119+
"schema": {
120+
"entities": [
121+
SchemaEntity(
122+
label="Person",
123+
properties=[
124+
SchemaProperty(name="name", type="STRING"),
125+
SchemaProperty(name="place_of_birth", type="STRING"),
126+
SchemaProperty(name="date_of_birth", type="DATE"),
127+
],
128+
),
129+
SchemaEntity(
130+
label="Organization",
131+
properties=[
132+
SchemaProperty(name="name", type="STRING"),
133+
SchemaProperty(name="country", type="STRING"),
134+
],
135+
),
136+
SchemaEntity(
137+
label="Field",
138+
properties=[
139+
SchemaProperty(name="name", type="STRING"),
140+
],
141+
),
142+
],
143+
"relations": [
144+
SchemaRelation(
145+
label="WORKED_ON",
146+
),
147+
SchemaRelation(
148+
label="WORKED_FOR",
149+
),
150+
],
151+
"potential_schema": [
152+
("Person", "WORKED_ON", "Field"),
153+
("Person", "WORKED_FOR", "Organization"),
154+
],
155+
},
156+
"extractor": {
157+
"lexical_graph_config": lexical_graph_config,
158+
},
159+
}
160+
# run the pipeline
161+
return await pipe.run(pipe_inputs)
162+
163+
164+
async def main(driver: neo4j.Driver) -> PipelineResult:
165+
# optional: define some custom node labels for the lexical graph:
166+
lexical_graph_config = LexicalGraphConfig(
167+
id_prefix="example",
168+
chunk_node_label="TextPart",
169+
document_node_label="Text",
170+
)
171+
text = """Albert Einstein was a German physicist born in 1879 who
172+
wrote many groundbreaking papers especially about general relativity
173+
and quantum mechanics. He worked for many different institutions, including
174+
the University of Bern in Switzerland and the University of Oxford."""
175+
llm = OpenAILLM(
176+
model_name="gpt-4o",
177+
model_params={
178+
"max_tokens": 1000,
179+
"response_format": {"type": "json_object"},
180+
},
181+
)
182+
res = await define_and_run_pipeline(
183+
driver,
184+
llm,
185+
lexical_graph_config,
186+
text,
187+
)
188+
await llm.async_client.close()
189+
return res
190+
191+
192+
if __name__ == "__main__":
193+
with neo4j.GraphDatabase.driver(
194+
"bolt://localhost:7687", auth=("neo4j", "password")
195+
) as driver:
196+
print(asyncio.run(main(driver)))

0 commit comments

Comments
 (0)