You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CocoIndex is the world's first open-source engine that supports both custom transformation logic and incremental updates specialized for data indexing.
20
+
**CocoIndex** is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.
The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.
27
+
28
+
## Dataflow programming
29
+
Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of [Dataflow](https://en.wikipedia.org/wiki/Dataflow_programming) programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.
30
+
31
+
**Particularly**, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.
32
+
33
+
```python
34
+
# import
35
+
data['content'] = flow_builder.add_source(...)
36
+
37
+
# transform
38
+
data['out'] = data['content']
39
+
.transform(...)
40
+
.transform(...)
41
+
42
+
# collect data
43
+
collector.collect(...)
44
+
45
+
# export to db, vector db, graph db ...
46
+
collector.export(...)
47
+
```
48
+
49
+
## Data Freshness
50
+
As a data framework, CocoIndex takes it to the next level on data freshness. **Incremental processing** is one of the core values provided by CocoIndex.
With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.
55
+
56
+
The frameworks takes care of
57
+
- Change data capture.
58
+
- Figure out what exactly needs to be updated, and only updating that without having to recompute everything.
59
+
60
+
This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.
29
61
30
62
31
63
## Quick Start:
32
-
If you're new to CocoIndex 🤗, we recommend checking out the 📖 [Documentation](https://cocoindex.io/docs) and ⚡ [Quick Start Guide](https://cocoindex.io/docs/getting_started/quickstart). We also have a ▶️ [quick start video tutorial](https://youtu.be/gv5R8nOXsWU?si=9ioeKYkMEnYevTXT) for you to jump start.
64
+
If you're new to CocoIndex, we recommend checking out
- 🎬 [Quick Start Video Tutorial](https://youtu.be/gv5R8nOXsWU?si=9ioeKYkMEnYevTXT)
33
68
34
69
### Setup
70
+
35
71
1. Install CocoIndex Python library
36
72
37
73
```bash
38
74
pip install -U cocoindex
39
75
```
40
76
41
-
2.Setup Postgres with pgvector extension; or bring up a Postgres database using docker compose:
77
+
2.[Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one. CocoIndex uses it for incremental processing.
42
78
43
-
- Make sure Docker Compose is installed: [docs](https://docs.docker.com/compose/install/)
44
-
- Start a Postgres SQL database for cocoindex using our docker compose config:
45
79
46
-
```bash
47
-
docker compose -f <(curl -L https://raw.githubusercontent.com/cocoindex-io/cocoindex/refs/heads/main/dev/postgres.yaml) up -d
48
-
```
80
+
### Define data flow
49
81
50
-
### Start your first indexing flow!
51
-
Follow [Quick Start Guide](https://cocoindex.io/docs/getting_started/quickstart) to define your first indexing flow.
52
-
A common indexing flow looks like:
82
+
Follow [Quick Start Guide](https://cocoindex.io/docs/getting_started/quickstart) to define your first indexing flow. An example flow looks like:
@@ -105,8 +136,9 @@ Go to the [examples directory](examples) to try out with any of the examples, fo
105
136
|[Docs to Knowledge Graph](examples/docs_to_knowledge_graph)| Extract relationships from Markdown documents and build a knowledge graph |
106
137
|[Embeddings to Qdrant](examples/text_embedding_qdrant)| Index documents in a Qdrant collection for semantic search |
107
138
|[FastAPI Server with Docker](examples/fastapi_server_docker)| Run the semantic search server in a Dockerized FastAPI setup |
139
+
|[Product_Taxonomy_Knowledge_Graph](examples/product_taxonomy_knowledge_graph)| Build knowledge graph for product recommendations |
108
140
109
-
More coming and stay tuned! If there's any specific examples you would like to see, please let us know in our [Discord community](https://discord.com/invite/zpA9S2DR7s) 🌱.
141
+
More coming and stay tuned 👀!
110
142
111
143
## 📖 Documentation
112
144
For detailed documentation, visit [CocoIndex Documentation](https://cocoindex.io/docs), including a [Quickstart guide](https://cocoindex.io/docs/getting_started/quickstart).
@@ -127,5 +159,8 @@ Join our community here:
127
159
- ▶️ [Subscribe to our YouTube channel](https://www.youtube.com/@cocoindex-io)
128
160
- 📜 [Read our blog posts](https://cocoindex.io/blogs/)
129
161
162
+
## Support us:
163
+
We are constantly improving, and more features and examples are coming soon. If you love this project, please give us a star ⭐ at GitHub repo [](https://github.com/cocoindex-io/cocoindex) to stay tuned and help us grow.
0 commit comments