Skip to content

Commit cbc1998

Browse files
update project description (#454)
1 parent 30b0356 commit cbc1998

File tree

1 file changed

+56
-21
lines changed

1 file changed

+56
-21
lines changed

README.md

Lines changed: 56 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -12,44 +12,74 @@
1212
[![PyPI version](https://img.shields.io/pypi/v/cocoindex?color=5B5BD6)](https://pypi.org/project/cocoindex/)
1313
[![PyPI - Downloads](https://img.shields.io/pypi/dm/cocoindex)](https://pypistats.org/packages/cocoindex)
1414

15-
<!-- [![Python](https://img.shields.io/badge/python-3.11%20to%203.13-5B5BD6?logo=python&logoColor=white)](https://www.python.org/) -->
1615
[![CI](https://github.com/cocoindex-io/cocoindex/actions/workflows/CI.yml/badge.svg?event=push&color=5B5BD6)](https://github.com/cocoindex-io/cocoindex/actions/workflows/CI.yml)
1716
[![release](https://github.com/cocoindex-io/cocoindex/actions/workflows/release.yml/badge.svg?event=push&color=5B5BD6)](https://github.com/cocoindex-io/cocoindex/actions/workflows/release.yml)
1817
[![Discord](https://img.shields.io/discord/1314801574169673738?logo=discord&color=5B5BD6&logoColor=white)](https://discord.com/invite/zpA9S2DR7s)
19-
<!--[![LinkedIn](https://img.shields.io/badge/LinkedIn-CocoIndex-5B5BD6?logo=linkedin&logoColor=white)](https://www.linkedin.com/company/cocoindex) -->
20-
<!--[![X (Twitter)](https://img.shields.io/twitter/follow/cocoindex_io)](https://twitter.com/intent/follow?screen_name=cocoindex_io) -->
21-
2218
</div>
2319

24-
CocoIndex is the world's first open-source engine that supports both custom transformation logic and incremental updates specialized for data indexing.
20+
**CocoIndex** is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.
21+
22+
<p align="center">
23+
<img src="https://cocoindex.io/images/cocoindex-features.png" alt="CocoIndex Features" width="500">
24+
</p>
25+
26+
The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.
27+
28+
## Dataflow programming
29+
Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of [Dataflow](https://en.wikipedia.org/wiki/Dataflow_programming) programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.
30+
31+
**Particularly**, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.
32+
33+
```python
34+
# import
35+
data['content'] = flow_builder.add_source(...)
36+
37+
# transform
38+
data['out'] = data['content']
39+
.transform(...)
40+
.transform(...)
41+
42+
# collect data
43+
collector.collect(...)
44+
45+
# export to db, vector db, graph db ...
46+
collector.export(...)
47+
```
48+
49+
## Data Freshness
50+
As a data framework, CocoIndex takes it to the next level on data freshness. **Incremental processing** is one of the core values provided by CocoIndex.
51+
2552
<p align="center">
26-
<img src="https://cocoindex.io/images/venn.svg" alt="CocoIndex">
53+
<img src="https://github.com/user-attachments/assets/f4eb29b3-84ee-4fa0-a1e2-80eedeeabde6" alt="Incremental Processing" width="700">
2754
</p>
28-
With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.
55+
56+
The frameworks takes care of
57+
- Change data capture.
58+
- Figure out what exactly needs to be updated, and only updating that without having to recompute everything.
59+
60+
This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.
2961

3062

3163
## Quick Start:
32-
If you're new to CocoIndex 🤗, we recommend checking out the 📖 [Documentation](https://cocoindex.io/docs) and ⚡ [Quick Start Guide](https://cocoindex.io/docs/getting_started/quickstart). We also have a ▶️ [quick start video tutorial](https://youtu.be/gv5R8nOXsWU?si=9ioeKYkMEnYevTXT) for you to jump start.
64+
If you're new to CocoIndex, we recommend checking out
65+
- 📖 [Documentation](https://cocoindex.io/docs)
66+
-[Quick Start Guide](https://cocoindex.io/docs/getting_started/quickstart)
67+
- 🎬 [Quick Start Video Tutorial](https://youtu.be/gv5R8nOXsWU?si=9ioeKYkMEnYevTXT)
3368

3469
### Setup
70+
3571
1. Install CocoIndex Python library
3672

3773
```bash
3874
pip install -U cocoindex
3975
```
4076

41-
2. Setup Postgres with pgvector extension; or bring up a Postgres database using docker compose:
77+
2. [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one. CocoIndex uses it for incremental processing.
4278

43-
- Make sure Docker Compose is installed: [docs](https://docs.docker.com/compose/install/)
44-
- Start a Postgres SQL database for cocoindex using our docker compose config:
4579

46-
```bash
47-
docker compose -f <(curl -L https://raw.githubusercontent.com/cocoindex-io/cocoindex/refs/heads/main/dev/postgres.yaml) up -d
48-
```
80+
### Define data flow
4981

50-
### Start your first indexing flow!
51-
Follow [Quick Start Guide](https://cocoindex.io/docs/getting_started/quickstart) to define your first indexing flow.
52-
A common indexing flow looks like:
82+
Follow [Quick Start Guide](https://cocoindex.io/docs/getting_started/quickstart) to define your first indexing flow. An example flow looks like:
5383

5484
```python
5585
@cocoindex.flow_def(name="TextEmbedding")
@@ -90,10 +120,11 @@ def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoind
90120
```
91121

92122
It defines an index flow like this:
93-
![Flow diagram](docs/docs/core/flow_example.svg)
94123

95-
### Play with existing example and demo
96-
Go to the [examples directory](examples) to try out with any of the examples, following instructions under specific example directory.
124+
<img width="363" alt="Data Flow" src="https://github.com/user-attachments/assets/2ea7be6d-3d94-42b1-b2bd-22515577e463" />
125+
126+
127+
## 🚀 Examples and demo
97128

98129
| Example | Description |
99130
|---------|-------------|
@@ -105,8 +136,9 @@ Go to the [examples directory](examples) to try out with any of the examples, fo
105136
| [Docs to Knowledge Graph](examples/docs_to_knowledge_graph) | Extract relationships from Markdown documents and build a knowledge graph |
106137
| [Embeddings to Qdrant](examples/text_embedding_qdrant) | Index documents in a Qdrant collection for semantic search |
107138
| [FastAPI Server with Docker](examples/fastapi_server_docker) | Run the semantic search server in a Dockerized FastAPI setup |
139+
| [Product_Taxonomy_Knowledge_Graph](examples/product_taxonomy_knowledge_graph) | Build knowledge graph for product recommendations |
108140

109-
More coming and stay tuned! If there's any specific examples you would like to see, please let us know in our [Discord community](https://discord.com/invite/zpA9S2DR7s) 🌱.
141+
More coming and stay tuned 👀!
110142

111143
## 📖 Documentation
112144
For detailed documentation, visit [CocoIndex Documentation](https://cocoindex.io/docs), including a [Quickstart guide](https://cocoindex.io/docs/getting_started/quickstart).
@@ -127,5 +159,8 @@ Join our community here:
127159
- ▶️ [Subscribe to our YouTube channel](https://www.youtube.com/@cocoindex-io)
128160
- 📜 [Read our blog posts](https://cocoindex.io/blogs/)
129161

162+
## Support us:
163+
We are constantly improving, and more features and examples are coming soon. If you love this project, please give us a star ⭐ at GitHub repo [![GitHub](https://img.shields.io/github/stars/cocoindex-io/cocoindex?color=5B5BD6)](https://github.com/cocoindex-io/cocoindex) to stay tuned and help us grow.
164+
130165
## License
131166
CocoIndex is Apache 2.0 licensed.

0 commit comments

Comments
 (0)