|
| 1 | +--- |
| 2 | +title: Query Support |
| 3 | +description: CocoIndex supports vector search and text search. |
| 4 | +--- |
| 5 | + |
| 6 | +import Tabs from '@theme/Tabs'; |
| 7 | +import TabItem from '@theme/TabItem'; |
| 8 | + |
| 9 | +# CocoIndex Query Support |
| 10 | + |
| 11 | +The main functionality of CocoIndex is indexing. |
| 12 | +The goal of indexing is to enable efficient querying against your data. |
| 13 | +You can use any libraries or frameworks of your choice to perform queries. |
| 14 | +At the same time, CocoIndex provides seamless integration between indexing and querying workflows. |
| 15 | +For example, you can share transformations between indexing and querying, and easily retrieve table names when using CocoIndex's default naming conventions. |
| 16 | + |
| 17 | +## Transform Flow |
| 18 | + |
| 19 | +Sometimes a part of the transformation logic needs to be shared between indexing and querying, |
| 20 | +e.g. when we build a vector index and query against it, the embedding computation needs to be consistent between indexing and querying. |
| 21 | + |
| 22 | +In this case, you can: |
| 23 | + |
| 24 | +1. Extract a sub-flow with the shared transformation logic into a standalone function. |
| 25 | + * It takes one or more data slices as input. |
| 26 | + * It returns one data slice as output. |
| 27 | + * You need to annotate data types for both inputs and outputs as type parameter for `cocoindex.DataSlice[T]`. See [data types](./core/data_types.mdx) for more details about supported data types. |
| 28 | + |
| 29 | +2. When you're defining your indexing flow, you can directly call the function. |
| 30 | + The body will be executed, so that the transformation logic will be added as part of the indexing flow. |
| 31 | + |
| 32 | +3. At query time, you usually want to directly run the function with specific input data, instead of letting it called as part of a long-lived indexing flow. |
| 33 | + To do this, declare the function as a *transform flow*, by decorating it with `@cocoindex.transform_flow()`. |
| 34 | + This will add a `eval()` method to the function, so that you can directly call with specific input data. |
| 35 | + |
| 36 | + |
| 37 | +<Tabs> |
| 38 | +<TabItem value="python" label="Python"> |
| 39 | + |
| 40 | +The [quickstart](getting_started/quickstart#step-41-extract-common-transformations) shows an example: |
| 41 | + |
| 42 | +```python |
| 43 | +@cocoindex.transform_flow() |
| 44 | +def text_to_embedding(text: cocoindex.DataSlice[str]) -> cocoindex.DataSlice[list[float]]: |
| 45 | + return text.transform( |
| 46 | + cocoindex.functions.SentenceTransformerEmbed( |
| 47 | + model="sentence-transformers/all-MiniLM-L6-v2")) |
| 48 | +``` |
| 49 | + |
| 50 | +When you're defining your indexing flow, you can directly call the function: |
| 51 | + |
| 52 | +```python |
| 53 | +with doc["chunks"].row() as chunk: |
| 54 | + chunk["embedding"] = text_to_embedding(chunk["text"]) |
| 55 | +``` |
| 56 | + |
| 57 | +or, using the `call()` method of the transform flow on the first argument, to make operations chainable: |
| 58 | + |
| 59 | +```python |
| 60 | +with doc["chunks"].row() as chunk: |
| 61 | + chunk["embedding"] = chunk["text"].call(text_to_embedding) |
| 62 | +``` |
| 63 | + |
| 64 | +Any time, you can call the `eval()` method with specific string, which will return a `list[float]`: |
| 65 | + |
| 66 | +```python |
| 67 | +print(text_to_embedding.eval("Hello, world!")) |
| 68 | +``` |
| 69 | + |
| 70 | +</TabItem> |
| 71 | +</Tabs> |
| 72 | + |
| 73 | +## Get Target Native Names |
| 74 | + |
| 75 | +In your indexing flow, when you export data to a target, you can specify the target name (e.g. a database table name, a collection name, the node label in property graph databases, etc.) explicitly, |
| 76 | +or for some backends you can also omit it and let CocoIndex generate a default name for you. |
| 77 | +For the latter case, CocoIndex provides a utility function `cocoindex.utils.get_target_storage_default_name()` to get the default name. |
| 78 | +It takes the following arguments: |
| 79 | + |
| 80 | +* `flow` (type: `cocoindex.Flow`): The flow to get the default name for. |
| 81 | +* `target_name` (type: `str`): The export target name, appeared in the `export()` call. |
| 82 | + |
| 83 | +For example: |
| 84 | + |
| 85 | +<Tabs> |
| 86 | +<TabItem value="python" label="Python"> |
| 87 | + |
| 88 | +```python |
| 89 | +table_name = cocoindex.utils.get_target_storage_default_name(text_embedding_flow, "doc_embeddings") |
| 90 | +query = f"SELECT filename, text FROM {table_name} ORDER BY embedding <=> %s::vector DESC LIMIT 5" |
| 91 | +... |
| 92 | +``` |
| 93 | + |
| 94 | +</TabItem> |
| 95 | +</Tabs> |
| 96 | + |
0 commit comments