Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs about Table and List type representations. #77

Merged
merged 1 commit into from
Mar 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions docs/docs/core/basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,8 @@ An indexing flow involves source data and transformed data (either as an interme
Each piece of data has a **data type**, falling into one of the following categories:

* Basic type.
* Composite type
* Struct: a collection of **fields**, each with a name and a type.
* Table: a collection of **rows**, each of which is a struct with specified schema.
* Struct type: a collection of **fields**, each with a name and a type.
* Collection type: a collection of **rows**, each of which is a struct with specified schema. A collection type can be a table (which has a key field) or a list (ordered but without key field).

An indexing flow always has a top-level struct, containing all data within and managed by the flow.

Expand Down
47 changes: 30 additions & 17 deletions docs/docs/core/data_types.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,37 +17,50 @@ This is the list of all basic types supported by CocoIndex:

| Type | Type in Python | Original Type in Python |
|------|-----------------|--------------------------|
| `bytes` | `bytes` | `bytes` |
| `str` | `str` | `str` |
| `bool` | `bool` | `bool` |
| `int64` | `int` | `int` |
| `float32` | `cocoindex.typing.Float32` |`float` |
| `float64` | `cocoindex.typing.Float64` |`float` |
| `range` | `cocoindex.typing.Range` | `tuple[int, int]` |
| `vector(*type* [, *N*])` |`Annotated[list[type], cocoindex.typing.Vector(dim=N)]` | `list[type]` |
| `json` | `cocoindex.typing.Json` | Any type convertible to JSON by `json` package |
| bytes | `bytes` | `bytes` |
| str | `str` | `str` |
| bool | `bool` | `bool` |
| int64 | `int` | `int` |
| float32 | `cocoindex.typing.Float32` |`float` |
| float64 | `cocoindex.typing.Float64` |`float` |
| range | `cocoindex.typing.Range` | `tuple[int, int]` |
| vector[*type*, *N*?] |`Annotated[list[type], cocoindex.typing.Vector(dim=N)]` | `list[type]` |
| json | `cocoindex.typing.Json` | Any type convertible to JSON by `json` package |

For some types, CocoIndex Python SDK provides annotated types with finer granularity than Python's original type, e.g. `Float32` and `Float64` for `float`, and `vector` has dimension information.

When defining [custom functions](/docs/core/custom_function), use the specific types as type annotations for arguments and return values.
So CocoIndex will have information about the specific type.

### Struct
### Struct Type

A struct has a bunch of fields, each with a name and a type.

### Table
In Python, a struct type is represented by a [dataclass](https://docs.python.org/3/library/dataclasses.html),
and all fields must be annotated with a specific type. For example:

A table has a collection of rows, each of which is a struct with specified schema.
```python
from dataclasses import dataclass

The first field of a table is always the primary key.
@dataclass
class Order:
order_id: str
name: str
price: float
```

:::note
### Collection Types

CocoIndex will support functions taking struct and table types as arguments or returning composite types soon.
We'll update this section with corresponding Python types by then.
A collection type models a collection of rows, each of which is a struct with specific schema.

:::
We have two specific types of collection:

| Type | Description |Type in Python | Original Type in Python |
|------|-------------|---------------|-------------------------|
| Table[*type*] | The first field is the key, and CocoIndex enforces its uniqueness | `cocoindex.typing.Table[type]` | `list[type]` |
| List[*type*] | No key field; row order is preserved | `cocoindex.typing.List[type]` | `list[type]` |

For example, we can use `cocoindex.typing.Table[Order]` to represent a table of orders, and the first field `order_id` will be taken as the key field.

## Types to Create Indexes

Expand Down
Loading
Loading