Skip to content

Commit 6953931

Browse files
authored
Update the documents about SplitRecursively and transform arguments. (#125)
1 parent 59f692f commit 6953931

File tree

4 files changed

+14
-11
lines changed

4 files changed

+14
-11
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,8 @@ def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoind
6262
with data_scope["documents"].row() as doc:
6363
# Split the document into chunks, put into `chunks` field
6464
doc["chunks"] = doc["content"].transform(
65-
cocoindex.functions.SplitRecursively(
66-
language="markdown", chunk_size=300, chunk_overlap=100))
65+
cocoindex.functions.SplitRecursively(),
66+
language="markdown", chunk_size=300, chunk_overlap=100)
6767
6868
# Transform data of each chunk
6969
with doc["chunks"].row() as chunk:

docs/docs/core/flow_def.mdx

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,14 +122,20 @@ A data slice has a certain data type, and it's the input for most operations.
122122
`transform()` method transforms the data slice by a function, which creates another data slice.
123123
A *function spec* needs to be provided for any transform operation, to describe the function and parameters related to the function.
124124

125+
The function takes one or multiple data arguments.
126+
The first argument is the data slice to be transformed, and the `transform()` method is applied from it.
127+
Other arguments can be passed in as positional arguments or keyword arguments, aftert the function spec.
128+
125129
<Tabs>
126130
<TabItem value="python" label="Python" default>
127131

128132
```python
129133
@cocoindex.flow_def(name="DemoFlow")
130134
def demo_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
131135
...
132-
data_scope["field1"] = data_scope["documents"].transform(DemoFunctionSpec(...))
136+
data_scope["field2"] = data_scope["field1"].transform(
137+
DemoFunctionSpec(...),
138+
arg1, arg2, ..., key0=kwarg0, key1=kwarg1, ...)
133139
...
134140
```
135141

docs/docs/getting_started/quickstart.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -78,8 +78,8 @@ def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoind
7878
with data_scope["documents"].row() as doc:
7979
# Split the document into chunks, put into `chunks` field
8080
doc["chunks"] = doc["content"].transform(
81-
cocoindex.functions.SplitRecursively(
82-
language="markdown", chunk_size=300, chunk_overlap=100))
81+
cocoindex.functions.SplitRecursively(),
82+
language="markdown", chunk_size=300, chunk_overlap=100)
8383
8484
# Transform data of each chunk
8585
with doc["chunks"].row() as chunk:

docs/docs/ops/functions.md

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,15 +11,12 @@ description: CocoIndex Built-in Functions
1111
It tries to split at higher-level boundaries. If each chunk is still too large, it tries at the next level of boundaries.
1212
For example, for a Markdown file, it identifies boundaries in this order: level-1 sections, level-2 sections, level-3 sections, paragraphs, sentences, etc.
1313

14-
The spec takes the following fields:
15-
16-
* `chunk_size` (type: `int`, required): The maximum size of each chunk, in bytes.
17-
* `chunk_overlap` (type: `int`, required): The maximum overlap size between adjacent chunks, in bytes.
18-
* `language` (type: `str`, optional): The language of the document. Currently it supports `markdown`, `python` and `javascript`. If unspecified, will treat it as plain text.
19-
2014
Input data:
2115

2216
* `text` (type: `str`, required): The text to split.
17+
* `chunk_size` (type: `int`, required): The maximum size of each chunk, in bytes.
18+
* `chunk_overlap` (type: `int`, optional): The maximum overlap size between adjacent chunks, in bytes.
19+
* `language` (type: `str`, optional): The language of the document. Currently it supports `markdown`, `python` and `javascript`. If unspecified, will treat it as plain text.
2320

2421
Return type: `Table`, each row represents a chunk, with the following sub fields:
2522

0 commit comments

Comments
 (0)