Target optimal num docs by split when ingesting data

We can ingest data from a file with the CLI. Currently, it produces splits generally with num docs << 10M which is our general optimal num docs. This comes from the fact that we commit a split every 60 seconds (default `commit_timeout_secs`). This will lead to merges and will lower the indexing speed.

Ideally, we would need to produce split with num docs of 10M directly, this can be done by putting a high `commit_timeout_secs` by default.

I suggest putting the commit timeout to 3600 seconds to avoid merges.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Target optimal num docs by split when ingesting data #976

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Target optimal num docs by split when ingesting data #976

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions