Skip to content

Target optimal num docs by split when ingesting data #976

Open
@fmassot

Description

@fmassot

We can ingest data from a file with the CLI. Currently, it produces splits generally with num docs << 10M which is our general optimal num docs. This comes from the fact that we commit a split every 60 seconds (default commit_timeout_secs). This will lead to merges and will lower the indexing speed.

Ideally, we would need to produce split with num docs of 10M directly, this can be done by putting a high commit_timeout_secs by default.

I suggest putting the commit timeout to 3600 seconds to avoid merges.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions