Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding configuring chunking settings notebook for blog #417

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dan-rubinstein
Copy link
Member

@dan-rubinstein dan-rubinstein commented Mar 4, 2025

Adding a notebook for the configuring chunking settings blog I'm currently writing. A draft of the blog can be seen here.

Note: I've run through the steps with local testing and with a serverless trial account and confirmed that the steps succeeded.

Copy link

gitnotebooks bot commented Mar 4, 2025

Found 1 changed notebook. Review the changes at https://app.gitnotebooks.com/elastic/elasticsearch-labs/pull/417

"# Install packages and connect with Elasticsearch Client\n",
"\n",
"To get started, we'll need to connect to our Elastic deployment using the Python client (version 8.12.0 or above).\n",
"Because we're using an Elastic Cloud deployment, we'll use the **Cloud ID** to identify our deployment.\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change to "Because we're using an Elastic Serverless deployment, we'll use the Serverless Endpoint to identify our deployment."?

So that it matches the below code hosts=[ELASTIC_SERVERLESS_ENDPOINT]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant to update this section when I switched to serverless endpoint. I'll update this.

Copy link
Member Author

@dan-rubinstein dan-rubinstein Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with @joseph.mcelroy and was asked to switch to using cloud ID instead of serverless endpoint. I'll update all of the endpoint information to be related to cloud ID.

"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/document-chunking/configuring-chunking-settings-for-inference-endpoints.ipynb)\n",
"\n",
"\n",
"Learn how to configure [chunking settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-apis.html#infer-chunking-config) for [Inference API](https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-apis.html) endpoints."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to move and summarize some of the content of this wiki into an overview section to explain why users want to change chunking strategies, sizes, and overlap? Or does that not make a lot of sense since it'll distract from the overall goal of teaching people how to call the API with the new fields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this notebook will be linked from the chunking blog which explains the motivation behind various chunking setting options, I think we can leave this as is to avoid duplicating the information too many times. We could consider adding in a link to the blog once it is posted but this will have to be a follow-up change.

@dan-rubinstein dan-rubinstein requested a review from prwhelan March 4, 2025 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants