|
| 1 | + |
| 2 | + |
| 3 | +# Azure AI Search Setup Guide |
| 4 | +## Overview |
| 5 | +The Azure AI Search feature helps improve the responses from your application by combining the power of large language models (LLMs) with extra context retrieved from an external data source. Simply put, when you ask a question, the agent first searches through a set of relevant documents (stored as embeddings) and then uses this context to provide a more accurate and relevant response. If no relevant context is found, the agent returns the LLM response directly and informs customer that there is no relevant information in the documents. |
| 6 | +This AI Search feature is optional and is disabled by default. If you prefer to use it, simply set the environment variable `USE_AZURE_AI_SEARCH_SERVICE` to `true`. Doing so will also trigger the deployment of Azure AI Search resources. |
| 7 | + |
| 8 | +## How does Azure AI Search works? |
| 9 | +In our provided example, the application includes a sample dataset containing information about Contoso products. This data was split by 10 sentences, and each chunk of text was transformed into numerical representations called embeddings. These embeddings were created using OpenAI's `text-embedding-3-small` model with `dimensions=100`. The resulting embeddings file (`embeddings.csv`) is located in the `api/data` folder. The agent requires index, capable of both semantic and index search i.e. it can use LLM to search for context in the text fields as well as it can search by embedding vector similarity. The built index also must have the configured vectorizer, which will build the embedding when the agent will apply hybrid search. For search to provide the correct reference, the index must contain the field called "title" and optionally "url" to provide the link, however it is not shown in our sample as we have generated index using files located in `api/files` and there are no links available. |
| 10 | + |
| 11 | + |
| 12 | +## If you want to use your own dataset |
| 13 | +To create a custom embeddings file with your own data, you can use the provided helper class `SearchIndexManager`. Below is a straightforward way to build your own embeddings: |
| 14 | +```python |
| 15 | +from .api.search_index_manager import SearchIndexManager |
| 16 | + |
| 17 | +search_index_manager = SearchIndexManager( |
| 18 | + endpoint=your_search_endpoint, |
| 19 | + credential=your_credentials, |
| 20 | + index_name=your_index_name, |
| 21 | + dimensions=100, |
| 22 | + model=your_embedding_model, |
| 23 | + deployment_name=your_embedding_model, |
| 24 | + embedding_endpoint=your_search_endpoint_url, |
| 25 | + embed_api_key=embed_api_key, |
| 26 | + embedding_client=embedding_client |
| 27 | +) |
| 28 | +search_index_manager.build_embeddings_file( |
| 29 | + input_directory=input_directory, |
| 30 | + output_file=output_directory, |
| 31 | + sentences_per_embedding=10 |
| 32 | +) |
| 33 | +``` |
| 34 | +- Make sure to replace `your_search_endpoint`, `your_credentials`, `your_index_name`, and `embedding_client` with your own Azure service details. |
| 35 | +- `your_embedding_model` is the model, used to build embeddings. |
| 36 | +- `your_search_endpoint_url` is the url of emedding endpoint, which will be used to create the vectorizer, and `embed_api_key` is the API key to access it. |
| 37 | +- Your input data should be placed in the folder specified by `input_directory`. |
| 38 | +- `sentences_per_embedding` parameter specifies the number of sentences used to construct the embedding. The larger this number, the broader the context that will be identified during the similarity search. |
| 39 | + |
| 40 | +## Deploying the Application with AI index search enabled |
| 41 | +To deploy your application using the AI index search feature, set the following environment variables locally: |
| 42 | +In power shell: |
| 43 | +``` |
| 44 | +$env:USE_AZURE_AI_SEARCH_SERVICE="true" |
| 45 | +$env:AZURE_AI_SEARCH_INDEX_NAME="index_sample" |
| 46 | +$env:AZURE_AI_EMBED_DEPLOYMENT_NAME="text-embedding-3-small" |
| 47 | +``` |
| 48 | + |
| 49 | +In bash: |
| 50 | +``` |
| 51 | +export USE_AZURE_AI_SEARCH_SERVICE="true" |
| 52 | +export AZURE_AI_SEARCH_INDEX_NAME="index_sample" |
| 53 | +export AZURE_AI_EMBED_DEPLOYMENT_NAME="text-embedding-3-small" |
| 54 | +``` |
| 55 | + |
| 56 | +In cmd: |
| 57 | +``` |
| 58 | +set USE_AZURE_AI_SEARCH_SERVICE=true |
| 59 | +set AZURE_AI_SEARCH_INDEX_NAME=index_sample |
| 60 | +set AZURE_AI_EMBED_DEPLOYMENT_NAME=text-embedding-3-small |
| 61 | +``` |
| 62 | + |
| 63 | +- `USE_AZURE_AI_SEARCH_SERVICE`: Enables or disables (default) index search. |
| 64 | +- `AZURE_AI_SEARCH_INDEX_NAME`: The Azure Search Index the application will use. |
| 65 | +- `AZURE_AI_EMBED_DEPLOYMENT_NAME`: The Azure embedding deployment used to create embeddings. |
| 66 | + |
| 67 | +## Creating the Azure Search Index |
| 68 | + |
| 69 | +To utilize index search, you must have an Azure search index. By default, the application uses `index_sample` as the index name. You can create an index either by following these official Azure [instructions](https://learn.microsoft.com/azure/ai-services/agents/how-to/tools/azure-ai-search?tabs=azurecli%2Cpython&pivots=overview-azure-ai-search), or programmatically with the provided helper methods: |
| 70 | +```python |
| 71 | +# Create Azure Search Index (if it does not yet exist) |
| 72 | +await search_index_manager.create_index(raise_on_error=True) |
| 73 | + |
| 74 | +# Upload embeddings to the index |
| 75 | +await search_index_manager.upload_documents(embeddings_path) |
| 76 | +``` |
| 77 | +**Important:** If you have already created the index before deploying your application, the system will skip this step and directly use your existing Azure Search Index. The parameter `vector_index_dimensions` is only required if dimension information was not already provided when initially constructing the `SearchIndexManager` object. |
0 commit comments