Skip to content

Commit 88a14e0

Browse files
author
Abby Hartman
committed
Respond to some feedback, and added a screenshot to show comparison of evaluation runs
1 parent d6b746b commit 88a14e0

File tree

2 files changed

+41
-15
lines changed

2 files changed

+41
-15
lines changed

scenarios/evaluate/Supported_Evaluation_Metrics/Document_Retrieval_Evaluation/Document_Retrieval_Evaluation.ipynb

Lines changed: 41 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,21 @@
55
"id": "8f372fc3-d2ad-49a8-87da-e3b092b6b5ae",
66
"metadata": {},
77
"source": [
8-
"# Document Retrieval Evaluation in Azure AI Foundry"
8+
"# Document Retrieval Evaluation in Azure AI Foundry\n",
9+
"\n",
10+
"## Summary\n",
11+
"This notebook sample demonstrates how to perform evaluation of an Azure AI Search index using Azure AI Evaluation. The evaluator used in this example, `DocumentRetrievalEvaluator` requires a list of ground truth labeled documents (sometimes referred to as \"qrels\") and a list of actual search results obtained from a search index as inputs for calculating the evaluation metrics. This sample will walk through the steps of data preparation, gathering search results for different search configurations, running evaluation and comparing results of each run.\n",
12+
"\n",
13+
"### Explanation of Document Retrieval Metrics \n",
14+
"The metrics that will be generated in the output of the evaluator include:\n",
15+
"* NDCG (Normalized Discounted Cumulative Gain) calculated for the top 3 documents retrieved from a search query. NDCG measures how well a document ranking compares to an ideal document ranking given a list of ground-truth documents.\n",
16+
"* XDCG calculated for the top 3 documents retrieved from a search query. XDCG measures how objectively good are the top K documents, discounted by their position in the list.\n",
17+
"* Fidelity calculated over all documents retrieved from a search query. Fidelity measures how objectively good are all of the documents retrieved compared with all known good documents in the underlying data store.\n",
18+
"* Top 1 relevance, which is the top relevance score for a given set of retrieved documents.\n",
19+
"* Top 3 max relevance, which is the maximum relevance score among the top 3 documents for a given set of retrieved documents.\n",
20+
"* Holes and holes ratio, which measure the number of retrieved documents for which a ground truth label is missing, and the proportion of this count within the total number of retrieved documents, respectively.\n",
21+
"\n",
22+
"It's important to note that some metrics, particularly NDCG, XDCG and Fidelity, are sensitive to holes. Ideally the count of holes for a given evaluation should be zero, otherwise results for these metrics may not be accurate. It is recommended to iteratively check results against current known ground truth to fill holes to improve accuracy of the evaluation metrics. This process is not covered explicitly in the sample but is important to mention."
923
]
1024
},
1125
{
@@ -23,10 +37,12 @@
2337
"source": [
2438
"### Prerequisites\n",
2539
"Before running this notebook, be sure you have fulfilled the following prerequisites:\n",
26-
"* Create an Azure AI Search resource, with the \"Search Index Data Contributor\" role\n",
27-
"* Create an Azure AI Foundry project\n",
28-
"* Deploy a text embedding model in your Azure AI Foundry project if you would like to test vector-based search scenarios in this example, and give yourself \"Cognitive Services OpenAI User\" role for the Azure AI Services resource created for your project.\n",
29-
"* `az` CLI is installed in the current environment, and you have run `az login` to gain access to your resources."
40+
"* Create or get access to an [Azure Subscription](https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/azure-best-practices/initial-subscriptions), and assign yourself the Owner or Contributor role for creating resources in this subscription.\n",
41+
"* `az` CLI is installed in the current environment, and you have run `az login` to gain access to your resources.\n",
42+
"* Read and understand the documentation covering [assigning RBAC roles between resources](https://learn.microsoft.com/en-us/azure/role-based-access-control/role-assignments-cli) using the `az` CLI.\n",
43+
"* Create [Azure AI Search resource](https://learn.microsoft.com/en-us/azure/search/search-create-service-portal), and assign yourself the \"Search Index Data Contributor\" role for the resource. \n",
44+
"* Create an [Azure AI Foundry project](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/create-projects?tabs=ai-studio).\n",
45+
"* [Optional] Deploy a text embedding model in the Azure AI Foundry project for testing vector-based search scenarios in this example, and assign yourself the \"Cognitive Services OpenAI User\" role for the Azure AI Services resource created for your project. For integrated vectorization support, you will also need to ensure that your Azure AI Search resource has the \"Cognitive Services OpenAI User\" role assigned for the Azure AI Services resource as well."
3046
]
3147
},
3248
{
@@ -226,7 +242,7 @@
226242
"metadata": {},
227243
"source": [
228244
"### Create search configurations\n",
229-
"In the next cell, we will set some additional configuration values for configuring document search using Azure AI Search. We can select from these configurations later on when we generate search results for evaluation."
245+
"In the next cell, we will set some additional configuration values for configuring document search using Azure AI Search. We can select from these configurations later on when we generate search results for evaluation, and then compare the results of each run after the evaluations are finished to determine which configuration performs best for the index."
230246
]
231247
},
232248
{
@@ -306,7 +322,7 @@
306322
"metadata": {},
307323
"source": [
308324
"### Download the TREC-COVID (Beir) dataset\n",
309-
"In this section we will download an open source dataset to perform evaluation on. We will use the TREC-COVID dataset from BeIR, which contains a corpus we can index into Azure AI Search, as well as a set of queries and qrels for evaluation.\n"
325+
"In the next cell, we will download an open source dataset to perform evaluation on. We will use the TREC-COVID dataset from BeIR, which contains a corpus we can index into Azure AI Search, as well as a set of queries to run through our search service and a set of ground truth qrels for evaluation.\n"
310326
]
311327
},
312328
{
@@ -351,7 +367,7 @@
351367
"metadata": {},
352368
"source": [
353369
"### Create an Azure AI Search index from the dataset corpus\n",
354-
"In this section, we will create an Azure AI Search index using the BeIR TREC-COVID dataset. If integrated vectorization is enabled in the configuration settings, we will also add a vector field for our index."
370+
"Next, we will create an Azure AI Search index using the BeIR TREC-COVID dataset downloaded in the previous cell. If integrated vectorization is enabled in the configuration settings, we will also add a vector field for our index."
355371
]
356372
},
357373
{
@@ -488,7 +504,7 @@
488504
"metadata": {},
489505
"source": [
490506
"### Index the documents from the dataset corpus\n",
491-
"In this section, we will upload the documents from the TREC-COVID dataset into the index we previously created."
507+
"Once we have the data downloaded and the index created, we will ingest the documents from the local file into the index. If integrated vectorization is configured, we will also create embeddings for the input data to include in the ingestion payload."
492508
]
493509
},
494510
{
@@ -658,9 +674,9 @@
658674
"metadata": {},
659675
"source": [
660676
"### Upload dataset to Azure AI Foundry\n",
661-
"To run an evaluation in the cloud, we need to run the function in the previous cell to generate the data for evaluation, and then upload the data files to the specified Azure AI Foundry project.\n",
677+
"To run an evaluation in the cloud, we need to uploud our evaluation data to the specified Azure AI Foundry project.\n",
662678
"\n",
663-
"We will run the data preparation for each configuration specified earlier in the notebook, so we can compare evaluation runs to determine which configuration performs best."
679+
"We will run the data preparation for each search configuration specified earlier in the notebook, so we can compare evaluation runs to determine which configuration performs best."
664680
]
665681
},
666682
{
@@ -695,7 +711,7 @@
695711
"metadata": {},
696712
"source": [
697713
"## Run document retrieval evaluation\n",
698-
"In the following cell, we will configure and run the document retrieval evaluator for our dataset. The init params `groundtruth_label_min` and `groundtruth_label_max` help us to configure the qrels scaling for some metrics which depend on a count of labels, such as Fidelity. In this case, the TREC-COVID dataset groundtruth set has 0, 1, and 2 as possible labels, so we set the values of those init params accordingly."
714+
"After our datasets are uploaded, we will configure and run the document retrieval evaluator for each uploaded dataset. The init params `groundtruth_label_min` and `groundtruth_label_max` help us to configure the qrels scaling for some metrics which depend on a count of labels, such as Fidelity. In this case, the TREC-COVID dataset groundtruth set has 0, 1, and 2 as possible labels, so we set the values of those init params accordingly."
699715
]
700716
},
701717
{
@@ -753,11 +769,21 @@
753769
]
754770
},
755771
{
756-
"cell_type": "code",
757-
"execution_count": null,
772+
"cell_type": "markdown",
758773
"id": "7c616e48-2fab-4493-977e-ec51c7e6951b",
759774
"metadata": {},
760-
"outputs": [],
775+
"source": [
776+
"## Comparing results\n",
777+
"\n",
778+
"Once the evaluations are complete, you can compare the results by clicking the \"Evaluations\" tab on the left-side of the Azure AI Foundry project page, select the runs for comparison, and then click the \"Compare\" button to see metric results side-by-side.\n",
779+
"\n",
780+
"![Azure AI Foundry project evaluations page](eval-results-select.png)"
781+
]
782+
},
783+
{
784+
"cell_type": "markdown",
785+
"id": "e9406ba6",
786+
"metadata": {},
761787
"source": []
762788
}
763789
],

0 commit comments

Comments
 (0)