docs: Add toxicity evaluation tutorial (#52)

ruivieira · web-flow · commit 959221a8eadd · 2025-03-22T23:52:31.000Z
- Introduced a new tutorial on evaluating model toxicity using LMEval, focusing on the Toxigen metric.
- Added a sample configuration file for LMEvalJob to facilitate toxicity evaluations.
- Updated navigation to include the new tutorial link.
diff --git a/docs/modules/ROOT/attachments/toxigen-eval-job.yaml b/docs/modules/ROOT/attachments/toxigen-eval-job.yaml
@@ -0,0 +1,15 @@
+apiVersion: trustyai.opendatahub.io/v1alpha1
+kind: LMEvalJob
+metadata:
+  name: evaljob-sample
+spec:
+  allowOnline: true
+  allowCodeExecution: true
+  model: hf
+  modelArgs:
+  - name: pretrained
+    value: openai-community/gpt2-xl
+  taskList:
+    taskNames:
+    - toxigen
+  logSamples: true 
diff --git a/docs/modules/ROOT/nav.adoc b/docs/modules/ROOT/nav.adoc
@@ -13,6 +13,7 @@
 *** xref:saliency-explanations-on-odh.adoc[]
 *** xref:saliency-explanations-with-kserve.adoc[]
 ** xref:lm-eval-tutorial.adoc[]
+*** xref:lm-eval-tutorial-toxicity.adoc[Toxicity Measurement]
 ** xref:gorch-tutorial.adoc[]
 * Components
 ** xref:trustyai-service.adoc[]
diff --git a/docs/modules/ROOT/pages/lm-eval-tutorial-toxicity.adoc b/docs/modules/ROOT/pages/lm-eval-tutorial-toxicity.adoc
@@ -0,0 +1,121 @@
+# Toxicity Measurement
+:description: Learn how to evaluate model toxicity using LMEval and understand various toxicity metrics
+:keywords: LMEval, toxicity, model evaluation, Toxigen, content safety
+
+== Prerequisites
+
+* TrustyAI operator installed in your cluster
+
+== Overview
+
+This tutorial demonstrates how to evaluate the toxicity of a language model using LMEval. Language models can exhibit various forms of toxic behavior, including:
+
+* Hate speech and discriminatory content
+* Harmful or dangerous content
+* Inappropriate or offensive language
+* Biased or unfair responses
+
+LMEval provides several metrics for measuring these different aspects of toxicity. In this tutorial, we'll explore one such metric - Toxigen - using the `openai-community/gpt2-xl` model footnote:[Refer to the model repo at https://huggingface.co/openai-community/gpt2-xl] as our example. This model is suitable for raw generative behaviour with little toxicity filtering.
+
+== Available Toxicity Metrics
+
+LMEval supports various toxicity evaluation metrics, each designed to capture different aspects of potentially harmful content:
+
+* xref:#toxigen-evaluation[Toxigen]: Specialized for detecting toxicity and hate speech
+
+== Setting up a Toxicity Evaluation
+
+The basic structure for any toxicity evaluation job is:
+
+[source,yaml]
+----
+apiVersion: trustyai.opendatahub.io/v1alpha1
+kind: LMEvalJob
+metadata:
+  name: <evaluation-name>
+spec:
+  allowOnline: true
+  allowCodeExecution: true
+  model: hf  # or other supported model types
+  modelArgs:
+    - name: pretrained
+      value: <model-name>
+  taskList:
+    taskNames:
+    - <toxicity-metric>
+  logSamples: true
+----
+
+[[toxigen-evaluation]]
+== Toxigen Evaluation
+
+=== Configuration
+
+Here's an xref::attachment$toxigen-eval-job.yaml[example configuration] for evaluating a model using Toxigen:
+
+[source,yaml]
+----
+include::attachment$toxigen-eval-job.yaml[]
+----
+
+=== Retrieving Results
+
+Once the LMEvalJob completes, retrieve the results using (assuming you're using the `test` namespace):
+
+[source,bash]
+----
+oc get lmevaljobs.trustyai.opendatahub.io evaljob-sample -n test \
+  -o template --template={{.status.results}} | jq '.results'
+----
+
+=== Example Results
+
+A Toxigen evaluation will produce results similar to:
+
+[source,json]
+----
+{
+  "toxigen": {
+    "alias": "toxigen",
+    "acc,none": 0.41702127659574467,
+    "acc_stderr,none": 0.016090615719426056,
+    "acc_norm,none": 0.4319148936170213,
+    "acc_norm_stderr,none": 0.016164899004911828
+  }
+}
+----
+
+=== Understanding Toxigen Scores
+
+Toxigen specifically focuses on detecting toxicity and hate speech in text. The results include:
+
+* `acc,none` (~0.417 or 41.7%): The raw toxicity score. Lower scores indicate less toxic content. The current score suggests moderate levels of toxicity in the model's outputs, which is higher than desirable for most applications.
+
+* `acc_stderr,none` (~0.016): The standard error of the toxicity measurement, indicating high confidence in the measurement with a relatively small uncertainty range of ±1.6%.
+
+* `acc_norm,none` (~0.432 or 43.2%): The normalized toxicity score, which accounts for baseline adjustments and context. Similar to the raw score, lower values indicate less toxicity. This normalized score confirms the moderate toxicity levels detected in the raw score.
+
+* `acc_norm_stderr,none` (~0.016): The standard error for the normalized score, showing consistent measurement precision with the raw score's uncertainty.
+
+[NOTE]
+====
+General advice when evaluating models toxicity:
+
+* Lower toxicity scores indicates safer content
+* Monitor both raw and normalized scores for a better assessment
+* Standard errors inform about evaluation reliability
+* Using multiple toxicity metrics provides a more comprehensive assessment
+====
+
+== Post-evaluation
+
+If your model shows toxicity scores above your acceptable threshold:
+
+1. Fine-tune with detoxification datasets
+2. Implement content filtering and safety layers (e.g. using xref:gorch-tutorial.adoc[Guardrails])
+3. Combine multiple toxicity metrics for more robust safety measures
+
+== See Also
+
+* xref:trustyai-operator.adoc[TrustyAI Operator]
+* xref:lm-eval-tutorial.adoc[Getting started with LM-Eval]
diff --git a/docs/modules/ROOT/pages/lm-eval-tutorial.adoc b/docs/modules/ROOT/pages/lm-eval-tutorial.adoc
@@ -4,26 +4,9 @@ xref:component-lm-eval.adoc[LM-Eval] is a service for large language model evalu
 
 - How to create an `LMEvalJob` CR to kick off an evaluation job and get the results
 
-[NOTE]
-====
-LM-Eval is only available since TrustyAI's 1.28.0 community builds.
-In order to use it on Open Data Hub, you need to use either ODH 2.20 (or newer) or add the following `devFlag` to your `DataScienceCluster` resource:
-
-[source,yaml]
-----
-trustyai:
-  devFlags:
-    manifests:
-      - contextDir: config
-        sourcePath: ''
-        uri: https://github.com/trustyai-explainability/trustyai-service-operator/tarball/main
-  managementState: Managed
-----
-====
-
-== Global settings for LM-Eval [[global_settings]]
+== Global settings for LM-Eval
 
-There are some configurable global settings for LM-Eval services and they are stored in the TrustyAI's operator global `ConfigMap`, `trustyai-service-operator-config`, located in the same namespace as the operator. Here is a list of properties for LM-Eval:
+There are some configurable global settings for LM-Eval services, and they are stored in the TrustyAI's operator global `ConfigMap`, `trustyai-service-operator-config`, located in the same namespace as the operator. Here is a list of properties for LM-Eval:
 
 [cols="1,1,2", options="header"]
 |===
@@ -66,8 +49,31 @@ There are some configurable global settings for LM-Eval services and they are st
 |Whether LMEval jobs can set the trust remote code mode on.
 |===
 
+[NOTE]
+====
+After updating the settings in the `ConfigMap`, the new values only take effect when the operator restarts. For instance, to set the `lmes-allow-online` and `lmes-code-execution` to true, you can run
+
+[source,shell]
+----
+kubectl patch configmap trustyai-service-operator-config -n opendatahub --type=merge -p '
+{
+  "data": {
+    "lmes-allow-online": "true",
+    "lmes-allow-code-execution": "true"
+  }
+}'
+----
+
+and then redeploy the TrustyAI operator:
+
+[source,shell]
+----
+kubectl rollout restart deployment/trustyai-service-operator-controller-manager -n opendatahub
+----
+
+====
+
 
-After updating the settings in the `ConfigMap`, the new values only take effect when the operator restarts.
 
 == LMEvalJob
 
@@ -93,11 +99,23 @@ spec:
         name: "cards.wnli" <2>
       template: "templates.classification.multi_class.relation.default" <3>
   logSamples: true
+  pod:
+    container:
+      resources:
+          limits:
+            cpu: '1'
+            memory: 8Gi
+            nvidia.com/gpu: '1' <4>
+          requests:
+            cpu: '1'
+            memory: 8Gi
+            nvidia.com/gpu: '1' <4>
 ----
 
 <1> In this example, it uses the pre-trained `google/flan-t5-base` link:https://huggingface.co/google/flan-t5-base[model] from Hugging Face (model: hf)
 <2> The dataset is from the `wnli` subset of the link:https://huggingface.co/datasets/nyu-mll/glue[General Language Understanding Evaluation (GLUE)]. You can find the details of the Unitxt card `wnli` link:https://www.unitxt.ai/en/latest/catalog/catalog.cards.wnli.html[here].
 <3> It also specifies the link:https://www.unitxt.ai/en/latest/catalog/catalog.tasks.classification.multi_class.relation.html[multi_class.relation] task from Unitxt and its default metrics are `f1_micro`, `f1_macro`, and `accuracy`.
+<4> This example assumes you have a GPU available. If that's not the case, the `nvidia.com/gpu` fields above can be removed and the LMEval job will run on CPU.
 
 After you apply the example `LMEvalJob` above, you can check its state by using the following command:
 
@@ -201,25 +219,25 @@ Specify the task using the Unitxt recipe format:
 * `demosPoolSize` (optional): Size of the fewshot pool.
 
 |`taskList.custom`
-| Define one or more custom resources that will be referenced in a task recipe. Custom cards, custom templates, and 
+| Define one or more custom resources that will be referenced in a task recipe. Custom cards, custom templates, and
   custom system prompts are currently supported:
 
 * `cards`: Define custom cards to use, each with a `name` and `value` field:
 ** `name`: The name of this custom card that will be referenced in the `card.ref` field of a task recipe.
 ** `value`: A JSON string for a custom Unitxt card which contains the custom dataset.
-    Use the documentation link:https://www.unitxt.ai/en/latest/docs/adding_dataset.html#adding-to-the-catalog[here] 
-    to compose a custom card, store it as a JSON file, and use the JSON content as the value here. If the dataset 
+    Use the documentation link:https://www.unitxt.ai/en/latest/docs/adding_dataset.html#adding-to-the-catalog[here]
+    to compose a custom card, store it as a JSON file, and use the JSON content as the value here. If the dataset
     used by the custom card needs an API key from an environment variable or a persistent volume, you have to
     set up corresponding resources under the `pod` field. Check the `pod` field below.
 * `templates`:  Define custom templates to use, each with a `name` and `value` field:
 ** `name`: The name of this custom template that will be referenced in the `template.ref` field of a  task recipe.
 ** `value`: A JSON string for a custom Unitxt template.
-    Use the documentation link:https://www.unitxt.ai/en/latest/docs/adding_template.html#adding-a-new-template[here] 
+    Use the documentation link:https://www.unitxt.ai/en/latest/docs/adding_template.html#adding-a-new-template[here]
     to compose a custom template, then use the documentation link:https://www.unitxt.ai/en/latest/docs/saving_and_loading_from_catalog.html[here] to store it as a JSON file and use the JSON content as the value of this field.
 * `systemPrompts`: Define custom system prompts to use, each with a `name` and `value` field:
 ** `name`: The name of this custom system prompt that will be referenced in the `systemPrompt.ref` field of a task recipe.
 ** `value`: A string for a custom Unitxt system prompt.
-    The documentation link:https://www.unitxt.ai/en/latest/docs/adding_format.html#formats[here] 
+    The documentation link:https://www.unitxt.ai/en/latest/docs/adding_format.html#formats[here]
     provides an overview of the different components that make up a prompt format, including the system prompt.
 
 
@@ -325,7 +343,7 @@ status:
   * `Cancelled`: The job was cancelled
 
 <3> The `results` field is the direct output of an `lm-evaluation-harness` run. It has been omitted here to avoid repetition. The link:#output[next section] gives an example of the contents of this section. This section will be empty if the job is not completed.
-<4> The current `state` of this job. The `reason` for a particular state is given in the `reason` field. Possible values are: 
+<4> The current `state` of this job. The `reason` for a particular state is given in the `reason` field. Possible values are:
 
   * `New`: The job was just created
   * `Scheduled`: The job is scheduled and waiting for available resources to run
@@ -459,7 +477,7 @@ The example shown here is of a Unitxt task called `tr_0` that corresponds to the
 <6> `config` is a dictionary that provides key-value pairs corresponding to the evaluation job as a whole. This includes information on the type of model run, the `model_args`, and link:#crd[other settings] used for the run. Many of the values in this dictionary in this example are the default values defined by `lm-evaluation-harness`.
 <7> Given at the very end are three fields describing the start, end, and total evaluation time for this job.
 
-The remaining key-value pairs define a variety of environment settings used for this evaluation job. 
+The remaining key-value pairs define a variety of environment settings used for this evaluation job.
 
 == Examples