Skip to content

Commit 959221a

Browse files
authored
docs: Add toxicity evaluation tutorial (#52)
- Introduced a new tutorial on evaluating model toxicity using LMEval, focusing on the Toxigen metric. - Added a sample configuration file for LMEvalJob to facilitate toxicity evaluations. - Updated navigation to include the new tutorial link.
1 parent 0464feb commit 959221a

File tree

4 files changed

+182
-27
lines changed

4 files changed

+182
-27
lines changed
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
apiVersion: trustyai.opendatahub.io/v1alpha1
2+
kind: LMEvalJob
3+
metadata:
4+
name: evaljob-sample
5+
spec:
6+
allowOnline: true
7+
allowCodeExecution: true
8+
model: hf
9+
modelArgs:
10+
- name: pretrained
11+
value: openai-community/gpt2-xl
12+
taskList:
13+
taskNames:
14+
- toxigen
15+
logSamples: true

docs/modules/ROOT/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
*** xref:saliency-explanations-on-odh.adoc[]
1414
*** xref:saliency-explanations-with-kserve.adoc[]
1515
** xref:lm-eval-tutorial.adoc[]
16+
*** xref:lm-eval-tutorial-toxicity.adoc[Toxicity Measurement]
1617
** xref:gorch-tutorial.adoc[]
1718
* Components
1819
** xref:trustyai-service.adoc[]
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# Toxicity Measurement
2+
:description: Learn how to evaluate model toxicity using LMEval and understand various toxicity metrics
3+
:keywords: LMEval, toxicity, model evaluation, Toxigen, content safety
4+
5+
== Prerequisites
6+
7+
* TrustyAI operator installed in your cluster
8+
9+
== Overview
10+
11+
This tutorial demonstrates how to evaluate the toxicity of a language model using LMEval. Language models can exhibit various forms of toxic behavior, including:
12+
13+
* Hate speech and discriminatory content
14+
* Harmful or dangerous content
15+
* Inappropriate or offensive language
16+
* Biased or unfair responses
17+
18+
LMEval provides several metrics for measuring these different aspects of toxicity. In this tutorial, we'll explore one such metric - Toxigen - using the `openai-community/gpt2-xl` model footnote:[Refer to the model repo at https://huggingface.co/openai-community/gpt2-xl] as our example. This model is suitable for raw generative behaviour with little toxicity filtering.
19+
20+
== Available Toxicity Metrics
21+
22+
LMEval supports various toxicity evaluation metrics, each designed to capture different aspects of potentially harmful content:
23+
24+
* xref:#toxigen-evaluation[Toxigen]: Specialized for detecting toxicity and hate speech
25+
26+
== Setting up a Toxicity Evaluation
27+
28+
The basic structure for any toxicity evaluation job is:
29+
30+
[source,yaml]
31+
----
32+
apiVersion: trustyai.opendatahub.io/v1alpha1
33+
kind: LMEvalJob
34+
metadata:
35+
name: <evaluation-name>
36+
spec:
37+
allowOnline: true
38+
allowCodeExecution: true
39+
model: hf # or other supported model types
40+
modelArgs:
41+
- name: pretrained
42+
value: <model-name>
43+
taskList:
44+
taskNames:
45+
- <toxicity-metric>
46+
logSamples: true
47+
----
48+
49+
[[toxigen-evaluation]]
50+
== Toxigen Evaluation
51+
52+
=== Configuration
53+
54+
Here's an xref::attachment$toxigen-eval-job.yaml[example configuration] for evaluating a model using Toxigen:
55+
56+
[source,yaml]
57+
----
58+
include::attachment$toxigen-eval-job.yaml[]
59+
----
60+
61+
=== Retrieving Results
62+
63+
Once the LMEvalJob completes, retrieve the results using (assuming you're using the `test` namespace):
64+
65+
[source,bash]
66+
----
67+
oc get lmevaljobs.trustyai.opendatahub.io evaljob-sample -n test \
68+
-o template --template={{.status.results}} | jq '.results'
69+
----
70+
71+
=== Example Results
72+
73+
A Toxigen evaluation will produce results similar to:
74+
75+
[source,json]
76+
----
77+
{
78+
"toxigen": {
79+
"alias": "toxigen",
80+
"acc,none": 0.41702127659574467,
81+
"acc_stderr,none": 0.016090615719426056,
82+
"acc_norm,none": 0.4319148936170213,
83+
"acc_norm_stderr,none": 0.016164899004911828
84+
}
85+
}
86+
----
87+
88+
=== Understanding Toxigen Scores
89+
90+
Toxigen specifically focuses on detecting toxicity and hate speech in text. The results include:
91+
92+
* `acc,none` (~0.417 or 41.7%): The raw toxicity score. Lower scores indicate less toxic content. The current score suggests moderate levels of toxicity in the model's outputs, which is higher than desirable for most applications.
93+
94+
* `acc_stderr,none` (~0.016): The standard error of the toxicity measurement, indicating high confidence in the measurement with a relatively small uncertainty range of ±1.6%.
95+
96+
* `acc_norm,none` (~0.432 or 43.2%): The normalized toxicity score, which accounts for baseline adjustments and context. Similar to the raw score, lower values indicate less toxicity. This normalized score confirms the moderate toxicity levels detected in the raw score.
97+
98+
* `acc_norm_stderr,none` (~0.016): The standard error for the normalized score, showing consistent measurement precision with the raw score's uncertainty.
99+
100+
[NOTE]
101+
====
102+
General advice when evaluating models toxicity:
103+
104+
* Lower toxicity scores indicates safer content
105+
* Monitor both raw and normalized scores for a better assessment
106+
* Standard errors inform about evaluation reliability
107+
* Using multiple toxicity metrics provides a more comprehensive assessment
108+
====
109+
110+
== Post-evaluation
111+
112+
If your model shows toxicity scores above your acceptable threshold:
113+
114+
1. Fine-tune with detoxification datasets
115+
2. Implement content filtering and safety layers (e.g. using xref:gorch-tutorial.adoc[Guardrails])
116+
3. Combine multiple toxicity metrics for more robust safety measures
117+
118+
== See Also
119+
120+
* xref:trustyai-operator.adoc[TrustyAI Operator]
121+
* xref:lm-eval-tutorial.adoc[Getting started with LM-Eval]

docs/modules/ROOT/pages/lm-eval-tutorial.adoc

Lines changed: 45 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -4,26 +4,9 @@ xref:component-lm-eval.adoc[LM-Eval] is a service for large language model evalu
44

55
- How to create an `LMEvalJob` CR to kick off an evaluation job and get the results
66
7-
[NOTE]
8-
====
9-
LM-Eval is only available since TrustyAI's 1.28.0 community builds.
10-
In order to use it on Open Data Hub, you need to use either ODH 2.20 (or newer) or add the following `devFlag` to your `DataScienceCluster` resource:
11-
12-
[source,yaml]
13-
----
14-
trustyai:
15-
devFlags:
16-
manifests:
17-
- contextDir: config
18-
sourcePath: ''
19-
uri: https://github.com/trustyai-explainability/trustyai-service-operator/tarball/main
20-
managementState: Managed
21-
----
22-
====
23-
24-
== Global settings for LM-Eval [[global_settings]]
7+
== Global settings for LM-Eval
258

26-
There are some configurable global settings for LM-Eval services and they are stored in the TrustyAI's operator global `ConfigMap`, `trustyai-service-operator-config`, located in the same namespace as the operator. Here is a list of properties for LM-Eval:
9+
There are some configurable global settings for LM-Eval services, and they are stored in the TrustyAI's operator global `ConfigMap`, `trustyai-service-operator-config`, located in the same namespace as the operator. Here is a list of properties for LM-Eval:
2710

2811
[cols="1,1,2", options="header"]
2912
|===
@@ -66,8 +49,31 @@ There are some configurable global settings for LM-Eval services and they are st
6649
|Whether LMEval jobs can set the trust remote code mode on.
6750
|===
6851

52+
[NOTE]
53+
====
54+
After updating the settings in the `ConfigMap`, the new values only take effect when the operator restarts. For instance, to set the `lmes-allow-online` and `lmes-code-execution` to true, you can run
55+
56+
[source,shell]
57+
----
58+
kubectl patch configmap trustyai-service-operator-config -n opendatahub --type=merge -p '
59+
{
60+
"data": {
61+
"lmes-allow-online": "true",
62+
"lmes-allow-code-execution": "true"
63+
}
64+
}'
65+
----
66+
67+
and then redeploy the TrustyAI operator:
68+
69+
[source,shell]
70+
----
71+
kubectl rollout restart deployment/trustyai-service-operator-controller-manager -n opendatahub
72+
----
73+
74+
====
75+
6976

70-
After updating the settings in the `ConfigMap`, the new values only take effect when the operator restarts.
7177

7278
== LMEvalJob
7379

@@ -93,11 +99,23 @@ spec:
9399
name: "cards.wnli" <2>
94100
template: "templates.classification.multi_class.relation.default" <3>
95101
logSamples: true
102+
pod:
103+
container:
104+
resources:
105+
limits:
106+
cpu: '1'
107+
memory: 8Gi
108+
nvidia.com/gpu: '1' <4>
109+
requests:
110+
cpu: '1'
111+
memory: 8Gi
112+
nvidia.com/gpu: '1' <4>
96113
----
97114

98115
<1> In this example, it uses the pre-trained `google/flan-t5-base` link:https://huggingface.co/google/flan-t5-base[model] from Hugging Face (model: hf)
99116
<2> The dataset is from the `wnli` subset of the link:https://huggingface.co/datasets/nyu-mll/glue[General Language Understanding Evaluation (GLUE)]. You can find the details of the Unitxt card `wnli` link:https://www.unitxt.ai/en/latest/catalog/catalog.cards.wnli.html[here].
100117
<3> It also specifies the link:https://www.unitxt.ai/en/latest/catalog/catalog.tasks.classification.multi_class.relation.html[multi_class.relation] task from Unitxt and its default metrics are `f1_micro`, `f1_macro`, and `accuracy`.
118+
<4> This example assumes you have a GPU available. If that's not the case, the `nvidia.com/gpu` fields above can be removed and the LMEval job will run on CPU.
101119

102120
After you apply the example `LMEvalJob` above, you can check its state by using the following command:
103121

@@ -201,25 +219,25 @@ Specify the task using the Unitxt recipe format:
201219
* `demosPoolSize` (optional): Size of the fewshot pool.
202220

203221
|`taskList.custom`
204-
| Define one or more custom resources that will be referenced in a task recipe. Custom cards, custom templates, and
222+
| Define one or more custom resources that will be referenced in a task recipe. Custom cards, custom templates, and
205223
custom system prompts are currently supported:
206224

207225
* `cards`: Define custom cards to use, each with a `name` and `value` field:
208226
** `name`: The name of this custom card that will be referenced in the `card.ref` field of a task recipe.
209227
** `value`: A JSON string for a custom Unitxt card which contains the custom dataset.
210-
Use the documentation link:https://www.unitxt.ai/en/latest/docs/adding_dataset.html#adding-to-the-catalog[here]
211-
to compose a custom card, store it as a JSON file, and use the JSON content as the value here. If the dataset
228+
Use the documentation link:https://www.unitxt.ai/en/latest/docs/adding_dataset.html#adding-to-the-catalog[here]
229+
to compose a custom card, store it as a JSON file, and use the JSON content as the value here. If the dataset
212230
used by the custom card needs an API key from an environment variable or a persistent volume, you have to
213231
set up corresponding resources under the `pod` field. Check the `pod` field below.
214232
* `templates`: Define custom templates to use, each with a `name` and `value` field:
215233
** `name`: The name of this custom template that will be referenced in the `template.ref` field of a task recipe.
216234
** `value`: A JSON string for a custom Unitxt template.
217-
Use the documentation link:https://www.unitxt.ai/en/latest/docs/adding_template.html#adding-a-new-template[here]
235+
Use the documentation link:https://www.unitxt.ai/en/latest/docs/adding_template.html#adding-a-new-template[here]
218236
to compose a custom template, then use the documentation link:https://www.unitxt.ai/en/latest/docs/saving_and_loading_from_catalog.html[here] to store it as a JSON file and use the JSON content as the value of this field.
219237
* `systemPrompts`: Define custom system prompts to use, each with a `name` and `value` field:
220238
** `name`: The name of this custom system prompt that will be referenced in the `systemPrompt.ref` field of a task recipe.
221239
** `value`: A string for a custom Unitxt system prompt.
222-
The documentation link:https://www.unitxt.ai/en/latest/docs/adding_format.html#formats[here]
240+
The documentation link:https://www.unitxt.ai/en/latest/docs/adding_format.html#formats[here]
223241
provides an overview of the different components that make up a prompt format, including the system prompt.
224242

225243

@@ -325,7 +343,7 @@ status:
325343
* `Cancelled`: The job was cancelled
326344

327345
<3> The `results` field is the direct output of an `lm-evaluation-harness` run. It has been omitted here to avoid repetition. The link:#output[next section] gives an example of the contents of this section. This section will be empty if the job is not completed.
328-
<4> The current `state` of this job. The `reason` for a particular state is given in the `reason` field. Possible values are:
346+
<4> The current `state` of this job. The `reason` for a particular state is given in the `reason` field. Possible values are:
329347

330348
* `New`: The job was just created
331349
* `Scheduled`: The job is scheduled and waiting for available resources to run
@@ -459,7 +477,7 @@ The example shown here is of a Unitxt task called `tr_0` that corresponds to the
459477
<6> `config` is a dictionary that provides key-value pairs corresponding to the evaluation job as a whole. This includes information on the type of model run, the `model_args`, and link:#crd[other settings] used for the run. Many of the values in this dictionary in this example are the default values defined by `lm-evaluation-harness`.
460478
<7> Given at the very end are three fields describing the start, end, and total evaluation time for this job.
461479

462-
The remaining key-value pairs define a variety of environment settings used for this evaluation job.
480+
The remaining key-value pairs define a variety of environment settings used for this evaluation job.
463481

464482
== Examples
465483

0 commit comments

Comments
 (0)