You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Introduced a new tutorial on evaluating model toxicity using LMEval, focusing on the Toxigen metric.
- Added a sample configuration file for LMEvalJob to facilitate toxicity evaluations.
- Updated navigation to include the new tutorial link.
:description: Learn how to evaluate model toxicity using LMEval and understand various toxicity metrics
3
+
:keywords: LMEval, toxicity, model evaluation, Toxigen, content safety
4
+
5
+
== Prerequisites
6
+
7
+
* TrustyAI operator installed in your cluster
8
+
9
+
== Overview
10
+
11
+
This tutorial demonstrates how to evaluate the toxicity of a language model using LMEval. Language models can exhibit various forms of toxic behavior, including:
12
+
13
+
* Hate speech and discriminatory content
14
+
* Harmful or dangerous content
15
+
* Inappropriate or offensive language
16
+
* Biased or unfair responses
17
+
18
+
LMEval provides several metrics for measuring these different aspects of toxicity. In this tutorial, we'll explore one such metric - Toxigen - using the `openai-community/gpt2-xl` model footnote:[Refer to the model repo at https://huggingface.co/openai-community/gpt2-xl] as our example. This model is suitable for raw generative behaviour with little toxicity filtering.
19
+
20
+
== Available Toxicity Metrics
21
+
22
+
LMEval supports various toxicity evaluation metrics, each designed to capture different aspects of potentially harmful content:
23
+
24
+
* xref:#toxigen-evaluation[Toxigen]: Specialized for detecting toxicity and hate speech
25
+
26
+
== Setting up a Toxicity Evaluation
27
+
28
+
The basic structure for any toxicity evaluation job is:
29
+
30
+
[source,yaml]
31
+
----
32
+
apiVersion: trustyai.opendatahub.io/v1alpha1
33
+
kind: LMEvalJob
34
+
metadata:
35
+
name: <evaluation-name>
36
+
spec:
37
+
allowOnline: true
38
+
allowCodeExecution: true
39
+
model: hf # or other supported model types
40
+
modelArgs:
41
+
- name: pretrained
42
+
value: <model-name>
43
+
taskList:
44
+
taskNames:
45
+
- <toxicity-metric>
46
+
logSamples: true
47
+
----
48
+
49
+
[[toxigen-evaluation]]
50
+
== Toxigen Evaluation
51
+
52
+
=== Configuration
53
+
54
+
Here's an xref::attachment$toxigen-eval-job.yaml[example configuration] for evaluating a model using Toxigen:
55
+
56
+
[source,yaml]
57
+
----
58
+
include::attachment$toxigen-eval-job.yaml[]
59
+
----
60
+
61
+
=== Retrieving Results
62
+
63
+
Once the LMEvalJob completes, retrieve the results using (assuming you're using the `test` namespace):
64
+
65
+
[source,bash]
66
+
----
67
+
oc get lmevaljobs.trustyai.opendatahub.io evaljob-sample -n test \
A Toxigen evaluation will produce results similar to:
74
+
75
+
[source,json]
76
+
----
77
+
{
78
+
"toxigen": {
79
+
"alias": "toxigen",
80
+
"acc,none": 0.41702127659574467,
81
+
"acc_stderr,none": 0.016090615719426056,
82
+
"acc_norm,none": 0.4319148936170213,
83
+
"acc_norm_stderr,none": 0.016164899004911828
84
+
}
85
+
}
86
+
----
87
+
88
+
=== Understanding Toxigen Scores
89
+
90
+
Toxigen specifically focuses on detecting toxicity and hate speech in text. The results include:
91
+
92
+
* `acc,none` (~0.417 or 41.7%): The raw toxicity score. Lower scores indicate less toxic content. The current score suggests moderate levels of toxicity in the model's outputs, which is higher than desirable for most applications.
93
+
94
+
* `acc_stderr,none` (~0.016): The standard error of the toxicity measurement, indicating high confidence in the measurement with a relatively small uncertainty range of ±1.6%.
95
+
96
+
* `acc_norm,none` (~0.432 or 43.2%): The normalized toxicity score, which accounts for baseline adjustments and context. Similar to the raw score, lower values indicate less toxicity. This normalized score confirms the moderate toxicity levels detected in the raw score.
97
+
98
+
* `acc_norm_stderr,none` (~0.016): The standard error for the normalized score, showing consistent measurement precision with the raw score's uncertainty.
99
+
100
+
[NOTE]
101
+
====
102
+
General advice when evaluating models toxicity:
103
+
104
+
* Lower toxicity scores indicates safer content
105
+
* Monitor both raw and normalized scores for a better assessment
106
+
* Standard errors inform about evaluation reliability
107
+
* Using multiple toxicity metrics provides a more comprehensive assessment
108
+
====
109
+
110
+
== Post-evaluation
111
+
112
+
If your model shows toxicity scores above your acceptable threshold:
113
+
114
+
1. Fine-tune with detoxification datasets
115
+
2. Implement content filtering and safety layers (e.g. using xref:gorch-tutorial.adoc[Guardrails])
116
+
3. Combine multiple toxicity metrics for more robust safety measures
117
+
118
+
== See Also
119
+
120
+
* xref:trustyai-operator.adoc[TrustyAI Operator]
121
+
* xref:lm-eval-tutorial.adoc[Getting started with LM-Eval]
== Global settings for LM-Eval [[global_settings]]
7
+
== Global settings for LM-Eval
25
8
26
-
There are some configurable global settings for LM-Eval services and they are stored in the TrustyAI's operator global `ConfigMap`, `trustyai-service-operator-config`, located in the same namespace as the operator. Here is a list of properties for LM-Eval:
9
+
There are some configurable global settings for LM-Eval services, and they are stored in the TrustyAI's operator global `ConfigMap`, `trustyai-service-operator-config`, located in the same namespace as the operator. Here is a list of properties for LM-Eval:
27
10
28
11
[cols="1,1,2", options="header"]
29
12
|===
@@ -66,8 +49,31 @@ There are some configurable global settings for LM-Eval services and they are st
66
49
|Whether LMEval jobs can set the trust remote code mode on.
67
50
|===
68
51
52
+
[NOTE]
53
+
====
54
+
After updating the settings in the `ConfigMap`, the new values only take effect when the operator restarts. For instance, to set the `lmes-allow-online` and `lmes-code-execution` to true, you can run
<1> In this example, it uses the pre-trained `google/flan-t5-base` link:https://huggingface.co/google/flan-t5-base[model] from Hugging Face (model: hf)
99
116
<2> The dataset is from the `wnli` subset of the link:https://huggingface.co/datasets/nyu-mll/glue[General Language Understanding Evaluation (GLUE)]. You can find the details of the Unitxt card `wnli` link:https://www.unitxt.ai/en/latest/catalog/catalog.cards.wnli.html[here].
100
117
<3> It also specifies the link:https://www.unitxt.ai/en/latest/catalog/catalog.tasks.classification.multi_class.relation.html[multi_class.relation] task from Unitxt and its default metrics are `f1_micro`, `f1_macro`, and `accuracy`.
118
+
<4> This example assumes you have a GPU available. If that's not the case, the `nvidia.com/gpu` fields above can be removed and the LMEval job will run on CPU.
101
119
102
120
After you apply the example `LMEvalJob` above, you can check its state by using the following command:
103
121
@@ -201,25 +219,25 @@ Specify the task using the Unitxt recipe format:
201
219
* `demosPoolSize` (optional): Size of the fewshot pool.
202
220
203
221
|`taskList.custom`
204
-
| Define one or more custom resources that will be referenced in a task recipe. Custom cards, custom templates, and
222
+
| Define one or more custom resources that will be referenced in a task recipe. Custom cards, custom templates, and
205
223
custom system prompts are currently supported:
206
224
207
225
* `cards`: Define custom cards to use, each with a `name` and `value` field:
208
226
** `name`: The name of this custom card that will be referenced in the `card.ref` field of a task recipe.
209
227
** `value`: A JSON string for a custom Unitxt card which contains the custom dataset.
210
-
Use the documentation link:https://www.unitxt.ai/en/latest/docs/adding_dataset.html#adding-to-the-catalog[here]
211
-
to compose a custom card, store it as a JSON file, and use the JSON content as the value here. If the dataset
228
+
Use the documentation link:https://www.unitxt.ai/en/latest/docs/adding_dataset.html#adding-to-the-catalog[here]
229
+
to compose a custom card, store it as a JSON file, and use the JSON content as the value here. If the dataset
212
230
used by the custom card needs an API key from an environment variable or a persistent volume, you have to
213
231
set up corresponding resources under the `pod` field. Check the `pod` field below.
214
232
* `templates`: Define custom templates to use, each with a `name` and `value` field:
215
233
** `name`: The name of this custom template that will be referenced in the `template.ref` field of a task recipe.
216
234
** `value`: A JSON string for a custom Unitxt template.
217
-
Use the documentation link:https://www.unitxt.ai/en/latest/docs/adding_template.html#adding-a-new-template[here]
235
+
Use the documentation link:https://www.unitxt.ai/en/latest/docs/adding_template.html#adding-a-new-template[here]
218
236
to compose a custom template, then use the documentation link:https://www.unitxt.ai/en/latest/docs/saving_and_loading_from_catalog.html[here] to store it as a JSON file and use the JSON content as the value of this field.
219
237
* `systemPrompts`: Define custom system prompts to use, each with a `name` and `value` field:
220
238
** `name`: The name of this custom system prompt that will be referenced in the `systemPrompt.ref` field of a task recipe.
221
239
** `value`: A string for a custom Unitxt system prompt.
222
-
The documentation link:https://www.unitxt.ai/en/latest/docs/adding_format.html#formats[here]
240
+
The documentation link:https://www.unitxt.ai/en/latest/docs/adding_format.html#formats[here]
223
241
provides an overview of the different components that make up a prompt format, including the system prompt.
224
242
225
243
@@ -325,7 +343,7 @@ status:
325
343
* `Cancelled`: The job was cancelled
326
344
327
345
<3> The `results` field is the direct output of an `lm-evaluation-harness` run. It has been omitted here to avoid repetition. The link:#output[next section] gives an example of the contents of this section. This section will be empty if the job is not completed.
328
-
<4> The current `state` of this job. The `reason` for a particular state is given in the `reason` field. Possible values are:
346
+
<4> The current `state` of this job. The `reason` for a particular state is given in the `reason` field. Possible values are:
329
347
330
348
* `New`: The job was just created
331
349
* `Scheduled`: The job is scheduled and waiting for available resources to run
@@ -459,7 +477,7 @@ The example shown here is of a Unitxt task called `tr_0` that corresponds to the
459
477
<6> `config` is a dictionary that provides key-value pairs corresponding to the evaluation job as a whole. This includes information on the type of model run, the `model_args`, and link:#crd[other settings] used for the run. Many of the values in this dictionary in this example are the default values defined by `lm-evaluation-harness`.
460
478
<7> Given at the very end are three fields describing the start, end, and total evaluation time for this job.
461
479
462
-
The remaining key-value pairs define a variety of environment settings used for this evaluation job.
480
+
The remaining key-value pairs define a variety of environment settings used for this evaluation job.
0 commit comments