Skip to content

Commit 5bbed56

Browse files
authored
GenAiLab Updated Instructions (#1651)
* GenAiLab Updated Instructions Combined instructions from across multiple releases and ensured the most accurate information was available, removed information that was no longer relevant, and ensured the major feature releases found in releases 6.5.0-6.8.0 have been been distributed throughout the help document. * updates of the Gen AI Lab documentation --------- Co-authored-by: diatrambitas <JSL.Git2018>
1 parent a270f1b commit 5bbed56

File tree

5 files changed

+173
-98
lines changed

5 files changed

+173
-98
lines changed

docs/en/alab/annotation_labs_releases/release_notes_6_2_0.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -362,7 +362,7 @@ Users can confidently upload and import files, knowing that the system will enfo
362362

363363

364364
### There should be a way to "batch" clear predicted labels in a Section after pre-annotation
365-
Version 6.2 of Generative AI Lab introduces a significant enhancement improving the application's security and robustness by restricting the types of files that can be uploaded or imported. This change ensures that only supported and safe file types are processed, providing a more secure and efficient user experience, maintaining the platform's integrity and reliability while enhancing its security.
365+
Version 6.2 of Generative AI Lab introduces a tool to allow annotators to remove large sections of pre-annotated labels in bulk instead of removing them individually.
366366

367367
**Key Features of This Improvement:**
368368

docs/en/alab/de_identify.md

Lines changed: 49 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ seotitle: Generative AI Lab | John Snow Labs
66
title: De-Identification
77
permalink: /docs/en/alab/de_identify
88
key: docs-training
9-
modify_date: "2024-08-26"
9+
modify_date: "2024-12-03"
1010
use_language_switcher: "Python-Scala"
1111
show_nav: true
1212
sidebar:
@@ -15,19 +15,12 @@ sidebar:
1515

1616
<div class="h3-box" markdown="1">
1717

18-
## Introducing Support for De-Identification in Generative AI Lab 6.4
19-
We are happy to announce the release of Generative AI Lab 6.4, bringing exciting new features and enhancements. Leading this release is the support for de-identification projects, which enables users to anonymize documents containing sensitive data, such as PII (Personally Identifiable Information) and PHI (Protected Health Information). This ensures robust data privacy and compliance with privacy regulations while maintaining the utility of the data for further analysis and processing.
20-
21-
Additionally, version 6.4 enhances collaboration and quality control in the annotation process by allowing annotators to view completions submitted by reviewers. Annotators can now view and clone reviewed submissions, make corrections, or add comments directly on the annotated chunks, providing clear communication and improving overall annotation quality. The new release also simplifies the identification of differences between two completions by automatically highlighting discrepancies, streamlining the validation process.
22-
23-
Alongside these major updates, this release includes numerous improvements and bug fixes, making Generative AI Lab more efficient and user-friendly than ever.
24-
25-
</div><div class="h3-box" markdown="1">
26-
2718
## Support for De-identification
28-
Version 6.4 of the Generative AI Lab introduces a new de-identification feature, enabling users to anonymize documents containing sensitive information such as PII (Personally Identifiable Information) and PHI (Protected Health Information). This functionality is intended to protect data privacy and ensure compliance with privacy regulations while preserving the data’s usefulness for subsequent analysis and processing.
19+
Version 6.4 of Generative AI Lab introduces a new de-identification feature, enabling users to anonymize documents containing sensitive information such as PII (Personally Identifiable Information) and PHI (Protected Health Information). This functionality is intended to protect data privacy and ensure compliance with privacy regulations while preserving the data’s usefulness for subsequent analysis and processing.
20+
21+
Version 6.7 of Generative AI Lab improves on the original feature to allow for custom methods of de-identification for each entity label and support for the newest John Snow Labs De-identification Pipeline.
2922

30-
**De-identification Projects:** When creating a new project in the Generative AI Lab, users can mark it as De-Identification specific. These projects allow the use of manually trained or pre-trained text-based NER models, together with prompts, rules, and custom labels created by the user for identifying sensitive data inside of tasks. Once the sensitive data is identified (either automatically or manually) and validated by human users, it can be exported for further processing.
23+
**De-identification Projects** When creating a new project in the Generative AI Lab, users can mark it as De-Identification specific. These projects allow the use of manually trained or pre-trained text-based NER models, together with prompts, rules, and custom labels created by the user for identifying sensitive data inside of tasks. Once the sensitive data is identified (either automatically or manually) and validated by human users, it can be exported for further processing.
3124
When creating De-identification projects make sure you only target sensitive entities as part of your project configuration and avoid annotating relevant data you need for downstream processing as all those entities will be removed when exporting the project tasks as de-identified documents. The best practice, in this case, is to re-use de-identification specific models combined with custom prompts/rules.
3225

3326
**Exporting De-identified Documents:** The tasks of your project with PII/PHI labeled entities can be exported as de-identified documents. During the export process, labeled entities will be replaced by the label names, or special characters (such as "*"), or obfuscated and replaced with fake data. This ensures that sensitive information is removed and not available for downstream analysis.
@@ -47,110 +40,92 @@ Generative AI Lab supports four kinds of de-identification:
4740

4841
### Working with de-identification projects
4942

50-
**Step 1.** When creating a new project, after defining the project name and general settings, check the de-identification option at the bottom of the Project setup page, and select the type of anonymization you prefer.
43+
To create a De-identification project, in the first step ot he Project Configuration wizzard, select the `De-identification` template available under the `TEXT` tab.
5144

52-
![GenaiImage](/assets/images/annotation_lab/6.4.0/1.png)
45+
</div><div class="h3-box" markdown="1">
5346

54-
**Step 2.** Configure your project to reuse sensitive labels from existing NER Models, Rules, Prompts. It is also possible to create custom labels that can be used to manually annotate the entities you want to anonymize in your documents.
47+
### Creating a De-identification Project
48+
Users can use the de-identification feature if a valid license is available in the application:
49+
1. **Create a New Project**:
50+
During the project configuration, select **De-identification** as the project type.
51+
2. **Automatic Pipeline Download**:
52+
A default de-identification pipeline (`clinical_deidentification`) will automatically download if not previously available or it will use the default de-identification project template. All the downloaded pipelines are available on the **Pipeline** page.
53+
54+
![670image](/assets/images/annotation_lab/6.7.0/1.png)
5555

56-
When selecting pre-annotation resources for your project, ensure that no critical downstream data is inadvertently identified and removed. For instance, if you pre-annotate documents with models, rules, or prompts that identify diseases, those labels will be anonymized upon export, rendering them unavailable to document consumers.
56+
</div><div class="h3-box" markdown="1">
5757

58-
To mitigate this, employ pre-trained or custom de-identification models and augment them with rules and prompts tailored to your specific use cases (e.g., unique identifiers present in your documents). You can also selectively include specific labels from each model in your project configuration. For example, if age information is essential for your consumers, you can exclude this label from the project configuration to retain the data in your document.
58+
### New Pipeline Tab and Customization
59+
In the **Reuse Resource** page, a new **Pipelines Tab** is now available for de-identification projects. Here, all the downloaded de-identification pipelines are listed. Users can also use and apply pre-trained and trained models, rules, and zero-shot prompts.
5960

60-
![GenaiImage](/assets/images/annotation_lab/6.4.0/2.png)
6161

62-
**Step 3.** Pre-annotate your documents, then have your team review them for any overlooked sensitive data. Once your project is set up and tasks are imported, use the pre-annotation feature to automatically identify sensitive information.
63-
Incorporate a review process where your team checks the pre-annotations using the standard annotation workflow, making manual corrections or annotations to any sensitive segments as necessary. Ensure that all sensitive information is accurately labeled for effective de-identification.
62+
![670image](/assets/images/annotation_lab/6.7.0/2.png)
6463

65-
![GenaiImage](/assets/images/annotation_lab/6.4.0/3.png)
6664

67-
**Step 4.** Export De-identified Documents. After completing the labeling process, proceed to export the de-identified documents. Ensure the "Export with De-identification" option is selected on the export page to generate de-identified documents.
65+
In the **Customize Labels** page, users can first select the overall de-identification strategies to use. Furthermore, it is also possible to specify parand use entity level configurations.
6866

69-
During the export process, de-identification is executed based on the type of anonymization selected during project setup. This de-identification option can be updated at any time if necessary.
67+
![670image](/assets/images/annotation_lab/6.7.0/3.png)
7068

71-
![GenaiImage](/assets/images/annotation_lab/6.4.0/4.png)
69+
Users can also upload custom obfuscation configurations in JSON format via the Customize Labels page, enabling the seamless reuse of obfuscation rules across multiple projects.
7270

73-
**Step 5.** Import the de-identified tasks in a new project for further processing. These tasks, once exported, can be re-imported into any text-based project in case you need to extract additional data or in case you want to use them for model training/tuning.
71+
![670image](/assets/images/annotation_lab/6.7.0/4.gif)
7472

75-
![GenaiImage](/assets/images/annotation_lab/6.4.0/5.png)
73+
</div><div class="h3-box" markdown="1">
7674

77-
> **_HOW TO:_** De-identification projects can be easily identified without opening them. A small de-identification icon is displayed in the bottom left corner of the project card, clearly indicating the project's status.
75+
### De-identification Process
7876

77+
The de-identification process is similar to the existing pre-annotation workflow:
7978

80-
> **_LIMITATION:_** Projects must be designated as de-identification projects during their initial creation. It is not possible to convert existing projects or newly created non-de-identification projects into de-identification projects.
79+
1. **Import Tasks**
8180

82-
</div><div class="h3-box" markdown="1">
8381

84-
### Export of De-identified tasks
85-
**Completion Submission:** Pre-annotations alone are not sufficient for exporting de-identified data. Only starred completions are considered during the export of de-identified tasks. This means that each task intended for de-identified export must be validated by a human user, with at least one completion marked with a star by an annotator, reviewer, or manager.
82+
Initially, tasks are imported, and the `NonDeidentified` tag is automatically added to the tasks. It helps users know which tasks have been deidentified and which are yet to be de-identified.
8683

87-
**Multiple Submissions:** In instances where multiple submissions exist from various annotators, the de-identification process will prioritize the starred completion from the highest priority user as specified on the Teams page. This ensures that de-identification is based on the most relevant and prioritized annotations.
84+
![670image](/assets/images/annotation_lab/6.7.0/5.gif)
8885

89-
This new de-identification feature significantly enhances data privacy by anonymizing sensitive document information. We are confident that this feature will empower users to handle sensitive data responsibly while maintaining the integrity and usability of their datasets.
86+
2. **Pre-annotate/De-identify**
9087

91-
</div><div class="h3-box" markdown="1">
9288

93-
## Support for De-identification Pipelines
94-
Version 6.7.0 updates the existing de-identification feature, which has been significantly expanded to give more control over how de-identification is applied, how different entities are treated, and how to integrate pre-trained de-identification pipelines, models, rules, and zero-shot prompts to help identify and anonymize sensitive data.
89+
Click the **De-identification (pre-annotate)** button to deploy the de-identification pipeline and pre-annotate your tasks. During the pre-annotation stage, there is a status indicator (the colored circle) next to each task that changes to either green, red, or grey, just like the pre-annotation status.
9590

96-
De-identification has now moved from the Project Details page to the Content Type page during Project Configuration, where it is a separate project type.
91+
![670image](/assets/images/annotation_lab/6.7.0/6.gif)
9792

98-
</div><div class="h3-box" markdown="1">
93+
3. **Labeling Page**
9994

100-
### Creating a De-identification Project:
101-
Users can use the de-identification feature if a valid license is available in the application:
102-
1. **Create a New Project**:
103-
During the project configuration, select **De-identification** as the project type.
104-
2. **Automatic Pipeline Download**:
105-
A default de-identification pipeline (`clinical_deidentification`) will automatically download if not previously available or it will use the default de-identification project template. All the downloaded pipelines are available on the **Pipeline** page.
106-
107-
![670image](/assets/images/annotation_lab/6.7.0/1.png)
10895

109-
### New Pipeline Tab and Customization:
110-
In the **Reuse Resource** page, a new **Pipelines Tab** is now available for de-identification projects. Here, all the downloaded de-identification pipelines are listed. Users can also use and apply pre-trained and trained models, rules, and zero-shot prompts.
96+
On the labeling page, users can either make corrections or accept the predictions made by the pipeline.
11197

112-
![670image](/assets/images/annotation_lab/6.7.0/2.png)
98+
![670image](/assets/images/annotation_lab/6.7.0/7.gif)
11399

114-
In the **Customize Labels** page, users can configure the type of de-identification. Apart from all the deidentification types that are already supported, in version 6.7.0, users can even configure **different de-identification types for different labels** as well.
100+
4. **Re-run De-identification**
115101

116-
![670image](/assets/images/annotation_lab/6.7.0/3.png)
117102

118-
Additionally, users can upload custom obfuscation files in JSON format on the Customize Labels page.
103+
After saving and submitting the tasks, users can click the de-identify button again to run the de-identification process. This will change the content of your tasks by applying the specified de-identification configurations on all automatic and manual annotations. You can then view the de-identification results on the labeling page. Users can click the **De-identification View** button (located next to the Compare Completion button), to view the de-identified tasks in comparison with the original version. All de-identified completions will show **(De-identified)** next to the completion ID.
119104

120-
![670image](/assets/images/annotation_lab/6.7.0/4.gif)
105+
![670image](/assets/images/annotation_lab/6.7.0/8.gif)
121106

122-
</div><div class="h3-box" markdown="1">
107+
</div>
123108

124-
### De-identification Process:
125-
The de-identification process remains similar to the existing pre-annotation workflow:
109+
### Exporting De-identified Tasks
126110

127-
1. **Import Tasks**:
128-
Initially, tasks are imported, and the `NonDeidentified` tag is automatically added to the tasks. It helps users to know which tasks have been deidentified and which are yet to be de-identified.
129111

130-
![670image](/assets/images/annotation_lab/6.7.0/5.gif)
112+
Only de-identified completions submitted as **ground truth** are exported. Also, if a task has multiple ground truths from different users, the completion from the user with the **highest priority** will be exported.
131113

132-
3. **Pre-annotate/De-identify**:
133-
Click the **De-identification (pre-annotate)** button to deploy the de-identification pipeline and pre-annotate and de-identify tasks. Once the task is pre-annotated and de-identified, the de-identification status changes to either green, red, or grey, just like pre-annotation status.
114+
![670image](/assets/images/annotation_lab/6.7.0/9.gif)
134115

135-
![670image](/assets/images/annotation_lab/6.7.0/6.gif)
116+
These updates are built on top of the current structure, ensuring ease of use and a smooth transition without disrupting productivity.
136117

137-
5. **Labeling Page**:
138-
On the labeling page, users can either make corrections or accept the predictions made by the pipeline.
118+
> **_HOW TO:_** De-identification projects can be easily identified without opening them. A small de-identification icon is displayed in the bottom left corner of the project card, clearly indicating the project's status.
139119
140-
![670image](/assets/images/annotation_lab/6.7.0/7.gif)
141120

142-
7. **Re-run De-identification**:
143-
After saving or submitting the tasks, users can click the de-identify button again to run the process on either manually annotated completions or all completions and can view the de-identification in real-time from the labeling page. Users can click the **De-identification View** button (located next to the Compare Completion button), to view the de-identified tasks in real-time. All de-identified completions will show **(De-identified)** next to the completion ID.
121+
> **_LIMITATION:_** Projects must be designated as de-identification projects during their initial creation. It is not possible to convert existing projects or newly created non-de-identification projects into de-identification projects.
144122
145-
![670image](/assets/images/annotation_lab/6.7.0/8.gif)
146123

147-
</div><div class="h3-box" markdown="1">
124+
### Export of De-identified tasks
148125

149-
### Exporting De-identified Tasks:
150-
Only de-identified completions submitted as **ground truth** are exported. Also, if a task has multiple ground truths from different users, the completion from the user with the **highest priority** will be exported.
151126

152-
![670image](/assets/images/annotation_lab/6.7.0/9.gif)
127+
**Submitted Completions:** Pre-annotations alone are not sufficient for exporting de-identified data. Only starred completions are considered during the export of de-identified tasks. This means that each task intended for de-identified export must be validated by a human user, with at least one completion marked with a star by an annotator, reviewer, or manager.
153128

154-
These updates are built on top of the current structure, ensuring ease of use and a smooth transition without disrupting productivity.
129+
**Multiple Completions:** In cases where multiple submissions exist from various annotators, the de-identification process will prioritize the starred completion from the highest priority user as specified on the Teams page. This ensures that de-identification is based on the most relevant and prioritized annotations.
155130

156-
</div>
131+
This new de-identification feature significantly enhances data privacy by anonymizing sensitive document information. We are confident that this feature will empower users to handle sensitive data responsibly while maintaining the integrity and usability of their datasets.

0 commit comments

Comments
 (0)