Skip to content

Commit 54cf06a

Browse files
authored
Models hub internal (#1694)
1 parent 696f9e3 commit 54cf06a

4 files changed

+749
-0
lines changed
Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
---
2+
layout: model
3+
title: Clinical Deidentification Pipeline (Document Wise - Benchmark)
4+
author: John Snow Labs
5+
name: clinical_deidentification_docwise_benchmark
6+
date: 2025-01-16
7+
tags: [licensed, en, deidentification, deid, pipeline, clinical, docwise, benchmark]
8+
task: [De-identification, Pipeline Healthcare]
9+
language: en
10+
edition: Healthcare NLP 5.5.1
11+
spark_version: 3.4
12+
supported: true
13+
annotator: PipelineModel
14+
article_header:
15+
type: cover
16+
use_language_switcher: "Python-Scala-Java"
17+
---
18+
19+
## Description
20+
21+
This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate `NAME`, `IDNUM`, `CONTACT`, `LOCATION`, `AGE`, `DATE` entities.
22+
**This pipeline is prepared for benchmarking with cloud providers.**
23+
24+
## Predicted Entities
25+
26+
`NAME`, `IDNUM`, `CONTACT`, `LOCATION`, `AGE`, `DATE`
27+
28+
{:.btn-box}
29+
<button class="button button-orange" disabled>Live Demo</button>
30+
<button class="button button-orange" disabled>Open in Colab</button>
31+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_docwise_benchmark_en_5.5.1_3.4_1737046494582.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
32+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_docwise_benchmark_en_5.5.1_3.4_1737046494582.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
33+
34+
## How to use
35+
36+
37+
38+
<div class="tabs-box" markdown="1">
39+
{% include programmingLanguageSelectScalaPythonNLU.html %}
40+
41+
```python
42+
43+
from sparknlp.pretrained import PretrainedPipeline
44+
45+
deid_pipeline = PretrainedPipeline("clinical_deidentification_docwise_benchmark", "en", "clinical/models")
46+
47+
deid_result = deid_pipeline.fullAnnotate("""Name : Hendrickson, Ora, Record date: 2093-01-13, # 719435.
48+
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
49+
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
50+
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no:A334455B.
51+
Phone (302) 786-5227, Keats Street, San Francisco, E-MAIL: smith@gmail.com.""")
52+
53+
print(''.join([i.result for i in deid_result['mask_entity']]))
54+
print(''.join([i.result for i in deid_result['obfuscated']]))
55+
56+
```
57+
58+
{:.jsl-block}
59+
```python
60+
61+
deid_pipeline = nlp.PretrainedPipeline("clinical_deidentification_docwise_benchmark", "en", "clinical/models")
62+
63+
deid_result = deid_pipeline.fullAnnotate("""Name : Hendrickson, Ora, Record date: 2093-01-13, # 719435.
64+
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
65+
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
66+
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no:A334455B.
67+
Phone (302) 786-5227, Keats Street, San Francisco, E-MAIL: smith@gmail.com.""")
68+
69+
print(''.join([i.result for i in deid_result['mask_entity']]))
70+
print(''.join([i.result for i in deid_result['obfuscated']]))
71+
72+
```
73+
```scala
74+
75+
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
76+
77+
val deid_pipeline = PretrainedPipeline("clinical_deidentification_docwise_benchmark", "en", "clinical/models")
78+
79+
val deid_result = deid_pipeline.fullAnnotate("""Name : Hendrickson, Ora, Record date: 2093-01-13, # 719435.
80+
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
81+
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
82+
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no:A334455B.
83+
Phone (302) 786-5227, Keats Street, San Francisco, E-MAIL: smith@gmail.com.""")
84+
85+
println(deid_result("mask_entity").map(_("result").toString).mkString(""))
86+
println(deid_result("obfuscated").map(_("result").toString).mkString(""))
87+
88+
```
89+
</div>
90+
91+
## Results
92+
93+
```bash
94+
95+
Masked with entity labels
96+
------------------------------
97+
Name : <NAME>, Record date: <DATE>, # <IDNUM>.
98+
Dr. <NAME>, ID: <IDNUM>, IP <IDNUM>.
99+
He is a <AGE> male was admitted to the <LOCATION> for cystectomy on <DATE>.
100+
Patient's VIN : <IDNUM>, SSN <IDNUM>, Driver's license <IDNUM>.
101+
Phone <CONTACT>, <LOCATION>, <LOCATION>, E-MAIL: <CONTACT>.
102+
103+
104+
Obfuscated
105+
------------------------------
106+
Name : Lawrnce Pretzel, Record date: 2093-01-24, # 486302.
107+
Dr. Carolina Cid, ID: 5875955427, IP 089.708.009.79.
108+
He is a 65-year-old male was admitted to the South Benjaminside for cystectomy on 01/24/93.
109+
Patient's VIN : 0OZUO50MYTQ018397, SSN #888-11-3333, Driver's license YZ:Z881100W.
110+
Phone (546) 920-7669, Traceyburgh, 1441 Eastlake Avenue, E-MAIL: UIEZD@OIMEH.KGI.
111+
112+
```
113+
114+
{:.model-param}
115+
## Model Information
116+
117+
{:.table-model}
118+
|---|---|
119+
|Model Name:|clinical_deidentification_docwise_benchmark|
120+
|Type:|pipeline|
121+
|Compatibility:|Healthcare NLP 5.5.1+|
122+
|License:|Licensed|
123+
|Edition:|Official|
124+
|Language:|en|
125+
|Size:|2.5 GB|
126+
127+
## Included Models
128+
129+
- DocumentAssembler
130+
- InternalDocumentSplitter
131+
- TokenizerModel
132+
- WordEmbeddingsModel
133+
- MedicalNerModel
134+
- NerConverterInternalModel
135+
- MedicalNerModel
136+
- MedicalNerModel
137+
- MedicalNerModel
138+
- NerConverterInternalModel
139+
- NerConverterInternalModel
140+
- NerConverterInternalModel
141+
- PretrainedZeroShotNER
142+
- NerConverterInternalModel
143+
- MedicalNerModel
144+
- NerConverterInternalModel
145+
- ContextualEntityRuler
146+
- ChunkMergeModel
147+
- ContextualParserModel
148+
- ContextualParserModel
149+
- ContextualParserModel
150+
- ContextualParserModel
151+
- ContextualParserModel
152+
- ContextualParserModel
153+
- ContextualParserModel
154+
- TextMatcherInternalModel
155+
- TextMatcherInternalModel
156+
- ContextualParserModel
157+
- RegexMatcherInternalModel
158+
- ContextualParserModel
159+
- ContextualParserModel
160+
- ContextualParserModel
161+
- RegexMatcherInternalModel
162+
- RegexMatcherInternalModel
163+
- ChunkMergeModel
164+
- ChunkMergeModel
165+
- LightDeIdentification
166+
- LightDeIdentification
Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
---
2+
layout: model
3+
title: Clinical Deidentification Pipeline (Document Wise - Benchmark)
4+
author: John Snow Labs
5+
name: clinical_deidentification_docwise_benchmark
6+
date: 2025-01-16
7+
tags: [licensed, en, deidentification, deid, pipeline, clinical, docwise, benchmark]
8+
task: [De-identification, Pipeline Healthcare]
9+
language: en
10+
edition: Healthcare NLP 5.5.1
11+
spark_version: 3.2
12+
supported: true
13+
annotator: PipelineModel
14+
article_header:
15+
type: cover
16+
use_language_switcher: "Python-Scala-Java"
17+
---
18+
19+
## Description
20+
21+
This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate `NAME`, `IDNUM`, `CONTACT`, `LOCATION`, `AGE`, `DATE` entities.
22+
**This pipeline is prepared for benchmarking with cloud providers.**
23+
24+
## Predicted Entities
25+
26+
`NAME`, `IDNUM`, `CONTACT`, `LOCATION`, `AGE`, `DATE`
27+
28+
{:.btn-box}
29+
<button class="button button-orange" disabled>Live Demo</button>
30+
<button class="button button-orange" disabled>Open in Colab</button>
31+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_docwise_benchmark_en_5.5.1_3.2_1737048679338.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
32+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/clinical_deidentification_docwise_benchmark_en_5.5.1_3.2_1737048679338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
33+
34+
## How to use
35+
36+
37+
38+
<div class="tabs-box" markdown="1">
39+
{% include programmingLanguageSelectScalaPythonNLU.html %}
40+
41+
```python
42+
43+
from sparknlp.pretrained import PretrainedPipeline
44+
45+
deid_pipeline = PretrainedPipeline("clinical_deidentification_docwise_benchmark", "en", "clinical/models")
46+
47+
deid_result = deid_pipeline.fullAnnotate("""Name : Hendrickson, Ora, Record date: 2093-01-13, # 719435.
48+
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
49+
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
50+
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no:A334455B.
51+
Phone (302) 786-5227, Keats Street, San Francisco, E-MAIL: smith@gmail.com.""")
52+
53+
print(''.join([i.result for i in deid_result['mask_entity']]))
54+
print(''.join([i.result for i in deid_result['obfuscated']]))
55+
56+
```
57+
58+
{:.jsl-block}
59+
```python
60+
61+
deid_pipeline = nlp.PretrainedPipeline("clinical_deidentification_docwise_benchmark", "en", "clinical/models")
62+
63+
deid_result = deid_pipeline.fullAnnotate("""Name : Hendrickson, Ora, Record date: 2093-01-13, # 719435.
64+
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
65+
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
66+
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no:A334455B.
67+
Phone (302) 786-5227, Keats Street, San Francisco, E-MAIL: smith@gmail.com.""")
68+
69+
print(''.join([i.result for i in deid_result['mask_entity']]))
70+
print(''.join([i.result for i in deid_result['obfuscated']]))
71+
72+
```
73+
```scala
74+
75+
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
76+
77+
val deid_pipeline = PretrainedPipeline("clinical_deidentification_docwise_benchmark", "en", "clinical/models")
78+
79+
val deid_result = deid_pipeline.fullAnnotate("""Name : Hendrickson, Ora, Record date: 2093-01-13, # 719435.
80+
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
81+
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
82+
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no:A334455B.
83+
Phone (302) 786-5227, Keats Street, San Francisco, E-MAIL: smith@gmail.com.""")
84+
85+
println(deid_result("mask_entity").map(_("result").toString).mkString(""))
86+
println(deid_result("obfuscated").map(_("result").toString).mkString(""))
87+
88+
```
89+
</div>
90+
91+
## Results
92+
93+
```bash
94+
95+
Masked with entity labels
96+
------------------------------
97+
Name : <NAME>, Record date: <DATE>, # <IDNUM>.
98+
Dr. <NAME>, ID: <IDNUM>, IP <IDNUM>.
99+
He is a <AGE> male was admitted to the <LOCATION> for cystectomy on <DATE>.
100+
Patient's VIN : <IDNUM>, SSN <IDNUM>, Driver's license <IDNUM>.
101+
Phone <CONTACT>, <LOCATION>, <LOCATION>, E-MAIL: <CONTACT>.
102+
103+
104+
Obfuscated
105+
------------------------------
106+
Name : Laray Platt, Record date: 2093-02-17, # 264180.
107+
Dr. Tedd Favorite, ID: 1431511083, IP 534.253.554.24.
108+
He is a 71-year-old male was admitted to the 900 Hospital Drive for cystectomy on 02/17/93.
109+
Patient's VIN : 7HSNH27FRMJ785064, SSN #999-22-4444, Driver's license RS:S114433P.
110+
Phone (546) 920-7669, 830 Kempsville Road, 624 N Second, E-MAIL: AOKFJ@UOSKN.QMO.
111+
112+
```
113+
114+
{:.model-param}
115+
## Model Information
116+
117+
{:.table-model}
118+
|---|---|
119+
|Model Name:|clinical_deidentification_docwise_benchmark|
120+
|Type:|pipeline|
121+
|Compatibility:|Healthcare NLP 5.5.1+|
122+
|License:|Licensed|
123+
|Edition:|Official|
124+
|Language:|en|
125+
|Size:|2.5 GB|
126+
127+
## Included Models
128+
129+
- DocumentAssembler
130+
- InternalDocumentSplitter
131+
- TokenizerModel
132+
- WordEmbeddingsModel
133+
- MedicalNerModel
134+
- NerConverterInternalModel
135+
- MedicalNerModel
136+
- MedicalNerModel
137+
- MedicalNerModel
138+
- NerConverterInternalModel
139+
- NerConverterInternalModel
140+
- NerConverterInternalModel
141+
- PretrainedZeroShotNER
142+
- NerConverterInternalModel
143+
- MedicalNerModel
144+
- NerConverterInternalModel
145+
- ContextualEntityRuler
146+
- ChunkMergeModel
147+
- ContextualParserModel
148+
- ContextualParserModel
149+
- ContextualParserModel
150+
- ContextualParserModel
151+
- ContextualParserModel
152+
- ContextualParserModel
153+
- ContextualParserModel
154+
- TextMatcherInternalModel
155+
- TextMatcherInternalModel
156+
- ContextualParserModel
157+
- RegexMatcherInternalModel
158+
- ContextualParserModel
159+
- ContextualParserModel
160+
- ContextualParserModel
161+
- RegexMatcherInternalModel
162+
- RegexMatcherInternalModel
163+
- ChunkMergeModel
164+
- ChunkMergeModel
165+
- LightDeIdentification
166+
- LightDeIdentification

0 commit comments

Comments
 (0)