Skip to content

Commit 0c69ce7

Browse files
authored
Models hub internal to Main (#1639)
1 parent 683fc1b commit 0c69ce7

File tree

53 files changed

+7888
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+7888
-0
lines changed

docs/_posts/Cabir40/2024-10-18-clinical_deidentification_docwise_wip_en.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ The pipeline can mask and obfuscate `LOCATION`, `CONTACT`, `PROFESSION`, `NAME`,
2323
`LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `ZIP`, `STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `LOCATION_OTHER`, `DLN`,
2424
`SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` entities.
2525

26+
2627
## Predicted Entities
2728

2829
`ACCOUNT`, `AGE`, `BIOID`, `CITY`, `CONTACT`, `COUNTRY`, `City`, `DATE`, `DEVICE`, `DLN`, `DOCTOR`, `EMAIL`, `FAX`, `HEALTHPLAN`, `HOSPITAL`, `ID`, `IDNUM`, `IP`, `LICENSE`, `LOCATION`, `LOCATION-OTHER`, `LOCATION_OTHER`, `MEDICALRECORD`, `NAME`, `ORGANIZATION`, `PATIENT`, `PHONE`, `PLATE`, `PROFESSION`, `SSN`, `STATE`, `STREET`, `URL`, `USERNAME`, `VIN`, `ZIP`
@@ -142,3 +143,4 @@ The patient, Nathaneil Bakes, is 43 years old, her Contact number: 308-657-8469
142143
- ChunkMergeModel
143144
- LightDeIdentification
144145
- LightDeIdentification
146+

docs/_posts/Cabir40/2024-10-19-clinical_deidentification_docwise_wip_en.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,3 +142,4 @@ The patient, Nathaneil Bakes, is 43 years old, her Contact number: 308-657-8469
142142
- ChunkMergeModel
143143
- LightDeIdentification
144144
- LightDeIdentification
145+
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
---
2+
layout: model
3+
title: JSL_MedMX_v1 (LLM - q16)
4+
author: John Snow Labs
5+
name: jsl_medmx_q16_v1
6+
date: 2024-11-05
7+
tags: [en, licensed, clinical, medical, llm, ner, tensorflowi medl, rag]
8+
task: [Summarization, Question Answering, Named Entity Recognition]
9+
language: en
10+
edition: Healthcare NLP 5.5.1
11+
spark_version: 3.0
12+
supported: true
13+
engine: tensorflow
14+
annotator: MedicalLLM
15+
article_header:
16+
type: cover
17+
use_language_switcher: "Python-Scala-Java"
18+
---
19+
20+
## Description
21+
22+
This LLM model is trained to perform Q&A, Summarization, RAG, and Chat.
23+
24+
25+
{:.btn-box}
26+
<button class="button button-orange" disabled>Live Demo</button>
27+
<button class="button button-orange" disabled>Open in Colab</button>
28+
[Download](){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
29+
[Copy S3 URI](){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
30+
31+
## How to use
32+
33+
34+
35+
<div class="tabs-box" markdown="1">
36+
{% include programmingLanguageSelectScalaPythonNLU.html %}
37+
38+
```python
39+
40+
document_assembler = DocumentAssembler()\
41+
.setInputCol("text")\
42+
.setOutputCol("document")
43+
44+
medical_llm = MedicalLLM.pretrained("jsl_medmx_q16_v1", "en", "clinical/models")\
45+
.setInputCols("document")\
46+
.setOutputCol("completions")\
47+
.setBatchSize(1)\
48+
.setNPredict(100)\
49+
.setUseChatTemplate(True)\
50+
.setTemperature(0)
51+
52+
53+
pipeline = Pipeline(
54+
stages = [
55+
document_assembler,
56+
medical_llm
57+
])
58+
59+
prompt = """
60+
A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus.
61+
62+
Which of the following is the best treatment for this patient?
63+
A: Ampicillin
64+
B: Ceftriaxone
65+
C: Ciprofloxacin
66+
D: Doxycycline
67+
E: Nitrofurantoin
68+
"""
69+
70+
data = spark.createDataFrame([[prompt]]).toDF("text")
71+
72+
results = pipeline.fit(data).transform(data)
73+
74+
results.select("completions").show(truncate=False)
75+
76+
```
77+
```scala
78+
79+
val document_assembler = new DocumentAssembler()
80+
.setInputCol("text")
81+
.setOutputCol("document")
82+
83+
val medical_llm = MedicalLLM.pretrained("jsl_medmx_q16_v1", "en", "clinical/models")
84+
.setInputCols("document")
85+
.setOutputCol("completions")
86+
.setBatchSize(1)
87+
.setNPredict(100)
88+
.setUseChatTemplate(True)
89+
.setTemperature(0)
90+
91+
92+
val pipeline = new Pipeline().setStages(Array(
93+
document_assembler,
94+
medical_llm
95+
))
96+
97+
val prompt = """
98+
A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus.
99+
100+
Which of the following is the best treatment for this patient?
101+
A: Ampicillin
102+
B: Ceftriaxone
103+
C: Ciprofloxacin
104+
D: Doxycycline
105+
E: Nitrofurantoin
106+
"""
107+
108+
val data = Seq(prompt).toDF("text")
109+
110+
val results = pipeline.fit(data).transform(data)
111+
112+
results.select("completions").show(truncate=False)
113+
114+
```
115+
</div>
116+
117+
## Results
118+
119+
```bash
120+
121+
The correct answer is E: Nitrofurantoin.
122+
123+
The patient is presenting with symptoms of urinary tract infection (UTI), which is common during pregnancy. Nitrofurantoin is a first-line antibiotic for uncomplicated UTI during pregnancy. It is safe and effective in treating UTI during pregnancy and has been used for many years without any adverse effects on the fetus.
124+
125+
```
126+
127+
{:.model-param}
128+
## Model Information
129+
130+
{:.table-model}
131+
|---|---|
132+
|Model Name:|jsl_medmx_q16_v1|
133+
|Compatibility:|Healthcare NLP 5.5.1+|
134+
|License:|Licensed|
135+
|Edition:|Official|
136+
|Language:|en|
137+
|Size:|50+ GB|
Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
---
2+
layout: model
3+
title: Pretrained Zero-Shot Named Entity Recognition (zeroshot_ner_deid_subentity_merged_medium)
4+
author: John Snow Labs
5+
name: zeroshot_ner_deid_subentity_merged_medium
6+
date: 2024-11-27
7+
tags: [licensed, clinical, en, ner, deid, zeroshot]
8+
task: Named Entity Recognition
9+
language: en
10+
edition: Healthcare NLP 5.5.0
11+
spark_version: 3.0
12+
supported: true
13+
annotator: PretrainedZeroShotNER
14+
article_header:
15+
type: cover
16+
use_language_switcher: "Python-Scala-Java"
17+
---
18+
19+
## Description
20+
21+
Zero-shot Named Entity Recognition (NER) enables the identification of entities in text with minimal effort. By leveraging pre-trained language models and contextual understanding, zero-shot NER extends entity recognition capabilities to new domains and languages.
22+
23+
While the model card includes default labels as examples, it is important to highlight that users are not limited to these labels. The model is designed to support any set of entity labels, allowing users to adapt it to their specific use cases. For best results, it is recommended to use labels that are conceptually similar to the provided defaults.
24+
25+
## Predicted Entities
26+
27+
`DOCTOR`, `PATIENT`, `AGE`, `DATE`, `HOSPITAL`, `CITY`, `STREET`, `STATE`, `COUNTRY`, `PHONE`, `IDNUM`, `EMAIL`, `ZIP`, `ORGANIZATION`, `PROFESSION`, `USERNAME`
28+
29+
{:.btn-box}
30+
<button class="button button-orange" disabled>Live Demo</button>
31+
<button class="button button-orange" disabled>Open in Colab</button>
32+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/zeroshot_ner_deid_subentity_merged_medium_en_5.5.0_3.0_1732701620086.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
33+
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/zeroshot_ner_deid_subentity_merged_medium_en_5.5.0_3.0_1732701620086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
34+
35+
## How to use
36+
37+
38+
39+
<div class="tabs-box" markdown="1">
40+
{% include programmingLanguageSelectScalaPythonNLU.html %}
41+
42+
```python
43+
document_assembler = DocumentAssembler()\
44+
.setInputCol("text")\
45+
.setOutputCol("document")
46+
47+
sentence_detector = SentenceDetector()\
48+
.setInputCols(["document"])\
49+
.setOutputCol("sentence")
50+
51+
tokenizer = Tokenizer()\
52+
.setInputCols(["sentence"])\
53+
.setOutputCol("token")
54+
55+
labels = ['DOCTOR', 'PATIENT', 'AGE', 'DATE', 'HOSPITAL', 'CITY', 'STREET', 'STATE', 'COUNTRY', 'PHONE', 'IDNUM', 'EMAIL', 'ZIP', 'ORGANIZATION', 'PROFESSION', 'USERNAME'] # You can change the entities
56+
57+
pretrained_zero_shot_ner = PretrainedZeroShotNER().pretrained("zeroshot_ner_deid_subentity_merged_medium", "en", "clinical/models")\
58+
.setInputCols("sentence", "token")\
59+
.setOutputCol("ner")\
60+
.setPredictionThreshold(0.5)\
61+
.setLabels(labels)
62+
63+
ner_converter = NerConverterInternal()\
64+
.setInputCols("sentence", "token", "ner")\
65+
.setOutputCol("ner_chunk")
66+
67+
pipeline = Pipeline().setStages([
68+
document_assembler,
69+
sentence_detector,
70+
tokenizer,
71+
pretrained_zero_shot_ner,
72+
ner_converter
73+
])
74+
75+
data = spark.createDataFrame([["""Dr. John Taylor, ID: 982345, a cardiologist at St. Mary's Hospital in Boston, was contacted on 05/10/2023 regarding a 45-year-old."""]]).toDF("text")
76+
77+
result = pipeline.fit(data).transform(data)
78+
```
79+
80+
{:.jsl-block}
81+
```python
82+
document_assembler = nlp.DocumentAssembler()\
83+
.setInputCol("text")\
84+
.setOutputCol("document")
85+
86+
sentence_detector = nlp.SentenceDetector()\
87+
.setInputCols(["document"])\
88+
.setOutputCol("sentence")
89+
90+
tokenizer = nlp.Tokenizer()\
91+
.setInputCols(["sentence"])\
92+
.setOutputCol("token")
93+
94+
labels = ['DOCTOR', 'PATIENT', 'AGE', 'DATE', 'HOSPITAL', 'CITY', 'STREET', 'STATE', 'COUNTRY', 'PHONE', 'IDNUM', 'EMAIL', 'ZIP', 'ORGANIZATION', 'PROFESSION', 'USERNAME']
95+
96+
pretrained_zero_shot_ner = medical.PretrainedZeroShotNER().pretrained("zeroshot_ner_deid_subentity_merged_medium", "en", "clinical/models")\
97+
.setInputCols("sentence", "token")\
98+
.setOutputCol("ner")\
99+
.setPredictionThreshold(0.5)\
100+
.setLabels(labels)
101+
102+
ner_converter = medical.NerConverterInternal()\
103+
.setInputCols("sentence", "token", "ner")\
104+
.setOutputCol("ner_chunk")
105+
106+
107+
pipeline = nlp.Pipeline().setStages([
108+
document_assembler,
109+
sentence_detector,
110+
tokenizer,
111+
pretrained_zero_shot_ner,
112+
ner_converter
113+
])
114+
115+
data = spark.createDataFrame([["""Dr. John Taylor, ID: 982345, a cardiologist at St. Mary's Hospital in Boston, was contacted on 05/10/2023 regarding a 45-year-old."""]]).toDF("text")
116+
117+
result = pipeline.fit(data).transform(data)
118+
```
119+
```scala
120+
val document_assembler = new DocumentAssembler()
121+
.setInputCol("text")
122+
.setOutputCol("document")
123+
124+
val sentence_detector = new SentenceDetector()
125+
.setInputCols("document")
126+
.setOutputCol("sentence")
127+
128+
val tokenizer = new Tokenizer()
129+
.setInputCols("sentence")
130+
.setOutputCol("token")
131+
132+
labels = ["DOCTOR", "PATIENT", "AGE", "DATE", "HOSPITAL", "CITY", "STREET", "STATE", "COUNTRY", "PHONE", "IDNUM", "EMAIL", "ZIP", "ORGANIZATION", "PROFESSION", "USERNAME"]
133+
134+
val pretrained_zero_shot_ner = PretrainedZeroShotNER().pretrained("zeroshot_ner_deid_subentity_merged_medium", "en", "clinical/models")
135+
.setInputCols(Array("sentence", "token"))
136+
.setOutputCol("ner")
137+
.setPredictionThreshold(0.5)
138+
.setLabels(labels)
139+
140+
val ner_converter = new NerConverterInternal()
141+
.setInputCols(Array("sentence", "token", "ner"))
142+
.setOutputCol("ner_chunk")
143+
144+
145+
val pipeline = new Pipeline().setStages(Array(
146+
document_assembler,
147+
sentence_detector,
148+
tokenizer,
149+
pretrained_zero_shot_ner,
150+
ner_converter
151+
))
152+
153+
val data = Seq("""Dr. John Taylor, ID: 982345, a cardiologist at St. Mary's Hospital in Boston, was contacted on 05/10/2023 regarding a 45-year-old.""").toDF("text")
154+
155+
val result = pipeline.fit(data).transform(data)
156+
```
157+
</div>
158+
159+
## Results
160+
161+
```bash
162+
+--------+-------------------+-----+---+----------+
163+
|sentence| chunk|begin|end| ner_label|
164+
+--------+-------------------+-----+---+----------+
165+
| 0| Dr. John Taylor| 0| 14| DOCTOR|
166+
| 0| 982345| 21| 26| IDNUM|
167+
| 0| cardiologist| 31| 42|PROFESSION|
168+
| 0|St. Mary's Hospital| 47| 65| HOSPITAL|
169+
| 0| Boston| 70| 75| CITY|
170+
| 0| 05/10/2023| 95|104| DATE|
171+
| 0| 45-year-old| 118|128| AGE|
172+
+--------+-------------------+-----+---+----------+
173+
```
174+
175+
{:.model-param}
176+
## Model Information
177+
178+
{:.table-model}
179+
|---|---|
180+
|Model Name:|zeroshot_ner_deid_subentity_merged_medium|
181+
|Compatibility:|Healthcare NLP 5.5.0+|
182+
|License:|Licensed|
183+
|Edition:|Official|
184+
|Language:|en|
185+
|Size:|706.7 MB|

docs/_posts/Meryem1425/2024-10-14-explain_clinical_doc_oncology_en.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ It can also return the [ner_cancer_types_wip](https://nlp.johnsnowlabs.com/2024/
6767
`Posology_Information-Frequency`, `Posology_Information-Route`, `Unspecific_Therapy-Dosage`, `Unspecific_Therapy-Duration`,
6868
`Unspecific_Therapy-Frequency`, `Unspecific_Therapy-Route`, `Unspecific_Therapy-Metastais`, `Unspecific_Therapy-Cancer_Dx`
6969

70+
7071
## Predicted Entities
7172

7273
`Adenopathy`, `Age`, `Alcohol`, `Anatomical_Site`, `BMI`, `Biomarker`, `Biomarker_Measurement`, `Biomarker_Quant`, `Biomarker_Result`, `Body_Site`, `CNS_Tumor_Type`, `CancerModifier`, `Cancer_Dx`, `Cancer_Score`, `Cancer_Surgery`, `Cancer_Therapy`, `Cancer_dx`, `Cancer_dx"`, `Carcinoma_Type`, `Chemotherapy`, `Communicable_Disease`, `Cycle_Count`, `Cycle_Day`, `Cycle_Number`, `Date`, `Death_Entity`, `Diabetes`, `Direction`, `Dosage`, `Duration`, `Frequency`, `Gender`, `Grade`, `Histological_Type`, `Hormonal_Therapy`, `Imaging_Test`, `Immunotherapy`, `Invasion`, `Leukemia_Type`, `Line_Of_Therapy`, `Lymph_Node`, `Lymph_Node_Modifier`, `Lymphoma_Type`, `Melanoma`, `Metastasis`, `Obesity`, `Oncogene`, `Oncological`, `Overweight`, `Pathology_Result`, `Pathology_Test`, `Performance_Status`, `Posology_Information`, `Predictive_Biomarkers`, `Prognostic_Biomarkers`, `Race_Ethnicity`, `Radiation_Dose`, `Radiotherapy`, `Relative_Date`, `Response_To_Treatment`, `Route`, `Sarcoma_Type`, `Site_Bone`, `Site_Brain`, `Site_Breast`, `Site_Liver`, `Site_Lung`, `Site_Lymph_Node`, `Site_Other_Body_Part`, `Size_Trend`, `Smoking_Status`, `Staging`, `Targeted_Therapy`, `Tumor_Description`, `Tumor_Finding`, `Tumor_Size`, `Unspecific_Therapy`, `Weight`

docs/_posts/Meryem1425/2024-10-18-clinical_deidentification_docwise_wip_en.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ The pipeline can mask and obfuscate `LOCATION`, `CONTACT`, `PROFESSION`, `NAME`,
2323
`LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `ZIP`, `STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `LOCATION_OTHER`, `DLN`,
2424
`SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` entities.
2525

26+
2627
## Predicted Entities
2728

2829
`ACCOUNT`, `AGE`, `BIOID`, `CITY`, `CONTACT`, `COUNTRY`, `City`, `DATE`, `DEVICE`, `DLN`, `DOCTOR`, `EMAIL`, `FAX`, `HEALTHPLAN`, `HOSPITAL`, `ID`, `IDNUM`, `IP`, `LICENSE`, `LOCATION`, `LOCATION-OTHER`, `LOCATION_OTHER`, `MEDICALRECORD`, `NAME`, `ORGANIZATION`, `PATIENT`, `PHONE`, `PLATE`, `PROFESSION`, `SSN`, `STATE`, `STREET`, `URL`, `USERNAME`, `VIN`, `ZIP`
@@ -142,3 +143,4 @@ The patient, Nathaneil Bakes, is 43 years old, her Contact number: 308-657-8469
142143
- ChunkMergeModel
143144
- LightDeIdentification
144145
- LightDeIdentification
146+

0 commit comments

Comments
 (0)