JohnSnowLabs
diff --git a/‎docs/_posts/Cabir40/2024-10-18-clinical_deidentification_docwise_wip_en.md
Lines changed: 2 additions & 0 deletions b/‎docs/_posts/Cabir40/2024-10-18-clinical_deidentification_docwise_wip_en.md
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/_posts/Cabir40/2024-10-19-clinical_deidentification_docwise_wip_en.md
Lines changed: 1 addition & 0 deletions b/‎docs/_posts/Cabir40/2024-10-19-clinical_deidentification_docwise_wip_en.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/_posts/Cabir40/2024-11-05-jsl_medmx_q16_v1_en.md
Lines changed: 137 additions & 0 deletions b/‎docs/_posts/Cabir40/2024-11-05-jsl_medmx_q16_v1_en.md
Lines changed: 137 additions & 0 deletions
diff --git a/‎docs/_posts/Cabir40/2024-11-27-zeroshot_ner_deid_subentity_merged_medium_en.md
Lines changed: 185 additions & 0 deletions b/‎docs/_posts/Cabir40/2024-11-27-zeroshot_ner_deid_subentity_merged_medium_en.md
Lines changed: 185 additions & 0 deletions
diff --git a/‎docs/_posts/Meryem1425/2024-10-14-explain_clinical_doc_oncology_en.md
Lines changed: 1 addition & 0 deletions b/‎docs/_posts/Meryem1425/2024-10-14-explain_clinical_doc_oncology_en.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/_posts/Meryem1425/2024-10-18-clinical_deidentification_docwise_wip_en.md
Lines changed: 2 additions & 0 deletions b/‎docs/_posts/Meryem1425/2024-10-18-clinical_deidentification_docwise_wip_en.md
Lines changed: 2 additions & 0 deletions
@@ -23,6 +23,7 @@ The pipeline can mask and obfuscate `LOCATION`, `CONTACT`, `PROFESSION`, `NAME`,
 `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `ZIP`, `STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `LOCATION_OTHER`, `DLN`,
 `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` entities.
 
+
 ## Predicted Entities
 
 `ACCOUNT`, `AGE`, `BIOID`, `CITY`, `CONTACT`, `COUNTRY`, `City`, `DATE`, `DEVICE`, `DLN`, `DOCTOR`, `EMAIL`, `FAX`, `HEALTHPLAN`, `HOSPITAL`, `ID`, `IDNUM`, `IP`, `LICENSE`, `LOCATION`, `LOCATION-OTHER`, `LOCATION_OTHER`, `MEDICALRECORD`, `NAME`, `ORGANIZATION`, `PATIENT`, `PHONE`, `PLATE`, `PROFESSION`, `SSN`, `STATE`, `STREET`, `URL`, `USERNAME`, `VIN`, `ZIP`
@@ -142,3 +143,4 @@ The patient, Nathaneil Bakes, is 43 years old,  her Contact number: 308-657-8469
 - ChunkMergeModel
 - LightDeIdentification
 - LightDeIdentification
+
@@ -142,3 +142,4 @@ The patient, Nathaneil Bakes, is 43 years old,  her Contact number: 308-657-8469
 - ChunkMergeModel
 - LightDeIdentification
 - LightDeIdentification
+
@@ -0,0 +1,137 @@
+---
+layout: model
+title: JSL_MedMX_v1 (LLM - q16)
+author: John Snow Labs
+name: jsl_medmx_q16_v1
+date: 2024-11-05
+tags: [en, licensed, clinical, medical, llm, ner, tensorflowi medl, rag]
+task: [Summarization, Question Answering, Named Entity Recognition]
+language: en
+edition: Healthcare NLP 5.5.1
+spark_version: 3.0
+supported: true
+engine: tensorflow
+annotator: MedicalLLM
+article_header:
+  type: cover
+use_language_switcher: "Python-Scala-Java"
+---
+
+## Description
+
+This LLM model is trained to perform Q&A, Summarization, RAG, and Chat.
+
+
+{:.btn-box}
+<button class="button button-orange" disabled>Live Demo</button>
+<button class="button button-orange" disabled>Open in Colab</button>
+[Download](){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
+[Copy S3 URI](){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
+
+## How to use
+
+
+
+<div class="tabs-box" markdown="1">
+{% include programmingLanguageSelectScalaPythonNLU.html %}
+  
+```python
+
+document_assembler = DocumentAssembler()\
+    .setInputCol("text")\
+    .setOutputCol("document")
+
+medical_llm = MedicalLLM.pretrained("jsl_medmx_q16_v1", "en", "clinical/models")\
+    .setInputCols("document")\
+    .setOutputCol("completions")\
+    .setBatchSize(1)\
+    .setNPredict(100)\
+    .setUseChatTemplate(True)\
+    .setTemperature(0)
+
+
+pipeline = Pipeline(
+    stages = [
+        document_assembler,
+        medical_llm
+])
+
+prompt = """
+A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus.
+
+Which of the following is the best treatment for this patient?
+A: Ampicillin
+B: Ceftriaxone
+C: Ciprofloxacin
+D: Doxycycline
+E: Nitrofurantoin
+"""
+
+data = spark.createDataFrame([[prompt]]).toDF("text")
+
+results = pipeline.fit(data).transform(data)
+
+results.select("completions").show(truncate=False)
+
+```
+```scala
+
+val document_assembler = new DocumentAssembler()
+    .setInputCol("text")
+    .setOutputCol("document")
+
+val medical_llm = MedicalLLM.pretrained("jsl_medmx_q16_v1", "en", "clinical/models")
+    .setInputCols("document")
+    .setOutputCol("completions")
+    .setBatchSize(1)
+    .setNPredict(100)
+    .setUseChatTemplate(True)
+    .setTemperature(0)
+
+
+val pipeline = new Pipeline().setStages(Array(
+    document_assembler,
+    medical_llm
+))
+
+val  prompt = """
+A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus.
+
+Which of the following is the best treatment for this patient?
+A: Ampicillin
+B: Ceftriaxone
+C: Ciprofloxacin
+D: Doxycycline
+E: Nitrofurantoin
+"""
+
+val data = Seq(prompt).toDF("text")
+
+val results = pipeline.fit(data).transform(data)
+
+results.select("completions").show(truncate=False)
+
+```
+</div>
+
+## Results
+
+```bash
+
+The correct answer is E: Nitrofurantoin.
+
+The patient is presenting with symptoms of urinary tract infection (UTI), which is common during pregnancy. Nitrofurantoin is a first-line antibiotic for uncomplicated UTI during pregnancy. It is safe and effective in treating UTI during pregnancy and has been used for many years without any adverse effects on the fetus.
+
+```
+
+{:.model-param}
+## Model Information
+
+{:.table-model}
+|---|---|
+|Model Name:|jsl_medmx_q16_v1|
+|Compatibility:|Healthcare NLP 5.5.1+|
+|License:|Licensed|
+|Edition:|Official|
+|Language:|en|
+|Size:|50+ GB|
@@ -0,0 +1,185 @@
+---
+layout: model
+title: Pretrained Zero-Shot Named Entity Recognition (zeroshot_ner_deid_subentity_merged_medium)
+author: John Snow Labs
+name: zeroshot_ner_deid_subentity_merged_medium
+date: 2024-11-27
+tags: [licensed, clinical, en, ner, deid, zeroshot]
+task: Named Entity Recognition
+language: en
+edition: Healthcare NLP 5.5.0
+spark_version: 3.0
+supported: true
+annotator: PretrainedZeroShotNER
+article_header:
+  type: cover
+use_language_switcher: "Python-Scala-Java"
+---
+
+## Description
+
+Zero-shot Named Entity Recognition (NER) enables the identification of entities in text with minimal effort. By leveraging pre-trained language models and contextual understanding, zero-shot NER extends entity recognition capabilities to new domains and languages.
+ 
+While the model card includes default labels as examples, it is important to highlight that users are not limited to these labels. The model is designed to support any set of entity labels, allowing users to adapt it to their specific use cases. For best results, it is recommended to use labels that are conceptually similar to the provided defaults.
+
+## Predicted Entities
+
+`DOCTOR`, `PATIENT`, `AGE`, `DATE`, `HOSPITAL`, `CITY`, `STREET`, `STATE`, `COUNTRY`, `PHONE`, `IDNUM`, `EMAIL`, `ZIP`, `ORGANIZATION`, `PROFESSION`, `USERNAME`
+
+{:.btn-box}
+<button class="button button-orange" disabled>Live Demo</button>
+<button class="button button-orange" disabled>Open in Colab</button>
+[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/zeroshot_ner_deid_subentity_merged_medium_en_5.5.0_3.0_1732701620086.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
+[Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/zeroshot_ner_deid_subentity_merged_medium_en_5.5.0_3.0_1732701620086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
+
+## How to use
+
+
+
+<div class="tabs-box" markdown="1">
+{% include programmingLanguageSelectScalaPythonNLU.html %}
+  
+```python
+document_assembler = DocumentAssembler()\
+    .setInputCol("text")\
+    .setOutputCol("document")
+ 
+sentence_detector = SentenceDetector()\
+    .setInputCols(["document"])\
+    .setOutputCol("sentence")
+ 
+tokenizer = Tokenizer()\
+    .setInputCols(["sentence"])\
+    .setOutputCol("token")
+ 
+labels = ['DOCTOR', 'PATIENT', 'AGE', 'DATE', 'HOSPITAL', 'CITY', 'STREET', 'STATE', 'COUNTRY', 'PHONE', 'IDNUM', 'EMAIL', 'ZIP', 'ORGANIZATION', 'PROFESSION', 'USERNAME'] # You can change the entities
+ 
+pretrained_zero_shot_ner = PretrainedZeroShotNER().pretrained("zeroshot_ner_deid_subentity_merged_medium", "en", "clinical/models")\
+    .setInputCols("sentence", "token")\
+    .setOutputCol("ner")\
+    .setPredictionThreshold(0.5)\
+    .setLabels(labels)
+ 
+ner_converter = NerConverterInternal()\
+    .setInputCols("sentence", "token", "ner")\
+    .setOutputCol("ner_chunk")
+
+pipeline = Pipeline().setStages([
+    document_assembler,
+    sentence_detector,
+    tokenizer,
+    pretrained_zero_shot_ner,
+    ner_converter
+])
+
+data = spark.createDataFrame([["""Dr. John Taylor, ID: 982345, a cardiologist at St. Mary's Hospital in Boston, was contacted on 05/10/2023 regarding a 45-year-old."""]]).toDF("text")
+
+result = pipeline.fit(data).transform(data)
+```
+
+{:.jsl-block}
+```python
+document_assembler = nlp.DocumentAssembler()\
+    .setInputCol("text")\
+    .setOutputCol("document")
+ 
+sentence_detector = nlp.SentenceDetector()\
+    .setInputCols(["document"])\
+    .setOutputCol("sentence")
+ 
+tokenizer = nlp.Tokenizer()\
+    .setInputCols(["sentence"])\
+    .setOutputCol("token")
+ 
+labels = ['DOCTOR', 'PATIENT', 'AGE', 'DATE', 'HOSPITAL', 'CITY', 'STREET', 'STATE', 'COUNTRY', 'PHONE', 'IDNUM', 'EMAIL', 'ZIP', 'ORGANIZATION', 'PROFESSION', 'USERNAME']
+ 
+pretrained_zero_shot_ner = medical.PretrainedZeroShotNER().pretrained("zeroshot_ner_deid_subentity_merged_medium", "en", "clinical/models")\
+    .setInputCols("sentence", "token")\
+    .setOutputCol("ner")\
+    .setPredictionThreshold(0.5)\
+    .setLabels(labels)
+ 
+ner_converter = medical.NerConverterInternal()\
+    .setInputCols("sentence", "token", "ner")\
+    .setOutputCol("ner_chunk")
+ 
+ 
+pipeline = nlp.Pipeline().setStages([
+    document_assembler,
+    sentence_detector,
+    tokenizer,
+    pretrained_zero_shot_ner,
+    ner_converter
+])
+
+data = spark.createDataFrame([["""Dr. John Taylor, ID: 982345, a cardiologist at St. Mary's Hospital in Boston, was contacted on 05/10/2023 regarding a 45-year-old."""]]).toDF("text")
+
+result = pipeline.fit(data).transform(data)
+```
+```scala
+val document_assembler = new DocumentAssembler()
+    .setInputCol("text")
+    .setOutputCol("document")
+ 
+val sentence_detector = new SentenceDetector()
+    .setInputCols("document")
+    .setOutputCol("sentence")
+ 
+val tokenizer = new Tokenizer()
+    .setInputCols("sentence")
+    .setOutputCol("token")
+ 
+labels = ["DOCTOR", "PATIENT", "AGE", "DATE", "HOSPITAL", "CITY", "STREET", "STATE", "COUNTRY", "PHONE", "IDNUM", "EMAIL", "ZIP", "ORGANIZATION", "PROFESSION", "USERNAME"]
+ 
+val pretrained_zero_shot_ner = PretrainedZeroShotNER().pretrained("zeroshot_ner_deid_subentity_merged_medium", "en", "clinical/models")
+    .setInputCols(Array("sentence", "token"))
+    .setOutputCol("ner")
+    .setPredictionThreshold(0.5)
+    .setLabels(labels)
+ 
+val ner_converter = new NerConverterInternal()
+    .setInputCols(Array("sentence", "token", "ner"))
+    .setOutputCol("ner_chunk")
+ 
+ 
+val pipeline = new Pipeline().setStages(Array(
+    document_assembler,
+    sentence_detector,
+    tokenizer,
+    pretrained_zero_shot_ner,
+    ner_converter
+))
+
+val data = Seq("""Dr. John Taylor, ID: 982345, a cardiologist at St. Mary's Hospital in Boston, was contacted on 05/10/2023 regarding a 45-year-old.""").toDF("text")
+
+val result = pipeline.fit(data).transform(data)
+```
+</div>
+
+## Results
+
+```bash
++--------+-------------------+-----+---+----------+
+|sentence|              chunk|begin|end| ner_label|
++--------+-------------------+-----+---+----------+
+|       0|    Dr. John Taylor|    0| 14|    DOCTOR|
+|       0|             982345|   21| 26|     IDNUM|
+|       0|       cardiologist|   31| 42|PROFESSION|
+|       0|St. Mary's Hospital|   47| 65|  HOSPITAL|
+|       0|             Boston|   70| 75|      CITY|
+|       0|         05/10/2023|   95|104|      DATE|
+|       0|        45-year-old|  118|128|       AGE|
++--------+-------------------+-----+---+----------+
+```
+
+{:.model-param}
+## Model Information
+
+{:.table-model}
+|---|---|
+|Model Name:|zeroshot_ner_deid_subentity_merged_medium|
+|Compatibility:|Healthcare NLP 5.5.0+|
+|License:|Licensed|
+|Edition:|Official|
+|Language:|en|
+|Size:|706.7 MB|
@@ -67,6 +67,7 @@ It can also return the [ner_cancer_types_wip](https://nlp.johnsnowlabs.com/2024/
 `Posology_Information-Frequency`, `Posology_Information-Route`, `Unspecific_Therapy-Dosage`, `Unspecific_Therapy-Duration`,
 `Unspecific_Therapy-Frequency`, `Unspecific_Therapy-Route`, `Unspecific_Therapy-Metastais`, `Unspecific_Therapy-Cancer_Dx`
 
+
 ## Predicted Entities
 
 `Adenopathy`, `Age`, `Alcohol`, `Anatomical_Site`, `BMI`, `Biomarker`, `Biomarker_Measurement`, `Biomarker_Quant`, `Biomarker_Result`, `Body_Site`, `CNS_Tumor_Type`, `CancerModifier`, `Cancer_Dx`, `Cancer_Score`, `Cancer_Surgery`, `Cancer_Therapy`, `Cancer_dx`, `Cancer_dx"`, `Carcinoma_Type`, `Chemotherapy`, `Communicable_Disease`, `Cycle_Count`, `Cycle_Day`, `Cycle_Number`, `Date`, `Death_Entity`, `Diabetes`, `Direction`, `Dosage`, `Duration`, `Frequency`, `Gender`, `Grade`, `Histological_Type`, `Hormonal_Therapy`, `Imaging_Test`, `Immunotherapy`, `Invasion`, `Leukemia_Type`, `Line_Of_Therapy`, `Lymph_Node`, `Lymph_Node_Modifier`, `Lymphoma_Type`, `Melanoma`, `Metastasis`, `Obesity`, `Oncogene`, `Oncological`, `Overweight`, `Pathology_Result`, `Pathology_Test`, `Performance_Status`, `Posology_Information`, `Predictive_Biomarkers`, `Prognostic_Biomarkers`, `Race_Ethnicity`, `Radiation_Dose`, `Radiotherapy`, `Relative_Date`, `Response_To_Treatment`, `Route`, `Sarcoma_Type`, `Site_Bone`, `Site_Brain`, `Site_Breast`, `Site_Liver`, `Site_Lung`, `Site_Lymph_Node`, `Site_Other_Body_Part`, `Size_Trend`, `Smoking_Status`, `Staging`, `Targeted_Therapy`, `Tumor_Description`, `Tumor_Finding`, `Tumor_Size`, `Unspecific_Therapy`, `Weight`
 
@@ -23,6 +23,7 @@ The pipeline can mask and obfuscate `LOCATION`, `CONTACT`, `PROFESSION`, `NAME`,
 `LOCATION-OTHER`, `URL`, `DEVICE`, `CITY`, `ZIP`, `STATE`, `PATIENT`, `COUNTRY`, `STREET`, `PHONE`, `HOSPITAL`, `EMAIL`, `IDNUM`, `BIOID`, `FAX`, `LOCATION_OTHER`, `DLN`,
 `SSN`, `ACCOUNT`, `PLATE`, `VIN`, `LICENSE`, `IP` entities.
 
+
 ## Predicted Entities
 
 `ACCOUNT`, `AGE`, `BIOID`, `CITY`, `CONTACT`, `COUNTRY`, `City`, `DATE`, `DEVICE`, `DLN`, `DOCTOR`, `EMAIL`, `FAX`, `HEALTHPLAN`, `HOSPITAL`, `ID`, `IDNUM`, `IP`, `LICENSE`, `LOCATION`, `LOCATION-OTHER`, `LOCATION_OTHER`, `MEDICALRECORD`, `NAME`, `ORGANIZATION`, `PATIENT`, `PHONE`, `PLATE`, `PROFESSION`, `SSN`, `STATE`, `STREET`, `URL`, `USERNAME`, `VIN`, `ZIP`
@@ -142,3 +143,4 @@ The patient, Nathaneil Bakes, is 43 years old,  her Contact number: 308-657-8469
 - ChunkMergeModel
 - LightDeIdentification
 - LightDeIdentification
+