JohnSnowLabs
diff --git a/‎docs/_posts/Cabir40/2024-07-12-jsl_meds_q16_v1_en.md
Lines changed: 63 additions & 13 deletions b/‎docs/_posts/Cabir40/2024-07-12-jsl_meds_q16_v1_en.md
Lines changed: 63 additions & 13 deletions
diff --git a/‎docs/_posts/Cabir40/2024-07-12-jsl_meds_q4_v1_en.md
Lines changed: 50 additions & 0 deletions b/‎docs/_posts/Cabir40/2024-07-12-jsl_meds_q4_v1_en.md
Lines changed: 50 additions & 0 deletions
diff --git a/‎docs/_posts/Cabir40/2024-07-12-jsl_meds_q8_v1_en.md
Lines changed: 50 additions & 0 deletions b/‎docs/_posts/Cabir40/2024-07-12-jsl_meds_q8_v1_en.md
Lines changed: 50 additions & 0 deletions
diff --git a/‎docs/_posts/Cabir40/2024-07-12-jsl_medsner_zs_q16_v1_en.md
Lines changed: 50 additions & 0 deletions b/‎docs/_posts/Cabir40/2024-07-12-jsl_medsner_zs_q16_v1_en.md
Lines changed: 50 additions & 0 deletions
diff --git a/‎docs/_posts/Cabir40/2024-07-12-jsl_medsner_zs_q4_v1_en.md
Lines changed: 50 additions & 0 deletions b/‎docs/_posts/Cabir40/2024-07-12-jsl_medsner_zs_q4_v1_en.md
Lines changed: 50 additions & 0 deletions
diff --git a/‎docs/_posts/Cabir40/2024-07-12-jsl_medsner_zs_q8_v1_en.md
Lines changed: 50 additions & 0 deletions b/‎docs/_posts/Cabir40/2024-07-12-jsl_medsner_zs_q8_v1_en.md
Lines changed: 50 additions & 0 deletions
@@ -13,12 +13,12 @@ supported: true
 annotator: LLMLoader
 article_header:
   type: cover
-use_language_switcher: "Python-Scala-Java"
-
-deploy:
-  sagemaker_link: https://aws.amazon.com/marketplace/pp/prodview-yrajldynampw4
-  snowflake_link: https://app.snowflake.com/marketplace/listing/GZTYZ4386LJ68/john-snow-labs-medical-text-summarization-and-qa
-  databricks_link: 
+use_language_switcher: "Python-Scala-Java"
+
+deploy:
+  sagemaker_link: https://aws.amazon.com/marketplace/pp/prodview-yrajldynampw4
+  snowflake_link: https://app.snowflake.com/marketplace/listing/GZTYZ4386LJ68/john-snow-labs-medical-text-summarization-and-qa
+  databricks_link: 
 
 ---
 
@@ -38,13 +38,13 @@ This LLM model is trained to perform Summarization and Q&A based on a given cont
 [Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q16_v1_en_5.4.0_3.0_1720040078717.zip){:.button.button-orange.button-orange-trans.arr.button-icon.hidden}
 [Copy S3 URI](s3://auxdata.johnsnowlabs.com/clinical/models/jsl_meds_q16_v1_en_5.4.0_3.0_1720040078717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}
 
-{% if page.deploy %}
-## Available as Private API Endpoint
-
-{:.tac}
-{% include display_platform_information.html %}
-{% endif %}
-
+{% if page.deploy %}
+## Available as Private API Endpoint
+
+{:.tac}
+{% include display_platform_information.html %}
+{% endif %}
+
 ## How to use
 
 
@@ -116,3 +116,53 @@ val response = llmLoader.generate(prompt)
 
 
 
+## Benchmarking
+
+We have generated a total of 400 questions, 100 from each category. These questions were labeled and reviewed by 3 physician annotators. `%` indicates the preference rate
+
+```bash
+## Overall
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.24         | 0.25                 | 0.38          |
+| GPT4o      | 0.19         | 0.26                 | 0.27          |
+| Neutral    | 0.43         | 0.36                 | 0.18          |
+| None       | 0.14         | 0.13                 | 0.17          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## Summary 
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.47         | 0.48                 | 0.42          |
+| GPT4o      | 0.25         | 0.25                 | 0.25          |
+| Neutral    | 0.22         | 0.22                 | 0.25          |
+| None       | 0.07         | 0.05                 | 0.08          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## QA
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.35         | 0.36                 | 0.42          |
+| GPT4o      | 0.24         | 0.24                 | 0.29          |
+| Neutral    | 0.33         | 0.33                 | 0.18          |
+| None       | 0.09         | 0.07                 | 0.11          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## BioMedical
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.33         | 0.24                 | 0.57          |
+| GPT4o      | 0.12         | 0.08                 | 0.16          |
+| Neutral    | 0.45         | 0.57                 | 0.16          |
+| None       | 0.10         | 0.10                 | 0.10          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## OpenEnded
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.35         | 0.30                 | 0.39          |
+| GPT4o      | 0.30         | 0.33                 | 0.41          |
+| Neutral    | 0.19         | 0.20                 | 0.02          |
+| None       | 0.17         | 0.17                 | 0.19          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+```
@@ -103,3 +103,53 @@ val response = llmLoader.generate(prompt)
 
 
 
+## Benchmarking
+
+We have generated a total of 400 questions, 100 from each category. These questions were labeled and reviewed by 3 physician annotators. `%` indicates the preference rate
+
+```bash
+## Overall
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.24         | 0.25                 | 0.38          |
+| GPT4o      | 0.19         | 0.26                 | 0.27          |
+| Neutral    | 0.43         | 0.36                 | 0.18          |
+| None       | 0.14         | 0.13                 | 0.17          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## Summary 
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.47         | 0.48                 | 0.42          |
+| GPT4o      | 0.25         | 0.25                 | 0.25          |
+| Neutral    | 0.22         | 0.22                 | 0.25          |
+| None       | 0.07         | 0.05                 | 0.08          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## QA
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.35         | 0.36                 | 0.42          |
+| GPT4o      | 0.24         | 0.24                 | 0.29          |
+| Neutral    | 0.33         | 0.33                 | 0.18          |
+| None       | 0.09         | 0.07                 | 0.11          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## BioMedical
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.33         | 0.24                 | 0.57          |
+| GPT4o      | 0.12         | 0.08                 | 0.16          |
+| Neutral    | 0.45         | 0.57                 | 0.16          |
+| None       | 0.10         | 0.10                 | 0.10          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## OpenEnded
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.35         | 0.30                 | 0.39          |
+| GPT4o      | 0.30         | 0.33                 | 0.41          |
+| Neutral    | 0.19         | 0.20                 | 0.02          |
+| None       | 0.17         | 0.17                 | 0.19          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+```
@@ -103,3 +103,53 @@ val response = llmLoader.generate(prompt)
 
 
 
+## Benchmarking
+
+We have generated a total of 400 questions, 100 from each category. These questions were labeled and reviewed by 3 physician annotators. `%` indicates the preference rate
+
+```bash
+## Overall
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.24         | 0.25                 | 0.38          |
+| GPT4o      | 0.19         | 0.26                 | 0.27          |
+| Neutral    | 0.43         | 0.36                 | 0.18          |
+| None       | 0.14         | 0.13                 | 0.17          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## Summary 
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.47         | 0.48                 | 0.42          |
+| GPT4o      | 0.25         | 0.25                 | 0.25          |
+| Neutral    | 0.22         | 0.22                 | 0.25          |
+| None       | 0.07         | 0.05                 | 0.08          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## QA
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.35         | 0.36                 | 0.42          |
+| GPT4o      | 0.24         | 0.24                 | 0.29          |
+| Neutral    | 0.33         | 0.33                 | 0.18          |
+| None       | 0.09         | 0.07                 | 0.11          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## BioMedical
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.33         | 0.24                 | 0.57          |
+| GPT4o      | 0.12         | 0.08                 | 0.16          |
+| Neutral    | 0.45         | 0.57                 | 0.16          |
+| None       | 0.10         | 0.10                 | 0.10          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## OpenEnded
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.35         | 0.30                 | 0.39          |
+| GPT4o      | 0.30         | 0.33                 | 0.41          |
+| Neutral    | 0.19         | 0.20                 | 0.02          |
+| None       | 0.17         | 0.17                 | 0.19          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+```
@@ -127,3 +127,53 @@ val response = llmLoader.generate(prompt)
 
 
 
+## Benchmarking
+
+We have generated a total of 400 questions, 100 from each category. These questions were labeled and reviewed by 3 physician annotators. `%` indicates the preference rate
+
+```bash
+## Overall
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.24         | 0.25                 | 0.38          |
+| GPT4o      | 0.19         | 0.26                 | 0.27          |
+| Neutral    | 0.43         | 0.36                 | 0.18          |
+| None       | 0.14         | 0.13                 | 0.17          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## Summary 
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.47         | 0.48                 | 0.42          |
+| GPT4o      | 0.25         | 0.25                 | 0.25          |
+| Neutral    | 0.22         | 0.22                 | 0.25          |
+| None       | 0.07         | 0.05                 | 0.08          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## QA
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.35         | 0.36                 | 0.42          |
+| GPT4o      | 0.24         | 0.24                 | 0.29          |
+| Neutral    | 0.33         | 0.33                 | 0.18          |
+| None       | 0.09         | 0.07                 | 0.11          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## BioMedical
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.33         | 0.24                 | 0.57          |
+| GPT4o      | 0.12         | 0.08                 | 0.16          |
+| Neutral    | 0.45         | 0.57                 | 0.16          |
+| None       | 0.10         | 0.10                 | 0.10          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## OpenEnded
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.35         | 0.30                 | 0.39          |
+| GPT4o      | 0.30         | 0.33                 | 0.41          |
+| Neutral    | 0.19         | 0.20                 | 0.02          |
+| None       | 0.17         | 0.17                 | 0.19          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+```
@@ -127,3 +127,53 @@ val response = llmLoader.generate(prompt)
 
 
 
+## Benchmarking
+
+We have generated a total of 400 questions, 100 from each category. These questions were labeled and reviewed by 3 physician annotators. `%` indicates the preference rate
+
+```bash
+## Overall
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.24         | 0.25                 | 0.38          |
+| GPT4o      | 0.19         | 0.26                 | 0.27          |
+| Neutral    | 0.43         | 0.36                 | 0.18          |
+| None       | 0.14         | 0.13                 | 0.17          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## Summary 
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.47         | 0.48                 | 0.42          |
+| GPT4o      | 0.25         | 0.25                 | 0.25          |
+| Neutral    | 0.22         | 0.22                 | 0.25          |
+| None       | 0.07         | 0.05                 | 0.08          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## QA
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.35         | 0.36                 | 0.42          |
+| GPT4o      | 0.24         | 0.24                 | 0.29          |
+| Neutral    | 0.33         | 0.33                 | 0.18          |
+| None       | 0.09         | 0.07                 | 0.11          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## BioMedical
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.33         | 0.24                 | 0.57          |
+| GPT4o      | 0.12         | 0.08                 | 0.16          |
+| Neutral    | 0.45         | 0.57                 | 0.16          |
+| None       | 0.10         | 0.10                 | 0.10          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## OpenEnded
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.35         | 0.30                 | 0.39          |
+| GPT4o      | 0.30         | 0.33                 | 0.41          |
+| Neutral    | 0.19         | 0.20                 | 0.02          |
+| None       | 0.17         | 0.17                 | 0.19          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+```
@@ -127,3 +127,53 @@ val response = llmLoader.generate(prompt)
 
 
 
+## Benchmarking
+
+We have generated a total of 400 questions, 100 from each category. These questions were labeled and reviewed by 3 physician annotators. `%` indicates the preference rate
+
+```bash
+## Overall
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.24         | 0.25                 | 0.38          |
+| GPT4o      | 0.19         | 0.26                 | 0.27          |
+| Neutral    | 0.43         | 0.36                 | 0.18          |
+| None       | 0.14         | 0.13                 | 0.17          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## Summary 
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.47         | 0.48                 | 0.42          |
+| GPT4o      | 0.25         | 0.25                 | 0.25          |
+| Neutral    | 0.22         | 0.22                 | 0.25          |
+| None       | 0.07         | 0.05                 | 0.08          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## QA
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.35         | 0.36                 | 0.42          |
+| GPT4o      | 0.24         | 0.24                 | 0.29          |
+| Neutral    | 0.33         | 0.33                 | 0.18          |
+| None       | 0.09         | 0.07                 | 0.11          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## BioMedical
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.33         | 0.24                 | 0.57          |
+| GPT4o      | 0.12         | 0.08                 | 0.16          |
+| Neutral    | 0.45         | 0.57                 | 0.16          |
+| None       | 0.10         | 0.10                 | 0.10          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+
+## OpenEnded
+| Model      | Factuality % | Clinical Relevancy % | Conciseness % |
+|------------|--------------|----------------------|---------------|
+| JSL-MedS   | 0.35         | 0.30                 | 0.39          |
+| GPT4o      | 0.30         | 0.33                 | 0.41          |
+| Neutral    | 0.19         | 0.20                 | 0.02          |
+| None       | 0.17         | 0.17                 | 0.19          |
+| Total      | 1.00         | 1.00                 | 1.00          |
+```