You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PS: The reason why pipelines with the same stages have different costs is due to the layers of the NER model and the hardcoded regexes in Deidentification.
We evaluated six Johnsnow Lab LLM models across ten task categories: MedMCQA, MedQA, MMLU Anatomy, MMLU Clinical Knowledge, MMLU College Biology, MMLU College Medicine, MMLU Medical Genetics, MMLU Professional Medicine, and PubMedQA.
34
+
35
+
Each model's performance was measured based on accuracy, reflecting how well it handled medical reasoning, clinical knowledge, and biomedical question answering.
36
+
37
+
</div><divclass="h3-box"markdown="1">
13
38
14
39
<divclass="h3-box"markdown="1">
15
40
@@ -204,4 +229,4 @@ GPT4o demonstrates strength in Clinical Relevance, especially in Biomedical and
204
229
Neutral and "None" ratings across categories highlight areas for further optimization for both models.
205
230
This analysis underscores the strengths of JSL-MedM in producing concise and factual outputs, while GPT4o shows a stronger contextual understanding in certain specialized tasks.
0 commit comments