|
| 1 | +{%- capture title -%} |
| 2 | +BertForAssertionClassification |
| 3 | +{%- endcapture -%} |
| 4 | + |
| 5 | +{%- capture model -%} |
| 6 | +model |
| 7 | +{%- endcapture -%} |
| 8 | + |
| 9 | +{%- capture model_description -%} |
| 10 | +BertForAssertionClassification extracts the assertion status from text by analyzing both the extracted entities |
| 11 | +and their surrounding context. |
| 12 | + |
| 13 | +This classifier leverages pre-trained BERT models fine-tuned on biomedical text (e.g., BioBERT) and applies a |
| 14 | +sequence classification/regression head (a linear layer on the pooled output) to support multi-class document |
| 15 | +classification. |
| 16 | + |
| 17 | +**Key features:** |
| 18 | + |
| 19 | +- Accepts DOCUMENT and CHUNK type inputs and produces ASSERTION type annotations. |
| 20 | +- Emphasizes entity context by marking target entities with special tokens (e.g., [entity]), allowing the model to better focus on them. |
| 21 | +- Utilizes a transformer-based architecture (BERT for Sequence Classification) to achieve accurate assertion status prediction. |
| 22 | + |
| 23 | +**Input Example:** |
| 24 | + |
| 25 | +This annotator preprocesses the input text to emphasize the target entities as follows: |
| 26 | +[CLS] Patient with [entity] severe fever [entity]. |
| 27 | + |
| 28 | +Models from the HuggingFace 🤗 Transformers library are also compatible with |
| 29 | +Spark NLP 🚀. To see which models are compatible and how to import them see |
| 30 | +Import Transformers into Spark NLP 🚀 |
| 31 | +https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 |
| 32 | + |
| 33 | +Parameters: |
| 34 | + |
| 35 | +- `configProtoBytes`: ConfigProto from tensorflow, serialized into byte array. |
| 36 | +- `classificationCaseSensitive`: Whether to use case sensitive classification. Default is True. |
| 37 | + |
| 38 | + |
| 39 | + |
| 40 | + {%- endcapture -%} |
| 41 | + |
| 42 | + |
| 43 | +{%- capture model_input_anno -%} |
| 44 | +DOCUMENT, CHUNK |
| 45 | +{%- endcapture -%} |
| 46 | + |
| 47 | +{%- capture model_output_anno -%} |
| 48 | +ASSERTION |
| 49 | +{%- endcapture -%} |
| 50 | + |
| 51 | +{%- capture model_python_medical -%} |
| 52 | +from johnsnowlabs import nlp, medical |
| 53 | + |
| 54 | +document_assembler = nlp.DocumentAssembler()\ |
| 55 | + .setInputCol("text") \ |
| 56 | + .setOutputCol("document") |
| 57 | + |
| 58 | +sentence_detector = nlp.SentenceDetector()\ |
| 59 | + .setInputCols("document")\ |
| 60 | + .setOutputCol("sentence") |
| 61 | + |
| 62 | +tokenizer = nlp.Tokenizer()\ |
| 63 | + .setInputCols(["document"])\ |
| 64 | + .setOutputCol("token") |
| 65 | + |
| 66 | +embeddings = nlp.WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\ |
| 67 | + .setInputCols(["sentence", "token"])\ |
| 68 | + .setOutputCol("embeddings")\ |
| 69 | + .setCaseSensitive(False) |
| 70 | + |
| 71 | +ner = medical.NerModel.pretrained("ner_clinical", "en", "clinical/models")\ |
| 72 | + .setInputCols(["sentence", "token", "embeddings"])\ |
| 73 | + .setOutputCol("ner") |
| 74 | + |
| 75 | +ner_converter = medical.NerConverterInternal()\ |
| 76 | + .setInputCols(["sentence", "token", "ner"])\ |
| 77 | + .setOutputCol("ner_chunk")\ |
| 78 | + .setWhiteList(["PROBLEM"]) |
| 79 | + |
| 80 | +assertion_classifier = medical.BertForAssertionClassification.pretrained("assertion_bert_classification_clinical", "en", "clinical/models")\ |
| 81 | + .setInputCols(["sentence", "ner_chunk"])\ |
| 82 | + .setOutputCol("assertion_class") |
| 83 | + |
| 84 | +pipeline = nlp.Pipeline(stages=[ |
| 85 | + document_assembler, |
| 86 | + sentence_detector, |
| 87 | + tokenizer, |
| 88 | + embeddings, |
| 89 | + ner, |
| 90 | + ner_converter, |
| 91 | + assertion_classifier |
| 92 | +]) |
| 93 | + |
| 94 | +text = """ |
| 95 | +GENERAL: He is an elderly gentleman in no acute distress. He is sitting up in bed eating his breakfast. He is alert and oriented and answering questions appropriately. |
| 96 | +HEENT: Sclerae showed mild arcus senilis in the right. Left was clear. Pupils are equally round and reactive to light. Extraocular movements are intact. Oropharynx is clear. |
| 97 | +NECK: Supple. Trachea is midline. No jugular venous pressure distention is noted. No adenopathy in the cervical, supraclavicular, or axillary areas. |
| 98 | +ABDOMEN: Soft and not tender. There may be some fullness in the left upper quadrant, although I do not appreciate a true spleen with inspiration. |
| 99 | +EXTREMITIES: There is some edema, but no cyanosis and clubbing . |
| 100 | +""" |
| 101 | + |
| 102 | +data = spark.createDataFrame([[text]]).toDF("text") |
| 103 | +result = pipeline.fit(data).transform(data) |
| 104 | + |
| 105 | + |
| 106 | +# result |
| 107 | + |
| 108 | ++--------------------------------------------------------------+-----+----+---------+----------------------+ |
| 109 | +|ner_chunk |begin|end |ner_label|assertion_class_result| |
| 110 | ++--------------------------------------------------------------+-----+----+---------+----------------------+ |
| 111 | +|acute distress |43 |56 |PROBLEM |absent | |
| 112 | +|mild arcus senilis in the right |191 |221 |PROBLEM |present | |
| 113 | +|jugular venous pressure distention |380 |413 |PROBLEM |absent | |
| 114 | +|adenopathy in the cervical, supraclavicular, or axillary areas|428 |489 |PROBLEM |absent | |
| 115 | +|tender |514 |519 |PROBLEM |absent | |
| 116 | +|some fullness in the left upper quadrant |535 |574 |PROBLEM |possible | |
| 117 | +|some edema |660 |669 |PROBLEM |present | |
| 118 | +|cyanosis |679 |686 |PROBLEM |absent | |
| 119 | +|clubbing |692 |699 |PROBLEM |absent | |
| 120 | ++--------------------------------------------------------------+-----+----+---------+----------------------+ |
| 121 | + |
| 122 | + |
| 123 | +{%- endcapture -%} |
| 124 | + |
| 125 | + |
| 126 | +{%- capture model_scala_medical -%} |
| 127 | + |
| 128 | +import spark.implicits._ |
| 129 | + |
| 130 | +val documentAssembler = new DocumentAssembler() |
| 131 | + .setInputCol("text") |
| 132 | + .setOutputCol("document") |
| 133 | + |
| 134 | +val tokenizer = new Tokenizer() |
| 135 | + .setInputCols("document") |
| 136 | + .setOutputCol("token") |
| 137 | + |
| 138 | +val wordEmbeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models") |
| 139 | + .setInputCols("document", "token") |
| 140 | + .setOutputCol("embeddings") |
| 141 | + |
| 142 | +val jslNer = MedicalNerModel.pretrained("ner_jsl", "en", "clinical/models") |
| 143 | + .setInputCols("sentence", "token", "embeddings") |
| 144 | + .setOutputCol("jsl_ner") |
| 145 | + |
| 146 | +val jslNerConverter = new NerConverterInternal() |
| 147 | + .setInputCols("sentence", "token", "jsl_ner") |
| 148 | + .setOutputCol("ner_chunks") |
| 149 | + |
| 150 | +val clinicalAssertion = BertForAssertionClassification.pretrained("assertion_bert_classification_clinical", "en", "clinical/models") |
| 151 | + .setInputCols("sentence", "ner_chunk") |
| 152 | + .setOutputCol("assertion") |
| 153 | + .setCaseSensitive(false) |
| 154 | + |
| 155 | +val pipeline = new Pipeline().setStages( |
| 156 | + Array( |
| 157 | + documentAssembler, |
| 158 | + sentenceDetector, |
| 159 | + tokenizer, |
| 160 | + wordEmbeddings, |
| 161 | + jslNer, |
| 162 | + jslNerConverter, |
| 163 | + clinicalAssertion |
| 164 | + )) |
| 165 | + |
| 166 | +val text = "GENERAL: He is an elderly gentleman in no acute distress. He is sitting up in bed eating his breakfast. He is alert and oriented and answering questions appropriately. |
| 167 | +HEENT: Sclerae showed mild arcus senilis in the right. Left was clear. Pupils are equally round and reactive to light. Extraocular movements are intact. Oropharynx is clear. |
| 168 | +NECK: Supple. Trachea is midline. No jugular venous pressure distention is noted. No adenopathy in the cervical, supraclavicular, or axillary areas. |
| 169 | +ABDOMEN: Soft and not tender. There may be some fullness in the left upper quadrant, although I do not appreciate a true spleen with inspiration. |
| 170 | +EXTREMITIES: There is some edema, but no cyanosis and clubbing ." |
| 171 | + |
| 172 | +val df = Seq(text).toDF("text") |
| 173 | +val result = pipeline.fit(df).transform(df) |
| 174 | + |
| 175 | + |
| 176 | +# result |
| 177 | ++--------------------------------------------------------------+-----+----+---------+----------------------+ |
| 178 | +|ner_chunk |begin|end |ner_label|assertion_class_result| |
| 179 | ++--------------------------------------------------------------+-----+----+---------+----------------------+ |
| 180 | +|acute distress |43 |56 |PROBLEM |absent | |
| 181 | +|mild arcus senilis in the right |191 |221 |PROBLEM |present | |
| 182 | +|jugular venous pressure distention |380 |413 |PROBLEM |absent | |
| 183 | +|adenopathy in the cervical, supraclavicular, or axillary areas|428 |489 |PROBLEM |absent | |
| 184 | +|tender |514 |519 |PROBLEM |absent | |
| 185 | +|some fullness in the left upper quadrant |535 |574 |PROBLEM |possible | |
| 186 | +|some edema |660 |669 |PROBLEM |present | |
| 187 | +|cyanosis |679 |686 |PROBLEM |absent | |
| 188 | +|clubbing |692 |699 |PROBLEM |absent | |
| 189 | ++--------------------------------------------------------------+-----+----+---------+----------------------+ |
| 190 | + |
| 191 | +{%- endcapture -%} |
| 192 | + |
| 193 | +{%- capture model_api_link -%} |
| 194 | +[BertForAssertionClassification](https://nlp.johnsnowlabs.com/licensed/api/com/johnsnowlabs/nlp/annotators/assertion/BertForAssertionClassification.html) |
| 195 | +{%- endcapture -%} |
| 196 | + |
| 197 | +{%- capture model_python_api_link -%} |
| 198 | + |
| 199 | +[BertForAssertionClassification](https://nlp.johnsnowlabs.com/licensed/api/python/reference/autosummary/sparknlp_jsl/annotator/assertion/bert_for_assertion_classification/index.html) |
| 200 | + |
| 201 | +{%- endcapture -%} |
| 202 | + |
| 203 | +{%- capture model_notebook_link -%} |
| 204 | +[BertForAssertionClassification](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/2.4.BertForAssertionClassification.ipynb) |
| 205 | +{%- endcapture -%} |
| 206 | + |
| 207 | +{% include templates/licensed_approach_model_medical_fin_leg_template.md |
| 208 | +title=title |
| 209 | +model=model |
| 210 | +model_description=model_description |
| 211 | +model_input_anno=model_input_anno |
| 212 | +model_output_anno=model_output_anno |
| 213 | +model_python_medical=model_python_medical |
| 214 | +model_scala_medical=model_scala_medical |
| 215 | +model_api_link=model_api_link |
| 216 | +model_python_api_link=model_python_api_link |
| 217 | +model_notebook_link=model_notebook_link |
| 218 | +%} |
0 commit comments