Skip to content

Commit 3e28540

Browse files
authored
helathcare (#1706)
1 parent 4da65ce commit 3e28540

File tree

7 files changed

+46
-36
lines changed

7 files changed

+46
-36
lines changed

docs/en/jsl/nlu_for_healthcare.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ and the accompanying video below for an introduction to every healthcare domain.
2727
**Named entities** are sub-strings in a text that can be classified into catogires of a domain. For example, in the String
2828
`"Tesla is a great stock to invest in "` , the sub-string `"Tesla"` is a named entity, it can be classified with the label `company` by an ML algorithm.
2929
**Named entities** can easily be extracted by the various pre-trained Deep Learning based NER algorithms provided by NLU.
30-
NER models can be trained for many different domains and aquire expert domain knowledge in each of them. JSL provides a wide array of experts for various Medical, Helathcare and Clinical domains
30+
NER models can be trained for many different domains and aquire expert domain knowledge in each of them. JSL provides a wide array of experts for various Medical, Healthcare and Clinical domains
3131

3232
This algorithm is provided by **Spark NLP for Healthcare's** [MedicalNerModel](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators)
3333

@@ -70,7 +70,7 @@ Named Entities extracted by an NER model can be further classified into sub-clas
7070
All sentences have the entity `headache` which is of class `disease`.
7171
But there is a semantic difference on what the actual status of the disease mentioned in text is. In the first and third sentence, `Billy has no headache`, but in the second sentence `Billy actually has a sentence`.
7272
The `Entity Assertion` Algorithms provided by JSL solve this problem. The `disease` entity can be classified into `ABSENT` for the first case and into `PRESENT` for the second case. The third case can be classified into `PRESENT IN FAMILY`.
73-
This has immense implications for various data analytical approaches in the helathcare domain.
73+
This has immense implications for various data analytical approaches in the healthcare domain.
7474

7575
I.e. imagine you want you want to make a study about hearth attacks and survival rate of potential procedures. You can process all your digital patient notes with an Medical NER model and filter for documents that have the `Hearth Attack` entity.
7676
But your collected data will have wrong data entries because of the above mentioned Entity status problem. You cannot deduct that a document is talking about a patient having a hearth attack, unless you **assert** that the problem is actually there which is what the Resolutions algorithms do for you.

docs/en/jsl/release_notes.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3359,7 +3359,7 @@ for the first time by NLU, including ancient and exotic languages like `Ancient
33593359
On the healthcare NLP side, a new `ZeroShotRelationExtractionModel` is available, which can extract relations between
33603360
clinical entities in an unsupervised fashion, no training required!
33613361
Additionally, New French and Italian Deidentification models are available for clinical and healthcare domains.
3362-
Powerd by the fantastic [ Spark NLP for helathcare 3.5.0 release](https://nlp.johnsnowlabs.com/docs/en/spark_nlp_healthcare_versions/licensed_release_notes)
3362+
Powerd by the fantastic [ Spark NLP for healthcare 3.5.0 release](https://nlp.johnsnowlabs.com/docs/en/spark_nlp_healthcare_versions/licensed_release_notes)
33633363

33643364
</div><div class="h3-box" markdown="1">
33653365

@@ -4163,7 +4163,7 @@ Integrates the incredible [Spark NLP for Healthcare](https://nlp.johnsnowlabs.co
41634163

41644164
## NLU Version 3.3.0
41654165

4166-
#### 2000%+ Speedup on small data, 63 new models for 100+ Languages with 6 new supported Transformer classes including BERT, XLM-RoBERTa, alBERT, Longformer, XLnet based models, 48 NER profiling helathcare pipelines and much more in John Snow Labs NLU 3.3.0
4166+
#### 2000%+ Speedup on small data, 63 new models for 100+ Languages with 6 new supported Transformer classes including BERT, XLM-RoBERTa, alBERT, Longformer, XLnet based models, 48 NER profiling healthcare pipelines and much more in John Snow Labs NLU 3.3.0
41674167

41684168
We are incredibly excited to announce NLU 3.3.0 has been released!
41694169
It comes with a up to 2000%+ speedup on small datasets, 6 new Types of Deep Learning transformer models, including

docs/en/licensed_install.md

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ The first step you need to carry out is installing johnsnowlabs library. This is
9090

9191
</div><div class="h3-box" markdown="1">
9292

93-
#### 2. Installing Enterprise NLP (Finance, Legal, Helathcare)
93+
#### 2. Installing Enterprise NLP (Finance, Legal, Healthcare)
9494

9595
Import `johnsnowlabs` and use our one-liner `nlp.install()` to install all the dependencies, downloading the jars (yes, Spark NLP runs on top of the Java Virtual Machine!), preparing the cluster environment variables, licenses, etc!
9696

@@ -473,7 +473,7 @@ Make sure the following prerequisites are set:
473473

474474
</div><div class="h3-box" markdown="1">
475475

476-
## Non-johnsnowlabs Helathcare NLP on Ubuntu
476+
## Non-johnsnowlabs Healthcare NLP on Ubuntu
477477
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.
478478

479479
For installing John Snow Labs NLP libraries on an Ubuntu machine/VM please run the following command:
@@ -511,7 +511,7 @@ The install script downloads a couple of example notebooks that you can use to s
511511

512512
</div><div class="h3-box" markdown="1">
513513

514-
## Non-johnsnowlabs Helathcare NLP via Docker
514+
## Non-johnsnowlabs Healthcare NLP via Docker
515515
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.
516516

517517
A docker image that contains all the required libraries for installing and running Enterprise Spark NLP libraries is also available. However, it does not contain the library itself, as it is licensed, and requires installation credentials.
@@ -576,10 +576,10 @@ curl -o sparknlp_keys.txt https://raw.githubusercontent.com/JohnSnowLabs/spark-n
576576
577577
</div><div class="h3-box" markdown="1">
578578
579-
## Non-johnsnowlabs Helathcare NLP on python
579+
## Non-johnsnowlabs Healthcare NLP on python
580580
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.
581581
582-
You can install the Helathcare NLP by using:
582+
You can install the Healthcare NLP by using:
583583
584584
```bash
585585
pip install -q spark-nlp-jsl==${version} --extra-index-url https://pypi.johnsnowlabs.com/${secret.code} --upgrade
@@ -658,7 +658,7 @@ If you want to download the source files (jar and whl files) locally, you can fo
658658
# Install Spark NLP from PyPI
659659
pip install spark-nlp==${public_version}
660660
661-
#install Spark NLP helathcare
661+
#install Spark NLP Healthcare
662662
663663
pip install spark-nlp-jsl==${version} --extra-index-url https://pypi.johnsnowlabs.com/${secret.code} --upgrade
664664
@@ -674,7 +674,7 @@ spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:${public_version} --
674674
675675
</div><div class="h3-box" markdown="1">
676676
677-
## Non-johnsnowlabs Helathcare NLP for Scala
677+
## Non-johnsnowlabs Healthcare NLP for Scala
678678
> These instructions use non-johnsnowlabs installation syntax, since `johnsnowlabs` is a Python library.
679679
680680
#### Use Spark NLP in Spark shell
@@ -701,7 +701,7 @@ spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:${public-version} --j
701701
702702
</div><div class="h3-box" markdown="1">
703703
704-
## Non-johnsnowlabs Helathcare NLP in Sbt project
704+
## Non-johnsnowlabs Healthcare NLP in Sbt project
705705
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.
706706
707707
1.Download the fat jar for Enterprise Spark NLP.
@@ -733,7 +733,7 @@ unmanagedJars in Compile += file("lib/sparknlp-jsl.jar")
733733
734734
</div><div class="h3-box" markdown="1">
735735
736-
## Non-johnsnowlabs Helathcare NLP on Colab
736+
## Non-johnsnowlabs Healthcare NLP on Colab
737737
738738
This is the way to run Clinical NLP in Google Colab if you don't use `johnsnowlabs` library.
739739

@@ -792,7 +792,7 @@ os.environ.update(license_keys)
792792

793793
</div><div class="h3-box" markdown="1">
794794

795-
## Non-johnsnowlabs Helathcare NLP on GCP Dataproc
795+
## Non-johnsnowlabs Healthcare NLP on GCP Dataproc
796796
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.
797797

798798
- You can follow the steps here for [installation via IU](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/platforms/dataproc)
@@ -882,7 +882,7 @@ Or you can set `.master('yarn')`.
882882
883883
</div><div class="h3-box" markdown="1">
884884
885-
## Non-johnsnowlabs Helathcare NLP on AWS SageMaker
885+
## Non-johnsnowlabs Healthcare NLP on AWS SageMaker
886886
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.
887887
888888
1. Access AWS Sagemaker in AWS.
@@ -923,7 +923,7 @@ spark = sparknlp_jsl.start(license_keys['SECRET'])
923923
924924
</div><div class="h3-box" markdown="1">
925925
926-
## Non-johnsnowlabs Helathcare NLP with Poetry
926+
## Non-johnsnowlabs Healthcare NLP with Poetry
927927
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.
928928
929929
This is a sample `project.toml` file which you can use with `poetry install` to setup spark NLP + the Healthcare python library `spark-nlp-jsl`
@@ -954,7 +954,7 @@ build-backend = "poetry.core.masonry.api"
954954
955955
</div><div class="h3-box" markdown="1">
956956
957-
## Non-johnsnowlabs Helathcare NLP on AWS EMR
957+
## Non-johnsnowlabs Healthcare NLP on AWS EMR
958958
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.
959959
960960
In this page we explain how to setup Spark-NLP + Spark-NLP Healthcare in AWS EMR, using the AWS console.
@@ -971,21 +971,21 @@ In this page we explain how to setup Spark-NLP + Spark-NLP Healthcare in AWS EMR
971971
- select required applications
972972
973973
974-
![Non-johnsnowlabs Helathcare NLP on AWS EMR](/assets/images/emr/image.png "lit_shadow")
974+
![Non-johnsnowlabs Healthcare NLP on AWS EMR](/assets/images/emr/image.png "lit_shadow")
975975
976976
- Specify EC2 instances for the cluster, as primary/master node and cores/workers
977977
- Specify the storage/ EBS volume
978978
979-
![Non-johnsnowlabs Helathcare NLP on AWS EMR](/assets/images/emr/image-1.png "lit_shadow")
979+
![Non-johnsnowlabs Healthcare NLP on AWS EMR](/assets/images/emr/image-1.png "lit_shadow")
980980
981981
- Choose Cluster scaling and provisioning
982982
- Choose Networking / VPC
983983
984-
![Non-johnsnowlabs Helathcare NLP on AWS EMR](/assets/images/emr/image-2.png "lit_shadow")
984+
![Non-johnsnowlabs Healthcare NLP on AWS EMR](/assets/images/emr/image-2.png "lit_shadow")
985985
986986
- Choose Security Groups/Firewall for primary/master node and cores/workers/slaves
987987
988-
![Non-johnsnowlabs Helathcare NLP on AWS EMR](/assets/images/emr/image-3.png "lit_shadow")
988+
![Non-johnsnowlabs Healthcare NLP on AWS EMR](/assets/images/emr/image-3.png "lit_shadow")
989989
990990
- If you have add steps , that will be executed after cluster is provisioned
991991
- Specify the S3 location for logs
@@ -1064,7 +1064,7 @@ You can change spark configuration according to your needs.
10641064

10651065
</div><div class="h3-box" markdown="1">
10661066

1067-
## Non-johnsnowlabs Helathcare NLP on Amazon Linux 2
1067+
## Non-johnsnowlabs Healthcare NLP on Amazon Linux 2
10681068
> These instructions use non-johnsnowlabs installation syntax. For simplified installation with `johnsnowlabs` library, check first section.
10691069

10701070
```bash
@@ -1091,7 +1091,7 @@ You can pick the index number (I am using java-8 as default - index 2):
10911091

10921092
</div><div class="h3-box" markdown="1">
10931093

1094-
![Non-johnsnowlabs Helathcare NLP on Amazon Linux 2](/assets/images/installation/amazon-linux.png "lit_shadow")
1094+
![Non-johnsnowlabs Healthcare NLP on Amazon Linux 2](/assets/images/installation/amazon-linux.png "lit_shadow")
10951095

10961096
</div><div class="h3-box" markdown="1">
10971097

docs/en/spark_nlp_healthcare_versions/licensed_release_notes.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -97,9 +97,6 @@ text = """he patient is a 42-year-old female and has diabetes mellitus with diab
9797
| scope_average | diabetes mellitus | E11.40 | nervous system disorder due to diabetes mellitus [type 2 diabetes mellitus with diabetic neuropathy, unspecified] |
9898

9999

100-
101-
102-
103100
</div><div class="h3-box" markdown="1">
104101

105102
#### De-identifying Sensitive Data in Relational Databases with a Few Lines of Codes

docs/en/spark_nlp_healthcare_versions/release_notes_5_5_2.md

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -97,9 +97,6 @@ text = """he patient is a 42-year-old female and has diabetes mellitus with diab
9797
| scope_average | diabetes mellitus | E11.40 | nervous system disorder due to diabetes mellitus [type 2 diabetes mellitus with diabetic neuropathy, unspecified] |
9898

9999

100-
101-
102-
103100
</div><div class="h3-box" markdown="1">
104101

105102
#### De-identifying Sensitive Data in Relational Databases with a Few Lines of Codes
@@ -395,8 +392,6 @@ Muc5AC, human epidermal growth factor receptor-2 (HER2), and Muc6; positive for
395392

396393
Please check the [ZeroShot Clinical NER](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/1.6.ZeroShot_Clinical_NER.ipynb) Notebook for more information
397394

398-
399-
400395
</div><div class="h3-box" markdown="1">
401396

402397
#### Introducing Clinical Document Analysis with One-Liner Pretrained Pipelines for Specific Clinical Tasks and Concepts
@@ -450,8 +445,6 @@ The patient, Nathaneil Bakes, is 43 years old, her Contact number: 308-657-8469
450445

451446
Please check the [Task Based Clinical Pretrained Pipelines](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/11.3.Task_Based_Clinical_Pretrained_Pipelines.ipynb) model for more information
452447

453-
454-
455448
</div><div class="h3-box" markdown="1">
456449

457450
#### Introducing 2 New Named Entity Recognition and an Assertion Models for Gene and Phenotype Features

docs/en/spark_ocr_versions/ocr_release_notes.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,16 +30,19 @@ Release date: 23-01-2024
3030
* New Dicom Pretrained Pipelines.
3131
* New VisualDocumentProcessor.
3232

33+
</div><div class="h3-box" markdown="1">
34+
3335
## New Obfuscation Features in ImageDrawRegions
3436
ImageDrawRegions' main purpose is to draw solid rectangles on top of regions that typically come from NER or some other similar model. Many times, it is interesting not to only draw solid rectangles on top of detected entities, but some other values, like obfuscated values. For example, with the purpose of protecting patient's privacy, you may want to replace a name with another name, or a date with a modified date.
3537

3638
This feature, together with the Deidentification transformer from Spark NLP for Healthcare can be combined to create a 'rendering aware' obfuscation pipeline capable of rendering obfuscated values back to the source location where the original entities were present. The replacement must be 'rendering aware' because not every example of an entity requires the same space on the page to be rendered. So for example, 'Bob Smith' would be a good replacement for 'Rod Adams', but not for 'Alessandro Rocatagliata', simply because they render differently on the page. Let's take a look at a quick example,
3739

38-
![image](/assets/images/ocr/obfuscation_impainting.png)
40+
![New Obfuscation Features in ImageDrawRegions](/assets/images/ocr/obfuscation_impainting.png)
3941

4042
to the left we see a portion of a document in which we want to apply obfuscation. We want to focus on the entities representing PHI, like patient name or phone number. On the right side, after applying the transformation, we have an image containing fake values.
4143
You can see that the PHI in the source document has been replaced by similar entities, and these entities not only are of a similar category, but are also of a similar length.
4244

45+
</div><div class="h3-box" markdown="1">
4346

4447
## New obfuscation features in DicomMetadataDeidentifier
4548
Now you can customize the way metadata is de-identified in DicomMetadataDeidentifier. Customization happens through a number of different actions you can apply to each tag, for example, replacing a specific tag with a literal, or shifting a date by a number of days randomly.
@@ -70,6 +73,7 @@ ShiftTimeByRandomNbOfSecs | DT | coherent
7073
replaceWithRandomName | PN, LO | coherent
7174
shiftDateByFixedNbOfDays | DA | 112
7275

76+
</div><div class="h3-box" markdown="1">
7377

7478
### New Dicom Pretrained Pipelines
7579
We are releasing three new Dicom Pretrained Pipelines:
@@ -79,6 +83,8 @@ We are releasing three new Dicom Pretrained Pipelines:
7983

8084
Check notebook [here](https://github.com/JohnSnowLabs/visual-nlp-workshop/blob/master/jupyter/Dicom/SparkOcrDicomPretrainedPipelines.ipynb) for examples on how to use this.
8185

86+
</div><div class="h3-box" markdown="1">
87+
8288
### New Visual Document Processor
8389
New VisualDocumentProcessor that produces OCR text and tables on a single pass!,
8490
In plugs and play into any Visual NLP pipeline, it receives images, and it returns texts and tables following the same existing schemas for these datatypes,
@@ -93,6 +99,8 @@ result = proc.transform(df)
9399

94100
Check this [sample notebook](https://github.com/JohnSnowLabs/visual-nlp-workshop/blob/master/jupyter/SparkOcrVisualDocumentProcessor.ipynb) for an example on how to use it.
95101

102+
</div><div class="h3-box" markdown="1">
103+
96104
### Other Dicom Changes
97105
* DicomDrawRegions support for setting compression quality, now you can pick different compression qualities for each of the different compression algorithms supported. The API receives an array with each element specifying the compression type like a key/value,
98106
Example,
@@ -101,6 +109,8 @@ DicomDrawRegions()\
101109
.setCompressionQuality(["8Bit=90","LSNearLossless=2"])
102110
```
103111

112+
</div><div class="h3-box" markdown="1">
113+
104114
### Enhancements & Bug Fixes
105115
* New parameter in SVS tool that specifies whether to rename output file or not,
106116
```

0 commit comments

Comments
 (0)