Skip to content

Commit 2b10e8d

Browse files
authored
Current fixing (#1200)
1 parent c79ec59 commit 2b10e8d

20 files changed

+100
-109
lines changed

docs/_sass/custom.scss

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -466,6 +466,11 @@ h2.h2_doc {
466466
}
467467
}
468468

469+
img[title*="lit_shadow"] {
470+
width:100%;
471+
box-shadow: 0 3px 6px rgba(0,0,0,0.16), 0 3px 6px rgba(0,0,0,0.23);
472+
}
473+
469474
.table-inner {
470475
overflow: auto;
471476
}
File renamed without changes.

docs/en/benchmark.md

Lines changed: 21 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -345,8 +345,8 @@ In that case, try playing with various parameters in mapper or retrain/ augment
345345
- DataBricks Config: 8 CPU Core, 32GiB RAM (2 worker, Standard_DS3_v2)
346346
- AWS Config: 8 CPU Cores, 14GiB RAM (c6a.2xlarge)
347347

348-
349-
| partition | DataBricks <br> mapper timing | AWS <br> mapper timing | DataBricks <br> resolver timing | AWS <br> resolver timing | DataBricks <br> mapper and resolver timing | AWS <br> mapper and resolver timing |
348+
{:.table-model-big.db}
349+
| Partition | DataBricks <br> mapper timing | AWS <br> mapper timing | DataBricks <br> resolver timing | AWS <br> resolver timing | DataBricks <br> mapper and resolver timing | AWS <br> mapper and resolver timing |
350350
| --------- | ------------- | ------------- | --------------- | --------------- | -------------------------- | -------------------------- |
351351
| 4 | 23 sec | 11 sec | 4.36 mins | 3.02 mins | 2.40 mins | 1.58 mins |
352352
| 8 | 15 sec | 9 sec | 3.21 mins | 2.27 mins | 1.48 mins | 1.35 mins |
@@ -362,7 +362,8 @@ In that case, try playing with various parameters in mapper or retrain/ augment
362362
- DataBricks Config: 16 CPU Core,64GiB RAM (4 worker, Standard_DS3_v2)
363363
- AWS Config: 16 CPU Cores, 27GiB RAM (c6a.4xlarge)
364364

365-
| partition | DataBricks <br> mapper timing | AWS <br> mapper timing | DataBricks <br> resolver timing | AWS <br> resolver timing | DataBricks <br> mapper and resolver timing | AWS <br> mapper and resolver timing |
365+
{:.table-model-big.db}
366+
| Partition | DataBricks <br> mapper timing | AWS <br> mapper timing | DataBricks <br> resolver timing | AWS <br> resolver timing | DataBricks <br> mapper and resolver timing | AWS <br> mapper and resolver timing |
366367
| --------- | ------------- | ------------- | --------------- | --------------- | -------------------------- | -------------------------- |
367368
| 4 | 32.5 sec | 11 sec | 4.19 mins | 2.53 mins | 2.58 mins | 1.48 mins |
368369
| 8 | 15.1 sec | 7 sec | 2.25 mins | 1.43 mins | 1.38 mins | 1.04 mins |
@@ -376,8 +377,8 @@ In that case, try playing with various parameters in mapper or retrain/ augment
376377
- DataBricks Config: 32 CPU Core, 128GiB RAM (8 worker, Standard_DS3_v2)
377378
- AWS Config: 32 CPU Cores, 58GiB RAM (c6a.8xlarge)
378379

379-
380-
| partition | DataBricks <br> mapper timing | AWS <br> mapper timing | DataBricks <br> resolver timing | AWS <br> resolver timing | DataBricks <br> mapper and resolver timing | AWS <br> mapper and resolver timing |
380+
{:.table-model-big.db}
381+
| Partition | DataBricks <br> mapper timing | AWS <br> mapper timing | DataBricks <br> resolver timing | AWS <br> resolver timing | DataBricks <br> mapper and resolver timing | AWS <br> mapper and resolver timing |
381382
| --------- | ------------- | ------------- | --------------- | --------------- | -------------------------- | -------------------------- |
382383
| 4 | 37.3 sec | 12 sec | 4.46 mins | 2.37 mins | 2.52 mins | 1.47 mins |
383384
| 8 | 26.7 sec | 7 sec | 2.46 mins | 1.39 mins | 1.37 mins | 1.04 mins |
@@ -427,7 +428,8 @@ resolver_pipeline = PipelineModel(
427428

428429
***Results Table***
429430

430-
|partition|preprocessing|embeddings| resolver |onnx_embeddings|resolver_with_onnx_embeddings|
431+
{:.table-model-big.db}
432+
|Partition|preprocessing|embeddings| resolver |onnx_embeddings|resolver_with_onnx_embeddings|
431433
|--------:|------------:|---------:|------------:|--------------:|------------:|
432434
| 4 | 25 sec | 25 sec |7 min 46 sec | 9 sec |8 min 29 sec |
433435
| 8 | 21 sec | 25 sec |5 min 12 sec | 9 sec |4 min 53 sec |
@@ -439,16 +441,10 @@ resolver_pipeline = PipelineModel(
439441
| 512 | 24 sec | 27 sec |4 min 46 sec | 12 sec |4 min 22 sec |
440442
| 1024 | 29 sec | 30 sec |4 min 24 sec | 14 sec |4 min 29 sec |
441443

442-
</div>
443-
444-
445-
446-
444+
</div><div class="h3-box" markdown="1">
447445

448446
## Deidentification Benchmarks
449447

450-
<div class="h3-box" markdown="1">
451-
452448
### Deidentification Comparison Experiment on Clusters
453449

454450
- **Dataset:** 1000 Clinical Texts from MTSamples, approx. 503 tokens and 6 chunks per text.
@@ -502,7 +498,8 @@ deid_pipeline = Pipeline().setStages([
502498

503499
**Dataset:** 1000 Clinical Texts from MTSamples, approx. 503 tokens and 21 chunks per text.
504500

505-
| partition | AWS <br> result timing | DataBricks <br> result timing | Colab <br> result timing |
501+
{:.table-model-big.db}
502+
| Partition | AWS <br> result timing | DataBricks <br> result timing | Colab <br> result timing |
506503
|----------:|-------------:|-------------:|-------------:|
507504
| 1024 | 1 min 3 sec | 1 min 55 sec | 5 min 45 sec |
508505
| 512 | 56 sec | 1 min 26 sec | 5 min 15 sec |
@@ -529,7 +526,7 @@ deid_pipeline = Pipeline().setStages([
529526
- **Instance Type:**
530527
- 8 CPU Cores 52GiB RAM (Colab Pro - High RAM)
531528

532-
529+
{:.table-model-big.db}
533530
|Deidentification Pipeline Name | Elapsed Time | Stages |
534531
|:------------------------------------------------|-----------------:|:-----------------|
535532
|[clinical_deidentification_subentity_optimized](https://nlp.johnsnowlabs.com/2024/03/14/clinical_deidentification_subentity_optimized_en.html)| 67 min 44 seconds| 1 NER, 1 Deidentification, 13 Rule-based NER, 1 clinical embedding, 2 chunk merger |
@@ -651,7 +648,7 @@ pipeline_base = Pipeline().setStages([
651648

652649
<div class="h3-box" markdown="1">
653650

654-
651+
{:.table-model-big.db}
655652
| Partition | EMR <br> Base Pipeline | EMR <br> Optimized Pipeline | EC2 Instance <br> Base Pipeline | EC2 Instance <br> Optimized Pipeline | Databricks <br> Base Pipeline | Databricks <br> Optimized Pipeline |
656653
|-----------|--------------------|------------------------|----------------------------|---------------------------------|---------------|--------------------|
657654
| 1024 | 5 min 1 sec | 2 min 45 sec | 7 min 6 sec | **3 min 26 sec** | **10 min 10 sec** | **6 min 2 sec** |
@@ -716,7 +713,8 @@ The `sbiobertresolve_snomed_findings` model is used as the resolver model. The i
716713

717714
***Results Table***
718715

719-
| partition | NER Timing |NER and Resolver Timing|
716+
{:.table-model-big.db}
717+
| Partition | NER Timing |NER and Resolver Timing|
720718
|----------:|:---------------|:----------------------|
721719
|4 | 24.7 seconds |1 minutes 8.5 seconds|
722720
|8 | 23.6 seconds |1 minutes 7.4 seconds|
@@ -860,7 +858,7 @@ This experiment consisted of training a Name Entity Recognition model (token-lev
860858
We used the Spark NLP class `MedicalNer` and it's method `Approach()` as described in the [documentation](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#medicalner).
861859

862860
The pipeline looks as follows:
863-
![](/assets/images/CPUvsGPUbenchmarkpic4.png)
861+
![Benchmark on MedicalNerDLApproach](/assets/images/CPUvsGPUbenchmarkpic4.png)
864862

865863
</div>
866864
<div class="h3-box" markdown="1">
@@ -925,15 +923,15 @@ CPU times: `~29 min`
925923
| 1024 | 2.5 |
926924
| 2048 | 2.5 |
927925

928-
![](/assets/images/CPUvsGPUbenchmarkpic7.png)
926+
![Inference times](/assets/images/CPUvsGPUbenchmarkpic7.png)
929927

930928
</div>
931929
<div class="h3-box" markdown="1">
932930

933931
#### Performance metrics
934932
A macro F1-score of about `0.92` (`0.90` in micro) was achieved, with the following charts extracted from the `MedicalNerApproach()` logs:
935933

936-
![](/assets/images/CPUvsGPUbenchmarkpic8.png)
934+
![Inference times](/assets/images/CPUvsGPUbenchmarkpic8.png)
937935

938936
</div>
939937
<div class="h3-box" markdown="1">
@@ -950,11 +948,11 @@ You will experiment big GPU improvements in the following cases:
950948

951949
### MultiGPU Inference on Databricks
952950
In this part, we will give you an idea on how to choose appropriate hardware specifications for Databricks. Here is a few different hardwares, their prices, as well as their performance:
953-
![image](https://user-images.githubusercontent.com/25952802/158796429-78ec52b1-c036-4a9c-89c2-d3d1f395f71d.png)
951+
![MultiGPU Inference on Databricks](https://user-images.githubusercontent.com/25952802/158796429-78ec52b1-c036-4a9c-89c2-d3d1f395f71d.png)
954952

955953
Apparently, GPU hardware is the cheapest among them although it performs the best. Let's see how overall performance looks like:
956954

957-
![image](https://user-images.githubusercontent.com/25952802/158799106-8ee03a8b-8590-49ae-9657-b9663b915324.png)
955+
![MultiGPU Inference on Databricks](https://user-images.githubusercontent.com/25952802/158799106-8ee03a8b-8590-49ae-9657-b9663b915324.png)
958956

959957
Figure above clearly shows us that GPU should be the first option of ours.
960958

@@ -1151,4 +1149,4 @@ These findings unequivocally affirm Spark NLP's superiority for NER extraction t
11511149

11521150
*SpaCy with pandas UDFs*: Development might be more straightforward since you're essentially working with Python functions. However, maintaining optimal performance with larger datasets and ensuring scalability can be tricky.
11531151

1154-
</div>
1152+
</div>

docs/en/display.md

Lines changed: 13 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,7 @@ The ability to quickly visualize the entities/relations/assertion statuses, etc.
2727

2828
The visualisation classes work with the outputs returned by both Pipeline.transform() function and LightPipeline.fullAnnotate().
2929

30-
31-
<br/>
30+
</div><div class="h3-box" markdown="1">
3231

3332
### Install Spark NLP Display
3433

@@ -37,10 +36,11 @@ You can install the Spark NLP Display library via pip by using:
3736
```bash
3837
pip install spark-nlp-display
3938
```
40-
<br/>
4139

4240
A complete guideline on how to use the Spark NLP Display library is available <a href="https://github.com/JohnSnowLabs/spark-nlp-display/blob/main/tutorials/Spark_NLP_Display.ipynb">here</a>.
4341

42+
</div><div class="h3-box" markdown="1">
43+
4444
### Visualize a dependency tree
4545

4646
For visualizing a dependency trees generated with <a href="https://nlp.johnsnowlabs.com/docs/en/annotators#dependency-parsers">DependencyParserApproach</a> you can use the following code.
@@ -57,14 +57,12 @@ dependency_vis.display(pipeline_result[0], #should be the results of a single ex
5757
dependency_type_col = 'dependency_type' #specify the dependency type column
5858
)
5959
```
60-
<br/>
6160

6261
The following image gives an example of html output that is obtained for a test sentence:
6362

64-
<img class="image image--xl" src="/assets/images/dependency tree viz.png" style="width:70%; align:center; box-shadow: 0 3px 6px rgba(0,0,0,0.16), 0 3px 6px rgba(0,0,0,0.23);"/>
65-
63+
![Visualize a dependency tree](/assets/images/dependency tree viz.png "lit_shadow")
6664

67-
<br/>
65+
</div><div class="h3-box" markdown="1">
6866

6967
### Visualize extracted named entities
7068

@@ -86,9 +84,9 @@ ner_vis.set_label_colors({'LOC':'#800080', 'PER':'#77b5fe'}) #set label colors b
8684
```
8785
The following image gives an example of html output that is obtained for a couple of test sentences:
8886

89-
<img class="image image--xl" src="/assets/images/ner viz.png" style="width:80%; align:center; box-shadow: 0 3px 6px rgba(0,0,0,0.16), 0 3px 6px rgba(0,0,0,0.23);"/>
87+
![Visualize a dependency tree](/assets/images/ner_viz.png "lit_shadow")
9088

91-
<br/>
89+
</div><div class="h3-box" markdown="1">
9290

9391
### Visualize relations
9492

@@ -108,12 +106,9 @@ re_vis.display(pipeline_result[0], #should be the results of a single example, n
108106
```
109107
The following image gives an example of html output that is obtained for a couple of test sentences:
110108

111-
<img class="image image--xl" src="/assets/images/relations viz.png" style="width:100%;align:center; box-shadow: 0 3px 6px rgba(0,0,0,0.16), 0 3px 6px rgba(0,0,0,0.23);"/>
112-
113-
109+
![Visualize relations](/assets/images/relations_viz.png "lit_shadow")
114110

115-
116-
<br/>
111+
</div><div class="h3-box" markdown="1">
117112

118113
### Visualize assertion status
119114

@@ -137,12 +132,9 @@ assertion_vis.set_label_colors({'TREATMENT':'#008080', 'problem':'#800080'}) #se
137132
```
138133
The following image gives an example of html output that is obtained for a couple of test sentences:
139134

140-
<img class="image image--xl" src="/assets/images/assertion viz.png" style="width:80%;align:center; box-shadow: 0 3px 6px rgba(0,0,0,0.16), 0 3px 6px rgba(0,0,0,0.23);"/>
141-
142-
143-
135+
![Visualize assertion status](/assets/images/assertion_viz.png "lit_shadow")
144136

145-
<br/>
137+
</div><div class="h3-box" markdown="1">
146138

147139
### Visualize entity resolution
148140

@@ -168,8 +160,6 @@ er_vis.set_label_colors({'TREATMENT':'#800080', 'PROBLEM':'#77b5fe'}) #set label
168160

169161
The following image gives an example of html output that is obtained for a couple of test sentences:
170162

171-
<img class="image image--xl" src="/assets/images/resolution viz.png" style="width:100%;align:center; box-shadow: 0 3px 6px rgba(0,0,0,0.16), 0 3px 6px rgba(0,0,0,0.23);"/>
172-
163+
![Visualize entity resolution](/assets/images/resolution_viz.png "lit_shadow")
173164

174-
175-
</div><div class="h3-box" markdown="1">
165+
</div>

docs/en/image-1.png

-95.9 KB
Binary file not shown.

docs/en/image-2.png

-83.8 KB
Binary file not shown.

docs/en/image-3.png

-62.1 KB
Binary file not shown.

docs/en/image-4.png

-30 KB
Binary file not shown.

docs/en/image-5.png

-30.7 KB
Binary file not shown.

docs/en/image.png

-90.9 KB
Binary file not shown.

0 commit comments

Comments
 (0)