Skip to content

Commit 2687133

Browse files
authored
Gen ai tutoriall (#1821)
* posts * Update tutorials.md
1 parent 4721c35 commit 2687133

File tree

4 files changed

+19
-16
lines changed

4 files changed

+19
-16
lines changed

docs/en/alab/tutorials.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,10 @@ sidebar:
1515

1616
<div class="grid--container container-aside"><div class="grid">
1717

18+
<div class="cell cell--12 cell--lg-6 cell--sm-12"><div class="video-item">{%- include extensions/youtube.html id='tBXM_2nTLwk' -%}<div class="video-descr">Generative AI Lab Installation on AWS. April, 2025</div></div></div>
19+
20+
<div class="cell cell--12 cell--lg-6 cell--sm-12"><div class="video-item">{%- include extensions/youtube.html id='pG3Ft1DmiLY' -%}<div class="video-descr">Generative AI Lab Installation Tutorial. May, 2025</div></div></div>
21+
1822
<div class="cell cell--12 cell--lg-6 cell--sm-12"><div class="video-item">{%- include extensions/youtube.html id='ycrJX_UMA6I' -%}<div class="video-descr">Programmatic labeling in Generative AI Lab. Suvrat Joshi - October, 2022</div></div></div>
1923

2024
<div class="cell cell--12 cell--lg-6 cell--sm-12"><div class="video-item">{%- include extensions/youtube.html id='tzEwzT_HmXM' -%}<div class="video-descr">How to create a NER project in Generative AI Lab. Suvrat Joshi - September, 2022</div></div></div>

docs/en/benchmark.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ nlpPipeline = Pipeline(stages=[
107107
- Worker: Standard_D4s_v3, 4 core, 16 GB memory, total worker number: 1
108108
- Input Data Count: 1000
109109

110-
{:.table-model-big.db}
110+
{:.table-model-big}
111111
| action | partition | NER<br>timing | 2_NER<br>timing | 4_NER<br>timing | NER+RE<br>timing |
112112
|-----------------|----------:|----------------:|----------------:|-----------------:|-----------------:|
113113
| write_parquet | 4 | 4 min 47 sec | 8 min 37 sec | 19 min 34 sec | 7 min 20 sec |
@@ -132,7 +132,7 @@ nlpPipeline = Pipeline(stages=[
132132
- Worker: Standard_D4s_v3, 4 core, 16 GB memory, total worker number: 2
133133
- Input Data Count: 1000
134134

135-
{:.table-model-big.db}
135+
{:.table-model-big}
136136
| action | partition | NER<br>timing | 2_NER<br>timing | 4_NER<br>timing | NER+RE<br>timing |
137137
|-----------------|----------:|----------------:|----------------:|-----------------:|-----------------:|
138138
| write_parquet | 4 | 3 min 28 sec | 6 min 9 sec | 13 min 46 sec | 5 min 32 sec |
@@ -156,7 +156,7 @@ nlpPipeline = Pipeline(stages=[
156156
- Worker: Standard_D4s_v3, 4 core, 16 GB memory, total worker number: 4
157157
- Input Data Count: 1000
158158

159-
{:.table-model-big.db}
159+
{:.table-model-big}
160160
| action | partition | NER<br>timing | 2_NER<br>timing | 4_NER<br>timing | NER+RE<br>timing |
161161
|-----------------|----------:|----------------:|----------------:|-----------------:|-----------------:|
162162
| write_parquet | 4 | 3 min 13 sec | 5 min 35 sec | 12 min 8 sec | 4 min 57 sec |
@@ -179,7 +179,7 @@ nlpPipeline = Pipeline(stages=[
179179
- Worker: Standard_D4s_v3, 4 core, 16 GB memory, total worker number: 8
180180
- Input Data Count: 1000
181181

182-
{:.table-model-big.db}
182+
{:.table-model-big}
183183
| action | partition | NER<br>timing | 2_NER<br>timing | 4_NER<br>timing | NER+RE<br>timing |
184184
|-----------------|----------:|----------------:|----------------:|-----------------:|-----------------:|
185185
| write_parquet | 4 | 3 min 24 sec | 5 min 24 sec | 16 min 50 sec | 8 min 17 sec |
@@ -203,7 +203,7 @@ nlpPipeline = Pipeline(stages=[
203203
- Worker: Standard_D4s_v2, 8 core, 28 GB memory, total worker number: 8
204204
- Input Data Count: 1000
205205

206-
{:.table-model-big.db}
206+
{:.table-model-big}
207207
| action | partition | NER<br>timing | 2_NER<br>timing | 4_NER<br>timing | NER+RE<br>timing |
208208
|-----------------| ---------:|----------------:|----------------:|----------------:|-----------------:|
209209
| write_parquet | 4 | 1 min 36 sec | 3 min 1 sec | 6 min 32 sec | 3 min 12 sec |
@@ -334,7 +334,7 @@ nlpPipeline = Pipeline(
334334
- report_5: ~35.23kb
335335

336336

337-
{:.table-model-big.db}
337+
{:.table-model-big}
338338
| |Spark NLP 4.0.0 (PySpark 3.1.2) |Spark NLP 4.2.1 (PySpark 3.3.1) |Spark NLP 4.2.1 (PySpark 3.1.2) |Spark NLP 4.2.2 (PySpark 3.1.2) |Spark NLP 4.2.2 (PySpark 3.3.1) |Spark NLP 4.2.3 (PySpark 3.3.1) |Spark NLP 4.2.3 (PySpark 3.1.2) |
339339
|:---------|-------------------------------:|-------------------------------:|-------------------------------:|-------------------------------:|-------------------------------:|-------------------------------:|-------------------------------:|
340340
| report_1 | 2.36066 | 3.33056 | 2.23723 | 2.27243 | 2.11513 | 2.19655 | 2.23915 |
@@ -436,7 +436,7 @@ In that case, try playing with various parameters in mapper or retrain/ augment
436436
- DataBricks Config: 8 CPU Core, 32GiB RAM (2 worker, Standard_DS3_v2)
437437
- AWS Config: 8 CPU Cores, 14GiB RAM (c6a.2xlarge)
438438

439-
{:.table-model-big.db}
439+
{:.table-model-big}
440440
| Partition | DataBricks <br> mapper timing | AWS <br> mapper timing | DataBricks <br> resolver timing | AWS <br> resolver timing | DataBricks <br> mapper and resolver timing | AWS <br> mapper and resolver timing |
441441
| --------- | ------------- | ------------- | --------------- | --------------- | -------------------------- | -------------------------- |
442442
| 4 | 23 sec | 11 sec | 4.36 mins | 3.02 mins | 2.40 mins | 1.58 mins |
@@ -453,7 +453,7 @@ In that case, try playing with various parameters in mapper or retrain/ augment
453453
- DataBricks Config: 16 CPU Core,64GiB RAM (4 worker, Standard_DS3_v2)
454454
- AWS Config: 16 CPU Cores, 27GiB RAM (c6a.4xlarge)
455455

456-
{:.table-model-big.db}
456+
{:.table-model-big}
457457
| Partition | DataBricks <br> mapper timing | AWS <br> mapper timing | DataBricks <br> resolver timing | AWS <br> resolver timing | DataBricks <br> mapper and resolver timing | AWS <br> mapper and resolver timing |
458458
| --------- | ------------- | ------------- | --------------- | --------------- | -------------------------- | -------------------------- |
459459
| 4 | 32.5 sec | 11 sec | 4.19 mins | 2.53 mins | 2.58 mins | 1.48 mins |
@@ -468,7 +468,7 @@ In that case, try playing with various parameters in mapper or retrain/ augment
468468
- DataBricks Config: 32 CPU Core, 128GiB RAM (8 worker, Standard_DS3_v2)
469469
- AWS Config: 32 CPU Cores, 58GiB RAM (c6a.8xlarge)
470470

471-
{:.table-model-big.db}
471+
{:.table-model-big}
472472
| Partition | DataBricks <br> mapper timing | AWS <br> mapper timing | DataBricks <br> resolver timing | AWS <br> resolver timing | DataBricks <br> mapper and resolver timing | AWS <br> mapper and resolver timing |
473473
| --------- | ------------- | ------------- | --------------- | --------------- | -------------------------- | -------------------------- |
474474
| 4 | 37.3 sec | 12 sec | 4.46 mins | 2.37 mins | 2.52 mins | 1.47 mins |
@@ -516,7 +516,7 @@ resolver_pipeline = PipelineModel(
516516

517517
***Results Table***
518518

519-
{:.table-model-big.db}
519+
{:.table-model-big}
520520
|Partition|preprocessing|embeddings| resolver |onnx_embeddings|resolver_with_onnx_embeddings|
521521
|--------:|------------:|---------:|------------:|--------------:|------------:|
522522
| 4 | 25 sec | 25 sec |7 min 46 sec | 9 sec |8 min 29 sec |
@@ -646,7 +646,7 @@ deid_pipeline = Pipeline().setStages([
646646

647647
## Processing Time by Partition Size
648648

649-
649+
{:.table-model-big.db}
650650
| Pipeline Name | 4 <br> partition | 8 <br> partition | 16 <br> partition | 32 <br> partition | 64 <br> partition | 100 <br> partition | 1000 <br> partition | Components |
651651
|--------------|---:|---:|----:|----:|----:|-----:|------:|------------|
652652
| [ner_deid_subentity_context_augmented](https://nlp.johnsnowlabs.com/2024/05/20/ner_deid_subentity_context_augmented_pipeline_en.html) | 183.57 sec | 129.89 sec | 96.08 sec | 84.43 sec | 75.41 sec | 67.59 sec | 50.10 sec | 1 NER, 1 Deidentification, 14 Rule-based NER, 1 clinical embedding, 2 chunk merger |
@@ -797,7 +797,7 @@ pipeline_base = Pipeline().setStages([
797797
```
798798
</div><div class="h3-box" markdown="1">
799799

800-
{:.table-model-big.db}
800+
{:.table-model-big}
801801
| Partition | EMR <br> Base Pipeline | EMR <br> Optimized Pipeline | EC2 Instance <br> Base Pipeline | EC2 Instance <br> Optimized Pipeline | Databricks <br> Base Pipeline | Databricks <br> Optimized Pipeline |
802802
|-----------|--------------------|------------------------|----------------------------|---------------------------------|---------------|--------------------|
803803
| 1024 | 5 min 1 sec | 2 min 45 sec | 7 min 6 sec | **3 min 26 sec** | **10 min 10 sec** | **6 min 2 sec** |

docs/en/benchmark_llm.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ show_nav: true
1010
sidebar:
1111
nav: sparknlp-healthcare
1212
---
13+
1314
<div class="h3-box" markdown="1">
1415

1516
## Medical Benchmarks
@@ -36,8 +37,6 @@ Each model's performance was measured based on accuracy, reflecting how well it
3637

3738
</div><div class="h3-box" markdown="1">
3839

39-
<div class="h3-box" markdown="1">
40-
4140
## JSL-MedS
4241

4342
### Benchmarking
@@ -229,4 +228,4 @@ GPT4o demonstrates strength in Clinical Relevance, especially in Biomedical and
229228
Neutral and "None" ratings across categories highlight areas for further optimization for both models.
230229
This analysis underscores the strengths of JSL-MedM in producing concise and factual outputs, while GPT4o shows a stronger contextual understanding in certain specialized tasks.
231230

232-
</div>
231+
</div>

docs/en/utility_helper_modules.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -595,7 +595,7 @@ obfuscator_df.show(truncate=False)
595595

596596
```
597597

598-
# result
598+
## Result
599599

600600
```
601601
+---------------+----------+---+--------------+--------------+------------------+--------------+

0 commit comments

Comments
 (0)