JohnSnowLabs
diff --git a/‎docs/en/image-1.png
95.9 KB b/‎docs/en/image-1.png
95.9 KB
diff --git a/‎docs/en/image-2.png
83.8 KB b/‎docs/en/image-2.png
83.8 KB
diff --git a/‎docs/en/image-3.png
62.1 KB b/‎docs/en/image-3.png
62.1 KB
diff --git a/‎docs/en/image-4.png
30 KB b/‎docs/en/image-4.png
30 KB
diff --git a/‎docs/en/image-5.png
30.7 KB b/‎docs/en/image-5.png
30.7 KB
diff --git a/‎docs/en/image.png
90.9 KB b/‎docs/en/image.png
90.9 KB
diff --git a/‎docs/en/licensed_install.md
Lines changed: 94 additions & 35 deletions b/‎docs/en/licensed_install.md
Lines changed: 94 additions & 35 deletions
@@ -843,46 +843,105 @@ In this page we explain how to setup Spark-NLP + Spark-NLP Healthcare in AWS EMR
 </div><div class="h3-box" markdown="1">
 
 ### Steps
-1. You must go to the blue button "Create Cluster" on the UI. By doing that you will get directed to the "Create Cluster - Quick Options" page. Don't use the quick options, click on "Go to advanced options" instead.
-2. Now in Advanced Options, on Step 1, "Software and Steps", please pick the following selection in the checkboxes,
-![software config](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/platforms/emr/software_configs.png?raw=true)
-Also in the "Edit Software Settings" page, enter the following,
 
-```
-[{
-  "Classification": "spark-env",
-  "Configurations": [{
-    "Classification": "export",
-    "Properties": {
-      "PYSPARK_python": "/usr/bin/python3",
-      "AWS_ACCESS_KEY_ID": "XYXYXYXYXYXYXYXYXYXY",
-      "AWS_SECRET_ACCESS_KEY": "XYXYXYXYXYXYXYXYXYXY", 
-      "SPARK_NLP_LICENSE": "XYXYXYXYXYXYXYXYXYXYXYXYXYXY"
-    }
-  }]
-},
-{
-  "Classification": "spark-defaults",
+1. Go to AWS services, and select EMR
+
+2. Press Create Cluster and start:
+  - Name your cluster
+  - select EMR version
+  - select required applications
+
+
+![alt text](image.png)
+
+ - Specify EC2  instances for the cluster, as primary/master node  and cores/workers
+ - Specify the storage/ EBS volume
+ 
+ ![alt text](image-1.png)
+
+ - Choose Cluster scaling and provisioning 
+ - Choose Networking / VPC
+ 
+ ![alt text](image-2.png)
+
+- Choose Security Groups/Firewall for primary/master node and cores/workers/slaves
+
+![alt text](image-3.png)
+
+- If you have add steps , that will be executed after cluster is provisioned
+- Specify the S3 location for logs
+- Under **Tags** section, please add a `KEY: VALUE` pair with `for-use-with-amazon-emr-managed-policies` `true`
+
+**Important**
+- Specify the Bootstrap Action
+
+[jsl_emr_bootstrap.sh](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/platforms/emr/jsl_emr_bootstrap.sh)
+
+Put this sample shell script in a S3 location and specify it in the form:
+You will have spark-nlp and spark-nlp-jsl and spark-ocr installed by bootstrap action, this file is executed during the cluster provisioning. Version of Libraries and other credentials provided by Johnsnowlabs will be in this file.
+
+![add bootstrap action](image-5.png)
+
+
+**Important** 
+- Specify the Configuration for spark:
+Here is a sample configuration, you can copy/paste into  Software settings tab or load from S3.
+You can change spark configuration according to your needs.
+
+```
+[
+  {
+    "Classification": "spark-env",
+    "Configurations": [
+      {
+        "Classification": "export",
+        "Properties": {
+          "JSL_EMR": "1",
+          "PYSPARK_PYTHON": "/usr/bin/python3",
+          "SPARK_NLP_LICENSE": "XYXYXYXYXY"
+        }
+      }
+    ],
+    "Properties": {}
+  },
+  {
+    "Classification": "yarn-env",
+    "Configurations": [
+      {
+        "Classification": "export",
+        "Properties": {
+          "JSL_EMR": "1",
+          "SPARK_NLP_LICENSE": "XYXYXYXYXY"
+        }
+      }
+    ],
+    "Properties": {}
+  },
+  {
+    "Classification": "spark-defaults",
     "Properties": {
-      "spark.yarn.stagingDir": "hdfs:///tmp",
-      "spark.yarn.preserve.staging.files": "true",
+      "spark.driver.maxResultSize": "0",
+      "spark.driver.memory": "64G",
+      "spark.dynamicAllocation.enabled": "true",
+      "spark.executor.memory": "64G",
+      "spark.executorEnv.SPARK_NLP_LICENSE": "XYXYXYXYXY",
+      "spark.jsl.settings.aws.credentials.access_key_id": "XYXYXYXYXY",
+      "spark.jsl.settings.aws.credentials.secret_access_key": "XYXYXYXYXY",
+      "spark.jsl.settings.aws.region": "us-east-1",
+      "spark.jsl.settings.pretrained.credentials.access_key_id": "XYXYXYXYXY",
+      "spark.jsl.settings.pretrained.credentials.secret_access_key": "XYXYXYXYXY",
       "spark.kryoserializer.buffer.max": "2000M",
+      "spark.rpc.message.maxSize": "1024",
       "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
-      "spark.driver.maxResultSize": "0",
-      "spark.driver.memory": "32G"
+      "spark.yarn.appMasterEnv.SPARK_NLP_LICENSE": "XYXYXYXYXY",
+      "spark.yarn.preserve.staging.files": "true",
+      "spark.yarn.stagingDir": "hdfs:///tmp"
     }
-}]
-```
-Make sure that you replace all the secret information(marked here as XYXYXYXYXY) by the appropriate values that you received with your license.<br/>
-3. In "Step 2" choose the hardware and networking configuration you prefer, or just pick the defaults. Move to next step by clocking the "Next" blue button.<br/>
-4. Now you are in "Step 3", in which you assign a name to your cluster, and you can change the location of the cluster logs. If the location of the logs is OK for you, take note of the path so you can debug potential problems by using the logs.<br/>
-5. Still on "Step 3", go to the bottom of the page, and expand the "Bootstrap Actions" tab. We're gonna add an action to execute during bootstrap of the cluster. Select "Custom Action", then press on "Configure and add".<br/>
-You need to provide a path to a script on S3. The path needs to be public. Keep this in mind, no secret information can be contained there.<br/>
-The script we'll used for this setup is [emr_bootstrap.sh](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/platforms/emr/emr_bootstrap.sh) .
-<br/>
-This script will install Spark-NLP 3.1.0, and Spark-NLP Healthcare 3.1.1. You'll have to edit the script if you need different versions.<br/>
-After you entered the route to S3 in which you place the `emr_bootstrap.sh` file, and before clicking "add" in the dialog box, you must pass an additional parameter containing the SECRET value you received with your license. Just paste the secret on the "Optional arguments" field in that dialog box.<br/>
-6. There's not much additional setup you need to perform. So just start a notebook server, connect it to the cluster you just created(be patient, it takes a while), and test with the [NLP_EMR_Setup.ipynb](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/platforms/emr/NLP_EMR_Setup.ipynb) test notebook.<br/>
+  }
+]
+```
+
+- There's not much additional setup you need to perform. So just start a notebook server, connect it to the cluster you just created(be patient, it takes a while), and test with the [jsl_test_notebook_for_emr.ipynb](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/platforms/emr/NLP_EMR_Setup.ipynb) test notebook.<br/>
 
 </div><div class="h3-box" markdown="1">