-
Notifications
You must be signed in to change notification settings - Fork 328
[Documentation]Instructions on how to take your application to production #345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
e6ca264
db3808f
cdfc8b7
d6cbd28
606a2c6
c197f89
7abab90
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
Taking your .NET for Apache Spark Application to Production | ||
=== | ||
|
||
# Table of Contents | ||
This how-to provides general instructions on how to take your .NET for Apache Spark application to production. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does it mean to take an app to production? Perhaps add a couple words/sentence defining that (does it just mean running on-prem? Deploying to cloud? Building and running There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great point! @rapoth Could you please help with elaborating this a little more? |
||
In this documentation, we will summarize the most commonly asked scenarios when running a .NET for Apache Spark Application. | ||
You will also learn how to package your application and submit your application with [spark-submit](https://spark.apache.org/docs/latest/submitting-applications.html) and [Apachy Livy](https://livy.incubator.apache.org/). | ||
- [How to deploy your application when you have a single dependency](#how-to-deploy-your-application-when-you-have-a-single-dependency) | ||
- [Scenarios](#scenarios---single-dependency) | ||
- [Package your application](#package-your-application---single-dependency) | ||
- [Launch your application](#launch-your-application---single-dependency) | ||
- [How to deploy your application when you have multiple dependencies](#how-to-deploy-your-application-when-you-have-multiple-dependencies) | ||
- [Scenarios](#scenarios---multiple-dependencies) | ||
- [Package your application](#package-your-application---multiple-dependencies) | ||
- [Launch your application](#launch-your-application---multiple-dependencies) | ||
|
||
## How to deploy your application when you have a single dependency | ||
### Scenarios - single dependency | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does single dependency mean? I think it could help users to include a short explanation here or at the top of the document of what a dependency means in the .NET for Spark context. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
elvaliuliuliu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#### Scenario 1. SparkSession code and business logic in the same Program.cs file | ||
elvaliuliuliu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This would be the simple use case when you have `SparkSession` code and business logic (UDFs) in the same Program.cs file and in the same project (e.g. mySparkApp.csproj). | ||
#### Scenario 2. SparkSession code and business logic in the same project, but different .cs files | ||
This would be the use case when you have `SparkSession` code and business logic (UDFs) in the different .cs files but in the same project (e.g. SparkSession in Program.cs, business logic in BusinessLogic.cs and both are in mySparkApp.csproj). | ||
|
||
### Package your application - single dependency | ||
elvaliuliuliu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Please follow [Get Started](https://github.com/dotnet/spark/#get-started) to build your application in Scenario 1 and Scenario 2. | ||
|
||
### Launch your application - single dependency | ||
elvaliuliuliu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#### 1. Using spark-submit | ||
Please see below as an example of running your app with `spark-submit` in Scenario 1 and Scenario 2. | ||
elvaliuliuliu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
```shell | ||
elvaliuliuliu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
%SPARK_HOME%\bin\spark-submit \ | ||
--class org.apache.spark.deploy.dotnet.DotnetRunner \ | ||
--master local \ | ||
elvaliuliuliu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
--files bin\Debug\netcoreapp3.0\mySparkApp.dll \ | ||
bin\Debug\<dotnet version>\microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar \ | ||
dotnet bin\Debug\netcoreapp3.0\mySparkApp.dll <app arg 1> <app arg 2> ... <app arg n> | ||
elvaliuliuliu marked this conversation as resolved.
Show resolved
Hide resolved
elvaliuliuliu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
#### 2. Using Apache Livy | ||
Please see below as an example of running your app with Apache Livy in Scenario 1 and Scenario 2. | ||
elvaliuliuliu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
```shell | ||
{ | ||
"file": "adl://<cluster name>.azuredatalakestore.net/<some dir>/microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar", | ||
"className": "org.apache.spark.deploy.dotnet.DotnetRunner", | ||
"files": [“adl://<cluster name>.azuredatalakestore.net/<some dir>/mySparkApp.dll" ], | ||
"args": ["dotnet","adl://<cluster name>.azuredatalakestore.net/<some dir>/mySparkApp.dll","<app arg 1>","<app arg 2>,"...","<app arg n>"] | ||
} | ||
``` | ||
|
||
## How to deploy your application when you have multiple dependencies | ||
### Scenarios - multiple dependencies | ||
elvaliuliuliu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#### Scenario 3. SparkSession code in one project that references another project including the business logic | ||
elvaliuliuliu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This would be the use case when you have `SparkSession` code in one project (e.g. mySparkApp.csproj) and business logic (UDFs) in another project (e.g. businessLogic.csproj). | ||
#### Scenario 4. SparkSession code references a function from a Nuget package that has been installed in the csproj | ||
This would be the use case when `SparkSession` code references a function from a Nuget package in the same project (e.g. mySparkApp.csproj). | ||
#### Scenario 5. SparkSession code references a function from a DLL on the user's machine | ||
This would be the use case when `SparkSession` code reference business logic (UDFs) on the user's machine (e.g. `SparkSession` code in the mySparkApp.csproj and businessLogic.dll on a different machine). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why does businessLogic.dll be from a different machine ? |
||
#### Scenario 6. SparkSession code references functions and business logic from multiple projects/solutions that themselves depend on multiple Nuget packages | ||
This would be a more complex use case when you have `SparkSession` code reference business logic (UDFs) and functions from Nuget packages in multiple projects and/or solutions. | ||
|
||
### Package your application - multiple dependencies | ||
elvaliuliuliu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Please follow [Get Started](https://github.com/dotnet/spark/#get-started) to build your mySparkApp.csproj in Scenario 4 and Scenario 5 (and businessLogic.csproj for Scenario 3). | ||
- Please see detailed steps [here](https://github.com/dotnet/spark/tree/master/deployment#preparing-your-spark-net-app) on how to build, publish and zip your application in Scenario 6. After packaging your .NET for Apache Spark application, you will have a zip file (e.g. mySparkApp.zip) which has all the dependencies. | ||
|
||
### Launch your application - multiple dependencies | ||
elvaliuliuliu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#### 1. Using spark-submit | ||
- Please see below as an example of running your app with `spark-submit` in Scenario 3 and Scenario 5. | ||
Additionally, you should use `--files bin\Debug\netcoreapp3.0\nugetLibrary.dll` in Scenario 4. | ||
```shell | ||
%SPARK_HOME%\bin\spark-submit \ | ||
--class org.apache.spark.deploy.dotnet.DotnetRunner \ | ||
--master local \ | ||
--files bin\Debug\netcoreapp3.0\businessLogic.dll \ | ||
bin\Debug\<dotnet version>\microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar \ | ||
dotnet bin\Debug\netcoreapp3.0\mySparkApp.dll <app arg 1> <app arg 2> ... <app arg n> | ||
``` | ||
elvaliuliuliu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Please see below as an example of running your app with `spark-submit` in Scenario 6. | ||
```shell | ||
spark-submit \ | ||
--class org.apache.spark.deploy.dotnet.DotnetRunner \ | ||
--master yarn \ | ||
--deploy-mode cluster \ | ||
--conf spark.yarn.appMasterEnv.DOTNET_ASSEMBLY_SEARCH_PATHS=./udfs \ | ||
--conf spark.yarn.appMasterEnv.DOTNET_ASSEMBLY_SEARCH_PATHS=./myLibraries.zip \ | ||
--archives hdfs://<path to your files>/businessLogics.zip#udfs,hdfs://<path to your files>/myLibraries.zip \ | ||
hdfs://<path to jar file>/microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar \ | ||
hdfs://<path to your files>/mySparkApp.zip mySparkApp <app arg 1> <app arg 2> ... <app arg n> | ||
``` | ||
#### 2. Using Apache Livy | ||
- Please see below as an example of running your app with Apache Livy in Scenario 3 and Scenario 5. | ||
Additionally, you should use `"files": ["adl://<cluster name>.azuredatalakestore.net/<some dir>/nugetLibrary.dll"]` in Scenario 4. | ||
```shell | ||
{ | ||
"file": "adl://<cluster name>.azuredatalakestore.net/<some dir>/microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar", | ||
"className": "org.apache.spark.deploy.dotnet.DotnetRunner", | ||
"files": [“adl://<cluster name>.azuredatalakestore.net/<some dir>/businessLogic.dll" ], | ||
"args": ["dotnet","adl://<cluster name>.azuredatalakestore.net/<some dir>/mySparkApp.dll","<app arg 1>","<app arg 2>,"...","<app arg n>"] | ||
} | ||
``` | ||
Comment on lines
+91
to
+98
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should just provide the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
- Please see below as an example of running your app with Apache Livy in Scenario 6. | ||
```shell | ||
{ | ||
"file": "adl://<cluster name>.azuredatalakestore.net/<some dir>/microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar", | ||
"className": "org.apache.spark.deploy.dotnet.DotnetRunner", | ||
"conf": {"spark.yarn.appMasterEnv.DOTNET_ASSEMBLY_SEARCH_PATHS": "./udfs, ./myLibraries.zip"}, | ||
"archives": ["adl://<cluster name>.azuredatalakestore.net/<some dir>/businessLogics.zip#udfs”, "adl://<cluster name>.azuredatalakestore.net/<some dir>/myLibraries.zip”], | ||
"args": ["adl://<cluster name>.azuredatalakestore.net/<some dir>/mySparkApp.zip","mySparkApp","<app arg 1>","<app arg 2>,"...","<app arg n>"] | ||
} | ||
``` |
Uh oh!
There was an error while loading. Please reload this page.