diff --git a/docs/take-to-prod.md b/docs/take-to-prod.md new file mode 100644 index 000000000..631e6647c --- /dev/null +++ b/docs/take-to-prod.md @@ -0,0 +1,87 @@ +Taking your .NET for Apache Spark Application to Production +=== + +# Table of Contents +This how-to provides general instructions on how to take your .NET for Apache Spark application to production. +In this documentation, we will summarize the most commonly asked scenarios when running a .NET for Apache Spark Application. +You will also learn how to package your application and submit your application with [spark-submit](https://spark.apache.org/docs/latest/submitting-applications.html) and [Apachy Livy](https://livy.incubator.apache.org/). +- [How to deploy your application when you have a single dependency](#how-to-deploy-your-application-when-you-have-a-single-dependency) + - [Scenarios](#scenarios) + - [Package your application](#package-your-application) + - [Launch your application](#launch-your-application) +- [How to deploy your application when you have multiple dependencies](#how-to-deploy-your-application-when-you-have-multiple-dependencies) + - [Scenarios](#scenarios-1) + - [Package your application](#package-your-application-1) + - [Launch your application](#launch-your-application-1) + +## How to deploy your application when you have a single dependency +### Scenarios +#### 1. SparkSession code and business logic in the same Program.cs file +The `SparkSession` code and business logic (UDFs) are contained in the same `Program.cs` file. +#### 2. SparkSession code and business logic in the same project, but different .cs files +The `SparkSession` code and business logic (UDFs) are in different `.cs` files and both contained in the same project (e.g. SparkSession in Program.cs, business logic in BusinessLogic.cs and both are in mySparkApp.csproj). + +### Package your application +Please follow [Get Started](https://github.com/dotnet/spark/#get-started) to build your application. + +### Launch your application +#### 1. Using spark-submit +Please make sure you have [pre-requisites](https://github.com/dotnet/spark/blob/master/docs/getting-started/windows-instructions.md#pre-requisites) to run the following command. +```powershell +%SPARK_HOME%\bin\spark-submit \ +--class org.apache.spark.deploy.dotnet.DotnetRunner \ +--master yarn \ +--deploy-mode cluster \ +--files \\mySparkApp.dll \ +\\microsoft-spark--.jar \ +dotnet \\mySparkApp.dll ... +or + +``` +#### 2. Using Apache Livy +```shell +{ + "file": "adl://.azuredatalakestore.net//microsoft-spark--.jar", + "className": "org.apache.spark.deploy.dotnet.DotnetRunner", + "files": [“adl://.azuredatalakestore.net//mySparkApp.dll" ], + "args": ["dotnet","adl://.azuredatalakestore.net//mySparkApp.dll","",","...",""] +} +``` + +## How to deploy your application when you have multiple dependencies +### Scenarios +#### 1. SparkSession code in one project that references another project including the business logic +The `SparkSession` code in one project (e.g. mySparkApp.csproj) and business logic (UDFs) in another project (e.g. businessLogic.csproj). +#### 2. SparkSession code references a function from a Nuget package that has been installed in the csproj +The `SparkSession` code references a function from a Nuget package. +#### 3. SparkSession code references a function from a DLL on the user's machine +The `SparkSession` code reference business logic (UDFs) on the user's machine (e.g. `SparkSession` code in the mySparkApp.csproj and businessLogic.dll on a different machine). +#### 4. SparkSession code references functions and business logic from multiple projects/solutions that themselves depend on multiple Nuget packages +This would be a more complex use case when you have `SparkSession` code reference business logic (UDFs) and functions from Nuget packages in multiple projects and/or solutions. + +### Package your application +Please see detailed steps [here](https://github.com/dotnet/spark/tree/master/deployment#preparing-your-spark-net-app) on how to build, publish and zip your application. After packaging your .NET for Apache Spark application, you will have a zip file (e.g. mySparkApp.zip) which has all the dependencies. + +### Launch your application +#### 1. Using spark-submit +```shell +spark-submit \ +--class org.apache.spark.deploy.dotnet.DotnetRunner \ +--master yarn \ +--deploy-mode cluster \ +--conf spark.yarn.appMasterEnv.DOTNET_ASSEMBLY_SEARCH_PATHS=./udfs \ +--conf spark.yarn.appMasterEnv.DOTNET_ASSEMBLY_SEARCH_PATHS=./myLibraries.zip \ +--archives hdfs:///businessLogics.zip#udfs,hdfs:///myLibraries.zip \ +hdfs:///microsoft-spark--.jar \ +hdfs:///mySparkApp.zip mySparkApp ... +``` +#### 2. Using Apache Livy +```shell +{ + "file": "adl://.azuredatalakestore.net//microsoft-spark--.jar", + "className": "org.apache.spark.deploy.dotnet.DotnetRunner", +    "conf": {"spark.yarn.appMasterEnv.DOTNET_ASSEMBLY_SEARCH_PATHS": "./udfs, ./myLibraries.zip"}, + "archives": ["adl://.azuredatalakestore.net//businessLogics.zip#udfs”, "adl://.azuredatalakestore.net//myLibraries.zip”], + "args": ["adl://.azuredatalakestore.net//mySparkApp.zip","mySparkApp","",","...",""] +} +```