From 398810cb92075be913062f64fe5bbdbaf4bb21e0 Mon Sep 17 00:00:00 2001 From: elvaliuliuliu <47404285+elvaliuliuliu@users.noreply.github.com> Date: Fri, 22 Nov 2019 17:49:29 -0800 Subject: [PATCH 1/2] init --- docs/take-to-prod.md | 85 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) create mode 100644 docs/take-to-prod.md diff --git a/docs/take-to-prod.md b/docs/take-to-prod.md new file mode 100644 index 000000000..02336841d --- /dev/null +++ b/docs/take-to-prod.md @@ -0,0 +1,85 @@ +Taking your .NET for Apache Spark Application to Production +=== + +# Table of Contents +This how-to provides general instructions on how to take your .NET for Apache Spark application to production. +In this documentation, we will summarize the most commonly asked scenarios when running a .NET for Apache Spark Application. +You will also learn how to package your application and submit your application with [spark-submit](https://spark.apache.org/docs/latest/submitting-applications.html) and [Apachy Livy](https://livy.incubator.apache.org/). +- [How to deploy your application when you have a single dependency](#how-to-deploy-your-application-when-you-have-a-single-dependency) + - [Scenarios](#scenarios) + - [Package your application](#package-your-application) + - [Launch your application](#launch-your-application) +- [How to deploy your application when you have multiple dependencies](#how-to-deploy-your-application-when-you-have-multiple-dependencies) + - [Scenarios](#scenarios-1) + - [Package your application](#package-your-application-1) + - [Launch your application](#launch-your-application-1) + +## How to deploy your application when you have a single dependency +### Scenarios +#### 1. SparkSession code and business logic in the same Program.cs file +This would be the simple use case when you have `SparkSession` code and business logic (UDFs) in the same Program.cs file and in the same project (e.g. mySparkApp.csproj). +#### 2. SparkSession code and business logic in the same project, but different .cs files +This would be the use case when you have `SparkSession` code and business logic (UDFs) in the different .cs files but in the same project (e.g. SparkSession in Program.cs, business logic in BusinessLogic.cs and both are in mySparkApp.csproj). + +### Package your application +Please follow [Get Started](https://github.com/dotnet/spark/#get-started) to build your application. + +### Launch your application +#### 1. Using spark-submit +Please make sure you have [pre-requisites](https://github.com/dotnet/spark/blob/master/docs/getting-started/windows-instructions.md#pre-requisites) to run the following command. +```powershell +%SPARK_HOME%\bin\spark-submit \ +--class org.apache.spark.deploy.dotnet.DotnetRunner \ +--master yarn \ +--deploy-mode cluster \ +--files \\mySparkApp.dll \ +\\microsoft-spark--.jar \ +dotnet \\mySparkApp.dll ... +``` +#### 2. Using Apache Livy +```shell +{ + "file": "adl://.azuredatalakestore.net//microsoft-spark--.jar", + "className": "org.apache.spark.deploy.dotnet.DotnetRunner", + "files": [“adl://.azuredatalakestore.net//mySparkApp.dll" ], + "args": ["dotnet","adl://.azuredatalakestore.net//mySparkApp.dll","",","...",""] +} +``` + +## How to deploy your application when you have multiple dependencies +### Scenarios +#### 1. SparkSession code in one project that references another project including the business logic +This would be the use case when you have `SparkSession` code in one project (e.g. mySparkApp.csproj) and business logic (UDFs) in another project (e.g. businessLogic.csproj). +#### 2. SparkSession code references a function from a Nuget package that has been installed in the csproj +This would be the use case when `SparkSession` code references a function from a Nuget package in the same project (e.g. mySparkApp.csproj). +#### 3. SparkSession code references a function from a DLL on the user's machine +This would be the use case when `SparkSession` code reference business logic (UDFs) on the user's machine (e.g. `SparkSession` code in the mySparkApp.csproj and businessLogic.dll on a different machine). +#### 4. SparkSession code references functions and business logic from multiple projects/solutions that themselves depend on multiple Nuget packages +This would be a more complex use case when you have `SparkSession` code reference business logic (UDFs) and functions from Nuget packages in multiple projects and/or solutions. + +### Package your application +Please see detailed steps [here](https://github.com/dotnet/spark/tree/master/deployment#preparing-your-spark-net-app) on how to build, publish and zip your application. After packaging your .NET for Apache Spark application, you will have a zip file (e.g. mySparkApp.zip) which has all the dependencies. + +### Launch your application +#### 1. Using spark-submit +```shell +spark-submit \ +--class org.apache.spark.deploy.dotnet.DotnetRunner \ +--master yarn \ +--deploy-mode cluster \ +--conf spark.yarn.appMasterEnv.DOTNET_ASSEMBLY_SEARCH_PATHS=./udfs \ +--conf spark.yarn.appMasterEnv.DOTNET_ASSEMBLY_SEARCH_PATHS=./myLibraries.zip \ +--archives hdfs:///businessLogics.zip#udfs,hdfs:///myLibraries.zip \ +hdfs:///microsoft-spark--.jar \ +hdfs:///mySparkApp.zip mySparkApp ... +``` +#### 2. Using Apache Livy +```shell +{ + "file": "adl://.azuredatalakestore.net//microsoft-spark--.jar", + "className": "org.apache.spark.deploy.dotnet.DotnetRunner", +    "conf": {"spark.yarn.appMasterEnv.DOTNET_ASSEMBLY_SEARCH_PATHS": "./udfs, ./myLibraries.zip"}, + "archives": ["adl://.azuredatalakestore.net//businessLogics.zip#udfs”, "adl://.azuredatalakestore.net//myLibraries.zip”], + "args": ["adl://.azuredatalakestore.net//mySparkApp.zip","mySparkApp","",","...",""] +} +``` From 2e254e88eb64596153eb8066484074a466545077 Mon Sep 17 00:00:00 2001 From: elvaliuliuliu <47404285+elvaliuliuliu@users.noreply.github.com> Date: Mon, 9 Dec 2019 17:21:06 -0800 Subject: [PATCH 2/2] resolve comments --- docs/take-to-prod.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/take-to-prod.md b/docs/take-to-prod.md index 02336841d..631e6647c 100644 --- a/docs/take-to-prod.md +++ b/docs/take-to-prod.md @@ -17,9 +17,9 @@ You will also learn how to package your application and submit your application ## How to deploy your application when you have a single dependency ### Scenarios #### 1. SparkSession code and business logic in the same Program.cs file -This would be the simple use case when you have `SparkSession` code and business logic (UDFs) in the same Program.cs file and in the same project (e.g. mySparkApp.csproj). +The `SparkSession` code and business logic (UDFs) are contained in the same `Program.cs` file. #### 2. SparkSession code and business logic in the same project, but different .cs files -This would be the use case when you have `SparkSession` code and business logic (UDFs) in the different .cs files but in the same project (e.g. SparkSession in Program.cs, business logic in BusinessLogic.cs and both are in mySparkApp.csproj). +The `SparkSession` code and business logic (UDFs) are in different `.cs` files and both contained in the same project (e.g. SparkSession in Program.cs, business logic in BusinessLogic.cs and both are in mySparkApp.csproj). ### Package your application Please follow [Get Started](https://github.com/dotnet/spark/#get-started) to build your application. @@ -35,6 +35,8 @@ Please make sure you have [pre-requisites](https://github.com/dotnet/spark/blob/ --files \\mySparkApp.dll \ \\microsoft-spark--.jar \ dotnet \\mySparkApp.dll ... +or + ``` #### 2. Using Apache Livy ```shell @@ -49,11 +51,11 @@ dotnet \\mySparkApp.dll .. ## How to deploy your application when you have multiple dependencies ### Scenarios #### 1. SparkSession code in one project that references another project including the business logic -This would be the use case when you have `SparkSession` code in one project (e.g. mySparkApp.csproj) and business logic (UDFs) in another project (e.g. businessLogic.csproj). +The `SparkSession` code in one project (e.g. mySparkApp.csproj) and business logic (UDFs) in another project (e.g. businessLogic.csproj). #### 2. SparkSession code references a function from a Nuget package that has been installed in the csproj -This would be the use case when `SparkSession` code references a function from a Nuget package in the same project (e.g. mySparkApp.csproj). +The `SparkSession` code references a function from a Nuget package. #### 3. SparkSession code references a function from a DLL on the user's machine -This would be the use case when `SparkSession` code reference business logic (UDFs) on the user's machine (e.g. `SparkSession` code in the mySparkApp.csproj and businessLogic.dll on a different machine). +The `SparkSession` code reference business logic (UDFs) on the user's machine (e.g. `SparkSession` code in the mySparkApp.csproj and businessLogic.dll on a different machine). #### 4. SparkSession code references functions and business logic from multiple projects/solutions that themselves depend on multiple Nuget packages This would be a more complex use case when you have `SparkSession` code reference business logic (UDFs) and functions from Nuget packages in multiple projects and/or solutions.