From b8982fc417fb31a3b26a20a18a54a4b5728b7230 Mon Sep 17 00:00:00 2001 From: elvaliuliuliu <47404285+elvaliuliuliu@users.noreply.github.com> Date: Wed, 5 Feb 2020 01:41:04 -0800 Subject: [PATCH 1/3] Init --- docs/user-defined-functions-c#.md | 43 +++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) create mode 100644 docs/user-defined-functions-c#.md diff --git a/docs/user-defined-functions-c#.md b/docs/user-defined-functions-c#.md new file mode 100644 index 000000000..e5e89fab6 --- /dev/null +++ b/docs/user-defined-functions-c#.md @@ -0,0 +1,43 @@ +# User-Defined Functions Guide +This documentation contains user-defined Function (UDF) examples. It shows how to define UDFs and how to use UDFs with Row objects as examples. + +## Pre-requisites: +Install Microsoft.Spark.Worker. When you want to execute a C# UDF, Spark needs to understand how to launch the .NET CLR to execute this UDF. Microsoft.Spark.Worker provides a collection of classes to Spark that enable this functionality. Please see more details at [how to install Microsoft.Spark.Worker](https://docs.microsoft.com/en-us/dotnet/spark/tutorials/get-started#5-install-net-for-apache-spark) and [how to deploy worker and UDF binaries](https://docs.microsoft.com/en-us/dotnet/spark/how-to-guides/deploy-worker-udf-binaries). + +## UDF that takes in Row objects + +```csharp +// Create DataFrame which will also be used in the following examples. +DataFrame df = spark.Range(0, 5).WithColumn("structId", Struct("id")); + +// Define UDF that takes in Row objects +Func udf1 = Udf( + row => row.GetAs(0) + 100); + +// Use UDF with DataFrames +df.Select(udf(df["structId"])).Show(); +``` + +## UDF that returns Row objects +Please note that `GenericRow` objects need to be used here. + +```csharp +// Define UDF that returns Row objects +var schema = new StructType(new[] +{ + new StructField("col1", new IntegerType()), + new StructField("col2", new StringType()) +}); +Func udf2 = Udf( + id => new GenericRow(new object[] { 1, "abc" }), schema); + +// Use UDF with DataFrames +df.Select(udf(df["id"])).Show(); +``` + +## Chained UDF with Row objects + +```csharp +// Chained UDF with udf1 and udf2 defined above. +df.Select(udf1(udf2(df["id"]))).Show(); +``` From c4bc40e6bccb2d04be8a111092a057a5146b6983 Mon Sep 17 00:00:00 2001 From: elvaliuliuliu <47404285+elvaliuliuliu@users.noreply.github.com> Date: Wed, 5 Feb 2020 01:44:32 -0800 Subject: [PATCH 2/3] clean up --- docs/user-defined-functions-c#.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/user-defined-functions-c#.md b/docs/user-defined-functions-c#.md index e5e89fab6..598abadc0 100644 --- a/docs/user-defined-functions-c#.md +++ b/docs/user-defined-functions-c#.md @@ -1,5 +1,5 @@ -# User-Defined Functions Guide -This documentation contains user-defined Function (UDF) examples. It shows how to define UDFs and how to use UDFs with Row objects as examples. +# User-Defined Functions - C# +This documentation contains user-defined function (UDF) examples. It shows how to define UDFs and how to use UDFs with Row objects as examples. ## Pre-requisites: Install Microsoft.Spark.Worker. When you want to execute a C# UDF, Spark needs to understand how to launch the .NET CLR to execute this UDF. Microsoft.Spark.Worker provides a collection of classes to Spark that enable this functionality. Please see more details at [how to install Microsoft.Spark.Worker](https://docs.microsoft.com/en-us/dotnet/spark/tutorials/get-started#5-install-net-for-apache-spark) and [how to deploy worker and UDF binaries](https://docs.microsoft.com/en-us/dotnet/spark/how-to-guides/deploy-worker-udf-binaries). @@ -38,6 +38,6 @@ df.Select(udf(df["id"])).Show(); ## Chained UDF with Row objects ```csharp -// Chained UDF with udf1 and udf2 defined above. +// Chained UDF using udf1 and udf2 defined above. df.Select(udf1(udf2(df["id"]))).Show(); ``` From 4d5ece7eb27d538a484a9c0bc13b4e5ec1e2cd7b Mon Sep 17 00:00:00 2001 From: elvaliuliuliu <47404285+elvaliuliuliu@users.noreply.github.com> Date: Wed, 5 Feb 2020 14:21:54 -0800 Subject: [PATCH 3/3] resolve comments Co-Authored-By: Brigit Murtaugh --- docs/user-defined-functions-c#.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/user-defined-functions-c#.md b/docs/user-defined-functions-c#.md index 598abadc0..dec37e2af 100644 --- a/docs/user-defined-functions-c#.md +++ b/docs/user-defined-functions-c#.md @@ -2,7 +2,9 @@ This documentation contains user-defined function (UDF) examples. It shows how to define UDFs and how to use UDFs with Row objects as examples. ## Pre-requisites: -Install Microsoft.Spark.Worker. When you want to execute a C# UDF, Spark needs to understand how to launch the .NET CLR to execute this UDF. Microsoft.Spark.Worker provides a collection of classes to Spark that enable this functionality. Please see more details at [how to install Microsoft.Spark.Worker](https://docs.microsoft.com/en-us/dotnet/spark/tutorials/get-started#5-install-net-for-apache-spark) and [how to deploy worker and UDF binaries](https://docs.microsoft.com/en-us/dotnet/spark/how-to-guides/deploy-worker-udf-binaries). +When you want to execute a C# UDF, Spark needs to understand how to launch the .NET CLR to execute this UDF. Microsoft.Spark.Worker provides a collection of classes to Spark that enable this functionality. Thus, you need to [install the Microsoft.Spark.Worker](https://docs.microsoft.com/en-us/dotnet/spark/tutorials/get-started#5-install-net-for-apache-spark). + +Additionally, [you may need to configure certain environment variables and parameters](https://docs.microsoft.com/en-us/dotnet/spark/how-to-guides/deploy-worker-udf-binaries) to deploy worker and UDF binaries when submitting your Spark app. ## UDF that takes in Row objects