Replies: 1 comment 1 reply
-
@msgurikar I'd highly recommend using full UNC's (\\server\shareddata...) Also I'd recommend downloading process monitor (sysinternals) and watching the file IO. It will show both successful file operations and failures. You can use filter to see only the IO for certain processes, and exclude others. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We have setup development windows cluster to run our .NET spark jobs across different windows machine.
Windows Machine 1: Started Master and worker.
spark-class2.cmd org.apache.spark.deploy.master.Master --host xx.xx.100.2
spark-class2.cmd org.apache.spark.deploy.worker.Worker spark://xx.xx.100.2:7077
Now from Windows Machine 2: Submitting job
set SPARK_HOME=%~dp0\spark-3.0.0-bin-hadoop2.7
set DOTNET_WORKER_DEBUG=1
set DOTNET_WORKER_DIR=K:\Microsoft.Spark.Worker-1.0.0
set DOTNET_ASSEMBLY_SEARCH_PATHS=K:/app_binaries/Debug/
set PATH=%SPARK_HOME%\bin;%DOTNET_WORKER_DIR%;%PATH%
Submit-Job.cmd --class org.apache.spark.deploy.dotnet.DotnetRunner --master spark://xx.xx.100.2:7077 --conf spark.driver.host=xx.xx.100.4 --files ./Debug.zip .\microsoft-spark-3-0_2.12-1.0.0.jar, .\Debug.zip .\app.exe
K:\ network drive is accessible to both machines.
I am able to see driver code executing and also Udf getting called, our Udf depends on other Dlls, the place where Udfs calls this dll function, is throwing error and showing worker stderr Unable to load this .dll or its dependencies.
My question is, how do we pass dependency dlls of Udf to workers that are remotely executing Udf.
I see microsoft-spark-3-0_2.12-1.0.0.jar getting copied to spark-3.0.0-bin-hadoop2.7\work\app-20210409135645-0016\0
folder during the run, but where will .net spark finds Udf dependency dlls.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions