Replies: 1 comment 2 replies
-
I did some research on the python side of things. It looks like those developers are often given a very verbose error message ("traceback") that describes the python-based exception that occurred during the execution of their UDF. See the following link for an example. https://stackoverflow.com/questions/59739846/pyspark-implement-helper-in-rdd-map It appears that the parsing of their exceptions is possible by way of a method, org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException. ... but perhaps our .Net UDF's don't have something that corresponds to this handler. If the exception details won't be relayed back to the driver, nor appear in the driver logs, then it makes development/debugging challenging. But I am wondering what the impact will be in our production environment as well. We don't have log delivery enabled, and we rely on only the default files that are available in the databricks workspace (driver log, standard output). It seems to me that failures in our UDF's won't produce any meaningful output in these default files and that will require us to enable/manage log delivery in production. Is that what others are doing? Please let me know if there are any other techniques I might use - both in development and in production - to capture errors from our UDF's. |
Beta Was this translation helpful? Give feedback.
-
I may be missing something obvious. Is there any way to get exception information to bubble out of a UDF to the driver?
The following is the most common message I get when failures are encountered in my UDF's.
... unfortunately this message is meaningless and to get the real underlying problem you have to open the browser, find the worker, and inspect the stderr output logs. This is quite a lot of effort if you need to do it all day long, while writing UDF logic. It would be far better if there was some auto-magical way for the underlying exception to bubble out to the driver.
I use spark.task.maxFailures = 1, so typically there is only one error apiece from each parallel worker.
Today the issue happens to be with my Microsoft.Data.SqlClient libraries. I found this in my stderr logs.
... but there are a wide assortment of exceptions that can come out of a UDF. Every time I bump into a new one I get the same incoherent message and have to start clicking around in the Spark console in the browser to investigate. After losing five minutes of my time, I've learned which exception was thrown and can get back to fixing the root cause, and continuing with my work.
It would be nice to have a pattern that we could follow to send exception details back to the driver, so that it can appear in the visual studio IDE. Am I missing something obvious, like perhaps I need to catch exceptions in the UDF and rethrow them in some sort of way that is compatible/serializable?
Below is the full stack from the worker, I believe. This is a JVM stack and, interestingly, there is no mention of microsoft .net for spark. I'm assuming this stack indicates that the microsoft stuff has already failed, and the java side is trying to deserialize/reserialize something back to the driver. Is that so? Do python developers experience the same unfriendly message when their UDF's are failing? Or maybe their exceptions/errors are always serializable unlike with .Net?
Beta Was this translation helpful? Give feedback.
All reactions