Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python notebook times out #75

Closed
mobsy74 opened this issue May 24, 2018 · 3 comments
Closed

Python notebook times out #75

mobsy74 opened this issue May 24, 2018 · 3 comments

Comments

@mobsy74
Copy link

mobsy74 commented May 24, 2018

We use Python notebooks heavily in our workflows. However, it seems that the notebook often loses its connection and we are forced to restart the kernel (which removes all of the cached data in the notebook).
This appears to happen right after the rabbitmq container logs indicate a 60 second timeout (first image below):

rabbitmq_1           |
rabbitmq_1           | =ERROR REPORT==== 24-May-2018::16:57:24 ===
rabbitmq_1           | closing AMQP connection <0.1117.0> (10.255.3.11:34656 -> 10.255.3.9:5672):
rabbitmq_1           | missed heartbeats from client, timeout: 60s

Once this happens, any subsequent call to the Python kernel gets an error from the notebooks container indicating that the connection was closed (second image below):

notebooks_1          | Exception in thread Thread-13:
notebooks_1          | Traceback (most recent call last):
notebooks_1          |   File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
notebooks_1          |     self.run()
notebooks_1          |   File "/usr/lib/python2.7/threading.py", line 763, in run
notebooks_1          |     self.__target(*self.__args, **self.__kwargs)
notebooks_1          |   File "/usr/local/share/jupyter/kernels/pyspark/socket_forwarder.py", line 58, in to_rabbit_forwarder
notebooks_1          |     self.to_rabbit_sender(message)
notebooks_1          |   File "/usr/local/share/jupyter/kernels/pyspark/forwarding_kernel.py", line 145, in sender
notebooks_1          |     self._send_zmq_forward_to_rabbit(stream_name, message)
notebooks_1          |   File "/usr/local/share/jupyter/kernels/pyspark/forwarding_kernel.py", line 125, in _send_zmq_forward_to_rabbit
notebooks_1          |     'body': [base64.b64encode(s) for s in message]
notebooks_1          |   File "/usr/local/share/jupyter/kernels/pyspark/rabbit_mq_client.py", line 110, in send
notebooks_1          |     message=json_message)
notebooks_1          |   File "/usr/local/share/jupyter/kernels/pyspark/rabbit_mq_client.py", line 43, in send
notebooks_1          |     body=message)
notebooks_1          |   File "/usr/local/lib/python2.7/dist-packages/pika/adapters/blocking_connection.py", line 2077, in basic_publish
notebooks_1          |     mandatory, immediate)
notebooks_1          |   File "/usr/local/lib/python2.7/dist-packages/pika/adapters/blocking_connection.py", line 2164, in publish
notebooks_1          |     self._flush_output()
notebooks_1          |   File "/usr/local/lib/python2.7/dist-packages/pika/adapters/blocking_connection.py", line 1250, in _flush_output
notebooks_1          |     *waiters)
notebooks_1          |   File "/usr/local/lib/python2.7/dist-packages/pika/adapters/blocking_connection.py", line 474, in _flush_output
notebooks_1          |     result.reason_text)
notebooks_1          | ConnectionClosed: (-1, "error(104, 'Connection reset by peer')")

I can reproduce this by creating a brand new workflow, adding a Python notebook node, opening it, and then just waiting until the timeout error occurs (seems to take three minutes). It seems that at some point the notebooks container stops issuing heartbeats for some reason.

Initially I was using our code fork, but I confirmed that the same thing happens after checking out the master branch of deepsense-ai/seahorse and building from source.

rabbit1
rabbit2

@mobsy74
Copy link
Author

mobsy74 commented May 25, 2018

Adding on to this, we have seen this issue running on both MacOS and linux systems.

@jaroslaw-osmanski
Copy link

Turning heartbeats off like in this pull request should fix that #76

We're still searching for long term solution, but this will take more than couple of weeks.

@mobsy74
Copy link
Author

mobsy74 commented Jun 2, 2018

This workaround seems to have fixed the timeouts. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants