Skip to content

Raw vs Jupyter Kernels

Rich Chiodo edited this page Jul 25, 2022 · 9 revisions

What is the difference between raw mode and jupyter mode for kernels? What reasons did we have to make this difference?

`Jupyter' kernels

The jupyter organization provides a server that allows for connecting to different kernels. Something like so:

image

In the jupyter extension, we call this jupyter kernels.

Essentially you create a connection settings object that describes the URI for the server and pass it into the jupyterlab services API. The API will then allow us to:

  • List existing kernels
  • List existing sessions
  • Start new kernels
  • Execute code on those kernels

`Raw' Kernels

What then are raw kernels? This is the same diagram but with raw kernels instead:

image

Raw kernels means talking directly to the kernel process instead of going through the jupyter server.

How does raw work differently than Jupyter?

Given this kernelspec:

{
 "argv": ["python3", "-m", "ipykernel_launcher",
          "-f", "{connection_file}"],
 "display_name": "Python 3",
 "language": "python"
}

A raw kernel will 'launch' the kernel directly. This means:

  • Compute open ports on the local machine (part of the {connection_file} argument above)
  • Compute the necessary environment (see more details [above
  • Start the correct python process with the args in the argv setting
  • Connect using the zeromq package to the ports opened
  • Use a patched version of the jupyterlab services API to connect to those ports. Patched because we have to do our own serialization of the messages as the kernel expects the messages serialized, but the zeromq library doesn't do that on its own.

Why use raw?

Why did we bother with this? Why not just use the jupyterlab services api and connect to a jupyter server?

There's a number of reasons:

  1. Direct connection is faster. On windows, we found starting a python process takes up to 5 seconds. If we had to start jupyter first, that doubles the time.
  2. Less things to install. IPython and IPykernel are necessary to get a python kernel up and running. This is a lot less stuff than installing jupyter into a python environment.
  3. Non python kernels. Some kernels don't need python at all. We could support these kernels without having to install anything. There was a lot of customer feedback from other kernel owners around us having to install Jupyter to get our extension to work.
  4. Raw kernels fail faster. Jupyter has its own logic to detect when a kernel goes down. By default it retries 5 times. This takes a lot longer than just having the process die as soon as we start it.
  5. Raw kernels die with more information. Jupyter (especially remote) does not return errors all the time when kernels fail to start. In the raw kernel case we have a lot more information in the stderr from the kernel starting up.
  6. Raw kernels don't require adding custom kernelspecs for Jupyter to find. Originally this was a problem because we thought we had to put them into the same directory as other kernelspecs, but now we can get Jupyter to find them elsewhere. But with raw we never need a kernelspec on disk.
  7. Raw kernels use less CPU and less memory. There's no middleman process in the way. That middleman can also start other processes on the side to handle other things it doesn't need to do in our scenario. With raw, there is only the kernel process.
  8. Raw kernels don't need write permissions for folders (kernelspecs and default notebook)
  9. Raw kernels don't write notebook checkpoint files vscode-jupyter 6510
  10. Raw kernels can use private ports. Jupyter opens public ports on the machine.
  11. Raw kernels are easier to setup environment variables. Since the jupyter server starts with an environment, all kernels share this environment unless the kernelspec has an override. This means for a jupyter kernel, the kernel spec has to be updated just before running a kernel. In the raw case, we just compute the environment in memory and pass it to the raw kernel.
Clone this wiki locally