Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Issue 88 kubectl debug no connect (#99)
* Trying to cherry pick lxcfs binaries was resulting in a borked image because of missing pieces/versions. Just installing it instead, that way the prereqs will be there. * - Use set -x so that if is a failure during script execution you can see where it occurred. - Adjust paths passed to cp to match current location of lxcfs binaries. * Add a verbosity command line option and some client side logging. * Add some server side logging. * Neglected to add support to Verbosity setting in plugin's config file. * Improve error reporting if an unsupported container runtime is in use. Before the client would just hang and you have no error in the agent log. * Prep work for supporting containerd: Move the call to NewRuntimeManager from NewServer to ServeDebug. That is, move construction to point where we know what container runtime is in use but before we have invoked kubelet API. * Pass verbosity setting from client(plugin) to server(agent). * Prep work for containerd support : Add containerid member to RuntimeManager. * Prep work for adding containerd support - move validation of container id into NewRuntimeManager. * Prep work for adding containerd support : - Rename DebugAttacher to DebugAttacherDocker - Put in a check to make sure DebugAttacherDocker implements kubeletremote.Attacher * Definition of RuntimeManager and DebugAttacherDocker were interlaced. Reoganized file so all DebugAttacherDocker pieces come before all RuntimeManager pieces. * Prep work for adding containerd support : - Pass agent config to NewRuntimeManager instead of various members of it. Doing this to avoid adding several more parameters for containerd options. - Move GetAttacher after NewRuntimeManager. * Neglected to commit these before. * - More refactoring in preparation of getting containerd support in. - RuntimeManager can successfully create containerd client. * More progress on containerd support. Image download is complete. * - Change MountNSEnter to use an int64 for holding target pid instead of an int. Did this because an int64 can hold an int, which is what docker runtime uses for pid values, as well as an int32, which is what containerd runtime uses for pid values. - Fill in containerd implementation of ContainerInfo * Add --registry-skip-tls-verify option. Not sure why, but pulls from dockerhub started failing at some point between today and last Friday. That is, it was clear that the client was unhappy with the cert signer for dockerhubs cert but I don't know why exactly that became a problem. * - Make use of registry skip tls verify option in RunTime and DebugAttachers. - Containerd creates a container now ( so progress from before ) but creation of the task within the container fails with 'User namespaces enabled, but no uid mappings found'. Not sure yet where the problem lies. * - Comment out the setting of namespaces for now. Somehow it isn't working quite right, the container creation fails. - Adjust the container and task clean up calls. For example, if NewContainer returns a non-nil error + a non-nil container then we still need to make sure that the container gets deleted. - If stderr that kubelet passes us is null that pass stdout to containerd.NewTask. NewTask will fail if you pass nill for a stream. - Wasn't calling task.Start and so no process was actually getting started in our container. - Make sure command line passed in from kubectl debug by the user actually are used. ( Need to pass to put into spec opts ) - Fix TTY issue. Not only do you need to pass WithTermainl to NewTask but you need to put WithTTY into the spec opts. * - ContainerRuntime.ContainerInfo, at least in case of Containerd impl, caches the results from the first call. Fixes issue below, but maybe need to rethink the caching since a targetContainerId is being passed in. - When using containerd, we are at least able to put the debug container into the network, ipc and pid namespaces of the target container. There is some issue when trying to set the user namespace. * These should hae been added in the last commit. * Add a comment that explains why I am punting on setting the user namespace in the kubernetes case. * Changes so that we can deploy to a cluster wihere contarinerd runtime v1 is in use. - In start.sh, the agent container entry point, check parent process's command line to see if it is using containerd v1 runtime and if so set env var KCTLDBG_CONTAINERDV1_SHIM - In agent runtime. check for above env var and client option to use v1 runtime if set. - Shorten up the name of the debug container. The v1 runtime creates a unix socket whose path includes the container name. The unix socket path has a length limit of 106 chars and we were running into that. * Containerd mounts for agent container. * Agentless support for containerd container runtime. * Debug containers were orphaned when the kubectl<->agent network connection timed out. * Remove targetContainerInfo from ContainerdContainerRuntime. i.e. Remove some state. I feel that ideally the container runtimes should be as stateless as possible. * Changes to make it so that there can be more than one debug container per debugee container. - Change the debugger container name to be a guid from being dbg-[id of debuggee]. - So that we can still identify debug containers, label then with - the id of the debuggee - the name of the user that created the debug container ( taken from the system where kbuectl was executed ) - the hostname where the request came from * - In agent when containerd is the container runtime, make sure that each container we create gets a uniquely named snapshot. Need this in order to allow for multiple debug containers running on the same node. - In kubelet plugin, if port forwarding fails assume that it is because it is already forwarded. This is a quick hack to allow for multiple debug containers to be launched from the same workstation. e.g. If you want to run tcpdump in one and a traffic generator in another. * For containerd container runtime, handle tty resize. * - start.sh is trying to run lxcfs from the host fs. However it is assuming it is in /usr/local/bin when it may be in /usr/bin. Make a check and then use the correct path. - We run lxcf on the host file system. However if the host is running lxcf there was a collision with pid files. So now we give the instance we start up a unique pid. * In order to be able to clean up any orphaned debug containers we want to have ctr in the agent image. However when installing containerd to pickup ctr the image ended up being 300MB. Setting up the same image with xenial instead of alpine resulted in an image of 160MB. So xenial it is. * Back to basing agent image on alpine. Found I can copy ctr executable from a bulid image. Now the image size is down to 68MB from 160MB. * - When contructing path /var/lib/lxc/lxcfs do so in the host mount namespace. - Fix problem with exec of lxcfs, wasn't using -- when running under nsenter and consequently nsenter was getting confused by params to lxcfs. * Remove obsolete comment. * Move message that logs fact that kubernetes passed nil resize to a common spot where we have a verbosity level available. * Add several verbosity checks for logging. * Have print/log messages respect log level. * I found that having the mount /var/lib/lxc/lxcfs as part of the container definition meant that I could delete and then redeploy the daemonset. The redeploy would fail because /var/lib/lxc/lxcfs on the host would be in a bad state. An ls would show ls: cannot access '/var/lib/lxc/lxcfs': Transport endpoint is not connected total 8 d????????? ? ? ? ? ? lxcfs drwxr-xr-x 45 root root 4096 Apr 25 02:35 .. drwxr-xr-x 3 root root 4096 Apr 25 14:35 . However changing the container def so that it mounted /var/lib/lxc instead seems to work without issue. * Don't need to get ctr from host anymore, have it installed in the agent image instead. * - Use ctr installed in image instead of mounted from container host. - Fix agent redeploy problem caused by mounting /var/lib/lxc/lxcfs instead of /var/lib/lxc * ContainerdContainerRuntime closes connection to containerd after it has finished running the debugger container. ( Avoid resource leak ) * Add doc for registrySkipTLSVerify and verbosity settings.
- Loading branch information