Issue 88 kubectl debug no connect (#99)

* Trying to cherry pick lxcfs binaries was resulting in a borked image because of missing pieces/versions. Just installing it instead, that way the prereqs will be there. * - Use set -x so that if is a failure during script execution you can see where it occurred. - Adjust paths passed to cp to match current location of lxcfs binaries. * Add a verbosity command line option and some client side logging. * Add some server side logging. * Neglected to add support to Verbosity setting in plugin's config file. * Improve error reporting if an unsupported container runtime is in use. Before the client would just hang and you have no error in the agent log. * Prep work for supporting containerd: Move the call to NewRuntimeManager from NewServer to ServeDebug. That is, move construction to point where we know what container runtime is in use but before we have invoked kubelet API. * Pass verbosity setting from client(plugin) to server(agent). * Prep work for containerd support : Add containerid member to RuntimeManager. * Prep work for adding containerd support - move validation of container id into NewRuntimeManager. * Prep work for adding containerd support : - Rename DebugAttacher to DebugAttacherDocker - Put in a check to make sure DebugAttacherDocker implements kubeletremote.Attacher * Definition of RuntimeManager and DebugAttacherDocker were interlaced. Reoganized file so all DebugAttacherDocker pieces come before all RuntimeManager pieces. * Prep work for adding containerd support : - Pass agent config to NewRuntimeManager instead of various members of it. Doing this to avoid adding several more parameters for containerd options. - Move GetAttacher after NewRuntimeManager. * Neglected to commit these before. * - More refactoring in preparation of getting containerd support in. - RuntimeManager can successfully create containerd client. * More progress on containerd support. Image download is complete. * - Change MountNSEnter to use an int64 for holding target pid instead of an int. Did this because an int64 can hold an int, which is what docker runtime uses for pid values, as well as an int32, which is what containerd runtime uses for pid values. - Fill in containerd implementation of ContainerInfo * Add --registry-skip-tls-verify option. Not sure why, but pulls from dockerhub started failing at some point between today and last Friday. That is, it was clear that the client was unhappy with the cert signer for dockerhubs cert but I don't know why exactly that became a problem. * - Make use of registry skip tls verify option in RunTime and DebugAttachers. - Containerd creates a container now ( so progress from before ) but creation of the task within the container fails with 'User namespaces enabled, but no uid mappings found'. Not sure yet where the problem lies. * - Comment out the setting of namespaces for now. Somehow it isn't working quite right, the container creation fails. - Adjust the container and task clean up calls. For example, if NewContainer returns a non-nil error + a non-nil container then we still need to make sure that the container gets deleted. - If stderr that kubelet passes us is null that pass stdout to containerd.NewTask. NewTask will fail if you pass nill for a stream. - Wasn't calling task.Start and so no process was actually getting started in our container. - Make sure command line passed in from kubectl debug by the user actually are used. ( Need to pass to put into spec opts ) - Fix TTY issue. Not only do you need to pass WithTermainl to NewTask but you need to put WithTTY into the spec opts. * - ContainerRuntime.ContainerInfo, at least in case of Containerd impl, caches the results from the first call. Fixes issue below, but maybe need to rethink the caching since a targetContainerId is being passed in. - When using containerd, we are at least able to put the debug container into the network, ipc and pid namespaces of the target container. There is some issue when trying to set the user namespace. * These should hae been added in the last commit. * Add a comment that explains why I am punting on setting the user namespace in the kubernetes case. * Changes so that we can deploy to a cluster wihere contarinerd runtime v1 is in use. - In start.sh, the agent container entry point, check parent process's command line to see if it is using containerd v1 runtime and if so set env var KCTLDBG_CONTAINERDV1_SHIM - In agent runtime. check for above env var and client option to use v1 runtime if set. - Shorten up the name of the debug container. The v1 runtime creates a unix socket whose path includes the container name. The unix socket path has a length limit of 106 chars and we were running into that. * Containerd mounts for agent container. * Agentless support for containerd container runtime. * Debug containers were orphaned when the kubectl<->agent network connection timed out. * Remove targetContainerInfo from ContainerdContainerRuntime. i.e. Remove some state. I feel that ideally the container runtimes should be as stateless as possible. * Changes to make it so that there can be more than one debug container per debugee container. - Change the debugger container name to be a guid from being dbg-[id of debuggee]. - So that we can still identify debug containers, label then with - the id of the debuggee - the name of the user that created the debug container ( taken from the system where kbuectl was executed ) - the hostname where the request came from * - In agent when containerd is the container runtime, make sure that each container we create gets a uniquely named snapshot. Need this in order to allow for multiple debug containers running on the same node. - In kubelet plugin, if port forwarding fails assume that it is because it is already forwarded. This is a quick hack to allow for multiple debug containers to be launched from the same workstation. e.g. If you want to run tcpdump in one and a traffic generator in another. * For containerd container runtime, handle tty resize. * - start.sh is trying to run lxcfs from the host fs. However it is assuming it is in /usr/local/bin when it may be in /usr/bin. Make a check and then use the correct path. - We run lxcf on the host file system. However if the host is running lxcf there was a collision with pid files. So now we give the instance we start up a unique pid. * In order to be able to clean up any orphaned debug containers we want to have ctr in the agent image. However when installing containerd to pickup ctr the image ended up being 300MB. Setting up the same image with xenial instead of alpine resulted in an image of 160MB. So xenial it is. * Back to basing agent image on alpine. Found I can copy ctr executable from a bulid image. Now the image size is down to 68MB from 160MB. * - When contructing path /var/lib/lxc/lxcfs do so in the host mount namespace. - Fix problem with exec of lxcfs, wasn't using -- when running under nsenter and consequently nsenter was getting confused by params to lxcfs. * Remove obsolete comment. * Move message that logs fact that kubernetes passed nil resize to a common spot where we have a verbosity level available. * Add several verbosity checks for logging. * Have print/log messages respect log level. * I found that having the mount /var/lib/lxc/lxcfs as part of the container definition meant that I could delete and then redeploy the daemonset. The redeploy would fail because /var/lib/lxc/lxcfs on the host would be in a bad state. An ls would show ls: cannot access '/var/lib/lxc/lxcfs': Transport endpoint is not connected total 8 d????????? ? ? ? ? ? lxcfs drwxr-xr-x 45 root root 4096 Apr 25 02:35 .. drwxr-xr-x 3 root root 4096 Apr 25 14:35 . However changing the container def so that it mounted /var/lib/lxc instead seems to work without issue. * Don't need to get ctr from host anymore, have it installed in the agent image instead. * - Use ctr installed in image instead of mounted from container host. - Fix agent redeploy problem caused by mounting /var/lib/lxc/lxcfs instead of /var/lib/lxc * ContainerdContainerRuntime closes connection to containerd after it has finished running the debugger container. ( Avoid resource leak ) * Add doc for registrySkipTLSVerify and verbosity settings.
aylei · Apr 28, 2020 · 1ab75d9 · 1ab75d9
1 parent 1fd0bac
commit 1ab75d9
Show file tree

Hide file tree

Showing 14 changed files with 1,225 additions and 278 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -1,19 +1,15 @@
-FROM ubuntu:xenial as build
+FROM alpine:3.11.5 as build
 
-RUN apt-get update && apt-get install libcgmanager-dev libnih-dbus-dev libnih-dev libfuse-dev automake libtool libpam-dev wget gcc automake -y && apt-get clean && rm -rf /var/lib/apt/lists/*
+RUN apk add lxcfs containerd 
 
-ENV LXCFS_VERSION 3.1.2
-RUN wget https://linuxcontainers.org/downloads/lxcfs/lxcfs-$LXCFS_VERSION.tar.gz && \
-	mkdir /lxcfs && tar xzvf lxcfs-$LXCFS_VERSION.tar.gz -C /lxcfs  --strip-components=1 && \
-	cd /lxcfs && ./configure && make
+FROM alpine:3.11.5
 
-FROM alpine:3.10
+COPY --from=build /usr/bin/lxcfs /usr/bin/lxcfs
+COPY --from=build /usr/lib/*fuse* /usr/lib/
+COPY --from=build /usr/bin/ctr /usr/bin/ctr
 
-COPY --from=build /lxcfs/lxcfs /usr/local/bin/lxcfs
-COPY --from=build /lxcfs/.libs/liblxcfs.so /usr/local/lib/lxcfs/liblxcfs.so
-COPY --from=build /lxcfs/lxcfs /lxcfs/lxcfs
-COPY --from=build /lxcfs/.libs/liblxcfs.so /lxcfs/liblxcfs.so
 COPY ./scripts/start.sh /
+RUN chmod 755 /start.sh
 COPY ./debug-agent /bin/debug-agent
 
 EXPOSE 10027

diff --git a/README.md b/README.md
@@ -198,6 +198,11 @@ agentMemoryLimits: ""
 # format is []string
 # If not set, this parameter is empty by default (Means that any labels of the original pod are not retained, and the labels of the copied pods are empty.)
 forkPodRetainLabels: []
+# You can disable SSL certificate check when communicating with image registry by 
+# setting registrySkipTLSVerify to true.
+registrySkipTLSVerify: false
+# You can set the log level with the verbosity setting
+verbosity : 0
 ```
 
 If the debug-agent is not accessible from host port, it is recommended to set `portForward: true` to using port-forawrd mode.

diff --git a/cmd/agent/main.go b/cmd/agent/main.go
@@ -8,7 +8,7 @@ import (
 )
 
 func main() {
-
+	log.SetFlags(log.LstdFlags | log.Lshortfile)
 	var configFile string
 	flag.StringVar(&configFile, "config.file", "", "Config file location.")
 	flag.Parse()

diff --git a/go.mod b/go.mod
@@ -14,19 +14,28 @@ require (
 	github.com/PuerkitoBio/urlesc v0.0.0-20170810143723-de5bf2ad4578
 	github.com/beorn7/perks v0.0.0-20180321164747-3a771d992973
 	github.com/census-instrumentation/opencensus-proto v0.2.1
+	github.com/containerd/cgroups v0.0.0-20200407151229-7fc7a507c04c // indirect
+	github.com/containerd/console v1.0.0 // indirect
+	github.com/containerd/containerd v1.3.3
+	github.com/containerd/continuity v0.0.0-20200413184840-d3ef23f19fbb // indirect
+	github.com/containerd/fifo v0.0.0-20200410184934-f15a3290365b // indirect
+	github.com/containerd/ttrpc v1.0.0 // indirect
+	github.com/containerd/typeurl v1.0.0
 	github.com/dgrijalva/jwt-go v3.2.0+incompatible
-	github.com/docker/distribution v2.7.0+incompatible
+	github.com/docker/distribution v2.7.1-0.20190205005809-0d3efadf0154+incompatible // 2020-04-15 d : https://github.com/containerd/containerd/issues/3031
 	github.com/docker/docker v0.0.0-20171023200535-7848b8beb9d3
 	github.com/docker/go-connections v0.4.0
-	github.com/docker/go-units v0.3.3
+	github.com/docker/go-events v0.0.0-20190806004212-e31b211e4f1c // indirect
+	github.com/docker/go-units v0.4.0
 	github.com/docker/spdystream v0.0.0-20181023171402-6480d4af844c
 	github.com/evanphx/json-patch v4.1.0+incompatible
 	github.com/exponent-io/jsonpath v0.0.0-20151013193312-d6023ce2651d
 	github.com/go-openapi/jsonpointer v0.19.0
 	github.com/go-openapi/jsonreference v0.19.0
 	github.com/go-openapi/spec v0.19.0
 	github.com/go-openapi/swag v0.19.0
-	github.com/gogo/protobuf v1.2.0
+	github.com/gogo/googleapis v1.3.2 // indirect
+	github.com/gogo/protobuf v1.3.1
 	github.com/golang/protobuf v1.3.2
 	github.com/google/btree v1.0.0
 	github.com/google/gofuzz v0.0.0-20170612174753-24818f796faf
@@ -37,33 +46,37 @@ require (
 	github.com/imdario/mergo v0.3.6
 	github.com/inconshreveable/mousetrap v1.0.0
 	github.com/json-iterator/go v1.1.5
-	github.com/konsorten/go-windows-terminal-sequences v1.0.1
+	github.com/konsorten/go-windows-terminal-sequences v1.0.2
 	github.com/mailru/easyjson v0.0.0-20190312143242-1de009706dbe
 	github.com/matttproud/golang_protobuf_extensions v1.0.1
 	github.com/mitchellh/go-wordwrap v1.0.0
 	github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd
 	github.com/modern-go/reflect2 v0.0.0-20180701023420-4b7aa43c6742
 	github.com/opencontainers/go-digest v1.0.0-rc1
 	github.com/opencontainers/image-spec v1.0.1
+	github.com/opencontainers/runc v0.1.1 // indirect
+	github.com/opencontainers/runtime-spec v1.0.2
 	github.com/pborman/uuid v0.0.0-20180906182336-adf5a7427709
 	github.com/petar/GoLLRB v0.0.0-20130427215148-53be0d36a84c
 	github.com/peterbourgon/diskv v2.0.1+incompatible
-	github.com/pkg/errors v0.8.0
+	github.com/pkg/errors v0.9.1
 	github.com/prometheus/client_golang v0.9.2
 	github.com/prometheus/client_model v0.0.0-20180712105110-5c3871d89910
 	github.com/prometheus/common v0.0.0-20181218105931-67670fe90761
 	github.com/prometheus/procfs v0.0.0-20181204211112-1dc9a6cbc91a
 	github.com/russross/blackfriday v0.0.0-20151117072312-300106c228d5
 	github.com/shurcooL/sanitized_anchor_name v1.0.0
-	github.com/sirupsen/logrus v1.1.1
+	github.com/sirupsen/logrus v1.4.2
 	github.com/spf13/cobra v0.0.3
 	github.com/spf13/pflag v1.0.3
+	github.com/syndtr/gocapability v0.0.0-20180916011248-d98352740cb2 // indirect
+	github.com/urfave/cli v1.22.4 // indirect
 	go.opencensus.io v0.22.0
 	golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2
 	golang.org/x/net v0.0.0-20190628185345-da137c7871d7
 	golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45
 	golang.org/x/sync v0.0.0-20190423024810-112230192c58
-	golang.org/x/sys v0.0.0-20190712062909-fae7ac547cb7
+	golang.org/x/sys v0.0.0-20200120151820-655fe14d7479
 	golang.org/x/text v0.3.2
 	golang.org/x/time v0.0.0-20181108054448-85acf8d2951c
 	google.golang.org/api v0.7.0