-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: don't crash in cache refresh/update with nil fsnotify watcher. #254
base: main
Are you sure you want to change the base?
fix: don't crash in cache refresh/update with nil fsnotify watcher. #254
Conversation
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Don't crash in update() if we fail to create an fsnotify watch. This can happen if we have too many open files. In this case we now record a failure for all configured spec directories and in update we always trigger a refresh. If the process if ever able to create new file descriptors the cache becomes functional but in a 'always implicitly fully refreshed' mode instead of auto- refreshed. It's not entirely clear what is the best option to deal with a failed watch creation. Being out of file descriptors typically results in a cascading chain of errors which the process does not usually survive. This fix aims for minimal footprint. On failed watch creation it does not render the cache fully unusable. If the process is ever able to create new file descriptors again the cache also becomes functional, but instead of autorefreshed mode it will be in an 'always implicitly fully refreshed' mode. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
be9b6f3
to
73b86d0
Compare
// (but with autoRefresh left on). One known case when this can happen is | ||
// if we have too many open files. In that case we always return true and | ||
// force a refresh. | ||
if w.watcher == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to check it earlier, e.g. when w.watcher
is about to be assigned to nil
? Or we want to keep it nil in a hope that it will be changed at some point?
@@ -812,6 +814,102 @@ devices: | |||
} | |||
} | |||
|
|||
func TestTooManyOpenFiles(t *testing.T) { | |||
if runtime.GOOS != "linux" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to move this test to the cache_test_linux.go
? Would it work on other *nixes or Darwin?
func triggerEmfile() (*emfile, error) { | ||
fdsize, err := getProcStatusFdSize() | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
em := &emfile{} | ||
|
||
if err := syscall.Getrlimit(syscall.RLIMIT_NOFILE, &em.limit); err != nil { | ||
return nil, err | ||
} | ||
|
||
limit := em.limit | ||
limit.Cur = fdsize | ||
|
||
if err := syscall.Setrlimit(syscall.RLIMIT_NOFILE, &limit); err != nil { | ||
return nil, err | ||
} | ||
|
||
for i := uint64(0); i < fdsize; i++ { | ||
fd, err := syscall.Socket(syscall.AF_INET, syscall.SOCK_DGRAM, 0) | ||
if err != nil { | ||
return em, nil | ||
} | ||
em.fds = append(em.fds, fd) | ||
} | ||
|
||
return nil, fmt.Errorf("failed to trigger EMFILE") | ||
} | ||
|
||
func (em *emfile) undo() error { | ||
if em == nil { | ||
return nil | ||
} | ||
|
||
if err := syscall.Setrlimit(syscall.RLIMIT_NOFILE, &em.limit); err != nil { | ||
return err | ||
} | ||
for _, fd := range em.fds { | ||
syscall.Close(fd) | ||
} | ||
|
||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have some description/comments for this code.
require.NotNil(t, cache) | ||
|
||
// try to trigger original crash with a nil fsnotify.Watcher | ||
_, _ = cache.InjectDevices(&oci.Spec{}, "vendor1.com/device=dev1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
_, _ = cache.InjectDevices(&oci.Spec{}, "vendor1.com/device=dev1") | |
devs := []string{"vendor1.com/device=dev1"} | |
devices, err := cache.InjectDevices(&oci.Spec{}, devs...) | |
require.Error(t, err) | |
require.EqualValues(t, devs, devices) |
Would it make sense to test other cache
APIs as well?
Don't crash if fsnotify.Watcher creation fails, for instance due to being out of file descriptors. See moby/buildkit#5767 for an example.
Fixes #253.