[Feature] Sync cache invalidation between Magpie/Twitcher #189

fmigneault · 2021-08-11T19:18:09Z

Description

Because permissions are applied onto Magpie and are resolved for request-access by Twitcher instead, any already active caching of older requests will not be immediately synchronized on Twitcher side right after the permission update on Magpie side (caches are not shared, and therefore not invalidated on permission update).

Following requests to the same resources (within the caching expiration delay) will hit the cached response triggered by the access during the previous requests. New permissions resolution will not be effective until the cache expires.
For example:

GET /twitcher/ows/proxy/thredds/<file-ref>   
    => denied, response is now cached
PUT /magpie/users|groups/{id}/resources/<file-ref-id>/permissions  
    => <file-ref> made 'allowed' for previous request user
GET /twitcher/ows/proxy/thredds/<file-ref>  (cached)  
    =>  denied instead of allowed
... wait ~20s (caching delay) ...
GET /twitcher/ows/proxy/thredds/<file-ref>  
    =>  allowed

Note that the effect goes both ways, i.e.: removing access to a resource will not be blocked until the delay was reached.

Edit:

For the invalidation to take effect on Twitcher side, there are ~~2 methods~~ 3 methods:

Explicitly set cache-control: no-cache header during the next file access request to enforce reset of cache.
This works, but should be done only on the first call after permission update, otherwise all caching performance advantages are lost over many repeated access to the same resource.
Share the Magpie/Twitcher caches via file references to allow them to invalidate each other.
- To do this, we need to have a volume mounted by both images, and have both of them use cache.type = file + corresponding paths for cache.lock_dir and cache.data_dir in their INI configs.
- More updates to Magpie/Twitcher will be needed to correctly invalidate caches of type file (only memory is tested, and they are not hashed the same way for selective invalidation - e.g.: invalidate only ACL for resource X / user Y, etc.).
(best) Employ redis or mongodb extension with beaker to synchronize caches.
https://beaker.readthedocs.io/en/latest/modules/redis.html
https://beaker.readthedocs.io/en/latest/modules/mongodb.html
Not only would this allow to sync or invalidate caches across Magpie/Twitcher, but also between individual workers of Magpie and Twitcher. At the moment, each worker holds its own in-memory cache depending on which requests it received, meaning cached/non-cached responses won't be the same (and won't expire at the same time) depending on which worker processes the request and when was the last one received by it.

References

This branch: https://github.com/Ouranosinc/Magpie/tree/invalidate-perm-via-group provides an initial draft of cache invalidation following permission update. It still is missing updates for cache.type = file, but invalidation occurs as expected with memory on Magpie-side.
[Feature] Notice related to cache delay following permission update Ouranosinc/Magpie#462 is a temporary "resolution" description if such that we consider this caching situation a known issue and that won't be adjusted according to above shared-volume method.
Following PR: provide details for debugging failing test-thredds notebook Ouranosinc/PAVICS-e2e-workflow-tests#87 bypasses the issue by enforcing cache-control: no-cache header during tests. Failing tests in re-enable caching feature of twitcher #188 (comment) work without the cache.

The text was updated successfully, but these errors were encountered:

dbyrns · 2021-08-30T20:08:50Z

For me the cache does not need to be shared or synchronized. It is pretty common on the web to have to wait a couple of seconds or minutes before a new authorization takes effect.
The only problematic scenario is when running tests that don't take that into account. So for me the solution #1 is the good one with one caveat. If a service (A) calls itself another service (B), even if the no-cache header is set when calling (A), there is no guarantee that (A) will set this header while calling (B). So the B request could still hit the bad cached value.
So the tests need to be carefully crafted when updating permissions so that no waterfall calls occur when setting the no-cache header or take into account the time required before the cache expires.

fmigneault · 2021-08-30T21:53:40Z

If service (A) calls a protected service (B), it should technically forward at least the Set-Cookie header already (or other cookie / auth containers based on impl.) in order to gain the same access user/identity, otherwise it assumes (B) is always public. It should also forward to Cache-Control to be sure.

I think that this is not a very common nor problematic case though. Most probably, either the endpoint called on (B) by (A) is a static path, so the endpoint shouldn't change permission, or it is a "same resource" path (eg matching WMS for a Thredds file), so permissions should be changed similarly at about the same time to reflect the desired access. The caches of both resources should also reset around the same time if last cached request successively called (A)->(B).

I find also that changing a permission on (A) from/to secure/public impacts only during that kind of "same resource" case, considering below "initial" permissions.


A (secure)	B (secure)	Both need permission update to be public, about same delay until reset cache takes effect if permissions are applied on both. Otherwise, bad permission setup.
A (public)	B (secure)	(B) always blocked to begin with, doesn't matter that A becomes secured, because token was always required to begin with.
A (secure)	B (public)	(B) is assumed public, it remains public. Will be callable publicly via (A) after cache reset. Making (A) public doesn't change access to (B) that could be called directly.
A (public)	B (public)	(B) is assumed public, it remains public, just (A) that becomes blocked after cache reset. Technically, nothing wrong. (A) can still access (B), but via public. To do things correctly, permission on (B) should be changed also if they should match blocked access.

dbyrns · 2021-08-31T15:06:18Z

It is indeed a edge case. I was just pointing out that one should keep that in mind when writing tests. The exact usecase I had in mind that you didn't mention was more a WPS (as service A) trying to access data (thredds as service B) or shape (geoserver as service B) or Weaver (as service A) calling other WPS (as service B). I would be surprised if our WPS forward the cache-control header.

But again, I assume that WPS tests will not include permission changes and will rather use public data/shape. It is really just a reminder how deep the cache can trick us.

tlvu · 2021-09-01T20:41:17Z

It is pretty common on the web to have to wait a couple of seconds or minutes before a new authorization takes effect.

I am with DavidB on this one. I think this is a very nice to have but we do not have to resolve it right now.

fmigneault added the enhancement New feature or request label Aug 11, 2021

fmigneault assigned huard, dbyrns, tlvu and fmigneault Aug 11, 2021

fmigneault mentioned this issue Aug 11, 2021

re-enable caching feature of twitcher #188

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Sync cache invalidation between Magpie/Twitcher #189

[Feature] Sync cache invalidation between Magpie/Twitcher #189

fmigneault commented Aug 11, 2021 •

edited

Loading

dbyrns commented Aug 30, 2021

fmigneault commented Aug 30, 2021

dbyrns commented Aug 31, 2021

tlvu commented Sep 1, 2021

[Feature] Sync cache invalidation between Magpie/Twitcher #189

[Feature] Sync cache invalidation between Magpie/Twitcher #189

Comments

fmigneault commented Aug 11, 2021 • edited Loading

Description

References

dbyrns commented Aug 30, 2021

fmigneault commented Aug 30, 2021

dbyrns commented Aug 31, 2021

tlvu commented Sep 1, 2021

fmigneault commented Aug 11, 2021 •

edited

Loading