Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Sync cache invalidation between Magpie/Twitcher #189

Open
fmigneault opened this issue Aug 11, 2021 · 4 comments
Open

[Feature] Sync cache invalidation between Magpie/Twitcher #189

fmigneault opened this issue Aug 11, 2021 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@fmigneault
Copy link
Member

fmigneault commented Aug 11, 2021

Description

Because permissions are applied onto Magpie and are resolved for request-access by Twitcher instead, any already active caching of older requests will not be immediately synchronized on Twitcher side right after the permission update on Magpie side (caches are not shared, and therefore not invalidated on permission update).

Following requests to the same resources (within the caching expiration delay) will hit the cached response triggered by the access during the previous requests. New permissions resolution will not be effective until the cache expires.
For example:

GET /twitcher/ows/proxy/thredds/<file-ref>   
    => denied, response is now cached
PUT /magpie/users|groups/{id}/resources/<file-ref-id>/permissions  
    => <file-ref> made 'allowed' for previous request user
GET /twitcher/ows/proxy/thredds/<file-ref>  (cached)  
    =>  denied instead of allowed
... wait ~20s (caching delay) ...
GET /twitcher/ows/proxy/thredds/<file-ref>  
    =>  allowed

Note that the effect goes both ways, i.e.: removing access to a resource will not be blocked until the delay was reached.

Edit:

For the invalidation to take effect on Twitcher side, there are 2 methods 3 methods:

  1. Explicitly set cache-control: no-cache header during the next file access request to enforce reset of cache.
    This works, but should be done only on the first call after permission update, otherwise all caching performance advantages are lost over many repeated access to the same resource.
  2. Share the Magpie/Twitcher caches via file references to allow them to invalidate each other.
    • To do this, we need to have a volume mounted by both images, and have both of them use cache.type = file + corresponding paths for cache.lock_dir and cache.data_dir in their INI configs.
    • More updates to Magpie/Twitcher will be needed to correctly invalidate caches of type file (only memory is tested, and they are not hashed the same way for selective invalidation - e.g.: invalidate only ACL for resource X / user Y, etc.).
  3. (best) Employ redis or mongodb extension with beaker to synchronize caches.
    https://beaker.readthedocs.io/en/latest/modules/redis.html
    https://beaker.readthedocs.io/en/latest/modules/mongodb.html
    Not only would this allow to sync or invalidate caches across Magpie/Twitcher, but also between individual workers of Magpie and Twitcher. At the moment, each worker holds its own in-memory cache depending on which requests it received, meaning cached/non-cached responses won't be the same (and won't expire at the same time) depending on which worker processes the request and when was the last one received by it.

References

@dbyrns
Copy link
Collaborator

dbyrns commented Aug 30, 2021

For me the cache does not need to be shared or synchronized. It is pretty common on the web to have to wait a couple of seconds or minutes before a new authorization takes effect.
The only problematic scenario is when running tests that don't take that into account. So for me the solution #1 is the good one with one caveat. If a service (A) calls itself another service (B), even if the no-cache header is set when calling (A), there is no guarantee that (A) will set this header while calling (B). So the B request could still hit the bad cached value.
So the tests need to be carefully crafted when updating permissions so that no waterfall calls occur when setting the no-cache header or take into account the time required before the cache expires.

@fmigneault
Copy link
Member Author

If service (A) calls a protected service (B), it should technically forward at least the Set-Cookie header already (or other cookie / auth containers based on impl.) in order to gain the same access user/identity, otherwise it assumes (B) is always public. It should also forward to Cache-Control to be sure.

I think that this is not a very common nor problematic case though. Most probably, either the endpoint called on (B) by (A) is a static path, so the endpoint shouldn't change permission, or it is a "same resource" path (eg matching WMS for a Thredds file), so permissions should be changed similarly at about the same time to reflect the desired access. The caches of both resources should also reset around the same time if last cached request successively called (A)->(B).

I find also that changing a permission on (A) from/to secure/public impacts only during that kind of "same resource" case, considering below "initial" permissions.

A (secure) B (secure) Both need permission update to be public, about same delay until reset cache takes effect if permissions are applied on both. Otherwise, bad permission setup.
A (public) B (secure) (B) always blocked to begin with, doesn't matter that A becomes secured, because token was always required to begin with.
A (secure) B (public) (B) is assumed public, it remains public. Will be callable publicly via (A) after cache reset. Making (A) public doesn't change access to (B) that could be called directly.
A (public) B (public) (B) is assumed public, it remains public, just (A) that becomes blocked after cache reset. Technically, nothing wrong. (A) can still access (B), but via public. To do things correctly, permission on (B) should be changed also if they should match blocked access.

@dbyrns
Copy link
Collaborator

dbyrns commented Aug 31, 2021

It is indeed a edge case. I was just pointing out that one should keep that in mind when writing tests. The exact usecase I had in mind that you didn't mention was more a WPS (as service A) trying to access data (thredds as service B) or shape (geoserver as service B) or Weaver (as service A) calling other WPS (as service B). I would be surprised if our WPS forward the cache-control header.

But again, I assume that WPS tests will not include permission changes and will rather use public data/shape. It is really just a reminder how deep the cache can trick us.

@tlvu
Copy link
Collaborator

tlvu commented Sep 1, 2021

It is pretty common on the web to have to wait a couple of seconds or minutes before a new authorization takes effect.

I am with DavidB on this one. I think this is a very nice to have but we do not have to resolve it right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants