Skip to content

feat: ClusterState does not cache session contexts #1226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 1, 2025

Conversation

milenkovicm
Copy link
Contributor

@milenkovicm milenkovicm commented Apr 4, 2025

... as session context was not cleaned up.

Which issue does this PR close?

Closes #1220.

Rationale for this change

  • ClusterState was keeping HashMap of session_id -> SessionContext which was never cleaned up.
  • ClusterState will never emit events about accessed or removed SessionContexts.
  • There is a need to time out (and remove) contexts from scheduler, or do cluster wide cleanup when session context expires. There were two ways to do this:
    • bringing cache to ClusterState and emit events when cache expires or
    • externally listen for SessionContext events and react when session_id times out (keep cache out of ClusterState.
  • Client SessionContext is authoritative not the context kept scheduler, it will simplify client scheduler interaction if initial create_context is skipped.
  • ClusterState is stateless from connected client perspective

What changes are included in this PR?

In order to have expiring session contexts a LRU cache is needed, which I did not want to bring in. So this implementation proposes not to cache session contexts, rather to create them every time. This is trade off between correctness and keeping additional dependency in. It can be revisited later if needed.

Also, this change brings:

  • Client sets session context id, no need for extra create_context call.
  • get_context has been replaced with create_or_update_context
  • SessionContext notifications are published (SessionAccessed, SessionRemoved)
  • SessionAccessed, SessionRemoved added
  • SessionUpdate removed
  • change in ballista protocol to reflect create_or_update_context method change
  • session_id is required to be set
  • operation_id added to help with client-scheduler request mapping

Are there any user-facing changes?

  • there are changes in ballista protocol

To be done

  • add request id (uuid)

@milenkovicm milenkovicm requested a review from Copilot April 4, 2025 19:25
@milenkovicm milenkovicm marked this pull request as draft April 4, 2025 19:25
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

@milenkovicm milenkovicm changed the title feat: make scheduler session context stateless feat: ClusterState does not cache session contexts Apr 4, 2025
@milenkovicm milenkovicm marked this pull request as ready for review April 6, 2025 19:30
@milenkovicm milenkovicm marked this pull request as draft April 6, 2025 23:31
@milenkovicm milenkovicm force-pushed the bug_1220_session_events branch from 9902f25 to c9ea189 Compare April 13, 2025 15:17
@milenkovicm milenkovicm marked this pull request as ready for review April 17, 2025 11:38
@milenkovicm milenkovicm requested a review from andygrove April 17, 2025 11:39
@milenkovicm milenkovicm force-pushed the bug_1220_session_events branch from 9fcc709 to 01534b4 Compare April 18, 2025 16:30
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't currently have time to review large PRs like this unfortunately, but I skimmed through it and it looks reasonable.

@milenkovicm milenkovicm force-pushed the bug_1220_session_events branch from 29434b1 to cee1510 Compare May 18, 2025 15:58
@milenkovicm milenkovicm merged commit 30f3042 into apache:main Jun 1, 2025
23 checks passed
@milenkovicm milenkovicm deleted the bug_1220_session_events branch June 1, 2025 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JobStateEventStream does not emit events related to Session
2 participants