Skip to content

Commit a93b5fe

Browse files
committed
Merge branch 'main' into develop
2 parents b337942 + b3029ca commit a93b5fe

File tree

5 files changed

+58
-86
lines changed

5 files changed

+58
-86
lines changed

CHANGELOG.md

+23-2
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,44 @@
11
# Changelog
22

33
## 3.5.0
4-
Not yet released.
4+
[Released](https://bitbucket.org/xnatdev/container-service/src/3.5.0/).
55

66
* **Improvement** [CS-946][] Prevent setting mutually distinct k8s PVC mounting options
7+
* **Bugfix** [CS-966][] Ensure tracking of container IDs in workflow tables in a Kubernetes environment
78
* **Bugfix** [CS-968][] Switch the docker API library we use from [docker-client][] to [docker-java][].
89
This should restore CS functionality on docker engine v25 and higher.
10+
11+
### A Note About Our Docker Library
12+
Originally we used the [spotify/docker-client][] library to wrap the docker remote API in java method calls. They stopped updating that and put out their final release [v6.1.1][] in 2016.
13+
14+
We switched the Container Service to use a fork of that client, [dmandalidis/docker-client][] in CS version 3.0.0. Given that this was a fork of the client we already used, it was a simple drop-in replacement with no changes needed.
15+
16+
But that library maintainer did continue to make changes. In 2023 they released a major version upgrade, [v7.0.0][], which dropped support for Java 8. That is the version of Java we use in XNAT (at time of writing) so this change meant we weren't able to update our version of this library. That was fine for a while...
17+
...Until version 25 of the docker engine, in which they made an API change which caused an error in the version we used of `docker-client`. The library (presumably) fixed their issue but we weren't able to use that fix because our version of the library was frozen by their decision to drop Java 8 support.
18+
19+
This forced us to switch our library from `docker-client` to [docker-java][]. This was not a drop-in replacement, and did require a migration. All the same docker API endpoints were supported in a 1:1 replacement—which took a little effort but was straightforward—except for one. The `docker-java` library did not support requesting `GenericResources` on a swarm service, which is the mechanism by which we allow commands to specify that they need a GPU. We opened a ticket reporting that lack of support (https://github.com/docker-java/docker-java/issues/2320), but at time of writing there has been no response. I created a fork (https://github.com/johnflavin/docker-java) and fixed the issue myself (https://github.com/docker-java/docker-java/pull/2327), but at time of writing that also has no response. I built a custom version of `docker-java` `3.4.0.1` and pushed that to the XNAT artifactory ([ext-release-local/com/github/docker-java][]).
20+
21+
Long story short, as of CS version `3.5.0` we depend on `docker-java` version `3.4.0.1` for our docker (and swarm) API support.
922

1023
[CS-946]: https://radiologics.atlassian.net/browse/CS-946
24+
[CS-966]: https://radiologics.atlassian.net/browse/CS-966
1125
[CS-968]: https://radiologics.atlassian.net/browse/CS-968
12-
[docker-client]: https://github.com/dmandalidis/docker-client
26+
[docker-client]: https://github.com/spotify/docker-client
27+
[spotify/docker-client]: https://github.com/spotify/docker-client
28+
[v6.1.1]: https://github.com/spotify/docker-client/releases/tag/v6.1.1
29+
[dmandalidis/docker-client]: https://github.com/dmandalidis/docker-client
30+
[v7.0.0]: https://github.com/dmandalidis/docker-client/tree/v7.0.0
1331
[docker-java]: https://github.com/docker-java/docker-java
32+
[ext-release-local/com/github/docker-java]: https://nrgxnat.jfrog.io/ui/repos/tree/General/ext-release-local/com/github/docker-java
1433

1534
## 3.4.3
1635
[Released](https://bitbucket.org/xnatdev/container-service/src/3.4.3/).
1736

37+
* **Improvement** [XNAT-7903][] Support shared data at project level
1838
* **Bugfix** [CS-945][] Fix race condition on overlapping container status events
1939

2040
[CS-945]: https://radiologics.atlassian.net/browse/CS-945
41+
[XNAT-7903]: https://radiologics.atlassian.net/browse/XNAT-7903
2142

2243
## 3.4.2
2344
[Released](https://bitbucket.org/xnatdev/container-service/src/3.4.2/).

build.gradle

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ plugins {
1414
}
1515

1616
group "org.nrg.xnatx.plugins"
17-
version "3.5.0-SNAPSHOT"
17+
version "3.5.0"
1818

1919
sourceCompatibility = 1.8
2020
targetCompatibility = 1.8

src/main/java/org/nrg/containers/api/KubernetesInformerImpl.java

+8-2
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
import io.kubernetes.client.openapi.models.V1ObjectMeta;
1919
import io.kubernetes.client.openapi.models.V1Pod;
2020
import io.kubernetes.client.openapi.models.V1PodList;
21+
import io.kubernetes.client.openapi.models.V1PodSpec;
2122
import io.kubernetes.client.openapi.models.V1PodStatus;
2223
import io.kubernetes.client.util.CallGeneratorParams;
2324
import lombok.extern.slf4j.Slf4j;
@@ -223,6 +224,10 @@ private static KubernetesStatusChangeEvent createEvent(V1Pod pod) {
223224
timestamp = null;
224225
}
225226

227+
// Node
228+
final V1PodSpec podSpec = pod.getSpec();
229+
final String nodeName = podSpec == null ? null : podSpec.getNodeName();
230+
226231
return new KubernetesStatusChangeEvent(
227232
jobNameFromPodLabels(pod),
228233
objName(pod),
@@ -232,7 +237,8 @@ private static KubernetesStatusChangeEvent createEvent(V1Pod pod) {
232237
containerState,
233238
containerStateReason,
234239
exitCode,
235-
timestamp
240+
timestamp,
241+
nodeName
236242
);
237243
}
238244

@@ -276,7 +282,7 @@ public void onUpdate(V1Pod oldObj, V1Pod newObj) {
276282
return;
277283
}
278284

279-
log.debug("Pod {} updated", newEvent.podName());
285+
log.debug("Pod {} updated", newEvent.getPodName());
280286
triggerEvent(newEvent);
281287
}
282288

Original file line numberDiff line numberDiff line change
@@ -1,42 +1,24 @@
11
package org.nrg.containers.events.model;
22

3+
import lombok.Value;
34
import org.nrg.containers.model.kubernetes.KubernetesPodPhase;
45

56
import java.time.OffsetDateTime;
67
import java.util.Collections;
78
import java.util.Map;
8-
import java.util.Objects;
99

10+
@Value
1011
public class KubernetesStatusChangeEvent implements ContainerEvent {
11-
private final String jobName;
12-
private final String podName;
13-
private final String containerId;
14-
private final KubernetesPodPhase podPhase;
15-
private final String podPhaseReason;
16-
private final KubernetesContainerState containerState;
17-
private final String containerStateReason;
18-
private final Integer exitCode;
19-
private final OffsetDateTime timestamp;
20-
21-
public KubernetesStatusChangeEvent(String jobName,
22-
String podName,
23-
String containerId,
24-
KubernetesPodPhase podPhase,
25-
String podPhaseReason,
26-
KubernetesContainerState containerState,
27-
String containerStateReason,
28-
Integer exitCode,
29-
OffsetDateTime timestamp) {
30-
this.jobName = jobName;
31-
this.podName = podName;
32-
this.containerId = containerId;
33-
this.podPhase = podPhase;
34-
this.podPhaseReason = podPhaseReason;
35-
this.containerState = containerState;
36-
this.containerStateReason = containerStateReason;
37-
this.exitCode = exitCode;
38-
this.timestamp = timestamp;
39-
}
12+
String jobName;
13+
String podName;
14+
String containerId;
15+
KubernetesPodPhase podPhase;
16+
String podPhaseReason;
17+
KubernetesContainerState containerState;
18+
String containerStateReason;
19+
Integer exitCode;
20+
OffsetDateTime timestamp;
21+
String nodeId;
4022

4123
@Override
4224
public String backendId() {
@@ -82,48 +64,4 @@ public String details() {
8264
}
8365
return details;
8466
}
85-
86-
public String podName() {
87-
return podName;
88-
}
89-
90-
public String containerId() {
91-
return containerId;
92-
}
93-
94-
@Override
95-
public String toString() {
96-
return "KubernetesStatusChangeEvent{" +
97-
"jobName=\"" + jobName + "\"" +
98-
", podName=\"" + podName + "\"" +
99-
", containerId=\"" + containerId + "\"" +
100-
", podPhase=\"" + podPhase + "\"" +
101-
", podPhaseReason=\"" + podPhaseReason + "\"" +
102-
", containerState=\"" + containerState + "\"" +
103-
", containerStateReason=\"" + containerStateReason + "\"" +
104-
", exitCode=\"" + exitCode + "\"" +
105-
", timestamp=\"" + timestamp + "\"" +
106-
"}";
107-
}
108-
109-
@Override
110-
public boolean equals(Object o) {
111-
if (this == o) return true;
112-
if (o == null || getClass() != o.getClass()) return false;
113-
KubernetesStatusChangeEvent that = (KubernetesStatusChangeEvent) o;
114-
return Objects.equals(jobName, that.jobName) &&
115-
Objects.equals(podName, that.podName) &&
116-
Objects.equals(containerId, that.containerId) &&
117-
podPhase == that.podPhase &&
118-
Objects.equals(podPhaseReason, that.podPhaseReason) &&
119-
containerState == that.containerState &&
120-
Objects.equals(containerStateReason, that.containerStateReason) &&
121-
Objects.equals(exitCode, that.exitCode) &&
122-
Objects.equals(timestamp, that.timestamp);
123-
}
124-
125-
@Override
126-
public int hashCode() {
127-
return Objects.hash(jobName, podName, containerId, podPhase, podPhaseReason, containerState, containerStateReason, exitCode, timestamp);
128-
}
12967
}

src/main/java/org/nrg/containers/services/impl/ContainerServiceImpl.java

+14-7
Original file line numberDiff line numberDiff line change
@@ -861,22 +861,29 @@ public void processEvent(final ContainerEvent event) {
861861
KubernetesStatusChangeEvent kEvent = ((KubernetesStatusChangeEvent) event);
862862

863863
// Check if we need to set additional ids
864-
boolean shouldUpdatePodName = containerWithAddedEvent.podName() == null && kEvent.podName() != null;
865-
boolean shouldUpdateContainerId = containerWithAddedEvent.containerId() == null && kEvent.containerId() != null;
866-
if (shouldUpdatePodName || shouldUpdateContainerId) {
864+
boolean shouldUpdateNode = containerWithAddedEvent.nodeId() == null && kEvent.getNodeId() != null;
865+
boolean shouldUpdatePodName = containerWithAddedEvent.podName() == null && kEvent.getPodName() != null;
866+
boolean shouldUpdateContainerId = containerWithAddedEvent.containerId() == null && kEvent.getContainerId() != null;
867+
if (shouldUpdateNode || shouldUpdatePodName || shouldUpdateContainerId) {
867868
Container.Builder builder = containerWithAddedEvent.toBuilder();
868869

870+
if (shouldUpdateNode) {
871+
log.debug("Container {} for job {}: setting nodeId to node {}",
872+
container.databaseId(), container.jobName(), kEvent.getNodeId()
873+
);
874+
builder.nodeId(kEvent.getNodeId());
875+
}
869876
if (shouldUpdatePodName) {
870877
log.debug("Container {} for job {}: setting taskId to pod name {}",
871-
container.databaseId(), container.jobName(), kEvent.podName()
878+
container.databaseId(), container.jobName(), kEvent.getPodName()
872879
);
873-
builder.taskId(kEvent.podName());
880+
builder.taskId(kEvent.getPodName());
874881
}
875882
if (shouldUpdateContainerId) {
876883
log.debug("Container {} for job {}: setting containerId to container id {}",
877-
container.databaseId(), container.jobName(), kEvent.containerId()
884+
container.databaseId(), container.jobName(), kEvent.getContainerId()
878885
);
879-
builder.containerId(kEvent.containerId());
886+
builder.containerId(kEvent.getContainerId());
880887
}
881888
containerEntityService.update(fromPojo(builder.build()));
882889
containerWithAddedEvent = retrieve(container.databaseId());

0 commit comments

Comments
 (0)