-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(scorecard): scorecard tests for recording management #698
Conversation
d765626
to
d912ace
Compare
Expected output when the test runs successfully. --------------------------------------------------------------------------------
Image: quay.io/thvo/cryostat-operator-scorecard:2.5.0-20231224193927
Entrypoint: [cryostat-scorecard-tests cryostat-recording]
Labels:
"test":"cryostat-recording"
"suite":"cryostat"
Results:
Name: cryostat-recording
State: pass
Log:
deployment cryostat-recording is not yet found
deployment cryostat-recording is not yet found
deployment cryostat-recording is not yet found
deployment cryostat-recording is not yet found
deployment cryostat-recording is not yet found
deployment cryostat-recording is not yet found
deployment cryostat-recording is not yet found
deployment cryostat-recording is not yet available
deployment cryostat-recording is not yet available
deployment cryostat-recording is not yet available
deployment cryostat-recording is not yet available
deployment cryostat-recording is not yet available
deployment cryostat-recording is not yet available
deployment cryostat-recording is not yet available
deployment cryostat-recording is not yet available
deployment cryostat-recording is not yet available
deployment cryostat-recording is not yet available
deployment cryostat-recording is not yet available
deployment cryostat-recording is not yet available
deployment cryostat-recording is available
application is ready at https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing
found a target: {ConnectUrl:service:jmx:rmi:///jndi/rmi://10-217-0-141.cryostat-operator-scorecard.pod:9091/jmxrmi Alias:cryostat-recording-5d4b67d9c9-cqnfp}
created stored credential with match expression: target.alias=="cryostat-recording-5d4b67d9c9-cqnfp"
created a recording: &{DownloadURL:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2F10-217-0-141.cryostat-operator-scorecard.pod:9091%2Fjmxrmi/recordings/scorecard_test_rec ReportURL:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2F10-217-0-141.cryostat-operator-scorecard.pod:9091%2Fjmxrmi/reports/scorecard_test_rec Id:1 Name:scorecard_test_rec StartTime:1703447359940 State:RUNNING Duration:0 Continuous:true ToDisk:true MaxSize:0 MaxAge:0}
current list of recordings: [{DownloadURL:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2F10-217-0-141.cryostat-operator-scorecard.pod:9091%2Fjmxrmi/recordings/scorecard_test_rec ReportURL:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/v1/targets/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2F10-217-0-141.cryostat-operator-scorecard.pod:9091%2Fjmxrmi/reports/scorecard_test_rec Id:1 Name:scorecard_test_rec StartTime:1703447359940 State:RUNNING Duration:0 Continuous:true ToDisk:true MaxSize:0 MaxAge:0}]
archived the recording scorecard_test_rec at: cryostat-recording-5d4b67d9c9-cqnfp_scorecard_test_rec_20231224T194950Z.jfr
current list of archives: [{Name:cryostat-recording-5d4b67d9c9-cqnfp_scorecard_test_rec_20231224T194950Z.jfr DownloadUrl:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/beta/recordings/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2F10-217-0-141.cryostat-operator-scorecard.pod:9091%2Fjmxrmi/cryostat-recording-5d4b67d9c9-cqnfp_scorecard_test_rec_20231224T194950Z.jfr ReportUrl:https://cryostat-recording-cryostat-operator-scorecard.apps-crc.testing:443/api/beta/reports/service:jmx:rmi:%2F%2F%2Fjndi%2Frmi:%2F%2F10-217-0-141.cryostat-operator-scorecard.pod:9091%2Fjmxrmi/cryostat-recording-5d4b67d9c9-cqnfp_scorecard_test_rec_20231224T194950Z.jfr Metadata:{Labels:map[template.name:ALL template.type:TARGET]} Size:4406873}]
generated report for the recording scorecard_test_rec: map[Allocations.class:map[evaluation:map[explanation:Frequently allocated types are good places to start when trying to reduce garbage collections. Look at where the most common types are being allocated to see if many instances are created along the same call path. Try to reduce the number of instances created by invoking the most commonly taken paths less. suggestions:[] summary:The most allocated type is likely ''byte[]'', most commonly allocated by: org.openjdk.jmc.flightrecorder.internal.parser.v1.StructTypes$JfrMethod@b73f018f] name:Allocated Classes score:12.405895119368047 topic:heap] Allocations.thread:map[evaluation:map[explanation:Many allocations performed by the same thread might indicate a problem in a multi-threaded program. Look at the stack traces for the thread with the highest allocation rate. See if the allocation rate can be brought down, or balanced among the active threads. suggestions:[] summary:The most allocations were likely done by thread ''vert.x-worker-thread-12'' at: org.openjdk.jmc.flightrecorder.internal.parser.v1.StructTypes$JfrMethod@102c1cf5,org.openjdk.jmc.flightrecorder.internal.parser.v1.StructTypes$JfrMethod@2bba42b1,org.openjdk.jmc.flightrecorder.internal.parser.v1.StructTypes$JfrMethod@35be4ede,org.openjdk.jmc.flightrecorder.internal.parser.v1.StructTypes$JfrMethod@f5d26b03,org.openjdk.jmc.flightrecorder.internal.parser.v1.StructTypes$JfrMethod@871fff19,org.openjdk.jmc.flightrecorder.internal.parser.v1.StructTypes$JfrMethod@d9595ce6] name:Threads Allocating score:8.34649245842964 topic:java_application] ApplicationHalts:map[evaluation:map[explanation:The highest ratio of application halts to execution time was 0.319 % during 12/24/2023, 7:49:19.000 PM – 7:50:19 PM. 24.3 % of the halts were for reasons other than GC. The halts ratio for the entire recording was 0.616 %. 24.3 % of the total halts were for reasons other than GC. suggestions:[] summary:Application efficiency was not highly affected by halts.] name:Application Halts score:1.5954072750000001 topic:java_application] BufferLost:map[evaluation:map[suggestions:[] summary:No Flight Recorder buffers were lost during the recording.] name:Lost Flight Recorder Buffers score:0 topic:recording] BytecodeVerification:map[evaluation:map[suggestions:[] summary:The application ran with bytecode verification enabled.] name:Bytecode Verification score:0 topic:jvm_information] ClassLeak:map[evaluation:map[suggestions:[]] name:Class Leak score:-1 topic:classloading] ClassLoading:map[evaluation:map[suggestions:[] summary:No significant time was spent loading new classes during this recording.] name:Class Loading Pressure score:0 topic:classloading] CodeCache:map[evaluation:map[suggestions:[]] name:Code Cache score:-1 topic:code_cache] CompareCpu:map[evaluation:map[explanation:The application performance can be affected when the machine is under heavy load and there are other processes that use CPU or other resources on the same computer. To profile representatively or get higher throughput, shut down other resource intensive processes running on the machine. suggestions:[] summary:An average CPU load of 15 % was caused by other processes for during 12/24/2023, 7:49:19.000 PM – 7:49:50 PM.] name:Competing CPU Ratio Usage score:6.225391176180512 topic:processes] CompressedOops:map[evaluation:map[suggestions:[] summary:The settings for Compressed Oops were OK.] name:Compressed Oops score:0 topic:gc_configuration] ContextSwitch:map[evaluation:map[suggestions:[] summary:The program did not context switch excessively during the recording.] name:Context Switches score:1 topic:lock_instances] DMSIncident:map[evaluation:map[suggestions:[]] name:DMS Incidents score:-1 topic:DMS] DebugNonSafepoints:map[evaluation:map[suggestions:[] summary:DebugNonSafepoints was implicitly enabled in the JVM version used to create this recording.] name:DebugNonSafepoints score:0 topic:jvm_information] DiscouragedVmOptions:map[evaluation:map[suggestions:[] summary:No problems were found with the VM options.] name:Discouraged VM Options score:0 topic:jvm_information] DumpReason:map[evaluation:map[suggestions:[]] name:Exceptional Dump Reason score:-1 topic:recording] DuplicateFlags:map[evaluation:map[suggestions:[] summary:There were no duplicate JVM flags on the command line.] name:Duplicated Flags score:0 topic:jvm_information] Errors:map[evaluation:map[explanation:3 errors were thrown in total. The most common error was ''java.lang.NoSuchMethodError'', which was thrown 3 times. Investigate the thrown errors to see if they can be avoided. Errors indicate that something went wrong with the code execution and should never be used for flow control. suggestions:[] summary:The program generated an average of 3 errors per minute during 12/24/2023, 7:49:20.000 PM – 7:50:20 PM.] name:Thrown Errors score:2.5 topic:exceptions] Exceptions:map[evaluation:map[explanation:Throwing exceptions is more expensive than normal code execution, which means that they should only be used for exceptional situations. Investigate the thrown exceptions to see if any of them can be avoided with a non-exceptional control flow. suggestions:[] summary:The program generated 7.21 exceptions per second during 12/24/2023, 7:49:20.000 PM – 7:49:50 PM.] name:Thrown Exceptions score:0.036032756557046075 topic:exceptions] Fatal Errors:map[evaluation:map[suggestions:[]] name:Fatal Errors score:-1 topic:jvm_information] FewSampledThreads:map[evaluation:map[suggestions:[]] name:Parallel Threads score:-1 topic:java_application] FileRead:map[evaluation:map[suggestions:[] summary:No long file read pauses were found in this recording (the longest was 6.508 ms).] name:File Read Peak Duration score:0.08134705 topic:file_io] FileWrite:map[evaluation:map[suggestions:[] summary:No long file write pauses were found in this recording (the longest was 333.179 μs).] name:File Write Peak Duration score:0 topic:file_io] FlightRecordingSupport:map[evaluation:map[suggestions:[] summary:The JVM version used for this recording has full Flight Recorder support.] name:Flight Recording Support score:0 topic:jvm_information] FullGc:map[evaluation:map[explanation:At least one Full, Stop-The-World Garbage Collection occurred during this recording. For the CMS and G1 collectors, Full GC events are a strong negative performance indicator. Tunable GC parameters can be used to allow the collector to operate in concurrent mode, avoiding Stop-The-World pauses and increasing GC and application performance. suggestions:[] summary:Full GC detected.] name:G1/CMS Full Collection score:75 topic:garbage_collection] GarbageCollectionInfoRule:map[evaluation:map[suggestions:[]] name:Garbage Collection Info score:0 topic:garbage_collection] GcFreedRatio:map[evaluation:map[suggestions:[] summary:Only 8 heap summary events were found, this rule requires at least 10 events to be able to calculate a relevant result. This likely means that only a few garbage collections occurred during the recording. Having few garbage collections is generally a good sign.] name:GC Freed Ratio score:0 topic:heap] GcLocker:map[evaluation:map[suggestions:[] summary:No GCs were affected by the GC Locker.] name:GCs Caused by GC Locker score:0 topic:garbage_collection] GcOptions:map[evaluation:map[suggestions:[] summary:No problems were found with the GC configuration.] name:GC Setup score:0 topic:jvm_information] GcPauseRatio:map[evaluation:map[explanation:The highest ratio between garbage collection pauses and execution time was 0.242 % during 12/24/2023, 7:49:19.000 PM – 7:50:19 PM. The garbage collection pause ratio of the entire recording was 0.466 %. solution:Pause times may be reduced by increasing the heap size or by trying to reduce allocation. suggestions:[] summary:Application efficiency was not highly affected by GC pauses.] name:GC Pauses score:1.207738575 topic:garbage_collection] GcStall:map[evaluation:map[suggestions:[] summary:No indications that the garbage collector could not keep up with the workload were detected.] name:GC Stall score:0 topic:garbage_collection] HeapContent:map[evaluation:map[explanation:If the heap usage needs to be reduced, then this would be a good place to start. suggestions:[] summary:Most of the heap was used by only a few classes.] name:Heap Content score:89.91250273799291 topic:heap] HeapDump:map[evaluation:map[suggestions:[]] name:Heap Dump score:-1 topic:heap] HeapInspectionGc:map[evaluation:map[explanation:Performing heap inspection garbage collections may be a problem since they usually take a lot of time. suggestions:[] summary:The JVM performed 4 heap inspection garbage collections.] name:GCs Caused by Heap Inspection score:59.77379177936154 topic:garbage_collection] HighGc:map[evaluation:map[explanation:The time spent performing garbage collection may be reduced by increasing the heap size or by trying to reduce allocation.
To improve rule accuracy and/or get more details for further investigation, it is recommended to enable the following event types: . suggestions:[] summary:The JVM was paused for 100 % during 12/24/2023, 7:49:20.006.000 PM – .053] name:GC Pressure score:10.273532100008776 topic:heap] HighJvmCpu:map[evaluation:map[explanation:The sampling period for the 'CPU Load' events was set to Every Chunk, which is too high for CPU load related rules to work. suggestions:[] summary:This recording has a high sampling period for 'CPU Load' events.] name:High JVM CPU Load score:25 topic:java_application] IncreasingLiveSet:map[evaluation:map[explanation:Perform a dump with the 'Trace Paths to GC Roots' option enabled to enable a more detailed analysis of the potential memory leak. suggestions:[] summary:The live set on the heap seems to increase with a speed of about 11.6 KiB per second during the recording.There is no particular class that seems to be leaking more than any other.] name:Heap Live Set Trend score:0.849074074074074 topic:memoryleak] IncreasingMetaSpaceLiveSet:map[evaluation:map[suggestions:[] summary:The class data does not seem to increase during the recording.] name:Metaspace Live Set Trend score:4.102465604874872 topic:garbage_collection] JavaBlocking:map[evaluation:map[explanation:The following regular expression was used to exclude threads from this rule: ''(.*weblogic\.socket\.Muxer.*)'' suggestions:[] summary:No excessive problems with lock contention found.] name:Java Blocking score:0.07353422487380579 topic:lock_instances] JfrPeriodicEventsFix:map[evaluation:map[suggestions:[] summary:The version of Java you are running is not affected by a performance issue related to periodic events.] name:JFR Periodic Events Fix score:0 topic:jvm_information] LongGcPause:map[evaluation:map[explanation: suggestions:[] summary:The longest GC pause was 47.079 ms.] name:GC Pause Peak Duration score:1.422363001557028 topic:garbage_collection] LowOnPhysicalMemory:map[evaluation:map[suggestions:[] summary:The system did not run low on physical memory during this recording.] name:Free Physical Memory score:0 topic:heap] ManagementAgent:map[evaluation:map[solution:See the [Java Monitoring and Management Guide](https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html) for more information about how to configure the management agent. suggestions:[]] name:Discouraged Management Agent Settings score:-1 topic:jvm_information] ManyRunningProcesses:map[evaluation:map[explanation:At 12/24/23, 7:49:50.446 PM, a total of 1 other processes were running on the host machine that this Flight Recording was made on. solution:If this is a server environment, it may be good to only run other critical processes on that machine. suggestions:[] summary:1 processes were running while this Flight Recording was made.] name:Competing Processes score:0.20309488837692125 topic:processes] MetaspaceOom:map[evaluation:map[suggestions:[]] name:Metaspace Out of Memory score:-1 topic:garbage_collection] MethodProfiling:map[evaluation:map[suggestions:[]] name:Method Profiling score:-1 topic:method_profiling] Options:map[evaluation:map[suggestions:[] summary:No undocumented, deprecated or non-recommended option flags were detected.] name:Command Line Options Check score:0 topic:jvm_information] OverAggressiveRecordingSetting:map[evaluation:map[explanation:Event types without threshold can lead to quite a lot of events being generated, possibly translating to higher overhead. If this was not intended, please check the settings in the template for future recordings. suggestions:[] summary:These following event types had no threshold: 'Java Monitor Blocked', 'Java Thread Park'] name:Discouraged Recording Settings score:25 topic:recording] PasswordsInArguments:map[evaluation:map[suggestions:[] summary:The recording does not seem to contain passwords in the application arguments.] name:Passwords in Java Arguments score:0 topic:jvm_information] PasswordsInEnvironment:map[evaluation:map[explanation:The following suspicious environment variables were found in this recording: CRYOSTAT_JDBC_PASSWORD, CRYOSTAT_JMX_CREDENTIALS_DB_PASSWORD. The following regular expression was used to exclude strings from this rule: ''(passworld|passwise)''. solution:If you wish to keep having passwords in your environment variables, but want to be able to share recordings without also sharing the passwords, please disable the ''Initial Environment Variable'' event. suggestions:[] summary:The environment variables in the recording may contain passwords.] name:Passwords in Environment Variables score:75 topic:environment_variables] PasswordsInSystemProperties:map[evaluation:map[explanation:The following suspicious system properties were found in this recording: javax.net.ssl.keyStorePassword,javax.net.ssl.trustStorePassword,com.sun.management.jmxremote.password.file. The following regular expression was used to exclude strings from this rule: ''(passworld|passwise)''. solution:If you wish to keep having passwords in your system properties, but want to be able to share recordings without also sharing the passwords, please disable the ''Initial System Property'' event. suggestions:[] summary:The system properties in the recording may contain passwords.] name:Passwords in System Properties score:75 topic:system_properties] PrimitiveToObjectConversion:map[evaluation:map[explanation:
The most common object type that primitives are converted into is ''java.lang.Long'', which causes 42.3 KiB to be allocated. The most common call site is ''void sun.rmi.server.UnicastServerRef.dispatch(java.rmi.Remote, java.rmi.server.RemoteCall):323''.
Conversion from primitives to the corresponding object types can either be done explicitly, or be caused by autoboxing. If a considerable amount of the total allocation is caused by such conversions, consider changing the application source code to avoid this behavior. Look at the allocation stack traces to see which parts of the code to change. This rule finds the calls to the valueOf method for any of the eight object types that have primitive counterparts. suggestions:[] summary:0.0733 % of the total allocation (56.4 MiB) is caused by conversion from primitive types to object types. The most common object type that primitives are converted into is ''java.lang.Long''.] name:Primitive To Object Conversion score:0.09158578477668697 topic:heap] ProcessStarted:map[evaluation:map[suggestions:[]] name:Process Started score:-1 topic:processes] SocketRead:map[evaluation:map[explanation:The longest recorded socket read took 26.735 s to read 5 B from the host at 10.217.4.1. Average time of recorded IO: 32.741 ms. Total time of recorded IO: 1 min 29 s. Total time of recorded IO for the host 10.217.4.1: 34.611 s. Note that there are some socket read patterns with high duration reads that we consider to be normal and are therefore excluded. Such patterns include JMX RMI communication and MQ series. suggestions:[] summary:There are long socket read pauses in this recording (the longest is 26.735 s).] name:Socket Read Peak Duration score:75 topic:socket_io] SocketWrite:map[evaluation:map[explanation:Note that there are some socket write patterns with high duration writes that we consider to be normal and are therefore excluded. Such patterns include JMX RMI communication. suggestions:[] summary:No long socket write pauses were found in this recording (the longest was 5.328 ms).] name:Socket Write Peak Duration score:0.4843197272727273 topic:socket_io] StackdepthSetting:map[evaluation:map[explanation:The Flight Recorder is configured with a maximum captured stack depth of 64. 1.01 % of all traces were larger than this option, and were therefore truncated. If more detailed traces are required, increase the ''-XX:FlightRecorderOptions=stackdepth=<value>'' value.
Events of the following types have truncated stack traces: org.openjdk.jmc.flightrecorder.rules.jdk.general.StackDepthSettingRule$StackDepthTruncationData@37122a58,org.openjdk.jmc.flightrecorder.rules.jdk.general.StackDepthSettingRule$StackDepthTruncationData@1bb29df1,org.openjdk.jmc.flightrecorder.rules.jdk.general.StackDepthSettingRule$StackDepthTruncationData@44a7f5dd,org.openjdk.jmc.flightrecorder.rules.jdk.general.StackDepthSettingRule$StackDepthTruncationData@69e48b16,org.openjdk.jmc.flightrecorder.rules.jdk.general.StackDepthSettingRule$StackDepthTruncationData@4f823c98,org.openjdk.jmc.flightrecorder.rules.jdk.general.StackDepthSettingRule$StackDepthTruncationData@40d06d8c,org.openjdk.jmc.flightrecorder.rules.jdk.general.StackDepthSettingRule$StackDepthTruncationData@3b226156,org.openjdk.jmc.flightrecorder.rules.jdk.general.StackDepthSettingRule$StackDepthTruncationData@7653e11f suggestions:[] summary:Some stack traces were truncated in this recording.] name:Stackdepth Setting score:25 topic:jvm_information] StringDeduplication:map[evaluation:map[explanation:String deduplication is enabled using the JVM flag '-XX:+UseStringDeduplication'. This flag can be used together with the G1 garbage collector in JDK 8u20 or later, or with the Shenandoah garbage collector.
To validate if this gives a performance improvement for your application, create flight recordings both with and without string deduplication. For the run with string deduplication enabled, also enable statistics with '-XX:+PrintStringDeduplicationStatistics' for JDK 8 or '-Xlog:stringdedup*=debug' for JDK 9. Check if the heap live set decrease in the recording with string deduplication enabled is larger than the size of the string deduplication metadata table. The size of the metadata table is printed in the statistics output as 'Table/Memory Usage: XX MB'
You can read more about string deduplication in the java options documentation or in [JEP 192](https://openjdk.java.net/jeps/192). suggestions:[] summary:Approximately 1,544 % of the live set consists of the internal array type of strings (''byte[]'' for this JDK version).
The heap is around 1.26 % full. There is likely no big benefit from enabling string deduplication.] name:String Deduplication score:8.844070278634204 topic:heap] SystemGc:map[evaluation:map[suggestions:[] summary:No garbage collections were caused by System.gc().] name:GCs Caused by System.gc() score:0 topic:garbage_collection] TlabAllocationRatio:map[evaluation:map[solution:Allocating objects outside of Thread Local Allocation Buffers (TLABs) is more expensive than allocating inside TLABs. This may be acceptable if the individual allocations are intended to be larger than a reasonable TLAB. It may be possible to avoid this by decreasing the size of the individual allocations. There are some TLAB related JVM flags that you can experiment with, but it is usually better to let the JVM manage TLAB sizes automatically. suggestions:[] summary:The program allocated 11.8 % of the memory outside of TLABs.] name:TLAB Allocation Ratio score:15.934741099319458 topic:tlab] VMOperations:map[evaluation:map[suggestions:[] summary:No excessively long VM operations were found in this recording (the longest was 47.116 ms).] name:VMOperation Peak Duration score:1.17788785 topic:vm_operations] biasedLockingRevocation:map[evaluation:map[suggestions:[]] name:Biased Locking Revocation score:-1 topic:biased_locking] biasedLockingRevocationPause:map[evaluation:map[suggestions:[] summary:No revocation of biased locks found.] name:Biased Locking Revocation Pauses score:0 topic:vm_operations]]
stopped the recording: scorecard_test_rec
deleted the recording: scorecard_test_rec
current list of recordings: []
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @tthvo, really great work! Thank you!
I'll give this a more thorough look shortly. We'll probably need to add some kind of mitigation or identify the cause of the intermittent 500 errors (any ideas here @andrewazores?). This could cause our product builds to fail, for example.
I haven't seen that particular 500 [0] before, but it's saying that the JMX connection was unexpectedly closed. We have seen other similar ones in the past, ex.:
This could be a bug in how we treat the JMX connection and it gets expired and evicted from the connection cache prematurely, or else it could be some other network hiccup that causes the connection to drop, or perhaps there is something else that can cause the target JVM to drop JMX connections. It could even be a bug somewhere in the JMC library that we use in [0]:
|
I was able to reproduce this in an OpenShift cluster. Here are the logs from the Cryostat pod when it occurred. Looks kind of like the cache eviction you described, but I'm not certain.
|
@andrewazores and I tried to get to the bottom of this. So far we've discovered that the issue doesn't seem to occur when using a custom target of |
Ah that's interesting! I suppose another scorecard test with the same recording workflow but acting on a custom target or a target of a different realm (i.e. jdp, custom, k8s) would be useful in catching these issues? |
A custom target test would be good. Although testing the built-in discovery is important too. That said, if we don't identify a cause for the problem we're seeing then we may need to settle for using only a custom target until we do find the cause. |
77a1c74
to
bdd01c2
Compare
Signed-off-by: Thuan Vo <thuan.votann@gmail.com>
fe00c92
to
b7cca98
Compare
Hey @ebaron @andrewazores, the tests now runs more reliably for me also. @Ming also added the check for EOF error and header logs. Is there anything else meanwhile? I will see if I can help figure the bugs above...hopefully :D |
I haven't had a chance to exercise the new changes - has anyone done it and seen that the EOFs have occurred, been handled with a retry, and then succeeded? ie verified that this specific "fix" actually works? |
Took me a while to hit the EOF error. The error seems to caught but looks like the request body has been closed so retry is failing. I will have a closer look...
|
With the latest commit, the test should work properly now. Full log for recording test: https://gist.github.com/tthvo/42ad8f29e2aa2c5731ce436f513601c8
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EOF handling seems to be working nicely! I did 30 runs and got the following:
- Hit the EOF once, which retried and was successful.
- 1 timeout, which affected other tests in the suite. This could be flaky infrastructure, so I'm not going to worry about it.
- 3 failures due to 503 errors when first trying to create target. I don't see anything wrong with the ready detection logic, so perhaps this is the container crashing and we're not seeing it. I'll file a new issue to increase the logging (especially on failure).
Excellent work and thank you for all your patience!
^^ I am so glad I could help with this :D a bit caught off guard that 503 still occurs tho... I will try to see if I can reproduce it... |
@Mergifyio backport cryostat3 |
✅ Backports have been created
|
* test(scorecard): scorecard tests for recording management Signed-off-by: Thuan Vo <thuan.votann@gmail.com> * fixup(scorecard): fix cr cleanup func * test(scorecard): registry recording test to suite * chore(scorecard): reorganize client def * chore(scorecard): clean up common setup func * chore(bundle): regenerate bundle with scorecard tag * chore(bundle): correct image tag in bundle * fix(bundle): add missing scorecard test config patch * feat(scorecard): scaffold cryostat API client * chore(scorecard): clean up API client * test(scorecard): implement recording scorecard test * fixup(scorecard): correctly add scorecard test via hack templates * fix(client): ignore unverified tls certs and base64 oauth token * chore(bundle): split cryostat tests to separate stage * fix(scorecard): extend default transport instead of overwriting * chore(scorecard): refactor client to support multi-part * fixup(client): fix request verb * fix(client): fix recording create form format * fix(scorecard): create stored credentials for target JVM * fix(scorecard): fix 502 status error * chore(scorecard): simplify client def * chore(scorecard): fetch recordings to ensure action is correctly performed * test(scorecard): test generating report for a recording * chore(scorecard): clean up * test(scorecard): list archives in tests * ci(scorecard): reconfigure ingress for kind * ci(k8s): correct cluster name * test(scorecard): use role instead of clusterrole for oauth rules * test(scorecard): parse health response for additional checks * chore(scorecard): add missing newline in logs * chore(scorecard): check status code before parsing body in health check * test(scorecard): add custom target discovery to recording scorecard test * add EOF wait and resp headers * add resp headers * chore(client): configure all clients to send safe requests * fix(clients): add missing content-type header * fix(scorecard): add missing test name in help message * chore(client): create new http requests when retrying * chore(bundle): update scorecard image tags --------- Signed-off-by: Thuan Vo <thuan.votann@gmail.com> Co-authored-by: Ming Yu Wang <90855268+mwangggg@users.noreply.github.com> Co-authored-by: Ming Wang <miwan@redhat.com> (cherry picked from commit cfcbfc7) # Conflicts: # bundle/manifests/cryostat-operator.clusterserviceversion.yaml
) (#752) * test(scorecard): scorecard tests for recording management (#698) * test(scorecard): scorecard tests for recording management Signed-off-by: Thuan Vo <thuan.votann@gmail.com> * fixup(scorecard): fix cr cleanup func * test(scorecard): registry recording test to suite * chore(scorecard): reorganize client def * chore(scorecard): clean up common setup func * chore(bundle): regenerate bundle with scorecard tag * chore(bundle): correct image tag in bundle * fix(bundle): add missing scorecard test config patch * feat(scorecard): scaffold cryostat API client * chore(scorecard): clean up API client * test(scorecard): implement recording scorecard test * fixup(scorecard): correctly add scorecard test via hack templates * fix(client): ignore unverified tls certs and base64 oauth token * chore(bundle): split cryostat tests to separate stage * fix(scorecard): extend default transport instead of overwriting * chore(scorecard): refactor client to support multi-part * fixup(client): fix request verb * fix(client): fix recording create form format * fix(scorecard): create stored credentials for target JVM * fix(scorecard): fix 502 status error * chore(scorecard): simplify client def * chore(scorecard): fetch recordings to ensure action is correctly performed * test(scorecard): test generating report for a recording * chore(scorecard): clean up * test(scorecard): list archives in tests * ci(scorecard): reconfigure ingress for kind * ci(k8s): correct cluster name * test(scorecard): use role instead of clusterrole for oauth rules * test(scorecard): parse health response for additional checks * chore(scorecard): add missing newline in logs * chore(scorecard): check status code before parsing body in health check * test(scorecard): add custom target discovery to recording scorecard test * add EOF wait and resp headers * add resp headers * chore(client): configure all clients to send safe requests * fix(clients): add missing content-type header * fix(scorecard): add missing test name in help message * chore(client): create new http requests when retrying * chore(bundle): update scorecard image tags --------- Signed-off-by: Thuan Vo <thuan.votann@gmail.com> Co-authored-by: Ming Yu Wang <90855268+mwangggg@users.noreply.github.com> Co-authored-by: Ming Wang <miwan@redhat.com> (cherry picked from commit cfcbfc7) # Conflicts: # bundle/manifests/cryostat-operator.clusterserviceversion.yaml * Fix conflicts --------- Co-authored-by: Thuan Vo <thuan.votann@gmail.com> Co-authored-by: Elliott Baron <ebaron@redhat.com>
* feat(discovery): options to configure discovery port names and numbers (backport #715) (#725) * feat(discovery): options to configure discovery port names and numbers (#715) Signed-off-by: Thuan Vo <thuan.votann@gmail.com> (cherry picked from commit a552021) * resolve conflict --------- Co-authored-by: Thuan Vo <thuan.votann@gmail.com> Co-authored-by: Andrew Azores <aazores@redhat.com> * Deploy cryostat 3.0 * Remove extraneous file * test adjustments * feat(discovery): options to configure discovery port names and numbers (#715) Signed-off-by: Thuan Vo <thuan.votann@gmail.com> * Fix typo in environment variable breaking reconciler test, fix missing SecurityContext * Fix conflict with cluster cryostat removal * ci(gh): add comment when /build_test is finished (#745) * add scorecard test/suite selection (#746) * test(scorecard): scorecard tests for recording management (#698) * test(scorecard): scorecard tests for recording management Signed-off-by: Thuan Vo <thuan.votann@gmail.com> * fixup(scorecard): fix cr cleanup func * test(scorecard): registry recording test to suite * chore(scorecard): reorganize client def * chore(scorecard): clean up common setup func * chore(bundle): regenerate bundle with scorecard tag * chore(bundle): correct image tag in bundle * fix(bundle): add missing scorecard test config patch * feat(scorecard): scaffold cryostat API client * chore(scorecard): clean up API client * test(scorecard): implement recording scorecard test * fixup(scorecard): correctly add scorecard test via hack templates * fix(client): ignore unverified tls certs and base64 oauth token * chore(bundle): split cryostat tests to separate stage * fix(scorecard): extend default transport instead of overwriting * chore(scorecard): refactor client to support multi-part * fixup(client): fix request verb * fix(client): fix recording create form format * fix(scorecard): create stored credentials for target JVM * fix(scorecard): fix 502 status error * chore(scorecard): simplify client def * chore(scorecard): fetch recordings to ensure action is correctly performed * test(scorecard): test generating report for a recording * chore(scorecard): clean up * test(scorecard): list archives in tests * ci(scorecard): reconfigure ingress for kind * ci(k8s): correct cluster name * test(scorecard): use role instead of clusterrole for oauth rules * test(scorecard): parse health response for additional checks * chore(scorecard): add missing newline in logs * chore(scorecard): check status code before parsing body in health check * test(scorecard): add custom target discovery to recording scorecard test * add EOF wait and resp headers * add resp headers * chore(client): configure all clients to send safe requests * fix(clients): add missing content-type header * fix(scorecard): add missing test name in help message * chore(client): create new http requests when retrying * chore(bundle): update scorecard image tags --------- Signed-off-by: Thuan Vo <thuan.votann@gmail.com> Co-authored-by: Ming Yu Wang <90855268+mwangggg@users.noreply.github.com> Co-authored-by: Ming Wang <miwan@redhat.com> * test(scorecard): scorecard test for Cryostat CR configuration changes (#739) * CR config scorecard * reformat * reviews * add kubectl license * test(scorecard): scorecard test for report generator (#753) * deploy reports sidecar * report scorecard test * update * rebase fix * query health * fix(build-ci): fix scorecard image tag returned as null (#760) Signed-off-by: Thuan Vo <thuan.votann@gmail.com> Co-authored-by: Elliott Baron <ebaron@redhat.com> * test(scorecard): add container logs to scorecard results (#758) * test(scorecard): add container logs to scorecard results * build(bundle): regenerate bundle with new scorecard tags * chore(scorecard): refactor to remove duplicate codes * add permission to publish comment when ci fails (#769) Co-authored-by: Elliott Baron <ebaron@redhat.com> * Update NewCoreContainer and associated tests * build(go): update Golang to 1.21 (#777) * test(scorecard): logWorkloadEvent for cryostat-recording errors (#759) * logWorkLoadEvent for cryostat-recording errors * reviews * tr.LogChannel --------- Co-authored-by: Elliott Baron <ebaron@redhat.com> * test(scorecard): fix rebasing skipped commit (#780) * Merge pull request #8 from ebaron/scorecard-methods test(scorecard): use methods for more easily passing data * update bundle image * Review fixes * generate storage key, create expected Secret * fixup! generate storage key, create expected Secret * database secret handling corrections * combine database connection password and encryption key into one secret * correct storage secret key/access key * update datasource port number to not conflict with storage * precreate eventtemplates bucket * remove storage volume parameter overrides * use HTTP for Cryostat probe even when TLS is enabled - TLS will be done via auth proxy later * correct environment variable names for proxy awareness * Fix remaining merge conflict * Fix makefile * config cleanup and test fixup --------- Signed-off-by: Thuan Vo <thuan.votann@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Thuan Vo <thuan.votann@gmail.com> Co-authored-by: Andrew Azores <aazores@redhat.com> Co-authored-by: Ming Yu Wang <90855268+mwangggg@users.noreply.github.com> Co-authored-by: Ming Wang <miwan@redhat.com> Co-authored-by: Elliott Baron <ebaron@redhat.com>
Welcome to Cryostat! 👋
Before contributing, make sure you have:
main
branch[chore, ci, docs, feat, fix, test]
git commit -S -m "YOUR_COMMIT_MESSAGE"
Fixes: #504
Description of the change:
ALL
.ConfigMap
that represents the controller configurations to lower the number of worker processes. Seems like1
works well./etc/hosts
:${kind-ip-address} testing.cryostat
. This allowstesting.cryostat
to resolve to the kind container internal IP within its bridge network. Quite a neat workaround to avoid sending to ingress controller service directly by patching scorecard config file.Notes
API request can sometimes fail with status code 500: The client has been closed. The most common one is when generating report:
failed to generate report for the recording: API request failed with status code 500: org.openjdk.jmc.rjmx.services.jfr.FlightRecorderException: Could not clone the scorecard_test_rec recording caused by IOException: The client has been closed.
Motivation for the change:
See #504
How to manually test:
If using OpenShift (e.g. CRC), no further steps are needed.
If using Kubernetes (e.g. minikube, kind), obtain the ip address of the cluster and add an entry to
/etc/hosts
:${ip-address} testing.cryostat
minikube ip
docker inspect <kind-container-name> -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}'
Then, build the images, push to registry and run the tests.
Sample run
Below is a successful run. Though, retries might be needed as failures can occur, described in notes section above.
https://github.com/tthvo/cryostat-operator/actions/runs/7316493026/job/19931285551