-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: High (Prometheus) CPU usage for new unified metrics server endpoint [v1.12.0] #1642
Comments
Yeah, bro! The same problem. It drives me crazy. All of my Outline servers went to 100% of CPU utilization. I already dropped a ticket to Outline support. Luckily I've had a backup of Outline Manager on my flash drive and it works well. Waiting for the solution. |
My brother version 1.14.0 |
Thanks for the report. We're looking into it. If you can share more details of what you're experiencing, please feel free to share them in here. |
Outline Manager was automatically updated today I thought it was a server problem, but it wasn't. I rebooted the server and saw that it came up. After opening the Outline Manager, I saw that the CPU was at 100%. Please check and fix the problem. thank you |
Thanks @mohammad051. We introduced some new metrics in the Manager UI, the calculation of which I assume is the cause of this high CPU. For my understanding, how many access keys do your servers roughly have? |
Brother, each of my servers has between 30 and 60 active keys. |
I have the same issues on Amazon servers. The servers have 40-50 active access keys. |
I have a backup of the previous version of Outline Manager for Windows 1.15.2 |
An old Manager is a workaround, but we completed a rollback of the server back to v1.11.0. Your servers should pick up this change within the hour, when watchtower looks for a new image to pull. The continued CPU usage is surprising, that implies something is still doing work despite the Manager not asking anything. Can I ask whether you are also experiencing memory issues? |
After opening the Outline Manager menu, it goes up quickly and we don't even have a chance to log in and we don't know that the memory is involved. I went to the old version of Outline Manager but it immediately updates to the new version and the problems start. How can I disable automatic updates to fix the problem? please help me |
It's not a Manager issue; it's a server issue, which we rolled back earlier. Are you saying this is still happening for servers running the rolled back v1.11.0 version? |
thank you brother thank you very much |
Thank you for confirming @mohammad051 and I'm glad to hear that resolved the immediate outage. |
Thank you very much for your help. You helped me a lot. Thank you for your quick answers. Thank you for solving this problem in the shortest possible time. |
We have spent some more time on this and confirmed that Prometheus can cause increased CPU issues. d262f52 is mitigating the issue, though we still need to examine a full root cause. If anyone that ran into this issue is able to do a test run with the new release candidate containing the hotfix, that would help give us more confidence before releasing to a wider audience. New release candidate image:
|
I would like to say a big thank you. |
Dear @sbruens
Today the problem gets back. I've attached a screenshot from Amazon AWS to illustrate it. I have 3 Outline servers. The problem occurs with 2 of them based on Ubuntu 20.04 |
Thanks @BossyBigBoss for letting us know. Very helpful to know it may be an Ubuntu 20.04 issue. Is there anything else that differs between the servers beyond Ubuntu version? Are you able to check what Docker image each of them are running? |
Hello, good time I can provide you with a server from Amazon. to do all the tests on it. thank you |
Can you please tell me how to check that? On Ubuntu 24.04 the same result. |
Thanks @BossyBigBoss, ChatGPT was right: that was indeed the answer I was looking for. It confirms they are all on the same (latest) image. Are the specs of your Lightsail instances the same for all 3 servers? By that I mean CPU, RAM, etc. You should be able to find this information in the overview:
Thanks for the offer, but there's no need. But I am having no luck reproducing the issue. I spun up some Lightsail instances yesterday with what I think is the smallest blueprint: 512 MB RAM, 2 vCPUs, 20 GB SSD. I used Ubuntu 20.04, but it's not spiking in CPU the way you're experiencing. Not even when I try and hit the endpoint directly rapidly and consistently. Can I confirm that @BossyBigBoss and @mohammad051 you are both only seeing this on AWS Ubuntu 20.04 machines, is that correct? Some other debugging suggestions:
I know it's not necessarily a fix, but are you able to upgrade the instances to Ubuntu 24.04, as recommended by Amazon? Ubuntu 20.04 is coming up to end of support in May. |
@sbruens I am using Amazon servers with the next specs: 1 GB RAM, 2 vCPUs, 40 GB SSD htop command can't be used as the server is inaccessible because of CPU overload. |
@sbruens In the new version, Metrics analysis is performed within the server, which leads to high CPU load at this time. I think it would be better if the data were collected from the server by Outline Manager, and the analysis was performed on the system where Outline Manager is running, rather than on the server itself. Currently, this bug still exists and causes the main server to crash. Even after the CPU returns to normal, the server's performance remains very poor. Please make the server and Outline Manager updates optional so that when the system is in a stable state, it remains in that condition. The same issue has occurred with Outline clients as well. The previous versions must be completely removed and reinstalled. With the update, users experience frequent disconnections. |
Guys, how to save users' access keys and import them to a new installation? |
We're still struggling to reproduce this issue, especially with the added cache layer #1643. #1646 is also in-flight to further reduce load, but if anyone is able to help debug this, it could help us understand why it's happening on some servers but not others. If additional reporting users can please add details about the servers where they are seeing this, it can help us understand what server setups are affected and if there is a specific pattern. Any logs or additional investigations on your server will also be valuable. @BossyBigBoss you can use the management API to export and import keys. Someone also documented an alternative way to do it in Jigsaw-Code/outline-apps#1905 which may be easier to do. |
Find this on the source server. /opt/outline Transfer the entire folder to the destination server. scp -r /opt/outline root@x.x.x.x:/opt/outline Restart the destination server. |
I have a suggestion. Since this high CPU usage is for metric calculations, I think we should write a script that deletes previous data except for the consumed volume and updates the shape of Prometheus data. This way, the issue will be resolved. The components of this script can be flexible, but in my opinion, writing it in Python would be better since it can run on all operating systems. I suspect that the new version of Prometheus is causing this issue, and updating the Outline server also updates Prometheus, which disrupts the data. |
Thank you very much, I will try. |
Application
Outline Manager
Describe the bug
hello
Today, Outline Manager was automatically updated to version 1.17.0.
All server resources like ram - cpu
It became 100%
When I reboot the server, the server is fixed, but when I use the outline manager, I want to open the management key, it gets full again and the server crashes.
How can I disable the automatic update of Outline Manager and use the previous version to solve the problem of the new version?
Steps to reproduce
1.Open the Outline Manager
What did you expect to happen?
No response
What actually happened?
No response
Outline Version
1.17.0
What operation system are you using?
Windows
Operating System Version
No response
Screenshots and Videos
No response
The text was updated successfully, but these errors were encountered: