Quantcast
Viewing all articles
Browse latest Browse all 4446

System goes unresponsive for minutes, I suspect journalctl

We have an server application running on Ubuntu 22.04. The application logs are pretty spammy.

journalctl -S -1hour | wc --lines121349journalctl -S -1hour | wc --bytes32382836

Once in a while a server will become completely unresponsive for several minutes. The CPU will go to 100% and our applications are so unresponsive that other agents stop reporting any metrics.

When the incident is over we notice that our application didn't crash does log a bunch of timeout errors because it wasn't able to do anything for the last few minutes. However, it does just continue working after that.

I found this in /var/log/syslog

Feb 28 17:44:29 ip-10-11-0-205 kernel: [81118.252411] systemd[1]: Started ntp-systemd-netif.service.Feb 28 17:44:53 ip-10-11-0-205 kernel: [81142.367869] systemd[1]: ntp-systemd-netif.service: Deactivated successfully.Feb 28 17:45:13 ip-10-11-0-205 kernel: [81162.387816] systemd[1]: systemd-journald.service: State 'stop-watchdog' timed out. Killing.Feb 28 17:45:14 ip-10-11-0-205 kernel: [81162.731840] systemd[1]: systemd-journald.service: Killing process 117 (systemd-journal) with signal SIGKILL.Feb 28 17:45:17 ip-10-11-0-205 kernel: [81165.657264] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=9/KILLFeb 28 17:45:17 ip-10-11-0-205 kernel: [81165.696972] systemd[1]: systemd-journald.service: Failed with result 'watchdog'.Feb 28 17:45:18 ip-10-11-0-205 kernel: [81166.651435] systemd[1]: systemd-journald.service: Consumed 2min 5.531s CPU time.Feb 28 17:45:25 ip-10-11-0-205 kernel: [81174.079307] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1.Feb 28 17:45:25 ip-10-11-0-205 kernel: [81174.085466] systemd[1]: Stopped Journal Service.Feb 28 17:45:25 ip-10-11-0-205 kernel: [81174.200482] systemd[1]: systemd-journald.service: Consumed 2min 5.531s CPU time.Feb 28 17:45:29 ip-10-11-0-205 kernel: [81177.822409] systemd[1]: Starting Journal Service...Feb 28 17:45:35 ip-10-11-0-205 kernel: [81183.821550] systemd-journald[240259]: File /var/log/journal/ec212477ed3f3049adade2e820950984/system.journal corrupted or uncleanly shut down, renaming and replacing.Feb 28 17:45:38 ip-10-11-0-205 kernel: [81187.226010] systemd[1]: Started Journal Service.

So it sounds like systemd-journal is causing the issue.

Questions:

  1. Is this just taking a long time to process my logs?
  2. Why does it need to hog the whole host?
  3. If yes, are there some settings I can change so this doesn't go crazy? (e.g. trim logs more frequently, etc)

Viewing all articles
Browse latest Browse all 4446

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>