Hello Ubuntu community,
I’m experiencing random shutdowns and reboots on my Lenovo ThinkPad P14s Gen 4 running Ubuntu 24.04 LTS, and I’m looking for help in diagnosing the issue. These reboots happen suddenly, without any freezing or warning, and I’ve been unable to reproduce the issue on demand.
System Specifications:Model: Lenovo ThinkPad P14s Gen 4Processor: Intel i7-1370P vPro (13th Generation)RAM: 32 GBStorage: 1 TB SSDGraphics: NVIDIA RTX A500 4GB GDDR6, currently with nouveau driver, but also teste with proprietary nvidia 550Ubuntu Version: 24.04 LTSKernel Version: 6.8.0-45-generic
Problem:The laptop randomly shuts down and reboots without freezing. This can happen multiple times a day or less frequently, without a clear pattern. It occurs during both light tasks but also heavier usage, and I cannot replicate the problem consistently.There is almost no load on the system, since I mostly use it to go remote via Remmina.Fan is mostly silent, except when starting. Sometimes, I can work for days without a crash, sometimes, it crashes multiple times per day...
The error logs show quite a lot of errors. Especially the HANDLING IBECC MEMORY ERROR
was telling me maybe a memory error. But I did memtest etc.A few logs are atteched at the end, but please let me know, if you need any further information.
Steps I’ve Already Tried:Mem-Tests:I did memtest86+ over night with 10 loops. No errors in the memory
Stress Testing:Ran CPU (stress --cpu 20 --timeout 300) and GPU stress tests (FurMark_2.3.0.0_linux64), but the system remained stable during testing and didn’t reproduce the issue.
Drivers and Display:Initially thought the issue might be related to MS Teams, but crashes still occur when Teams is not in use, althouht it feels like it happens most when i MS Teams calls. It does not change if I use it in a browser (with and without hardaware acceleration) or via teams-for-linux / teams-for-linux --disableGpu. But it also happens when teams is not running / not in a call. When I only user the browser for example with very low load.
I now suspect the external screen connected via HDMI might be involved, though I haven’t confirmed it.
I have the problems with X11 and Wayland, using both the nouveau open source driver and different versions of the NVIDIA proprietary drivers (nvidia-driver-550, proprietary, tested), but the shutdowns happen regardless of the combination.
System Updates:All packages and kernels are up-to-date. All Firwmare is up to date
Log Review:Checked /var/log/syslog and /var/log/kern.log, but nothing conclusive was found before the shutdowns.
Certification:According to this post (https://ubuntu.com/certified/202306-31718), the Lenovo ThinkPad P14s Gen 4 is even Ubuntu-certified for 22.04 LTS. While I’m running 24.04 LTS, I assume the certification should apply without significant issues.Looking for Help:
I’m looking for insights or troubleshooting suggestions, specifically:Any potential causes for random shutdowns and reboots, particularly related to external displays or graphics.
Recommendations on further diagnostics or log files that could help identify the problem.Any kernel parameters or configuration changes that could stabilize the system.Thanks in advance for any help!
Best regards,Ketos
x@dexdev:~$ sudo dmesg | grep -i error[sudo] password for x: [ 1.533738] RAS: Correctable Errors collector initialized.[ 5.862150] EDAC igen6 MC1: HANDLING IBECC MEMORY ERROR[ 5.862153] EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR
x@dexdev:~$ grep -i "error\|fail" /var/log/kern.log2024-07-24T06:39:35.891513+02:00 dexdev kernel: pci 0000:03:00.0: ROM [mem size 0x00080000 pref]: failed to assign2024-07-24T06:39:35.891712+02:00 dexdev kernel: RAS: Correctable Errors collector initialized.2024-07-24T06:39:35.891973+02:00 dexdev kernel: EDAC igen6 MC1: HANDLING IBECC MEMORY ERROR2024-07-24T06:39:35.891975+02:00 dexdev kernel: EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR2024-07-24T06:39:35.892036+02:00 dexdev kernel: thermal thermal_zone8: failed to read out thermal zone (-61)2024-07-24T06:39:35.892038+02:00 dexdev kernel: Bluetooth: hci0: Failed to send firmware data (-71)2024-07-24T06:39:35.892038+02:00 dexdev kernel: Bluetooth: hci0: sending frame failed (-19)2024-07-24T06:39:35.892038+02:00 dexdev kernel: Bluetooth: hci0: FW download error recovery failed (-19)2024-07-24T06:39:35.892039+02:00 dexdev kernel: Bluetooth: hci0: sending frame failed (-19)2024-07-24T06:39:35.892039+02:00 dexdev kernel: Bluetooth: hci0: Reading supported features failed (-19)2024-07-24T06:39:35.892040+02:00 dexdev kernel: Bluetooth: hci0: Error reading debug features2024-07-24T06:39:35.892040+02:00 dexdev kernel: Bluetooth: hci0: sending frame failed (-19)2024-07-24T06:39:35.892040+02:00 dexdev kernel: Bluetooth: hci0: Failed to read MSFT supported features (-19)2024-07-24T07:34:04.143778+02:00 dexdev kernel: pci 0000:03:00.0: ROM [mem size 0x00080000 pref]: failed to assign2024-07-24T07:34:04.144218+02:00 dexdev kernel: RAS: Correctable Errors collector initialized.2024-07-24T07:34:04.144566+02:00 dexdev kernel: EDAC igen6 MC1: HANDLING IBECC MEMORY ERROR2024-07-24T07:34:04.144570+02:00 dexdev kernel: EDAC igen6 MC0: HANDLING IBECC MEMORY ERROR2024-07-24T07:34:04.144706+02:00 dexdev kernel: thermal thermal_zone8: failed to read out thermal zone (-61)