I have a relative new system, it worked for about 1 month without issues but starting about 1 week ago it starts to freeze completely. The freeze is so bad I can't do anything, any SysRq and the only thing is to cycle using power button.
Characteristics:
- MB: ASUS TUF Gaming Z790-Plus WiFi
- CPU: Intel Core i9-14900F Desktop Processor 24 cores
- RAM: CORSAIR VENGEANCE DDR5 RAM 96GB (2x48GB) 5600MHz CL40 Intel XMP iCUE
- GPU: GIGABYTE GV-N1030D4-2GL GeForce GT 1030 Low Profile D4 2G
- Cooling: NZXT Kraken 240 - RL-KN-240-B1 - 240mm AIO CPU Liquid Cooler
- HDD: WD_BLACK 2TB SN850X NVMe Internal Gaming SSD Solid State Drive with Heatsink
My assumption is that I'm facing a physical component issue, but not sure which one. I did look into other threads likeUbuntu 22.04.2 LTS freezes randomly and permanently and How do I diagnose my issue, when I'm not sure if it is a hardware or software issue?
but I was unable to either follow the steps (most are Ubuntu Desktop and may have differences as some components I don't have) or they didn't lead me anywhere.
The system seems to act OK when going to Grub -> Root (with network on) as it doesn't crash at all and stays a lot without issues. So one culprit could be the Video card, not sure yet. Also if I keep the system off for like 20 minutes, it tends to stay up for another hour or so, but if it freezes and I try to reboot, many times it freezes instantly. It also froze with Live Ubuntu disk running "normal resolution". It did not freeze (at least not very quickly) with the "Safe graphics" option of "Try Ubuntu"
A few checks already:
- Motherboard - no visible issues (popped capacitors or damage)
- Motherboard - I updated the BIOS (ASUS using EZ Flash) - issues still persists
- CPU - I did the "s-tui" test successfully
- CPU - sensors command gives around 35C for all cores
- CPU - temperatures on CPU are around 30C, however Asus shows 54 for CPU Package and 40 for Core. The cooling unit has a digital LCD that displays liquid temperature, that shows around 30C all the time
- Cleanup - apt update, upgrade, clean, autoremove
- RAM - I did create a MemTest86 bootable and did the "Test Memory" which completed with no errors found on all 96 GB RAM
- RAM - I did run successfully "memtester 6G 5" though
- HDD - nvme smart-log /dev/nvme0n1 - shows no visible signs
- HDD - Asus has a smart test in BIOS - ran succesfully
- Video - I tried another video card, much older, still freezes.
- LOG - /var/crash - empty
- LOG - /var/log/syslog - not sure what to look for there...
- LOG - /var/log/dmesg - no idea what to look for
One thing I noticed is that while in Bios and while doing root/command prompt actions it never freezes. But once I start Ubuntu in X mode it randomly freezes without any warning...
Any other ideas?