Quantcast
Channel: Active questions tagged 22.04 - Ask Ubuntu
Viewing all articles
Browse latest Browse all 4552

Second RTX 3090 GPU of a dual setup disappeared(turn into aspeed intergrated graphics) after reinstall ubuntu 22.04 and reboot

$
0
0

Problem Summary

The second NVIDIA RTX 3090 GPU on one of our lab's H3C servers disappeared from nvidia-smi, lspci, and ls /dev command output and not seems to be recognized by Ubuntu after a system reboot.

The Whole Story

We have a server which was built on H3C R4900 G5 with two RTX 3090 installed for machine learning. It has been running pretty well until an internet attack which made the machine began to occasionally freeze with no response to any input. So, a few days ago the server admin and me decided to reinstall the OS.

  1. We use a pre-made usb boot iso install the system. After a regular first time setup(choose disk, language, etc), the desktop shows normally. And we installed todesk(remote desktop), openssh, docker and miniconda on the machine.

"safe mode" was used instead of "try or install ubuntu" because the normal option would stuck on black screen with white cursor blinking on left top corner.

  1. Then the admin installed NVIDIA driver 550 and NVIDIA container toolkit (to access GPU inside docker container). At this time, everything works perfectly, we can see two 3090 from nvidia-smi and run model training on them.

Note that the server has not been rebooted since driver installation.

  1. After some days, I came to server room to help solve a network problem. I hot plugged in the VGA connector of a monitor, a usb mouse and a usb keyboard into the server's back(which only has one VGA port, and the other VGA port is on the front), and then press the space key on keyboard to wakeup the monitor.

  2. Normally this would bring up the login interface of ubuntu, however, I only got a tiny white vertical line with minor arrow on top of the line blinking at the right bottom side of the monitor, no matter how many times I pressed the keyboard.

  3. So after making sure no one was using the server at that time, I turned off the server by pushing power button and restarted it.

  4. After this reboot, the login interface would show up normally and I can log in to my account with no problem. But I could only see one RTX 3090 card from nvidia-smi now.lspci and ls /dev still shows two GPU detected but GPU 2 is now recognized as "ASPEED Graphics Family".

How can I solve this problem ? any help would be appreciated.

Hardware Information

OS: Ubuntu jammy 22.04 x86_64Kernel: Linux 6.8.0-49-genericShell: bash 5.1.16CPU: Intel(R) Xeon(R) Gold 5318Y (96) @ 3.40 GHzGPU 1: NVIDIA GeForce RTX 3090GPU 2: ASPEED Technology, Inc. ASPEED Graphics FamilyMemory: 5.76 GiB / 251.52 GiB (2%)Disk (/): 95.10 GiB / 2.15 TiB (4%) - ext4, RAID 5

Viewing all articles
Browse latest Browse all 4552

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>