I have 4 GPU A100 PCIe 80GB. I'd like to use the first two GPUs for PCI Passthrough on KVM/OpenStack and the other two for vGPU (MIG).
For the first two, I'd like to use the VFIO driver, and for the other two, I'd like to use the NVIDIA driver.
Is this possible?
Details:
# lspci -nnk | grep -A3 -i "3D Controller"01:00.0 3D controller [0302]: NVIDIA Corporation GA100 [A100 PCIe 80GB] [10de:20b5] (rev a1) Subsystem: NVIDIA Corporation GA100 [A100 PCIe 80GB] [10de:1533] Kernel driver in use: nvidia Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia--41:00.0 3D controller [0302]: NVIDIA Corporation GA100 [A100 PCIe 80GB] [10de:20b5] (rev a1) Subsystem: NVIDIA Corporation GA100 [A100 PCIe 80GB] [10de:1533] Kernel driver in use: nvidia Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia--81:00.0 3D controller [0302]: NVIDIA Corporation GA100 [A100 PCIe 80GB] [10de:20b5] (rev a1) Subsystem: NVIDIA Corporation GA100 [A100 PCIe 80GB] [10de:1533] Kernel driver in use: nvidia Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia--c4:00.0 3D controller [0302]: NVIDIA Corporation GA100 [A100 PCIe 80GB] [10de:20b5] (rev a1) Subsystem: NVIDIA Corporation GA100 [A100 PCIe 80GB] [10de:1533] Kernel driver in use: nvidia Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
~# nvidia-smiMon Jul 15 10:47:39 2024+---------------------------------------------------------------------------------------+| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 ||-----------------------------------------+----------------------+----------------------+| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. || | | MIG M. ||=========================================+======================+======================|| 0 NVIDIA A100 80GB PCIe Off | 00000000:01:00.0 Off | On || N/A 39C P0 43W / 300W | 0MiB / 81920MiB | N/A Default || | | Enabled |+-----------------------------------------+----------------------+----------------------+| 1 NVIDIA A100 80GB PCIe Off | 00000000:41:00.0 Off | On || N/A 40C P0 47W / 300W | 0MiB / 81920MiB | N/A Default || | | Enabled |+-----------------------------------------+----------------------+----------------------+| 2 NVIDIA A100 80GB PCIe Off | 00000000:81:00.0 Off | On || N/A 39C P0 44W / 300W | 87MiB / 81920MiB | N/A Default || | | Enabled |+-----------------------------------------+----------------------+----------------------+| 3 NVIDIA A100 80GB PCIe Off | 00000000:C4:00.0 Off | On || N/A 39C P0 45W / 300W | 87MiB / 81920MiB | N/A Default || | | Enabled |+-----------------------------------------+----------------------+----------------------++---------------------------------------------------------------------------------------+| MIG devices: |+------------------+--------------------------------+-----------+-----------------------+| GPU GI CI MIG | Memory-Usage | Vol| Shared || ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG || | | ECC| ||==================+================================+===========+=======================|| 2 7 0 0 | 12MiB / 9728MiB | 14 0 | 1 0 0 0 0 || | 0MiB / 16383MiB | | |+------------------+--------------------------------+-----------+-----------------------+| 2 8 0 1 | 12MiB / 9728MiB | 14 0 | 1 0 0 0 0 || | 0MiB / 16383MiB | | |+------------------+--------------------------------+-----------+-----------------------+| 2 9 0 2 | 12MiB / 9728MiB | 14 0 | 1 0 0 0 0 || | 0MiB / 16383MiB | | |+------------------+--------------------------------+-----------+-----------------------+| 2 11 0 3 | 12MiB / 9728MiB | 14 0 | 1 0 0 0 0 || | 0MiB / 16383MiB | | |+------------------+--------------------------------+-----------+-----------------------+| 2 12 0 4 | 12MiB / 9728MiB | 14 0 | 1 0 0 0 0 || | 0MiB / 16383MiB | | |+------------------+--------------------------------+-----------+-----------------------+| 2 13 0 5 | 12MiB / 9728MiB | 14 0 | 1 0 0 0 0 || | 0MiB / 16383MiB | | |+------------------+--------------------------------+-----------+-----------------------+| 2 14 0 6 | 12MiB / 9728MiB | 14 0 | 1 0 0 0 0 || | 0MiB / 16383MiB | | |+------------------+--------------------------------+-----------+-----------------------+| 3 7 0 0 | 12MiB / 9728MiB | 14 0 | 1 0 0 0 0 || | 0MiB / 16383MiB | | |+------------------+--------------------------------+-----------+-----------------------+| 3 8 0 1 | 12MiB / 9728MiB | 14 0 | 1 0 0 0 0 || | 0MiB / 16383MiB | | |+------------------+--------------------------------+-----------+-----------------------+| 3 9 0 2 | 12MiB / 9728MiB | 14 0 | 1 0 0 0 0 || | 0MiB / 16383MiB | | |+------------------+--------------------------------+-----------+-----------------------+| 3 11 0 3 | 12MiB / 9728MiB | 14 0 | 1 0 0 0 0 || | 0MiB / 16383MiB | | |+------------------+--------------------------------+-----------+-----------------------+| 3 12 0 4 | 12MiB / 9728MiB | 14 0 | 1 0 0 0 0 || | 0MiB / 16383MiB | | |+------------------+--------------------------------+-----------+-----------------------+| 3 13 0 5 | 12MiB / 9728MiB | 14 0 | 1 0 0 0 0 || | 0MiB / 16383MiB | | |+------------------+--------------------------------+-----------+-----------------------+| 3 14 0 6 | 12MiB / 9728MiB | 14 0 | 1 0 0 0 0 || | 0MiB / 16383MiB | | |+------------------+--------------------------------+-----------+-----------------------++---------------------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=======================================================================================|| No running processes found |+---------------------------------------------------------------------------------------+
# cat /etc/lsb-releaseDISTRIB_ID=UbuntuDISTRIB_RELEASE=22.04DISTRIB_CODENAME=jammyDISTRIB_DESCRIPTION="Ubuntu 22.04.4 LTS"
# uname -aLinux gpu004 6.5.0-41-generic #41~22.04.2-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 3 11:32:55 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
# cat /proc/cmdlineBOOT_IMAGE=/boot/vmlinuz-6.5.0-41-generic root=UUID=e8e591aa-3d0e-4e6f-aa30-43559c96fe7c ro systemd.unified_cgroup_hierarchy=0 iommu=pt intel_iommu=on default_hugepagesz=2M transparent_hugepage=never hugepages=225280 hugepagesz=2M vfio-pci.ids=10de:20b5:0000:01:00.0,10de:20b5:0000:41:00.0 vfio_iommu_type1.allow_unsafe_interrupts=1 modprobe.blacklist=nvidiafb,nouveau
Note: I tried this vfio-pci.ids=10de:20b5 as well in the grub and in the files.
# cat /etc/initramfs-tools/modulesvfiovfio_iommu_type1vfio_virqfdvfio_pci ids=10de:20b5:0000:01:00.0,10de:20b5:0000:41:00.0vfio-pci
# cat /etc/modprobe.d/vfio.confoptions vfio_pci ids=10de:20b5:0000:01:00.0,10de:20b5:0000:41:00.0
# cat /etc/modprobe.d/blacklist-nvidia.confblacklist nouveaublacklist nvidiafb
# cat blacklist-nouveau.confblacklist nouveauoptions nouveau modeset=0