Why is my system freezing when I try to unbind GPU from radeon?

Right now I’m trying to accomplish this: http://arseniyshestakov.com/2016/03/31/how-to-pass-gpu-to-vm-and-back-without-x-restart/

I’ve gotten everything on the host to work. DRI_PRIME is working correctly as shown below:

 $ DRI_PRIME=1 glxinfo | grep "renderer string"
 OpenGL renderer string: Gallium 0.4 on AMD HAWAII (DRM 2.43.0, LLVM 3.7.1)

 $ glxinfo | grep "renderer string"
 OpenGL renderer string: Mesa DRI Intel(R) Sandybridge Server

The problem is that the system completely freezes up when I’m trying to move the GPU from radeon to vfio-pci using this script:

#!/bin/bash
set -x
echo "1002 67b1" > /sys/bus/pci/drivers/vfio-pci/new_id
echo "0000:01:00.0" > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
echo "0000:01:00.0" > /sys/bus/pci/drivers/vfio-pci/bind
echo "1002 67b1" > /sys/bus/pci/drivers/vfio-pci/remove_id

echo "1002 aac8" > /sys/bus/pci/drivers/vfio-pci/new_id
echo "0000:01:00.1" > /sys/bus/pci/devices/0000:01:00.1/driver/unbind
echo "0000:01:00.1" > /sys/bus/pci/drivers/vfio-pci/bind
echo "1002 aac8" > /sys/bus/pci/drivers/vfio-pci/remove_id

set +x

It freezes on the line: echo “0000:01:00.0” > /sys/bus/pci/devices/0000:01:00.0/driver/unbind and the only thing I can do is power off the system via the power button.

I checked journalctl and I noticed this kept happening after the system froze:

Apr 02 11:13:12 joey-arch-pc kernel: WARNING: CPU: 1 PID: 7293 at drivers/gpu/drm/radeon/radeon_gart.c:246 radeon_gart_unbind+0xca/0xe0 [radeon]()
Apr 02 11:13:12 joey-arch-pc kernel: trying to unbind memory from uninitialized GART !

Here is the rest of the messages from journalctl with call traces: http://pastebin.com/L0asXS16

I found a good amount of similar bug reports via google, but they were rather old. There was a wide range of different patches I found related to similar issues, but since I am inexperienced with this sort of stuff I wasn’t exactly sure which would be the best patch to use. I attempted the ‘hotplug: Propagate the “ignore hotplug” setting to parent for bug #61891’ patch, but it didn’t work. I could try a bunch of different patches, but I thought it would be more logical to post here and see if someone has a solution before I waste time blindly trying a bunch of patches.

Edit: I just realized before the messages about the GART I get this message:

Apr 02 11:13:12 joey-arch-pc kernel: radeon 0000:01:00.0: Userspace still has active objects !

Also, an update: I plan to try out the latest kernel as well as the AMDGPU drivers (for CI, so experimental) and see how that goes. Other than that, no progress has been made.

Answer

Attribution
Source : Link , Question Author : MonopolyMan , Answer Author : Community

Leave a Comment