在容器内使用显卡进行渲染

由于要测试PyTorch3D生成的Mesh，而PyTorch3D的环境在本机又不好搭建，准备在Docker容器内做些渲染相关的工作，按关键词nvidia opengl docker搜索了一番，发现应该很好完成。

结合几篇文章的内容，以及nvidia/opengl的Dockerfile来看，核心是安装libglvnd0及一些依赖库、配置好glvnd的vendor json文件、设置nvidia docker的环境变量即可，但一番操作下来，在容器内安装mesa-utils和glmark2后，用glxinfo和glmark2都显示vendor是：

OpenGL renderer string: llvmpipe (LLVM 11.0.0, 256 bits)

即仍然在使用软渲染，但nvidia-smi在容器内工作又是正常的。

使用集显

在Host机器上测试glxinfo和glmark2，发现vendor居然也不是nvidia而是MESA Intel，也就是电脑的集成显卡。不过集成显卡就集成显卡吧，好歹让容器能够用集显，这样工作好歹可以继续。

测试后，发现在docker run时附带--device=/dev/dri:/dev/dri参数即可。这样操作后，容器内glxinfo总算显示vendor是集成显卡了，由于工作要求的渲染性能不高，代码倒也能跑起来。

关于Linux DRI，可参考Linux graphic subsystem(2)_DRI介绍的说明

使用独显

话说回来，为什么host和容器都显示vendor是集显而非Nvidia的独显？明明nvidia-smi工作正常，CUDA的代码也能运行。带此疑问，用why glxinfo not detect nvidia while nividia-smi works搜索一番，发现Nvidia的论坛里也有些相似的问题，但求助都没有明确答复。

最后本机上打开nvidia-settings查看设置时，发现Profile里的3个选项：

Nvidia (Performance mode)
Nvidia On-Demand
Intel (Power saving mode)

第3个很好理解，但Nvidia Performance mode和On-Demand又有什么区别？搜索一番，发现此贴：

Nvidia On - Demand : Ubuntu
there’s a good write up here: https://www.linuxuprising.com/2019/08/nvidia-43517-linux-beta-driver-adds.html
on-demand means the Ubuntu optimus tool now lets you have dynamic switching of nvidia, but output is limited to the laptop screen, which is ‘bumblebee mode’, or the normal ubuntu Nvidia mode, which turns on the card after you restart X; this mode uses nvidia to render everything, and external monitors work. But you can’t turn the card off without ending your X session.
This new on-demand mode is pretty clunky but if you want nvidia on battery and don’t use external monitors and find logging out and in inconvenient, it’s good.
And PSA: the gdm3 bug with optimus and modeset=1 is not fixed in 19.10, see https://bugs.launchpad.net/ubuntu/+source/gdm3/+bug/1716857
swap to lightdm or run gdm3 as root to fix it.

重点：

on-demand means the Ubuntu optimus tool now lets you have dynamic switching of nvidia, but output is limited to the laptop screen

换句话说用外接显示器时On-Demand岂不是固定用了集显？切换Profile到Performance，重启，host上检查glxinfo，vendor果然变成了Nvidia。

关于On-Demand，也可见graphics - How NVIDIA On-Demand option works in NVIDIA X Server Settings? - Ask Ubuntu的介绍

另，有人明确不推荐On-Demand的使用：

https://kfocus.org/wf/igpu.html
WE DO NOT RECOMMEND USING NVIDIA ON-DEMAND mode (also called Hybrid Graphics Mode) in most circumstances; it keeps both the dGPU and iGPU running, can make multi-display systems run slowly, and can use significantly more power than Intel Mode alone even when idle. For these reason, we suggest using Intel Mode to conserve power and NVIDIA Performance Mode otherwise. One use-case that is an exception is when you are using the GPU for CUDA, OPTIX, or other compute purposes. In instance, switching to On-Demand can provide additional VRAM and GPU power for those jobs IF YOU NEED IT. Just remember to switch back when you are done.
When using NVIDIA On-Demand, if you wish to launch an app using the NVIDIA GPU and Vulkan:
# __NV_PRIME_RENDER_OFFLOAD=1 %appname% # Example: __NV_PRIME_RENDER_OFFLOAD=1 vkcube
When using NVIDIA On-Demand, if you wish to launch an app using the NVIDIA GPU and OpenGL:
# __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia %appname% # Example sudo apt-get install mesa-utils # Get glxgears __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxgears

创建容器

在切换Profile后，创建运行容器，成功在容器内运行OpenGL程序，并通过x11转发在host中显示界面。Dockerfile如下：

FROM ubuntu:18.04

ARG USER=docker
ARG UID=1000
ARG GID=1000

# create a new user with same UID & PID but no password
RUN groupadd --gid ${GID} ${USER} && \
    useradd --create-home ${USER} --uid=${UID} --gid=${GID} --groups root && \
    passwd --delete ${USER}

# add user to the sudo group and set sudo group to no passoword
RUN apt update && \
    apt install -y sudo && \
    adduser ${USER} sudo && \
    echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers

# configure OpenGL glvnd runtime for Nvidia
# see https://hub.docker.com/r/nvidia/opengl for details
RUN apt install -y --no-install-recommends libxau6 libxdmcp6 libxcb1 libxext6 libx11-6
# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES ${NVIDIA_VISIBLE_DEVICES:-all}
ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES:+$NVIDIA_DRIVER_CAPABILITIES,}graphics,compat32,utility
RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf && \
    echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf
# Required for non-glvnd setups.
ENV LD_LIBRARY_PATH /usr/lib/x86_64-linux-gnu:/usr/lib/i386-linux-gnu${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
RUN apt-get install -y --no-install-recommends libglvnd0 libgl1 libglx0 libegl1 libgles2
COPY 10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json

# setup default user when enter the container
USER ${UID}:${GID}
WORKDIR /home/${USER}

创建镜像：

#!/bin/bash
set -e
USER_ID=$(id -u)
GROUP_ID=$(id -g)
docker build --build-arg USER="$USER" \
    --build-arg UID="$USER_ID" \
    --build-arg GID="$GROUP_ID" \
    --tag "opengl-docker" \
    --file ./Dockerfile \
    --progress=plain \
    .

运行容器：

#!/bin/bash
set -e
# --restart always: always restart the container if it stops (except it is manually stopped by the user)
# --network host: use the same network as the host, so the container is able to connect to the host xServer
# --env TERM=xterm-256color: support color output in bash
# --env DISPLAY=$DISPLAY: instructs X clients which X server to connect to
# --runtime=nvidia: enable all GPU access inside the docker. nvidia-docker2 must be installed to use this option
# --gpus all: expose all GPUs to the container. same as --runtime=nvidia
# --volume=/tmp/.X11-unix/:/tmp/.X11-unix/: allow access to the host's X socket
docker run -it \
    --name "opengl-runtime" \
    --restart always \
    --user "${USER}" \
    --workdir "${PWD}" \
    --network host \
    --env LANG=zh_CN.UTF-8 \
    --env TERM=xterm-256color \
    --env DISPLAY="${DISPLAY}" \
    --env QT_X11_NO_MITSHM=1 \
    --runtime=nvidia \
    --volume="$HOME":"$HOME" \
    --volume=/tmp/.X11-unix/:/tmp/.X11-unix/ \
    --detach \
    "opengl-docker" \
    /bin/bash

使用集显#

使用独显#

创建容器#

参考#

使用集显

使用独显

创建容器

参考