Qiao

在容器内使用显卡进行渲染

由于要测试PyTorch3D生成的Mesh，而PyTorch3D的环境在本机又不好搭建，准备在Docker容器内做些渲染相关的工作，按关键词nvidia opengl docker搜索了一番，发现应该很好完成。结合几篇文章的内容，以及nvidia/opengl的Dockerfile来看，核心是安装libglvnd0及一些依赖库、配置好glvnd的vendor json文件、设置nvidia docker的环境变量即可，但一番操作下来，在容器内安装mesa-utils和glmark2后，用glxinfo和glmark2都显示vendor是： OpenGL renderer string: llvmpipe (LLVM 11.0.0, 256 bits) 即仍然在使用软渲染，但nvidia-smi在容器内工作又是正常的。使用集显在Host机器上测试glxinfo和glmark2，发现vendor居然也不是nvidia而是MESA Intel，也就是电脑的集成显卡。不过集成显卡就集成显卡吧，好歹让容器能够用集显，这样工作好歹可以继续。测试后，发现在docker run时附带--device=/dev/dri:/dev/dri参数即可。这样操作后，容器内glxinfo总算显示vendor是集成显卡了，由于工作要求的渲染性能不高，代码倒也能跑起来。关于Linux DRI，可参考Linux graphic subsystem(2)_DRI介绍的说明使用独显话说回来，为什么host和容器都显示vendor是集显而非Nvidia的独显？明明nvidia-smi工作正常，CUDA的代码也能运行。带此疑问，用why glxinfo not detect nvidia while nividia-smi works搜索一番，发现Nvidia的论坛里也有些相似的问题，但求助都没有明确答复。最后本机上打开nvidia-settings查看设置时，发现Profile里的3个选项： Nvidia (Performance mode) Nvidia On-Demand Intel (Power saving mode) 第3个很好理解，但Nvidia Performance mode和On-Demand又有什么区别？搜索一番，发现此贴： Nvidia On - Demand : Ubuntu there’s a good write up here: https://www.linuxuprising.com/2019/08/nvidia-43517-linux-beta-driver-adds.html on-demand means the Ubuntu optimus tool now lets you have dynamic switching of nvidia, but output is limited to the laptop screen, which is ‘bumblebee mode’, or the normal ubuntu Nvidia mode, which turns on the card after you restart X; this mode uses nvidia to render everything, and external monitors work....

Use docker with Nvidia GPU in WSL2

GPU Support 先确认Docker Desktop的Backend使用的是WSL2，并且Windows、Nvidia驱动的版本足够，随后管理员权限终端执行wsl --update更新wsl。完成后，终端执行 docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark 如果GPU可用，则输出类似于 Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) -numbodies=<N> (number of bodies (>= 1) to run in simulation) -device=<d> (where d=0,1,2.... for the CUDA device to use) -numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation) -compare (compares simulation results running once on the default GPU and once on the CPU) -cpu (run n-body simulation on the CPU) -tipsy=<file....

使用Github Actions部署Hexo

切换到Hexo写博客后，每次换电脑都要来遍NodeJS、Hexo的配置。虽然不算困难，但挺消耗写文章的心情，故决定折腾下Github CI，实现写完文章后推送就自动完成Hexo的生成和部署，这样方便专注于文章的撰写，不被环境搭建分散精力。设置仓库准备两个仓库，一个为博客源码仓库，一个是静态页面仓库。博客源码仓库：名称任意，设为私有静态页面仓库：名称需按照xxx.github.io格式来，必须设为公开的，存放Hexo生成的内容。参考https://pages.github.com/ 设置密钥为了向静态页面仓库推送内容，需要添加一对SSH密钥，其中公钥设置到静态页面仓库，私钥设置到源码仓库。生成密钥 ssh-keygen -t ed25519 -C "your_email@example.com" 在Github的静态仓库页面，添加新的Deploy Key。在Settings -> Deploy keys -> Add new中，填入任意Title，Value则是新生成的公钥内容。由于要往此仓库推送，需勾选”Allow Write Access“。在Github的源码仓库页面，在Settings -> Secrets -> Actions中点击New repository secret，Name设置为DEPLOY_KEY（后面配置Github Actions的脚本时会用NAME索引到此secret），Value填入新生成私钥的内容定义Workflows 在源码仓库根目录下，创建.github/workflows/hexo_deploy.yml文件，内容如下： name: HEXO_DEPLOY on: push: branches: - master jobs: build: runs-on: ubuntu-latest steps: - name: Checkout source uses: actions/checkout@v2.5.0 with: ref: master submodules: 'true' - name: Use Node.js uses: actions/setup-node@v3 with: node-version: 18 - name: Setup hexo env: ACTION_DEPLOY_KEY: ${{ secrets....

利用strace查找文件热点

在做性能调优时，遇到这么一个问题：已知国产机（飞腾+麒麟OS）上机械硬盘的性能非常差，文件读写会有不少开销，那么怎么跟踪程序的读写情况，尽量优化掉不必要的读写呢？这需要查找文件热点。对于这项工作，BPF Compiler Collection里的filetop是个很好的选择，不过BCC这组工具在麒麟OS源里没有提供，遂考虑用strace实现。跟踪系统调用严格来说，strace并不能直接跟踪文件的读写情况，而是跟踪所有接受一个文件名为参数的系统调用。不过无论是频繁读写还是频繁判断文件状态，对于调优而言都是可待优化的，因此这里没有严格区分两者。跟踪文件相关的系统调用： $ strace -t -e trace=file -o strace.log COMMAND # --trace=file # Trace all system calls which take a file name as an argument. You can think of this as an abbreviation for -e trace=open,stat,chmod,unlink,... which is useful to seeing what files the process is referencing. --trace=还可以使用process、network、signal、desc、memory等等，参见https://man7.org/linux/man-pages/man1/strace.1.html 示例 $ strace -t -e trace=file -o strace.log fc-list $ cat strace.log 18:07:24 execve("/home/tanqiao/program/hotspot/hotspot", ["hotspot"], 0x7ffc0b28a138 /* 80 vars */) = 0 18:07:24 access("/etc/ld....

Add dynamic tracing point in C++ dynamic library

List functions To list all functions exported $ perf probe -x libQt5CoreKso.so --funcs --filter '*' If you don’t add --filter '*', then all functions that start with _ will be filtered by default To list all functions in original form: $ perf probe -x libQt5CoreKso.so --funcs --no-demangle --filter '*' Combine with grep, you can find the desired function $ perf probe -x libQt5CoreKso.so --funcs --no-demangle --filter '*' | grep setValue _ZN6kso_qt11QJsonObject10setValueAtEiRKNS_10QJsonValueE _ZN6kso_qt18QCommandLineOption12setValueNameERKNS_7QStringE _ZN6kso_qt24QVariantAnimationPrivate10setValueAtEdRKNS_8QVariantE _ZN6kso_qt9QSettings8setValueERKNS_7QStringERKNS_8QVariantE Add tracing point To add a function as tracing point:...