poor-man-s-profiler

背景

Linux下perf可以说是首选的性能调优工具——无需重新编译目标软件，支持采样多种类型的事件，开销相对较小。不过perf也有明显的缺点，perf是通过采样事件（并且通常都是采样CPU事件）来记录软件运行情况的，因此它的结果往往只能反映程序在某一方面的表现。

换而言之，如果perf采样的是CPU事件，那么其结果只能代表程序On-CPU的表现，至于Off-CPU部分则需要另外采样。虽然也可以通过采样sched:sched_stat_sleep、sched:sched_switch、sched:sched_process_exit来间接分析Off-CPU的情况，但Off-CPU瓶颈的类型可能有很多（IO、线程同步、内存等），只采样特定事件分析容易有所遗漏。

这自然地引出一个问题：有没有工具能够按固定时间间隔记录程序当前调用栈？

不幸的是，perf虽然支持记录cpu-clock、task-clock事件，但内核并没有提供类似于wall-clock的事件。其它采样工具，诸如gprof、gperftools、Valgrind也是如此。

poor man’s profiler

所谓的poor man’s profiler，其原理非常简单。用GDB启动（或关联到）进程，并按一定间隔中断进程并检查当前调用栈，函数的开销则与其在调用栈中出现的频率成正比。

一个简单的实现，poor-profiler.sh：

#!/bin/bash
set -e

command=""
sample_number=100
sleep_time=0.01
output_file="poorman-profiler.log"
PID=0

# parse arguments
while getopts e:n:t:o:p: flag
do
	case "${flag}" in
		e) command=${OPTARG} ;;
		n) sample_number=${OPTARG} ;;
		t) sleep_time=${OPTARG} ;;
		o) output_file=${OPTARG} ;;
		p) PID=${OPTARG} ;;
		*) echo "${OPTARG} are ignored" >&2 ;;
	esac
done

# remove old log
if [ -f "$output_file" ] ; then
	rm -v "$output_file"
fi

# run command in background if not empty
if [ -n "$VAR" ]; then
	${command} &
	PID=$!
fi

# attach gdb periodically
for x in $(seq 1 "$sample_number"); 
do
	sleep "$sleep_time"
	# check if process is running
	if kill -s 0 $PID ; then
		gdb -ex "set pagination 0" -ex "thread apply all bt" -batch -p $PID 2>/dev/null | tee -a "$output_file"
	else
		break
	fi
done

使用方式：

sh poorman-profiler.sh -e ${EXEC} -n 5 -t 1

也可以attach到运行中的进程：

sh poorman-profiler.sh -p ${PID} -n 5 -t 1

FlameGraph

Brendan Gregg的FlameGraph项目提供了从GDB输出的调用栈数据生成火焰图的脚本

stackcollapse-gdb.pl poorman-profiler.log > out.folded # 预处理
flamegraph.pl out.folded > gdb_flame.svg # 生成火焰图

生成的火焰图包含JS，建议用浏览器打开

背景#

poor man’s profiler#

FlameGraph#

参考#

背景

poor man’s profiler

FlameGraph

参考