(This project was formerly known as "perfrecord".)
This is a work in progress and not ready for public consumption.
samply
is a command line CPU profiler which uses the Firefox profiler as its UI.
At the moment it works on macOS and Linux. Windows support is planned.
Try it out now:
% cargo install samply
% samply record ./your-command your-arguments
This collects a profile of the ./your-command your-arguments
command and saves it to a file. Then it opens
your default browser, loads the profile in it, and runs a local webserver so that profiler.firefox.com
can symbolicate the profile and show source code and assembly code on demand.
The captured data is similar to that of the "CPU Profiler" in Instruments.
samply
is a sampling profiler that collects stack traces, per thread, at some sampling interval.
In the future it should support sampling based on wall-clock time ("All thread states") and CPU time.
samply
does not require sudo privileges for profiling (non-signed) processes that it launches itself.
samply record rustup check
generates this profile.
Profiling system-provided command line tools is not straightforward because of system-integrity protection.
Here's an example for profiling sleep
on an Intel machine:
cat /bin/sleep > /tmp/sleep; chmod +x /tmp/sleep
samply record /tmp/sleep 2
It produces this profile.
Profiling system tools on Apple Silicon machines is harder because of stricter signing requirements.
(The below was written before Linux support was added. These sections need updating.)
This is meant to be an alternative to the existing profilers on macOS:
- Instruments
- the
sample
command line tool - the dtrace scripts that people use to create flame graphs on macOS.
It is meant to overcome the following shortcomings:
sample
and the dtrace@[ustack()] = count();
script do not capture sample timestamps. They only capture aggregates. This makes it impossible to see the sequence of execution.- The Instruments command line tool does not allow specifying the sampling interval or to capture all-thread-states (wall-clock time-based) profiles. This means that you often have to initiate profiling from the UI, which can be cumbersome.
- Instruments is not open source.
- Instruments only profiles a single process (or all processes system-wide). It would be nice to have a profiler that can follow the process subtree of a command.
- Instruments is unusably slow when loading profiles of large binaries. For example, profiling a local Firefox build with debug information hangs the Instruments UI for ten minutes (!).
- Instruments has bugs, lots of them.
- It misses some features, such as certain call tree transforms, or Rust demangling.
The last two could be overcome by using Instruments just as a way to capture data, and then loading the .trace bundles in our own tool.
There are two main challenges here:
- Getting the
mach_task_self
of the launched child process into samply. - Obtaining stacks from the task.
We get the task by injecting a library into the launched process using DYLD_INSERT_LIBRARIES
.
The injected library establishes a mach connection to samply during its module constructor,
and sends its mach_task_self()
up to the samply process.
This makes use of code from the ipc-channel crate's
mach implementation.
We can only get the task of binaries that are not signed or have entitlements. Similar tools require you to use Xcode to create a build that has task_for_pid entitelments to work around this restriction.
Once samply has the mach_port_t
for the child task, it has complete control over it.
It can enumerate threads, pause them at will, and read process memory.
We use these primitives to walk the stack and enumerate shared libraries.
Stack unwinding uses the framehop
crate, which
emits high quality stacks on both x86_64 and arm64. It supports Apple's compact unwind
info format and DWARF CFI, and has heuristics for function prologues and epilogues. As
a result, stacks should always be available, even for binaries that were compiled without
frame pointers.