TIL about LLVM XRay. It sticks no-ops at the beginning and end of each function and has a runtime tool that can patch the binary while it's running to swap the no-ops with tracing instructions.
https://llvm.org/docs/XRay.html
The idea of "dynamic instrumented profiling" where you can on-the-fly toggle profiling in specific functions is pretty cool.