Hi Zhiyao, "zhiyao.ma.98--- via Devel" <devel@lists.tockos.org> writes:
Update to include some additional observation from my end: Compiler optimization significantly affects the speed. Currently Tock's kernel uses "z" which is the slowest.
On STM32F412G Discovery board @ 96 MHz, performing 10,000 ping-pong. When changing kernel compilation to use different optimization flags, the measured time and code size are shown below:
Thanks for posting this, this is really quite interesting. I did not expect there to be such a drastic difference between optimization level and overall performance. Especially on the simpler microcontroller platforms, conventional wisdom would suggest that fewer & smaller instructions should correlate with better performance. I'd guess that the compiler may choose more efficient algorithms for certain primitives and performs more aggressive inlining on higher optimization levels, which may serve as a gateway to enabling other optimizations. I'm curious whether we could pinpoint these performance gains to some few subroutines. I did a similar exploration a while back for Tock on RISC-V, which I've posted to the old mailing list: https://groups.google.com/g/tock-dev/c/FPTmNe4BAq0 This highlighted the `memcpy` intrinsic as being particularly inefficient on the `-Oz` optimization level. Supposedly this is fixed in upstream Rust / LLVM since a couple months or years now, but it might be good to verify that. For RISC-V we can use the LiteX Sim target (compiling its HDL to a Verilated simulation) to generate an instruction trace with cycle-accurate information on the instructions executed by the CPU. I have a set of patches for this which I can dig up if you're curious. I don't know whether we have an equivalent for ARM Cortex-M (yet). I'm Ccing @Alex who had Tock running on Renode (emulating an STM ARM chip) once, which may also be able to generate such a trace. -Leon