Stack traces on ESP8266: a GDB server stub

By Marko Mikulicic

TL;DR: we’ve implemented a GDB server stub for the ESP8266 that allows you to get a full symbolic C stack trace and do limited source level debugging.

Friends often ask me what it's like to write software for embedded platforms; the usual focus is on the limited resources available (such as code size and working memory) and occasionally on the fact that you tend to be more exposed to the low-level side of the force.

However, in this day and age, when the perception of embedded platforms is distorted by handheld devices that could happily emulate my first personal computer and have spare cycles to play some tunes in the background, it’s often too easy to forget the everyday struggles that the average embedded developer faces.

At Cesanta, we are busy building an embedded web server called Mongoose which includes the full TCP/IP network stack with drivers, as well as TLS 1.3 stack. It lowers the bar for entering the magical and cursed world of embedded network programming. But while building it we’re even more exposed to the same problems that plague the field.

Today, I’d like to talk about tooling, and specifically about a piece of functionality provided by almost every programming environment; one that’s so easy to take for granted: stack traces.

I’m not a great fan of debuggers. Personally, I’m a follower of the printf-debugging style. I noticed that in many cases attaching a debugger involves too much hassle to be of any practical gain.

In the past decade of my career, I wrote code and maintained infrastructures written in C/C++/Python/Java/Scala/Go and I remember having used the debugger only on a handful of occasions. And even then, it was mostly just because it was the only way to figure out the path of function invocation that led to a particular failure.

Since most languages provide stack traces as a built-in feature, that usually happened only when debugging C/C++ programs.

Fast-forward to today, enter embedded development, pick ESP8266 as a platform, leave me without a debugger, and I feel stranded.

Suddenly this happens:

First of all: why do we have this problem? What’s the story of the ESP8266?

The ESP8266

ESP8266 is an incredibly cool piece of silicon; the coolest part being the price: you can get a fully functional board for $3.60 or less. No need for expensive devboards; you can use it out of the box with minimal tooling.

The products started its life as a WiFi module you talk to via the AT command set over serial port. Just plug it to your Arduino and you’re ready to go.

But using it as a daughterboard is only the beginning of the story. The board itself comes with a 32-bit CPU, 32 Kb of instruction ram, 80 Kb of data ram, 512 Kb (some models more) of flash. Soon enough Espressif distributed an SDK that allowed you to write your code directly on the device. The capabilities of the device dwarf the well-known AVRs while presenting a comparable power consumption.

So how did a relatively unknown Chinese company pull that off? I cannot judge the quality of the ASIC and the RF part, which seems rather good to my untrained eye. The quality of the software SDK and accompanying documentation does raise quite a few eyebrows.

At the core of the ESP8266EX ASIC lies a very interesting Xtensa CPU brought by Tensilica. Tensilica has been acquired by Cadence in 2013. They are best known for their configurable (parametric) IP cores.

They provide a great tooling as well, a compiler, debugger, simulator, ...

The problem is that Espressif cannot ship that tooling with the SDK. Furthermore, even if you buy the Xtensa SDK from Cadence, the exact parameters used to define the ESP8266EX can only be inferred by a bunch of generated files. It’s hard to even figure out whether it’s doable to reverse engineer it in the timeframe of an Xtensa SDK evaluation license. And even so, it’s probably not worth the money unless ESP8266 is all you care about.

While Espressif does use Xtensa tooling internally to build their own binary blobs, users only realistic choice is to use the GCC port. The actual instantiation of the CPU architecture used by the ESP8266 is dubbed the lx106.

The parametric nature of the Xtensa platform means that actual features (including the instruction set) of the various devices built on the Xtensa CPU are quite different. This makes it quite hard to reuse tools or even to figure out how things are supposed to work.

The first striking characteristic of the lx106 platform is that it doesn’t use the defining feature of the Xtensa instruction set: the register window. This has profound implication on the calling convention used on the ESP8266 and hence to the rest of the tooling.

There is an actively developed port of the GNU compiler toolchain maintained by Max Filippov (from Cadence) available at: https://github.com/jcmvbkbc/crosstool-NG/tree/lx106-g++. It’s still far from perfect, but there’s been a lot of progress recently. Kudos to Max for his devotion to the community!

The current options for debugging the ESP8266 are quite involving.

You can try the on-chip-debugger (either via the xt-ocd from Xtensa or via https://github.com/projectgus/openocd), but it requires JTAG cables and a ESP board with enough broken out pins (i.e. not possible on a ESP-01).

The qemu port is also still embryonic.

Stack walking

If all we want to do is to have a stack trace, one simple option would be to just do it from within the code, e.g. with something like libunwind. Unfortunately, there is no lx106 port of any library like that. Furthermore, given that most of the code is compiled without frame pointers, implementing it as a library with a reasonable size would be … challenging.

GDB solves the problem by actually analysing the code, instruction by instruction, locating function prologue, undo stack manipulation etc. What if we could feed gdb with our memory content and let it do the hard work?

GDB server protocol

GDB supports remote debugging with a simple textual protocol. It can work over serial or network. You can find a simple description of the protocol at https://sourceware.org/gdb/onlinedocs/gdb/Packets....

There are two basic commands we have to support:

‘g’: dump the registers ‘m’: read a bunch of bytes from memory. So, all we have to do is get some code invoked when an exception occurs, figure out the state of the registers and talk the GDB protocol over the serial port.

Getting control

I first tried to directly change the low-level exception vector on the Xtensa CPU but with no luck.

Then I noticed that the linker script mentions a function called _xtos_set_exception_handler. XTOS is a very thin OS layer provided by the Xtensa SDK.

Turns out that _xtos_set_exception_handler lets you register a C function to be called when a given exception gets triggered.

ICACHE_FLASH_ATTR void gdb_init() {
  char causes[] = {EXCCAUSE_ILLEGAL, EXCCAUSE_INSTR_ERROR,
    EXCCAUSE_LOAD_STORE_ERROR, EXCCAUSE_DIVIDE_BY_ZERO,
    EXCCAUSE_UNALIGNED, EXCCAUSE_INSTR_PROHIBITED,
    EXCCAUSE_LOAD_PROHIBITED, EXCCAUSE_STORE_PROHIBITED};
  int i;
  for (i = 0; i < (int) sizeof(causes); i++) {
    _xtos_set_exception_handler(causes[i], gdb_exception_handler);
  }
}

The low-level interrupt handler saves the state of the registers in a structure on the stack and invokes the C handler with the address of that structure as the first and only parameter.

Since the Xtensa is a parametric core, it's not easy to figure out exactly what doc applies to what. Xtensa documentation is too generic and while a lot can be learned from code available for other core instances, you're never sure if it applies to the lx106.

I ended up so utterly confused that I just decided to try writing some patterns in the registers and see where they ended up. I managed to locate registers from a2 to a15, but a1 (stack pointer) appeared to be clobbered with the content of a0 (the return address).

I later found out these two sources of information that back my wild guesses and explain the missing a1 register:

The UserFrame structure definition: https://github.com/espressif/esp_iot_rtos_sdk/blob...

And the XTOS exception handler: https://github.com/qca/open-ath9k-htc-firmware/blo...

Putting all together we have:

struct xtos_saved_regs {
  uint32_t pc; /* instruction causing the trap */
  uint32_t ps;
  uint32_t sar;
  uint32_t vpri;  /* current xtos virtual priority */
  uint32_t a0;    /* when __XTENSA_CALL0_ABI__ is true */
  uint32_t a[16]; /* a2 - a15 */
};

The LITBASE register is missing, but it doesn’t seem that the low-level exception handler clobbers it though, hence we can just give GDB its current content.

The key gotcha here is that while the stack pointer is not present, it can be inferred from the address of the struct xtos_saved_regs structure passed to the C exception handler. It’s exactly 256 bytes below the original stack pointer.

From now on, we can just disable interrupts and wait for GDB queries:

/* The user should detach and let gdb do the talkin' */
ICACHE_FLASH_ATTR void gdb_server() {
  printf("waiting for gdb\n");
/*
 * polling since we cannot wait for interrupts inside
 * an interrupt handler of unknown level.
 *
 * Interrupts disabled so that the user (or v7 prompt)
 * uart interrupt handler doesn't interfere.
 */
  xthal_set_intenable(0);
  for (;;) {
    int ch = gdb_read_uart();
    if (ch != -1) gdb_handle_char(ch);
  }
}

Talking to GDB

The next piece of the puzzle is to find what is the format of the reply to the ‘g’ command expected by GDB.

This depends on the actual GDB build. We need to use the lx106 port.

The registers definition can be found in gdb/regformats/reg-xtensa.dat from

https://github.com/jcmvbkbc/crosstool-NG/blob/lx10...

From it we can derive:

struct regfile {
  uint32_t a[16];
  uint32_t pc;
  uint32_t sar;
  uint32_t litbase;
  uint32_t sr176;
  uint32_t sr208;
  uint32_t ps;
};

There are a couple of less interesting technicalities regarding the GDB actual protocol, and safely accessing memory even though there are some ranges which are not byte addressable, but that’s basically it.

Let’s see it in action:

#0 0x40242557 in crash (v7=<optimized out>, this_obj=18445899648779419648, args=18446462599806581592) at user/v7_esp.c:371
#1 0x4023c321 in i_eval_call (v7=v7@entry=0x3fff5c28, a=a@entry=0x3fff96f0, pos=pos@entry=0x3ffffe94, scope=<optimized out>,
 this_object=<error reading variable: can't compute CFA for this frame>, is_constructor=<optimized out>, is_constructor@entry=0) at user/v7.c:9977
#2 0x40239962 in i_eval_expr (v7=0x3fff5c28, v7@entry=<error reading variable: can't compute CFA for this frame>, a=0x3fff96f0,
 a@entry=<error reading variable: can't compute CFA for this frame>, pos=0x3ffffe94, pos@entry=<error reading variable: can't compute CFA for this frame>,
 scope=<optimized out>) at user/v7.c:9595
#3 0x4023bcf0 in i_eval_stmt (v7=<error reading variable: can't compute CFA for this frame>, a=<error reading variable: can't compute CFA for this frame>,
 pos=<error reading variable: can't compute CFA for this frame>, pos@entry=0x3ffffe94, scope=<optimized out>, brk=<optimized out>, brk@entry=0x3ffffe90) at user/v7.c:10487
#4 0x4023bd4a in i_eval_stmts (v7=<error reading variable: can't compute CFA for this frame>, a=<error reading variable: can't compute CFA for this frame>, pos=0x3ffffe94,
 pos@entry=<error reading variable: can't compute CFA for this frame>, end=15, scope=<optimized out>, brk=<error reading variable: can't compute CFA for this frame>)
 at user/v7.c:10053
#5 0x4023b104 in i_eval_stmt (v7=<optimized out>, a=a@entry=0x3fff96f0, pos=pos@entry=0x3ffffe94, scope=<optimized out>, brk=<optimized out>, brk@entry=0x3ffffe90)
 at user/v7.c:10088
#6 0x4024140a in v7_exec_with (v7=<optimized out>, res=res@entry=0x3fffff30, src=<optimized out>, w=<optimized out>) at user/v7.c:10607
#7 0x4024148a in v7_exec (v7=<optimized out>, res=res@entry=0x3fffff30, src=<optimized out>) at user/v7.c:10631
#8 0x402421c4 in process_js (cmd=<optimized out>) at user/v7_cmd.c:66
#9 0x4024234a in process_command (cmd=cmd@entry=0x3ffebc14 <recv_buf$3591> "crash()") at user/v7_cmd.c:128
#10 0x402423f7 in process_prompt_char (symb=<optimized out>) at user/v7_cmd.c:163
#11 0x40244a59 in rx_task (events=<optimized out>) at user/v7_uart.c:151
#12 0x40000f49 in ?? ()
#13 0x40000f49 in ?? ()

This is a stack trace of code compiled with: -Og -g3. It still doesn't work with -Os. Note the “error reading variable: can't compute CFA for this frame”. lx106 GDB likely needs still a bit of work (or I missed something in the GDB stub).

(UPDATE: CFA issue has been fixed in gdb 7.9.1, available in the lx106-g++-1.21.0 branch of https://github.com/jcmvbkbc/crosstool-NG. Thanks Angus for pointing it out that the new gdb was worth trying. It doesn't fix the -Os issue though)

If I find some time I’ll continue working on it and add the ability to set breakpoints and resume execution, but for now it does solve our pressing problem of stack traces and I hope you can find it useful as well.

UPDATE: Since the publication of this blog, Espressif has distributed their own gdb stub: https://github.com/espressif/esp-gdbstub . Their version has a few more features, you might want to check it out.

Coredumps

TL;DR: we also implement coredumps.

The GDB stub requires interactive bidirectional access to your device's serial port. This might not be practical in several scenarios, e.g. you ship the devices to some other people or your uart0 is connected to some other device and all you have is the unidirectional uart1 for logging.

When a crash happens and coredumps are enabled (default), the firmware will dump its memory and the register file to the currently configured debug uart interface. Our Mongoose Flashing Tool allows you to log your console output into a file so you don’t need to copy paste the (longish) coredump output in order to use it.

We didn’t want to waste time in figuring out how to create a gdb compatible ELF coredump file, so we basically reused the same idea and wrote a GDB stub that serves the content of the coredump file. This allowed us to just dump the memory in a very simple file format a JSON object with base64 encoded memory dumps:

---BEGIN CORE DUMP ---
{"arch": "ESP8266","REGS":{"addr": 1073651984, "data":
"hN0hQBD7/z8BAAAAAAAAAAD7/z+g3/8/YBsAAAAAAAAAAAAAHAAAAID6/z8AgP//bN0hQOGkIkC/kiJARAf/P4bdIUAgAAAAAAAAAAAAAAAAAAAAMAAAAA=="},"DRAM": {"addr": 1073643520, "data":
"AAAAAAAAAAAAAAAAAQEBAQABAAABAAAAbxkAAJJ3Mbp4AAAA/wAAAO+JJ0AAAAAAlYknQAQA
.....
AAAAAAAAAAAAAAAAAAAAAA=="}}
---- END CORE DUMP ----

The coredump “serving” script takes a log file as input and scans it backwards until it locates a pair of “BEGIN CORE DUMP” / “END CORE DUMP” markers. This allows you to just keep logging the serial output and run the debugging script whenever your device crashes without having to worry about preparing the coredump manually.

Since our GDB stub implementation is mostly focused on doing postmortem debugging anyways, the only advantage of the interactive gdb stub is that you don't have to wait for the coredump to be transferred via serial.