Saturday, May 21, 2016

Adventures with NodeMCU

I've always wanted to build a retro-themed display for some weather data, and I've been thinking about how to do this over a few years. Recently,  I started to assemble the hardware to actually make it a reality.

The essential piece of the system is an old-fashioned looking analog meter with a simple mechanism to choose the variable to be displayed (temperature, humidity, etc). I always wanted to be able to display two values on the same meter, so I needed a drive mechanism that could handle that. Eventually I found the VID28-05 which is a dual instrument stepper motor. These are designed for displays like car instrument panels so they are made in large volumes and hence are economical! Also they can be driven at 5 volts at low current.

The device that seemed to be suitable to drive these was the NodeMCU -- this is an ESP8266 based board that is very cheap but includes programming hardware and standard pin spacings. It is programmed in Lua -- which is great for prototyping.

The interface to the variable selector device was a cheap rotary encoder (as used in car stereo equipment) and I wrote a module for the NodeMCU to provide a sensible interface. In the course of doing this, I ended up fixing a number of other issues with the base Lua firmware and ended up as a contributer to the nodemcu-firmware project.

One of the big issues with the ESP8266 chipset is that there is very limited RAM available and this is normally the limiting constraint on writing Lua code -- it all gets loaded into RAM at runtime and then interpreted.

It occurred to me that if this could be copied into the flash memory (of which there is a lot), and it could be executed directly, then this would enable much larger applications to be written. More importantly it would allow larger sets of standard libraries to be written and shared.

The base object in Lua for a piece of code is a 'function' which corresponds to the C structure 'Proto'.

typedef struct Proto {
  CommonHeader;
  TValue *k;  /* constants used by the function */
  Instruction *code;
  struct Proto **p;  /* functions defined inside the function */
  unsigned char *packedlineinfo;
  struct LocVar *locvars;  /* information about local variables */
  TString **upvalues;  /* upvalue names */
  TString  *source;
  int sizeupvalues;
  int sizek;  /* size of `k' */
  int sizecode;
  int sizep;  /* size of `p' */
  int sizelocvars;
  int linedefined;
  int lastlinedefined;
  GCObject *gclist;
  lu_byte nups;  /* number of upvalues */
  lu_byte numparams;
  lu_byte is_vararg;
  lu_byte maxstacksize;
} Proto;

It was fairly easy to copy the 'code' to flash and then replace the pointer to point at the readonly copy. Very quickly I discovered that, after writing to the flash directly, the memory mapped, readonly, view of the flash did not update. The documentation on the ESP8266 is pretty rudimentary. It is an Xtensa lx106 core with a number of custom peripherals designed by Espressif.

After some experimentation, it appears that if you read memory at +32k and +64k, then the original cached data is lost and so, if you access it again, then the data is fetched from the flash chip. I haven't done the experiments to see if the cache can be flushed with a single read.

However, it turns out that just moving the code into flash doesn't get much memory back. A lot is consumed in strings (the constants, the local variable names, the upvalue names etc). There is a 16 byte Lua header for each string, and an 8 (or possibly 16) byte memory management overhead per block. This eats into the 48k of RAM that is available. So the next step was to move the strings (represented as TString) into flash. The code seemed fairly straightforward...

However, it didn't work except in the simplest case. The platform would lock up until the watchdog expired and triggered a reset. I had my suspicions that the garbage collector might be trying to write to my flash strings, but this should cause an exception rather than a watchdog timeout.

After some time, I recalled that the NodeMCU code had a custom exception handler that handled exceptions on 8 or 16 bit loads from flash. Apparently, the glue logic to the flash chip could only handle 32 bit loads (although this isn't clear if this is always true or whether it is only when there is a cache miss). Turns out that the exception handler also gets triggered when there is a store to the flash region. The exception handler detects that it is a store, and then (effectively) does a busy wait till the watchdog times out. The underlying SDK (from Espressif) tries to register interrupt handlers so that it can print out a nice message and save the exception parameters for the next reboot. It was a quick fix to make writes to the flash trigger an immediate crash.

This did help me track down a number of places in the garbage collector where it was trying to 'mark' my readonly TString objects. I fixed these.

I started out testing with the following code

function validate(method)
   local httpMethods = {GET=true, HEAD=true, POST=true, PUT=true, DELETE=true, TRACE=true, OPTIONS=true, CONNECT=true, PATCH=true}
   return (httpMethods[method])
end

Once I got the copying to flash to not crash the platform immediately, I tried to exercise the code above (after it was copied to flash). 

> validate("GET")
nil

What??? After lots more investigation, it turns out that the table implementation in Lua relies on the fact that two strings with the same value ("GET") are represented as the same pointer. This is no longer true once the value inside the function is stored in flash, and the interactive prompt version is located in RAM. 

I fixed the rawequal function so that it would compare the values of strings (without any significant performance penalty). It then turned out that the table implementation also used another equality checking function, so I needed to fix that as well.

It feels as though I am heading down a rabbit hole.

The current state is that the platform still triggers a watchdog timeout for complicated cases, but simple cases now work. It is a significant reduction in the amount of memory consumed by code. I am hopeful that I can the code to work reliably. Then the task will be to clean it up and make sure that there is no penalty when this copy-to-flash mode is not compiled in.



2 comments:

  1. Sometimes the journey is more interesting than the destination.

    ReplyDelete
  2. Someone said the other day: "The journey is its own reward. That's rubbish. The reward at the end is the reward. The journey should be rewarding."

    ReplyDelete