Monday, April 7, 2014

I fixed my bug!

For some time, the main GPS synchronized NTP server (and PTP master) has had unexplained 1 second jumps in the timebase. It jumps backwards for a second or two and then returns to normal. I kept adding debugging to the code (which would cause stack overflows as the vsnprintf stack frame is huge) to no avail.

Finally, I was staring at the code and saw the line:

   uint32_t tow = get32(packet) / 1000;

I'm using a uBlox 6 GPS receiver and running them in binary mode. This code was from the handling of the UBX-NAV-TIMEGPS message. The TimeOfWeek field is the number of milliseconds since the start of the week of the most recent navigation solution. I had missed this subtlety and had assumed that it was the time of the most recent PPS pulse. It turns out that sometimes the millisecond time of week does not fall on a second boundary but maybe a millisecond or so different. Given that the division does truncation and not rounding, there was my problem. Solution simple.

So, how much difference does it make? A lot. Also, I integrated the latest PPSI servo code, and a 12 hour run looks like:


The blue line is the frequency controlled by the PTP slave, while the red line is the frequency as determined by reference to another GPS receiver. This shows remarkable agreement with only a few excursions.


This shows that the offset of the PTP secondary was controlled well, with only a few excursions to as much as a microsecond. Note that the PTP master is running off a different GPS receiver (general purpose) than the PTP slave. Therefore some differences are to be expected.

Zooming in some more:


For most of the time, the offset was within maybe 400 nanoseconds of the GPS receiver. This seems remarkably good.

Thanks to Pietro for his work on the servo code: recent message.