Saturday, March 29, 2014

New servo -- almost a full day

I left the code running for a full day (or so) and this is the result:

The red line is the actual clock frequency offset as determined by the GPS PPS pulses. The blue line is what the PPSI servo is controlling. On average, it does a very good job. However, there are a bunch of excursions that run into the limit (of 500,000 PPB).

This is the same chart, except with auto scaling:

There is clearly something horrible going on. I don't know what!

If I look more closely at these excursions, it appears that the clock is being reset by an amount (up to 0.25 seconds) which is then recovered by adjusting the frequency. What is not clear (at the moment) is whether this is some fault in the servo code, or in the surrounding code that drives the clock.

***UPDATE***

It turns out that there are a number of problems: one of which is that when the PTP master reboots, it sends out the wrong time for a while. Another is that sometimes something happens and the mean path delay to the master goes negative and it gets stuck there.

The PTP master reboots for reasons that I don't understand either. I have a watchdog set to reboot the board if the underlying processing stops. However, there is no easy way to figure out what went wrong in this case. I've added more debugging....