[LEAPSECS] Leap seconds ain't broken, but most implementations are broken

Thu Jan 5 09:34:20 EST 2017

On 2017-01-05 05:56 AM, Tony Finch wrote:
> Martin Burnicki <martin.burnicki at burnicki.net> wrote:
>> Please note that NTP servers not necessarily need to be providers for
>> leap second files. There are some well known sites which provide this
>> file, and the NTP software package from ntp.org comes with a script
>> which can be used to update the file automatically.
> I was thinking more that an NTP client or server would use its leapseconds
> file for validating LI bits from servers and for determining when they
> should leap.
>
> My thinking is that routine software patching and security updates happen
> often enough that maybe NTP can get leap second more reliably out-of-band
> instead of using in-band leap indicators from upstream servers.
> Lower-stratum devices could use their own leap second information to
> correct for operational or implementation errors upstream.
>
>> The potential approach with tzdist or special DNS allowed for a
>> distributed system, where the special DNS can only provide leap second
>> warning and the current TAI offset, while tzdist also provides the leap
>> second history, and a way to update time zone rules, so it could be
>> generally used to keep also conversion to local time correct.
> Oh, I forgot about the DNS publication scheme. That would also help a lot
> if it were implemented. And maybe better than relying on sufficiently
> frequent software updates.

So, thing is, since 1972, no common and official way to automatically 
obtain the Leap Seconds information has been adopted. Its an obvious 
missing link, missing for 4 decades! I'd find this just incredible 
except I've now come accept this sort of frustration where timekeeping 
is concerned. This has been discussed many times on LEAPSECS.

Ideally there would be one, and only one, TAI-UTC table, in some very 
well specified form, residing somewhere in cyberspace, administered and 
maintained by official rules and regulations by the IERS. There then 
could be many API's via many technologies to access the information.

Today, there are many ways to get it, but they are all in different 
forms, not always so well specified, and all require some human 
intervention or oversight somewhere between the IERS announcements and 
the distribution. None of them are particularly "fast", imposing too 
much overhead on the receiver in some circumstances. And "many ways to 
get it" does not inspire confidence that each implementation will get 
everything the same.

I like the DNS publication scheme because you could imagine that IANA 
could take responsibility for maintaining it, especially if there were 
an official way to keep it automatically updated from a hypothetical 
official IERS source.

One could imagine, and it would be straight forward, to have NTP servers 
provide the TAI-UTC table, announcements, and expiration via the same 
IPC transport mechanisms used by NTP. Again, hopefully, updated from a 
hypothetical official IERS source.

I've had the thought that Block-chain would be a good way to do it. It 
would have all the purported anonymity, security, and persistence 
qualities of Block-chain. In such a scheme, only IERS could make updates 
to it and everybody else could read it.

It makes sense that the Leap Second Table be combined with time zones, 
as it is in Tz Database, because you really need all the local time 
information together with Leap Seconds to achieve comprehensive 
timekeeping. It occurs to me Tz Database could also be maintained as 
Block-chain.

The typical methods used in NTP, GPS, and PTP of distributing only the 
upcoming Leap Second announcement has always seemed fragile and 
incomplete to me. The lack of reliable automatic information from IERS 
means some human intervention must occur in the announcements, and the 
information is incomplete, only communicating the immediate upcoming 
current Leap Second. Many systems will need to go elsewhere to get the 
full table and expiration, and this leads to possible mismatches between 
information sources, as noted in this thread.

Meantime, all this is happening in an environment where the underlying 
specifications are difficult to understand, in some respects possibly 
controversial, and the de facto standards of Tz Database are unofficial 
and don't match Microsoft's world view. I think we really need to go 
back to the top and consolidate the specifications so we agree in detail 
what-all we're trying to accomplish.

-Brooks

>
>> Comparing to your example with DNS: If a root server has a software bug
>> which lets it deliver a wrong IP address, how should your local DNS
>> resolver detect this?
> My analogy was more along the lines of, when a root server IP address
> changes, the DNS server notices and logs that it has out of date hints.
> It's not a great analogy though, because if the DNS server has the wrong
> data about one root server, it can recover using the other 12, but if an
> NTP server has wrong data about the next leap second, it's screwed.
>
>>> I wonder if it would be better to set the leap indicator bits to NOSYNC if
>>> the configured leap seconds file has expired.
>> Sounds good at the first glance, but I think this would cause much bad
>> surprise if you have a company network and suddenly all NTP clients stop
>> to be synchronized.
> Well, at that point (end of June or December) they don't know if there
> should be a leap second or not, so they can't reliably tell the right
> time.
>
> Maybe they should fall back to relying on leap indicators from upstream,
> but they need some way to make it obvious they might fail.
>
>> The basic problem is more with a stratum-1 server which in many cases
>> gets its time only from a GPS receiver. If the GPS receiver provides
>> faulty leap second information then the NTP server can hardly detect
>> this. Even if it has a current leap second file this wouldn't work.
> I don't understand. If it has a current leap second file, it can use that
> to detect that its GPS receiver screwed up the leap second. It should then
> go NOSYNC.
>
>> For a pure client there should be no problem if the client has several
>> upstream servers configured. Before the leap second, the NTP daemon
>> accepts a leap second warning only if a majority of the configured
>> upstream servers provide this warning. However, the time from the faulty
>> server is still correct. Otherwise it wouldn't have been classified as
>> good candidate.
>>
>> When the leap second occurs then all upstream servers as well as our
>> server insert the leap second, but faulty servers don't. So the faulty
>> servers which haven't done the leap second are off by 1 s afterwards and
>> are *then* classified as false tickers.
>>
>> Of course this also doesn't work correctly if the *majority* of the
>> configured upstream servers get the leap second wrong, but in the past
>> we have seen that fortunately most public servers get it right.
> That seems fairly reasonable, but maybe you could make it more reliable
> using a leap seconds file to deal with cases when too many of the
> upstreams are wrong.
>
> Tony.