[LEAPSECS] How USB bugs are reported versus UTC
Warner Losh
imp at bsdimp.com
Fri Aug 23 13:17:47 EDT 2013
On Aug 23, 2013, at 8:25 AM, Rob Seaman wrote:
> On Aug 23, 2013, at 6:39 AM, Warner Losh <imp at bsdimp.com> wrote:
>
>> The fact that this random, seemingly tiny perturbation has caused kernel crashes and hangs is rightly called havoc...
>
> This is hyperbole; the word has meanings like pillage, plunder and devastation. The roughly contemporaneous storm in the mid-Atlantic states caused much more trouble. On the other hand, the Linux bug caused some sysadmins to work overtime. Unfortunate, not "devastating". An operations issue more significant than some, less significant than others.
The relative magnitude of the bug is several times worse than the USB bug. The side effects were kernel crashes or tight loops. The crashes were easy to cope with, but the tight loops caused a large spike in the power utilization of the data center, in some cases tripping breakers taking entire racks offline. The energy had to be generated somewhere... In the jargon of the IT industry, this is considered havoc, even if the literal, non-technical usage means something else.
The USB bug means that you have to plug/replug your keyboard on resume sometimes. It isn't something that randomly takes down an entire datacenter because of an obscure announcement that happened 6 months ago...
> The point was that two similar situations are reported very differently. USB was standardized beginning about 20 years ago. Some projects failed to implement it correctly. This resulted in the headline:
>
> "Misinterpretation of standard probably causing USB disconnects on resume in Linux"
>
> UTC was standardized 40 years ago. Some projects failed to implement it correctly. The hyperbolic headline:
>
> "‘Leap Second’ Bug Wreaks Havoc Across Web"
>
> Even ignoring the hyperbole - much larger disruptions have occurred both before and since for other reasons without similar melodrama - the headline is simply wrong. It was a bug in Linux (or Linux-related software), not a bug in leap seconds. That bug was the result of the same sort of misinterpretation of a standard - a standard that has been in effect for at least twice as long as USB.
Yet, as we've pointed out numerous times, the important computer standards completely ignore it, or insulate the users from the scary leap seconds. This makes it hard to get right systemically because relatively few people take it seriously enough to test their code with leap seconds, to code their code correctly with leap seconds, etc. That's why such a wide-ranging systemic effects can escape into the wild. People do not take the leap second standard seriously in the computer industry, while the USB standard is taken seriously...
So the root cause is the same (not implementing the standard correctly), the effects differ by orders of magnitude in severity. Once the severity crossed a specific level, of course the news papers are going to engage in a bit of hyperbole to sell copy. But the key point here is that it wasn't just pure hype because people love USB and hate UTC: The effects were measurably much worse in one instance than in the other.
If a driver of a car falls asleep, and then wakes up to scraping sounds as they peel a few layers of paint of the side of the car on a guardrail, this might warrant a mention in the local, small-town newspaper. If a driver falls alseep and plows into a parade, killing a couple and injuring dozens then that will be covered in a much more sensationalistic way. The root cause in both cases is the same (falling asleep at the wheel), but since the effects are much more random and severe in the second case it gets much more attention.
Warner
More information about the LEAPSECS
mailing list