strange NTP problem

newer
Fwd: Gnome-shell very slow on boot

Russell Coker

27 Nov 2012 27 Nov '12

12:02 p.m.

# ntpq ntpq> lpeer remote refid st t when poll reach delay offset jitter ============================================================================== foo 98.143.152.5 3 - 83d 1024 0 47.125 4.906 0.000 10.1.2.3 116.66.160.39 3 - 83d 1024 0 50.747 -2.829 0.000 resolv.internod 210.9.192.50 2 - 83d 1024 0 29.030 -3.233 0.000 The NTP server in my home stopped working, the above is what I saw when I queried it. Why would this happen? I don't think it was non-functional for 83 days, but then I don't generally check it that often and the system had 98 days of uptime. I've just rebooted it (so debugging the process state won't be possible). There was a new kernel to install. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Show replies by date

Michael Lindner

27 Nov 27 Nov

12:26 p.m.

There was something recently in the IT news about a problem with NTP servers, someone hacked the master server or something like that. NTP works a bit like DNS, so if the root server has a problem all the others follow. Not sure if this was your problem, but the solution was something like (perhaps) choose a different NTP server and (definitely) restart NTP - does this help? Might need to explicitly run it from the command line once to get it to sync if it's really out, using a "don't worry what diff" flag setting. Apologies for the vagueness, bit late for me to look it up properly :-D Mike. On 27/11/12 23:02, Russell Coker wrote:

...

# ntpq ntpq> lpeer remote refid st t when poll reach delay offset jitter ============================================================================== foo 98.143.152.5 3 - 83d 1024 0 47.125 4.906 0.000 10.1.2.3 116.66.160.39 3 - 83d 1024 0 50.747 -2.829 0.000 resolv.internod 210.9.192.50 2 - 83d 1024 0 29.030 -3.233 0.000

The NTP server in my home stopped working, the above is what I saw when I queried it. Why would this happen?

I don't think it was non-functional for 83 days, but then I don't generally check it that often and the system had 98 days of uptime.

I've just rebooted it (so debugging the process state won't be possible). There was a new kernel to install.

Russell Coker

12:48 p.m.

On Tue, 27 Nov 2012, Michael Lindner <michael@tropyx.com> wrote:

...

There was something recently in the IT news about a problem with NTP servers, someone hacked the master server or something like that. NTP works a bit like DNS, so if the root server has a problem all the others follow.

Thanks for the suggestion, but NTP doesn't work like DNS in that way. With DNS you have a list of root servers which have the addresses for Top Level Domain (TLD) servers such as "au" and "com", those servers then have the addresses for domains under them. With NTP you just have an IP address of a server to talk to, you shouldn't need anything else. If you specify multiple servers then the one which has the lowest stratum (the lowest number of hops from an atomic clock) will be chosen. I've idly considered hooking up a GPS receiver to my PC and making it a stratum 1 server.

...

Not sure if this was your problem, but the solution was something like (perhaps) choose a different NTP server and (definitely) restart NTP - does this help?

Restarting the ntpd made the problem go away. But it would be nice to know why it happened and how to prevent it from happening in future. It's not fun to have this problem become noticable when "make" reports timestamp errors when building things from the NFS server.

...

Might need to explicitly run it from the command line once to get it to sync if it's really out, using a "don't worry what diff" flag setting. Apologies for the vagueness, bit late for me to look it up properly :-D

ntpdate is the way to directly set it from the command-line. But I prefer to just start it and let it do it's thing. If the difference is less than 20 minutes then it can manage it by itself. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Peter Ross

11:06 p.m.

Hi Russell, On Tue, 27 Nov 2012, Russell Coker wrote:

...

But I prefer to just start it [ntpd] and let it do it's thing. If the difference is less than 20 minutes then it can manage it by itself.

Is your ntpd server inside a virtual machine? In "virtualland" time is a problem. In short, there is only one timer in hardware, and the virtual machine may lose ticks if machines are busy with something else. If "ntp time" and "system time" differ too much, ntpd stops synchronizing, and the gap widens. I don't have the problem at the moment (mostly using FreeBSD jails sharing the same kernel) so I forgot details but I can dig it up if you need it. Otherwise it may give you the right direction, I hope. Although, I have one VirtualBox left, and that was out of time a fortnight ago - most likely because I updated the kernel, and forgot to update the VirtualBox guest additions so they weren't loaded. Regards Peter

Russell Coker

11:57 p.m.

On Wed, 28 Nov 2012, Peter Ross <Peter.Ross@bogen.in-berlin.de> wrote:

...

On Tue, 27 Nov 2012, Russell Coker wrote:

...
But I prefer to just start it [ntpd] and let it do it's thing. If the difference is less than 20 minutes then it can manage it by itself.

Is your ntpd server inside a virtual machine?

No. Just a plain old P3 desktop system that's been running for years as a router.

...

In "virtualland" time is a problem. In short, there is only one timer in hardware, and the virtual machine may lose ticks if machines are busy with something else.

If "ntp time" and "system time" differ too much, ntpd stops synchronizing, and the gap widens.

That's about 20 minutes from memory, you need some severe problems to get to that stage. For my virtual servers I generally find that it's a few seconds of inaccuracy per day and I run ntpdate from a daily cron job to fix it. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

trentbuck＠gmail.com

11:42 p.m.

Russell Coker writes:

...

I've idly considered hooking up a GPS receiver to my PC and making it a stratum 1 server.

If you do, I would appreciate a brief synopsis of how you bolted the bits together, and any hurdles you had to jump over. I was looking at doing the same for an airgapped site a couple of years ago, but management instead opted for telling the master server there to just lie about its stratum (so that the desktops would sync off it).

hannah commodore

10:16 p.m.

On 27/11/2012, at 23:26, Michael Lindner <michael@tropyx.com> wrote:

...

There was something recently in the IT news about a problem with NTP servers, someone hacked the master server or something like that.

no no no. it was a stratum 0 server that rebooted and its clock reset to year 2000. it was sending this time out to higher stratum severs, but it all auto corrected itself within a few minutes. IIRC it was a navy.mil server

Michael Lindner

1 Dec 1 Dec

1:47 a.m.

yep, something like that. wasn't really interested. On 28/11/12 09:16, hannah commodore wrote:

...

On 27/11/2012, at 23:26, Michael Lindner <michael@tropyx.com> wrote:

...
There was something recently in the IT news about a problem with NTP servers, someone hacked the master server or something like that. no no no. it was a stratum 0 server that rebooted and its clock reset to year 2000. it was sending this time out to higher stratum severs, but it all auto corrected itself within a few minutes. IIRC it was a navy.mil server

Russell Coker

2:27 a.m.

On Sat, 1 Dec 2012, Michael Lindner <michael@tropyx.com> wrote:

...

yep, something like that. wasn't really interested.

The interesting thing about this is that ntpd is specifically designed to avoid problems in such situations. If a server is more than about 20 minutes out then the ntpd will never sync to it, not even if there are no other servers. If the stratum 0 server in question happened to push the wrong time to stratum 1 servers then it would be an interesting situation and I'd like to know how that happened. A server getting the wrong year isn't THAT uncommon, in fact it's what you expect in the case of a power outage combined with a motherboard battery failiure and some other hardware issues. Not to mention a typo when manually setting the date (not that anyone should be manually setting a stratum 0 server). Now if you use ntpdate from a cron job then you can have issues in this regard. I run some embedded systems that use ntpdate to save system resources and because they are expected to be in full operation soon after boot (sooner than ntpd can sync), but as they reboot at least once a day there is only the risk of one day's data being lost. I also run ntpdate from Xen DomUs, but as they get the date from the Dom0 there will probably be more serious problems if the Dom0 gets the time very wrong.

...

On 28/11/12 09:16, hannah commodore wrote:

...
On 27/11/2012, at 23:26, Michael Lindner <michael@tropyx.com> wrote:

...
There was something recently in the IT news about a problem with NTP servers, someone hacked the master server or something like that.

no no no. it was a stratum 0 server that rebooted and its clock reset to year 2000. it was sending this time out to higher stratum severs, but it all auto corrected itself within a few minutes. IIRC it was a navy.mil server

-- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

4595

Age (days ago)

4599

Last active (days ago)

List overview

Download

8 comments

5 participants

participants (5)

hannah commodore
Michael Lindner
Peter Ross
Russell Coker
trentbuck＠gmail.com

strange NTP problem

tags

participants (5)