
On 15 January 2012 16:09, Andrew Worsley <amworsley@gmail.com> wrote:
I am having periodic ntp synchronisation problems.
Apologies for digging up such an old thread, and thanks for reporting your eventual success to the list. This caught my interest whilst trawling the archive, and is unrelated to the recent leap second, but I've just got a few questions regarding the root-cause... <...>
Here's the output of commands when things are bad:
config(0)# check_ntp_peer -H 127.0.0.1 -w 1.0 -c 2.0 NTP WARNING: Server has the LI_ALARM bit set, Offset 0.210925 secs|offset=0.210925s;1.000000;2.000000;
LI_ALARM apparently means not in sync ???
This is actually the "leap indicator", also verified by the ntpq output below, (via "leap_alarm" and "leap=11"), designed to notify clients of an impending leap second; consequently most NTP clients will exclude that reference as a potential time source. I'm not even sure why this was set? As the previous leap second was on Dec 31 2008
config(1)# ntpq -c rl associd=0 status=c618 leap_alarm, sync_ntp, 1 event, no_sys_peer, version="ntpd 4.2.6p2@1.2194-o Sun Oct 17 13:35:13 UTC 2010 (1)", processor="x86_64", system="Linux/2.6.32-5-amd64", leap=11, stratum=3, precision=-23, rootdelay=95.696, rootdisp=263.117, refid=192.189.54.33, reftime=d2bccf2f.4854b34f Sun, Jan 15 2012 15:06:07.282, clock=d2bcd1d6.e9ffea4c Sun, Jan 15 2012 15:17:26.914, peer=16519, tc=10, mintc=3, offset=0.000, frequency=500.000, sys_jitter=35.804, clk_jitter=0.000, clk_wander=91.828
As for the frequency error greater than 500PPM (~ 43.2 sec / day), that's incredibly bad quality clock! (It also wanders pretty badly too!)
It thinks it's in error by 16s???
config(0)# ntpdc -c kerninfo
<...>
ntptime gives the same info
config(0)# ntptime ntp_gettime() returns code 5 (ERROR)
<...> No, it's just not synchronised!
Then mysertiously everything is okay:
config(0)# ntpdc -c kerninfo pll offset: 0.00998 s pll frequency: 500.000 ppm maximum error: 1.6291 s estimated error: 0.004771 s status: 0001 pll pll time constant: 10 precision: 1e-06 s frequency tolerance: 500 ppm
My leap becomes none (no leap_alarm) and things are ok?
Things are not OK, seeing as "no_sys_peer" flag is still set, you've still not synchronised... (albeit you seeing an offset at least)
config(0)# ntpq -c rl associd=0 status=0618 leap_none, sync_ntp, 1 event, no_sys_peer, version="ntpd 4.2.6p2@1.2194-o Sun Oct 17 13:35:13 UTC 2010 (1)", processor="x86_64", system="Linux/2.6.32-5-amd64", leap=00, stratum=3, precision=-23, rootdelay=95.272, rootdisp=983.007, refid=192.189.54.33, reftime=d2bcd33b.bbc10580 Sun, Jan 15 2012 15:23:23.733, clock=d2bcd852.ec6be1b9 Sun, Jan 15 2012 15:45:06.923, peer=16519, tc=10, mintc=3, offset=13.497, frequency=500.000, sys_jitter=7.251, clk_jitter=4.772, clk_wander=151.809
Aside from the bogus leap indicator, I'm curious as to what; a) hardware you're using (i.e. cat /proc/cpuinfo), as a modern chipset's TSC should be immune from CPU throttling b) kernel parameters you're passing at boot (if any) in particular any of; clock/clocksource/notsc, noapic/noalpci/acpi c) pool of ntp servers you have configured, and what options in particular, any of; burst, iburst, minpoll, maxpoll? On 18 January 2012 22:24, Andrew Worsley <amworsley@gmail.com> wrote:
My problem appears to be solved. It's been nearly 24 hours and ntp is latched very well - 2-3ms offset all day! <...>
Are you sure you're actually synchronised? Double-check the output of "ntpq -p"
I think actually adjtimex may take a while to cause an effect so I am not sure if I am waiting long enough or undoing the effect of the previous one.
Yup, for instance adjtime(2) on standard Linux will _slew_ 0.5ms per second <...>
It might have worked even simpler if I just followed the instructions, stopped ntpd, removed the drift file, and ran ntpdate every 10mins ntpdate -s -b ntpserver which will set the time instantly
Yup, will use settimeofday(2) to _step_ the clock instead <...>
Also the above link mentions some interesting issues about the clock source - it found the hpet clock source was 10x better than tsc. e.g. - cat /sys/devices/system/clocksource/clocksource0/available_clocksource echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource
I haven't tried that because I am very happy with things currently.
<...> I would recommend using the most accurate system clock available to you, and if that fails, perhaps increasing the ntp poll interval... otherwise regulating via adjtimex seems like an unnecessary kludge (pretty sure I've seen this discouraged somewhere too, probably NTP doco) -- Joel Shea <jwshea@gmail.com>