檢視NTP的狀態
如何檢視當前當前NTP狀態
ntpstat
>> NODE: 20.20.20.222 << synchronised to local net at stratum 11 time correct to within 10 ms polling server every 16 s >> NODE: 20.20.20.223 << synchronised to NTP server (20.20.20.222) at stratum 12 time correct to within 12 ms polling server every 16 s >> NODE: 20.20.20.224 << synchronised to NTP server (20.20.20.222) at stratum 12 time correct to within 11 ms polling server every 16 s >> NODE: 20.20.20.226 << synchronised to NTP server (20.20.20.222) at stratum 12 time correct to within 12 ms polling server every 16 s >> NODE: 20.20.20.227 << synchronised to NTP server (20.20.20.222) at stratum 12 time correct to within 12 ms polling server every 16 s
首先第一行NTP Server反應了當前機器的NTP Server IP。 第二行反應了時間的偏差。 第三行表示同步的時間頻率。
root@n82-2:~# cat /etc/ntp.conf ... server 20.20.20.222 burst iburst minpoll 4 maxpoll 4 ...
為什麼是每16秒查詢一次? poll 的含義是how frequently to query server (in seconds), 4的含義是2的4次方,即每16秒查詢一次。
除了正常的上述情況外,還有:
>> NODE: 20.20.20.222 << unsynchronised polling server every 8 s >> NODE: 20.20.20.223 << unsynchronised polling server every 8 s >> NODE: 20.20.20.224 << unsynchronised polling server every 8 s
上述情況出現在,剛剛重啟ntpd,尚未同步的階段。
ntpq -p
root@n82-1:/var/log# ntpq -pn remote refid st t when poll reach delay offset jitter ============================================================================== 20.20.20.224 20.20.20.222 12 s 13 16 376 0.142 -0.010 0.202 20.20.20.226 20.20.20.222 12 s 1 16 377 0.114 -0.003 0.076 20.20.20.227 20.20.20.222 12 s 1 16 377 0.150 -0.021 0.020 20.20.20.223 20.20.20.222 12 s 4 16 377 0.136 -0.013 18.163 *127.127.1.0 .LOCL. 10 l 10 16 377 0.000 0.000 0.000 root@n82-1:/var/log#
remote和refid,表明的本機器的遠端NTP Server和該遠端Server的上級 NTP Server
remote and refid : remote NTP server, and its NTP server
按照我們的NTP部署,如果沒有外部的NTP伺服器,我們會選擇 ceph-mon leader節點,作為內部叢集的NTP Server。對於我們本叢集而言:
root@n82-1:/var/log# ceph mon dump dumped monmap epoch 5 epoch 5 fsid bd489dd6-57c1-4878-a279-739624997f24 last_changed 2022-06-09 15:48:42.559018 created 2022-06-09 15:47:51.033044 0: 20.20.20.222:6789/0 mon.mvdfp 1: 20.20.20.223:6789/0 mon.fciae 2: 20.20.20.224:6789/0 mon.vmlcw 3: 20.20.20.226:6789/0 mon.qcdnb 4: 20.20.20.227:6789/0 mon.abtdv
20.20.20.222節點是IP最小的ceph-mon,正常情況下,整個叢集的NTP Server 20.20.20.222。
可是,為什麼在20.20.20.222節點上,執行ntpq -pn指令,remote這一列,會列出來其他儲存節點的IP?
這就不得不提 peer引數了。
peer 20.20.20.224 burst iburst minpoll 4 maxpoll 4
peer中出現的IP和 server 指定的IP,都出現在ntpq -pn的remote列,why?這兩者有什麼區別?
ntpd service requests the time from another server ntpd service exchanges the time with a fellow peer
NTP Server 是有層級的概念的,即配置檔案中的:
server 127.127.1.0 burst iburst minpoll 4 maxpoll 4 fudge 127.127.1.0 stratum 10
stratum 這個值越低,表示越權威。127.127.1.0 表示local本機作為NTP Server,層級一般定為10。 我們看一個叢集內的普通節點:
root@n82-2:~# ntpq -pn remote refid st t when poll reach delay offset jitter ============================================================================== +20.20.20.224 20.20.20.222 12 s 8 16 376 0.163 -0.008 0.562 +20.20.20.226 20.20.20.222 12 s 7 16 377 0.172 0.046 0.074 +20.20.20.227 20.20.20.222 12 s 8 16 377 0.144 0.006 0.167 +20.20.20.222 LOCAL(0) 11 s 1 16 377 0.104 -0.018 0.122 *20.20.20.222 LOCAL(0) 11 u 6 16 377 0.090 0.014 0.584
可以看到20.20.20.222 stratum 層級為11 ,而 出現在peer中的,層級為 12 。
下面介紹下ntpq 各個欄位的含義: poll: 這一行,表示週期,即查詢週期,前面介紹了 minpoll 4 maxpoll 4 即16秒查詢一次。 when: 這一行表示,距離上一次查詢的時間。 reach:這個是個8進位制的表示:
* 377 = 0b1111111, 表示最近8次查詢都成功了 * 376 = 0b11111110 除了最近一次查詢失敗外,其他7次查詢都成功了。 * 257 = 0b10101111 表示,最近的4次都成功了,以最近一次為開始算起,第五次和第七次查詢失敗了
dalay: network round trip time (in milliseconds) 這個是估算的和對應NTP Server(或者peer)之間的網路延遲。因為是虛擬機器,所以延遲是0.1ms以上。 offset: 本機與remote NTP Server (or peer)的時間差異,我看文件已裡面寫的單位是ms,但是我自己判斷單位是秒。 jitter: 這個是抖動的含義,比較高的抖動,表示要麼是remote Server不夠穩定精準,或者是網路條件太差。
- jitter: difference of successive time values from server (high jitter could be due to an unstable clock or, more likely, poor network performance)
如何校驗某個NTP Server是否可用
我們可以通過ntpdate -d指令來校驗,他用來診斷,但是並不會更改本地的時間。
-d Enable the debugging mode, in which ntpdate will go through all the steps, but not adjust the local clock. Information useful for general debugging will also be printed.
如果NTP Server網路可達,並且可以響應我們的查詢,輸入如下:
root@n82-1:/home/btadmin# ntpdate -d 20.20.20.223 25 Jun 15:24:50 ntpdate[919323]: ntpdate [email protected] Tue Jan 7 15:08:24 UTC 2020 (1) Looking for host 20.20.20.223 and service ntp host found : 20.20.20.223 transmit(20.20.20.223) receive(20.20.20.223) transmit(20.20.20.223) receive(20.20.20.223) transmit(20.20.20.223) receive(20.20.20.223) transmit(20.20.20.223) receive(20.20.20.223) server 20.20.20.223, port 123 stratum 11, precision -24, leap 00, trust 000 refid [20.20.20.223], delay 0.02576, dispersion 0.00005 transmitted 4, in filter 4 reference time: e66136bf.6e4bc565 Sat, Jun 25 2022 15:24:47.430 originate timestamp: e66136c9.1d79f7d3 Sat, Jun 25 2022 15:24:57.115 transmit timestamp: e66136c9.1d70de0a Sat, Jun 25 2022 15:24:57.115 filter delay: 0.02602 0.02638 0.02591 0.02576 0.00000 0.00000 0.00000 0.00000 filter offset: -0.00006 -0.00026 0.000019 -0.00001 0.000000 0.000000 0.000000 0.000000 delay 0.02576, dispersion 0.00005 offset -0.000011 25 Jun 15:24:57 ntpdate[919323]: adjust time server 20.20.20.223 offset -0.000011 sec root@n82-1:/home/btadmin# echo $? 0
如果網路不可達,或者不能響應查詢的請求:
root@n82-1:/home/btadmin# ntpdate -d 20.20.20.222 25 Jun 15:28:20 ntpdate[931881]: ntpdate [email protected] Tue Jan 7 15:08:24 UTC 2020 (1) Looking for host 20.20.20.222 and service ntp host found : 20.20.20.222 transmit(20.20.20.222) transmit(20.20.20.222) transmit(20.20.20.222) transmit(20.20.20.222) transmit(20.20.20.222) 20.20.20.222: Server dropped: no data server 20.20.20.222, port 123 stratum 0, precision 0, leap 00, trust 000 refid [20.20.20.222], delay 0.00000, dispersion 64.00000 transmitted 4, in filter 4 reference time: 00000000.00000000 Thu, Feb 7 2036 14:28:16.000 originate timestamp: 00000000.00000000 Thu, Feb 7 2036 14:28:16.000 transmit timestamp: e661379f.36a98426 Sat, Jun 25 2022 15:28:31.213 filter delay: 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 filter offset: 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 delay 0.00000, dispersion 64.00000 offset 0.000000 25 Jun 15:28:33 ntpdate[931881]: no server suitable for synchronization found root@n82-1:/home/btadmin# echo $? 1
NTP和hwclock
最後一個話題就是NTP 與hardware clock了。硬體時鐘和系統時鐘到底啥關係,NTP會在其中發揮什麼樣的影響。
Linux層面有兩個時鐘:
- system clock
- hardware clock
hwclock manual中,有如下一段話:
Automatic Hardware Clock Synchronization by the Kernel You should be aware of another way that the Hardware Clock is kept synchronized in some systems. The Linux kernel has a mode wherein it copies the System Time to the Hardware Clock every 11 minutes. This mode is a compile time option, so not all kernels will have this capability. This is a good mode to use when you are using something sophisticated like NTP to keep your System Clock synchronized. (NTP is a way to keep your System Time synchronized either to a time server somewhere on the network or to a radio clock hooked up to your system. See RFC 1305.) If the kernel is compiled with the '11 minute mode' option it will be active when the kernel's clock discipline is in a synchronized state. When in this state, bit 6 (the bit that is set in the mask 0x0040) of the kernel's time_status variable is unset. This value is output as the 'status' line of the adjtimex --print or ntptime commands. It takes an outside influence, like the NTP daemon ntpd(1), to put the kernel's clock discipline into a synchronized state, and therefore turn on '11 minute mode'. It can be turned off by running anything that sets the System Clock the old fashioned way, including hwclock --hctosys. However, if the NTP daemon is still running, it will turn '11 minute mode' back on again the next time it synchronizes the System Clock. If your system runs with '11 minute mode' on, it may need to use either --hctosys or --systz in a startup script, especially if the Hardware Clock is configured to use the local timescale. Unless the kernel is informed of what timescale the Hardware Clock is using, it may clobber it with the wrong one. The kernel uses UTC by default. The first userspace command to set the System Clock informs the kernel what timescale the Hardware Clock is using. This happens via the persistent_clock_is_local kernel variable. If --hctosys or --systz is the first, it will set this variable according to the adjtime file or the appropriate command-line argument. Note that when using this capability and the Hardware Clock timescale configuration is changed, then a reboot is required to notify the kernel.
大意是說,如果設定了NTP Server,核心每11分鐘會調整硬體時鐘,使其向系統時鐘看起, 這被成為 11 miniute mode。看不看起,也是有條件的。
核心中有兩個相關的配置選項:
- CONFIG_GENERIC_CMOS_UPDATE
- CONFIG_RTC_SYSTOHC
如此外,也要關注NTP的狀態,我們可以通過timedatectl檢視NTP SYNC 狀態:
root@SEG-248-82:/home/btadmin# timedatectl Local time: Sat 2022-06-25 16:07:59 HKT Universal time: Sat 2022-06-25 08:07:59 UTC RTC time: Sat 2022-06-25 08:08:00 Time zone: Asia/Hong_Kong (HKT, +0800) Network time on: yes NTP synchronized: yes RTC in local TZ: no
如果ntp同步時間有異常,硬體時鐘向系統時鐘的同步也會受到影響。