[Users] oVirt Node keeps rebooting
Ayal Baron
abaron at redhat.com
Thu Nov 22 06:46:20 UTC 2012
What type of NICs do you have? (It's a shot in the dark but I know there is an issue with bnx2x driver which causes random reboots which some users have hit).
Can you attach full vdsm.log and spm-lock.log ?
----- Original Message -----
>
> [2012-11-18 15:20:08] Protecting spm lock for vdsm pid 1343
> [2012-11-18 15:20:08] Trying to acquire lease -
> spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99
> lease_file=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> id=1000 lease_time_ms
> =60000 io_op_to_ms=10000
> [2012-11-18 15:20:28] Lease acquired
> spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1000
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases,
> TS=1353270008160373
> [2012-11-18 15:20:28] Protecting spm lock for vdsm pid 1343
> [2012-11-18 15:20:28] Started renewal process (pid=1912) for
> spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1000
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> [2012-11-18 15:20:30] Stopping lease for pool:
> f0071c9b-cbe2-4555-9ae0-279031764a99 pgrps: -1912
> User defined signal 1
> [2012-11-18 15:20:30] releasing lease
> spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1000
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> [2012-11-18 15:20:33] Protecting spm lock for vdsm pid 1343
> [2012-11-18 15:20:33] Trying to acquire lease -
> spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99
> lease_file=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> id=1 lease_time_ms=60
> 000 io_op_to_ms=10000
> [2012-11-18 15:20:53] Lease acquired
> spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases,
> TS=1353270033749998
> [2012-11-18 15:20:53] Protecting spm lock for vdsm pid 1343
> [2012-11-18 15:20:53] Started renewal process (pid=2072) for
> spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> [2012-11-18 16:19:09] Protecting spm lock for vdsm pid 1343
> [2012-11-18 16:19:09] Trying to acquire lease -
> spUUID=8862496a-f326-46cf-8085-7ff982f985da
> lease_file=/rhev/data-center/mnt/10.10.0.200:_iso/8862496a-f326-46cf-8085-7ff982f985da/dom_md/leases
> id=1 lease_
> time_ms=5000 io_op_to_ms=1000
> [2012-11-18 16:19:11] Lease acquired
> spUUID=8862496a-f326-46cf-8085-7ff982f985da id=1
> lease_path=/rhev/data-center/mnt/10.10.0.200:_iso/8862496a-f326-46cf-8085-7ff982f985da/dom_md/leases,
> TS=1353273549413
> 059
> [2012-11-18 16:19:11] Protecting spm lock for vdsm pid 1343
> [2012-11-18 16:19:11] Started renewal process (pid=25101) for
> spUUID=8862496a-f326-46cf-8085-7ff982f985da id=1
> lease_path=/rhev/data-center/mnt/10.10.0.200:_iso/8862496a-f326-46cf-8085-7ff982f985da/dom_md
> /leases
> [2012-11-18 16:19:13] Stopping lease for pool:
> 8862496a-f326-46cf-8085-7ff982f985da pgrps: -25101
> User defined signal 1
> [2012-11-18 16:19:13] releasing lease
> spUUID=8862496a-f326-46cf-8085-7ff982f985da id=1
> lease_path=/rhev/data-center/mnt/10.10.0.200:_iso/8862496a-f326-46cf-8085-7ff982f985da/dom_md/leases
> [2012-11-18 18:51:27] Protecting spm lock for vdsm pid 1495
> [2012-11-18 18:51:27] Trying to acquire lease -
> spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99
> lease_file=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> id=1 lease_time_ms=60
> 000 io_op_to_ms=10000
> [2012-11-18 18:53:47] Lease acquired
> spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases,
> TS=1353282807712736
> [2012-11-18 18:53:47] Protecting spm lock for vdsm pid 1495
> [2012-11-18 18:53:47] Started renewal process (pid=2338) for
> spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> [2012-11-18 20:17:10] Protecting spm lock for vdsm pid 1492
> [2012-11-18 20:17:10] Trying to acquire lease -
> spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99
> lease_file=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> id=1 lease_time_ms=60
> 000 io_op_to_ms=10000
> [2012-11-18 20:19:30] Lease acquired
> spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases,
> TS=1353287950366393
> [2012-11-18 20:19:30] Protecting spm lock for vdsm pid 1492
> [2012-11-18 20:19:30] Started renewal process (pid=2182) for
> spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> [2012-11-19 11:34:48] Protecting spm lock for vdsm pid 1516
> [2012-11-19 11:34:48] Trying to acquire lease -
> spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99
> lease_file=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> id=1 lease_time_ms=60
> 000 io_op_to_ms=10000
> [2012-11-19 11:37:08] Lease acquired
> spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases,
> TS=1353343008455126
> [2012-11-19 11:37:08] Protecting spm lock for vdsm pid 1516
> [2012-11-19 11:37:08] Started renewal process (pid=2184) for
> spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
>
>
> 2012/11/19 Itamar Heim < iheim at redhat.com >
>
>
>
> On 11/19/2012 11:19 PM, EricD wrote:
>
>
> Not sure to understand your question.
>
> Hope this will answer :
> I check under the Host tab.
> I have 2 oVirt Nodes and they both have the SPM value under
> SpmStatus.
>
> i assume they are in different data centers?
> anything in vdsm logs?
> in /var/log/vdsm/spm-lock.log on the rebooting node?
>
>
>
>
>
> 2012/11/19 Itamar Heim < iheim at redhat.com <mailto: iheim at redhat.com
> >>
>
>
> On 11/19/2012 07:31 PM, EricD wrote:
>
> *One of my oVirt Node keeps rebooting since I join the node to
> oVirt.
>
>
> Here what I see if I issue TOP or IOTOP
>
> There is a lot of :
> *
> qemu-kvm -S -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0*
>
>
>
>
> I don't have that much activity on the other oVirt node.
>
> What do you suggest to verify ?
>
>
> $ top*
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> COMMAND
> 24369 qemu 20 0 7108m 2.1g 10m S 86.5 13.4 16:21.26
> qemu-kvm
> 23852 qemu 20 0 5621m 805m 10m S 1.3 5.0 0:40.05
> qemu-kvm
> 23617 qemu 20 0 5114m 320m 10m S 1.0 2.0 0:25.69
> qemu-kvm
> 1516 vdsm 15 -5 1911m 39m 7736 S 0.7 0.2 0:21.12 vdsm
> 1127 root 20 0 1014m 14m 7716 S 0.3 0.1 0:03.30
> libvirtd
> 16141 root 20 0 15380 1404 936 R 0.3 0.0 0:00.10 top
> 1 root 20 0 65680 27m 2052 S 0.0 0.2 0:01.34
> systemd
>
> *
> $ iotop*
>
> Total DISK READ: 14.87 M/s | Total DISK WRITE: 8.79 K/s
> TID PRIO USER DISK READ DISK WRITE SWAPIN IO>
> COMMAND
> 29276 be/4 qemu 156.34 K/s 0.00 B/s 0.00 % 3.70 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29118 be/4 qemu 375.21 K/s 0.00 B/s 0.00 % 2.63 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29322 be/4 qemu 203.24 K/s 0.00 B/s 0.00 % 2.62 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29263 be/4 qemu 250.14 K/s 0.00 B/s 0.00 % 2.58 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29273 be/4 qemu 250.14 K/s 0.00 B/s 0.00 % 2.40 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29262 be/4 qemu 312.68 K/s 0.00 B/s 0.00 % 2.15 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29274 be/4 qemu 70.35 K/s 0.00 B/s 0.00 % 1.91 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 28022 be/4 qemu 297.04 K/s 0.00 B/s 0.00 % 1.82 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29298 be/4 qemu 171.97 K/s 0.00 B/s 0.00 % 1.79 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29324 be/4 qemu 281.41 K/s 0.00 B/s 0.00 % 1.78 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29117 be/4 qemu 187.61 K/s 0.00 B/s 0.00 % 1.62 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 28024 be/4 qemu 379.12 K/s 0.00 B/s 0.00 % 1.49 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29129 be/4 qemu 175.88 K/s 0.00 B/s 0.00 % 1.31 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 25711 be/4 qemu 328.31 K/s 0.00 B/s 0.00 % 1.22 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29127 be/4 qemu 297.04 K/s 0.00 B/s 0.00 % 1.19 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29261 be/4 qemu 351.76 K/s 0.00 B/s 0.00 % 1.16 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29116 be/4 qemu 328.31 K/s 0.00 B/s 0.00 % 1.16 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29302 be/4 qemu 390.85 K/s 0.00 B/s 0.00 % 1.16 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29277 be/4 qemu 281.41 K/s 0.00 B/s 0.00 % 1.12 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29122 be/4 qemu 93.80 K/s 0.00 B/s 0.00 % 1.10 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29260 be/4 qemu 265.78 K/s 0.00 B/s 0.00 % 1.10 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29279 be/4 qemu 285.32 K/s 0.00 B/s 0.00 % 1.09 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29112 be/4 qemu 343.95 K/s 0.00 B/s 0.00 % 1.08 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29272 be/4 qemu 187.61 K/s 0.00 B/s 0.00 % 1.04 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29264 be/4 qemu 179.79 K/s 0.00 B/s 0.00 % 0.94 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29269 be/4 qemu 171.97 K/s 0.00 B/s 0.00 % 0.93 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29303 be/4 qemu 171.97 K/s 0.00 B/s 0.00 % 0.92 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29121 be/4 qemu 254.05 K/s 0.00 B/s 0.00 % 0.87 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29271 be/4 qemu 226.69 K/s 0.00 B/s 0.00 % 0.85 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29280 be/4 qemu 218.87 K/s 0.00 B/s 0.00 % 0.84 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 26280 be/4 qemu 250.14 K/s 0.00 B/s 0.00 % 0.74 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29321 be/4 qemu 156.34 K/s 7.82 K/s 0.00 % 0.70 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 26281 be/4 qemu 234.51 K/s 0.00 B/s 0.00 % 0.69 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29265 be/4 qemu 297.04 K/s 0.00 B/s 0.00 % 0.65 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29268 be/4 qemu 125.07 K/s 0.00 B/s 0.00 % 0.60 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29323 be/4 qemu 332.22 K/s 0.00 B/s 0.00 % 0.58 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29111 be/4 qemu 316.59 K/s 0.00 B/s 0.00 % 0.53 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
> 29126 be/4 qemu 187.61 K/s 0.00 B/s 0.00 % 0.43 %
> qemu-kvm -S
> -M pc-0.14 -cpu
> kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> pci.0,addr=0x6
>
>
> ______________________________ ___________________
> Users mailing list
> Users at ovirt.org <mailto: Users at ovirt.org >
> http://lists.ovirt.org/__ mailman/listinfo/users
>
> < http://lists.ovirt.org/ mailman/listinfo/users >
>
>
> does the rebooting node happen to be the spm node?
>
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
More information about the Users
mailing list