[Users] oVirt Node keeps rebooting

EricD desrce at gmail.com
Thu Nov 22 13:36:22 UTC 2012


*vdsm.log is too big (14Mo)*
*spm-lock.log is attached to the email

$ lspci  | grep -i ether
*00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network
Connection (rev 05)
03:00.0 Ethernet controller: D-Link System Inc RTL8139 Ethernet (rev 10)*

*



2012/11/22 Ayal Baron <abaron at redhat.com>

> What type of NICs do you have? (It's a shot in the dark but I know there
> is an issue with bnx2x driver which causes random reboots which some users
> have hit).
> Can you attach full vdsm.log and spm-lock.log ?
>
> ----- Original Message -----
> >
> > [2012-11-18 15:20:08] Protecting spm lock for vdsm pid 1343
> > [2012-11-18 15:20:08] Trying to acquire lease -
> > spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99
> >
> lease_file=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> > id=1000 lease_time_ms
> > =60000 io_op_to_ms=10000
> > [2012-11-18 15:20:28] Lease acquired
> > spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1000
> >
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases,
> > TS=1353270008160373
> > [2012-11-18 15:20:28] Protecting spm lock for vdsm pid 1343
> > [2012-11-18 15:20:28] Started renewal process (pid=1912) for
> > spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1000
> >
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> > [2012-11-18 15:20:30] Stopping lease for pool:
> > f0071c9b-cbe2-4555-9ae0-279031764a99 pgrps: -1912
> > User defined signal 1
> > [2012-11-18 15:20:30] releasing lease
> > spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1000
> >
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> > [2012-11-18 15:20:33] Protecting spm lock for vdsm pid 1343
> > [2012-11-18 15:20:33] Trying to acquire lease -
> > spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99
> >
> lease_file=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> > id=1 lease_time_ms=60
> > 000 io_op_to_ms=10000
> > [2012-11-18 15:20:53] Lease acquired
> > spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1
> >
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases,
> > TS=1353270033749998
> > [2012-11-18 15:20:53] Protecting spm lock for vdsm pid 1343
> > [2012-11-18 15:20:53] Started renewal process (pid=2072) for
> > spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1
> >
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> > [2012-11-18 16:19:09] Protecting spm lock for vdsm pid 1343
> > [2012-11-18 16:19:09] Trying to acquire lease -
> > spUUID=8862496a-f326-46cf-8085-7ff982f985da
> > lease_file=/rhev/data-center/mnt/10.10.0.200:
> _iso/8862496a-f326-46cf-8085-7ff982f985da/dom_md/leases
> > id=1 lease_
> > time_ms=5000 io_op_to_ms=1000
> > [2012-11-18 16:19:11] Lease acquired
> > spUUID=8862496a-f326-46cf-8085-7ff982f985da id=1
> > lease_path=/rhev/data-center/mnt/10.10.0.200:
> _iso/8862496a-f326-46cf-8085-7ff982f985da/dom_md/leases,
> > TS=1353273549413
> > 059
> > [2012-11-18 16:19:11] Protecting spm lock for vdsm pid 1343
> > [2012-11-18 16:19:11] Started renewal process (pid=25101) for
> > spUUID=8862496a-f326-46cf-8085-7ff982f985da id=1
> > lease_path=/rhev/data-center/mnt/10.10.0.200:
> _iso/8862496a-f326-46cf-8085-7ff982f985da/dom_md
> > /leases
> > [2012-11-18 16:19:13] Stopping lease for pool:
> > 8862496a-f326-46cf-8085-7ff982f985da pgrps: -25101
> > User defined signal 1
> > [2012-11-18 16:19:13] releasing lease
> > spUUID=8862496a-f326-46cf-8085-7ff982f985da id=1
> > lease_path=/rhev/data-center/mnt/10.10.0.200:
> _iso/8862496a-f326-46cf-8085-7ff982f985da/dom_md/leases
> > [2012-11-18 18:51:27] Protecting spm lock for vdsm pid 1495
> > [2012-11-18 18:51:27] Trying to acquire lease -
> > spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99
> >
> lease_file=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> > id=1 lease_time_ms=60
> > 000 io_op_to_ms=10000
> > [2012-11-18 18:53:47] Lease acquired
> > spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1
> >
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases,
> > TS=1353282807712736
> > [2012-11-18 18:53:47] Protecting spm lock for vdsm pid 1495
> > [2012-11-18 18:53:47] Started renewal process (pid=2338) for
> > spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1
> >
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> > [2012-11-18 20:17:10] Protecting spm lock for vdsm pid 1492
> > [2012-11-18 20:17:10] Trying to acquire lease -
> > spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99
> >
> lease_file=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> > id=1 lease_time_ms=60
> > 000 io_op_to_ms=10000
> > [2012-11-18 20:19:30] Lease acquired
> > spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1
> >
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases,
> > TS=1353287950366393
> > [2012-11-18 20:19:30] Protecting spm lock for vdsm pid 1492
> > [2012-11-18 20:19:30] Started renewal process (pid=2182) for
> > spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1
> >
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> > [2012-11-19 11:34:48] Protecting spm lock for vdsm pid 1516
> > [2012-11-19 11:34:48] Trying to acquire lease -
> > spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99
> >
> lease_file=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> > id=1 lease_time_ms=60
> > 000 io_op_to_ms=10000
> > [2012-11-19 11:37:08] Lease acquired
> > spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1
> >
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases,
> > TS=1353343008455126
> > [2012-11-19 11:37:08] Protecting spm lock for vdsm pid 1516
> > [2012-11-19 11:37:08] Started renewal process (pid=2184) for
> > spUUID=f0071c9b-cbe2-4555-9ae0-279031764a99 id=1
> >
> lease_path=/rhev/data-center/mnt/_ovirt/f0071c9b-cbe2-4555-9ae0-279031764a99/dom_md/leases
> >
> >
> > 2012/11/19 Itamar Heim < iheim at redhat.com >
> >
> >
> >
> > On 11/19/2012 11:19 PM, EricD wrote:
> >
> >
> > Not sure to understand your question.
> >
> > Hope this will answer :
> > I check under the Host tab.
> > I have 2 oVirt Nodes and they both have the SPM value under
> > SpmStatus.
> >
> > i assume they are in different data centers?
> > anything in vdsm logs?
> > in /var/log/vdsm/spm-lock.log on the rebooting node?
> >
> >
> >
> >
> >
> > 2012/11/19 Itamar Heim < iheim at redhat.com <mailto: iheim at redhat.com
> > >>
> >
> >
> > On 11/19/2012 07:31 PM, EricD wrote:
> >
> > *One of my oVirt Node keeps rebooting since I join the node to
> > oVirt.
> >
> >
> > Here what I see if I issue TOP or IOTOP
> >
> > There is a lot of :
> > *
> > qemu-kvm -S -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0*
> >
> >
> >
> >
> > I don't have that much activity on the other oVirt node.
> >
> > What do you suggest to verify ?
> >
> >
> > $ top*
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> > COMMAND
> > 24369 qemu 20 0 7108m 2.1g 10m S 86.5 13.4 16:21.26
> > qemu-kvm
> > 23852 qemu 20 0 5621m 805m 10m S 1.3 5.0 0:40.05
> > qemu-kvm
> > 23617 qemu 20 0 5114m 320m 10m S 1.0 2.0 0:25.69
> > qemu-kvm
> > 1516 vdsm 15 -5 1911m 39m 7736 S 0.7 0.2 0:21.12 vdsm
> > 1127 root 20 0 1014m 14m 7716 S 0.3 0.1 0:03.30
> > libvirtd
> > 16141 root 20 0 15380 1404 936 R 0.3 0.0 0:00.10 top
> > 1 root 20 0 65680 27m 2052 S 0.0 0.2 0:01.34
> > systemd
> >
> > *
> > $ iotop*
> >
> > Total DISK READ: 14.87 M/s | Total DISK WRITE: 8.79 K/s
> > TID PRIO USER DISK READ DISK WRITE SWAPIN IO>
> > COMMAND
> > 29276 be/4 qemu 156.34 K/s 0.00 B/s 0.00 % 3.70 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29118 be/4 qemu 375.21 K/s 0.00 B/s 0.00 % 2.63 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29322 be/4 qemu 203.24 K/s 0.00 B/s 0.00 % 2.62 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29263 be/4 qemu 250.14 K/s 0.00 B/s 0.00 % 2.58 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29273 be/4 qemu 250.14 K/s 0.00 B/s 0.00 % 2.40 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29262 be/4 qemu 312.68 K/s 0.00 B/s 0.00 % 2.15 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29274 be/4 qemu 70.35 K/s 0.00 B/s 0.00 % 1.91 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 28022 be/4 qemu 297.04 K/s 0.00 B/s 0.00 % 1.82 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29298 be/4 qemu 171.97 K/s 0.00 B/s 0.00 % 1.79 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29324 be/4 qemu 281.41 K/s 0.00 B/s 0.00 % 1.78 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29117 be/4 qemu 187.61 K/s 0.00 B/s 0.00 % 1.62 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 28024 be/4 qemu 379.12 K/s 0.00 B/s 0.00 % 1.49 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29129 be/4 qemu 175.88 K/s 0.00 B/s 0.00 % 1.31 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 25711 be/4 qemu 328.31 K/s 0.00 B/s 0.00 % 1.22 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29127 be/4 qemu 297.04 K/s 0.00 B/s 0.00 % 1.19 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29261 be/4 qemu 351.76 K/s 0.00 B/s 0.00 % 1.16 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29116 be/4 qemu 328.31 K/s 0.00 B/s 0.00 % 1.16 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29302 be/4 qemu 390.85 K/s 0.00 B/s 0.00 % 1.16 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29277 be/4 qemu 281.41 K/s 0.00 B/s 0.00 % 1.12 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29122 be/4 qemu 93.80 K/s 0.00 B/s 0.00 % 1.10 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29260 be/4 qemu 265.78 K/s 0.00 B/s 0.00 % 1.10 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29279 be/4 qemu 285.32 K/s 0.00 B/s 0.00 % 1.09 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29112 be/4 qemu 343.95 K/s 0.00 B/s 0.00 % 1.08 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29272 be/4 qemu 187.61 K/s 0.00 B/s 0.00 % 1.04 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29264 be/4 qemu 179.79 K/s 0.00 B/s 0.00 % 0.94 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29269 be/4 qemu 171.97 K/s 0.00 B/s 0.00 % 0.93 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29303 be/4 qemu 171.97 K/s 0.00 B/s 0.00 % 0.92 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29121 be/4 qemu 254.05 K/s 0.00 B/s 0.00 % 0.87 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29271 be/4 qemu 226.69 K/s 0.00 B/s 0.00 % 0.85 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29280 be/4 qemu 218.87 K/s 0.00 B/s 0.00 % 0.84 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 26280 be/4 qemu 250.14 K/s 0.00 B/s 0.00 % 0.74 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29321 be/4 qemu 156.34 K/s 7.82 K/s 0.00 % 0.70 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 26281 be/4 qemu 234.51 K/s 0.00 B/s 0.00 % 0.69 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29265 be/4 qemu 297.04 K/s 0.00 B/s 0.00 % 0.65 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29268 be/4 qemu 125.07 K/s 0.00 B/s 0.00 % 0.60 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29323 be/4 qemu 332.22 K/s 0.00 B/s 0.00 % 0.58 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29111 be/4 qemu 316.59 K/s 0.00 B/s 0.00 % 0.53 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> > 29126 be/4 qemu 187.61 K/s 0.00 B/s 0.00 % 0.43 %
> > qemu-kvm -S
> > -M pc-0.14 -cpu
> > kvm64,+lahf_lm,+ssse~irtio-__ balloon-pci,id=balloon0,bus=__
> > pci.0,addr=0x6
> >
> >
> > ______________________________ ___________________
> > Users mailing list
> > Users at ovirt.org <mailto: Users at ovirt.org >
> > http://lists.ovirt.org/__ mailman/listinfo/users
> >
> > < http://lists.ovirt.org/ mailman/listinfo/users >
> >
> >
> > does the rebooting node happen to be the spm node?
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20121122/558050ae/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: spm-lock.log
Type: application/octet-stream
Size: 9090 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20121122/558050ae/attachment-0001.obj>


More information about the Users mailing list