[Users] Ovirt Node rebooting +/- every 10 days

EricD desrce at gmail.com
Fri Dec 21 12:46:08 UTC 2012


I have oVirt 3.1 with two nodes, one of them is running good without
restarting the other keep restarting, the maximum uptime that the server
can get is 10 days before it restart, I think that it might be something
related to the disk.

FYI, the disk are 2 disk of 1TB (RAID-0) to get 2TB.

# /var/log/messages
Dec 21 05:28:44 hypervisor01a ntpd[945]: 0.0.0.0 c61c 0c clock_step
+17997.588918 s
Dec 21 05:28:44 hypervisor01a ntpd[945]: 0.0.0.0 c614 04 freq_mode
Dec 21 05:28:45 hypervisor01a kdump: No crashkernel parameter specified for
running kernel
Dec 21 05:28:45 hypervisor01a kdumpctl[1366]: Starting kdump:
Dec 21 05:28:45 hypervisor01a kdump: failed to start up
Dec 21 05:28:45 hypervisor01a systemd[1]: kdump.service: main process
exited, code=exited, status=1
Dec 21 05:28:45 hypervisor01a systemd[1]: Unit kdump.service entered failed
state.
Dec 21 05:28:45 hypervisor01a systemd[1]: Startup finished in 888ms 157us
(kernel) + 2s 521ms 289us (initrd) + 15s 577ms 672us (userspace) = 18s
987ms 118us.
Dec 21 05:28:45 hypervisor01a ntpd[945]: 0.0.0.0 c618 08 no_sys_peer
Dec 21 05:29:04 hypervisor01a vdsm TaskManager.Task ERROR
Task=`5f51ff52-f9a4-4854-a41d-d5d33c872458`::Unexpected error
Dec 21 05:29:04 hypervisor01a vdsm Storage.Dispatcher.Protect ERROR
{'status': {'message': "Unknown pool id, pool not connected:
('dbb49db6-9a24-4395-a8bd-c9f222eaecab',)", 'code': 309}}
Dec 21 05:29:04 hypervisor01a vdsm TaskManager.Task ERROR
Task=`7b0cf3b0-6d26-4421-a221-29f2ecaaeb1f`::Unexpected error
Dec 21 05:29:04 hypervisor01a vdsm Storage.Dispatcher.Protect ERROR
{'status': {'message': "Unknown pool id, pool not connected:
('dbb49db6-9a24-4395-a8bd-c9f222eaecab',)", 'code': 309}}
Dec 21 05:29:04 hypervisor01a kernel: [   37.944421] ata1: hard resetting
link
Dec 21 05:29:04 hypervisor01a kernel: [   38.247979] ata1: SATA link up 3.0
Gbps (SStatus 123 SControl 300)
Dec 21 05:29:04 hypervisor01a kernel: [   38.248802] ata1.00: configured
for UDMA/133
Dec 21 05:29:04 hypervisor01a kernel: [   38.248807] ata1: EH complete
Dec 21 05:29:04 hypervisor01a kernel: [   38.249013] ata2: hard resetting
link
Dec 21 05:29:04 hypervisor01a kernel: [   38.553112] ata2: SATA link up 3.0
Gbps (SStatus 123 SControl 300)
Dec 21 05:29:04 hypervisor01a kernel: [   38.553881] ata2.00: configured
for UDMA/133
Dec 21 05:29:04 hypervisor01a kernel: [   38.553886] ata2: EH complete
Dec 21 05:29:04 hypervisor01a kernel: [   38.554064] ata3: hard resetting
link
Dec 21 05:29:05 hypervisor01a kernel: [   38.858275] ata3: SATA link up 3.0
Gbps (SStatus 123 SControl 300)
Dec 21 05:29:05 hypervisor01a kernel: [   38.861154] ata3.00: configured
for UDMA/133
Dec 21 05:29:05 hypervisor01a kernel: [   38.861159] ata3: EH complete
Dec 21 05:29:05 hypervisor01a kernel: [   38.861352] ata4: hard resetting
link
Dec 21 05:29:05 hypervisor01a kernel: [   39.165397] ata4: SATA link up 3.0
Gbps (SStatus 123 SControl 300)
Dec 21 05:29:05 hypervisor01a kernel: [   39.168223] ata4.00: configured
for UDMA/133
Dec 21 05:29:05 hypervisor01a kernel: [   39.168229] ata4: EH complete
Dec 21 05:29:05 hypervisor01a kernel: [   39.168421] ata5: hard resetting
link
Dec 21 05:29:05 hypervisor01a kernel: [   39.472459] ata5: SATA link up 1.5
Gbps (SStatus 113 SControl 300)
Dec 21 05:29:05 hypervisor01a kernel: [   39.480040] ata5.00: configured
for UDMA/100
Dec 21 05:29:05 hypervisor01a kernel: [   39.485478] ata5: EH complete
Dec 21 05:29:05 hypervisor01a kernel: [   39.485642] ata6: limiting SATA
link speed to 1.5 Gbps
Dec 21 05:29:05 hypervisor01a kernel: [   39.485647] ata6: hard resetting
link
Dec 21 05:29:06 hypervisor01a kernel: [   39.790610] ata6: SATA link down
(SStatus 0 SControl 310)

# RAID-0
mdadm --detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Sun Nov 18 14:47:15 2012
     Raid Level : raid0
     Array Size : 1953524736 (1863.03 GiB 2000.41 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Sun Nov 18 14:47:15 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 512K

           Name : hypervisor01-a:0  (local to host hypervisor01-a)
           UUID : 9eb1324d:57eed46d:c23ae815:0666e238
         Events : 0

    Number   Major   Minor   RaidDevice State
       0     253        2        0      active sync   /dev/dm-2
       1     253        3        1      active sync   /dev/dm-3
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20121221/395bf1aa/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vdsm_log.tgz
Type: application/x-gzip
Size: 3581139 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20121221/395bf1aa/attachment-0001.tgz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: messages
Type: application/octet-stream
Size: 121236 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20121221/395bf1aa/attachment-0001.obj>


More information about the Users mailing list