[ovirt-users] ovirt-3.6 : Hosted-engine crashed and can't restart

Simone Tiraboschi stirabos at redhat.com
Thu Jul 21 07:47:22 UTC 2016


On Wed, Jul 20, 2016 at 5:01 PM, Alexis HAUSER
<alexis.hauser at telecom-bretagne.eu> wrote:
> After assigning an IP adress to a VLAN network (it was using DHCP by default) that was on the same NIC than ovirtmgmt, my hosted-engine crashed and can't start again...I have no idea how to fix this.
> I had a similar issue some months ago but with a different error. I tried to restart the ha agent that seems to be linked with this error, also restarted the host. I also tried to remove the _DIRECT_IO_ lockfile on the engine storage as it fixed my problem last time but it didn't help...
>
> Any ideas ? Do you think editing manually the logical network in the host and reverting them at it was before crash can help ?
>
>
>
>
>
>
> hosted-engine --vm-status
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
>     "__main__", fname, loader, pkg_name)
>   File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
>     exec code in run_globals
>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 117, in <module>
>     if not status_checker.print_status():
>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 60, in print_status
>     all_host_stats = ha_cli.get_all_host_stats()
>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 160, in get_all_host_stats
>     return self.get_all_stats(self.StatModes.HOST)
>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats
>     self._configure_broker_conn(broker)
>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn
>     dom_type=dom_type)
>   File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain
>     .format(sd_type, options, e))
> ovirt_hosted_engine_ha.lib.exceptions.RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': 'e41807e5-ee68-40a2-a642-cc226ba0e82d'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
>
>
> vdsClient -s 0 list
>
> 16450089-911e-4bad-a8b7-98e84a79ef3a
>         Status = Down
>         nicModel = rtl8139,pv
>         statusTime = 4295559350
>         exitMessage = Unable to get volume size for domain e41807e5-ee68-40a2-a642-cc226ba0e82d volume 053df3a6-db18-445a-8f75-61c630ab0003
>         emulatedMachine = rhel6.5.0
>         pid = 0
>         vmName = HostedEngine
>         devices = [{'index': '0', 'iface': 'virtio', 'format': 'raw', 'bootOrder': '1', 'address': {'slot': '0x06', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'volumeID': '053df3a6-db18-445a-8f75-61c630ab0003', 'imageID': 'b6daa50d-adad-46a5-8f5f-accfb155a1e1', 'readonly': 'false', 'domainID': 'e41807e5-ee68-40a2-a642-cc226ba0e82d', 'deviceId': 'b6daa50d-adad-46a5-8f5f-accfb155a1e1', 'poolID': '00000000-0000-0000-0000-000000000000', 'device': 'disk', 'shared': 'exclusive', 'propagateErrors': 'off', 'type': 'disk'}, {'nicModel': 'pv', 'macAddr': '00:16:3e:1c:4b:81', 'linkActive': 'true', 'network': 'ovirtmgmt', 'deviceId': '0aeaea2f-a419-43cc-92d7-8422f6aa9223', 'address': 'None', 'device': 'bridge', 'type': 'interface'}, {'index': '2', 'iface': 'ide', 'readonly': 'true', 'deviceId': '8c3179ac-b322-4f5c-9449-c52e3665e0ae', 'address': {'bus': '1', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '0'}, 'device': 'cdrom', 'shared': 'false', 'path': '', 'type': 'disk'}, {'device': 'scsi', 'model': 'virtio-scsi', 'type': 'controller', 'deviceId': '21db0c6e-071c-48ff-b905-95478b37c384', 'address': {'slot': '0x04', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}}, {'device': 'usb', 'type': 'controller', 'deviceId': 'c0384f68-d0c9-4ebb-a779-8dc9911ce2f8', 'address': {'slot': '0x01', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x2'}}, {'device': 'ide', 'type': 'controller', 'deviceId': 'd5a2dd13-138a-482b-9bc3-994b10ec4100', 'address': {'slot': '0x01', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x1'}}, {'device': 'virtio-serial', 'type': 'controller', 'deviceId': '9e695172-c9b0-47df-bc76-8170219dec28', 'address': {'slot': '0x05', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}}]
>         guestDiskMapping = {}
>         vmType = kvm
>         displaySecurePort = -1
>         exitReason = 1
>         memSize = 6000
>         displayPort = -1
>         clientIp =
>         spiceSecureChannels = smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir
>         smp = 4
>         displayIp = 0
>         display = vnc
>         exitCode = 1
>
>
> systemctl status ovirt-ha-agent.service -l
> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
>    Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; vendor preset: disabled)
>    Active: active (running) since Wed 2016-07-20 14:56:22 UTC; 2min 29s ago
>  Main PID: 20236 (ovirt-ha-agent)
>    CGroup: /system.slice/ovirt-ha-agent.service
>            └─20236 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon
>
> Jul 20 14:57:56 rhevserv ovirt-ha-agent[20236]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
> Jul 20 14:57:57 rhevserv ovirt-ha-agent[20236]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
> Jul 20 14:58:37 rhevserv ovirt-ha-agent[20236]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: 'Connection to storage server failed' - trying to restart agent
> Jul 20 14:58:37 rhevserv ovirt-ha-agent[20236]: ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Error: 'Connection to storage server failed' - trying to restart agent

^^^
The issue seams here: please ensure that you can correctly connect
your storage server.
Can you please attach vdsm logs?

> Jul 20 14:58:42 rhevserv ovirt-ha-agent[20236]: WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '2'
> Jul 20 14:58:43 rhevserv ovirt-ha-agent[20236]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Found certificate common name: rhev.mydomain.com
> Jul 20 14:58:43 rhevserv ovirt-ha-agent[20236]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Initializing VDSM
> Jul 20 14:58:43 rhevserv ovirt-ha-agent[20236]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Connecting the storage
> Jul 20 14:58:43 rhevserv ovirt-ha-agent[20236]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
> Jul 20 14:58:44 rhevserv ovirt-ha-agent[20236]: INFO:ovirt_hosted_engine_ha.lib.storage_server.StorageServer:Connecting storage server
> _______________________________________________
> Users mailing list
> Users at ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users



More information about the Users mailing list