
On April 16, 2020 11:25:20 AM GMT+03:00, Shareef Jalloq <shareef@jalloq.co.uk> wrote:
Is this actually production ready? It seems to break at every step.
On Wed, Apr 15, 2020 at 5:45 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
>>>> >> >>> >> >>>> >> host-id=1 >>>> >> >>> >> >>>> >> score=3400 >>>> >> >>> >> >>>> >> vm_conf_refresh_time=523362 (Wed Apr 8 16:13:06 >2020) >>>> >> >>> >> >>>> >> conf_on_shared_storage=True >>>> >> >>> >> >>>> >> maintenance=False >>>> >> >>> >> >>>> >> state=EngineDown >>>> >> >>> >> >>>> >> stopped=False >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> --== Host ovirt-node-01.phoelex.com (id:
>status >>>> >==-- >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> conf_on_shared_storage : True >>>> >> >>> >> >>>> >> Status up-to-date : False >>>> >> >>> >> >>>> >> Hostname : >>>> >> >>> >ovirt-node-00.phoelex.com >>>> >> >>> >> >>>> >> Host ID : 1 >>>> >> >>> >> >>>> >> Engine status : unknown >>>> >stale-data >>>> >> >>> >> >>>> >> Score : 3400 >>>> >> >>> >> >>>> >> stopped : False >>>> >> >>> >> >>>> >> Local maintenance : False >>>> >> >>> >> >>>> >> crc32 : 9c4a034b >>>> >> >>> >> >>>> >> local_conf_timestamp : 523362 >>>> >> >>> >> >>>> >> Host timestamp : 523608 >>>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): >>>> >> >>> >> >>>> >> metadata_parse_version=1 >>>> >> >>> >> >>>> >> metadata_feature_version=1 >>>> >> >>> >> >>>> >> timestamp=523608 (Wed Apr 8 16:17:11
Thanks for your help but I've decided to try and reinstall from scratch. This is taking too long.
On Wed, Apr 15, 2020 at 3:25 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Yes, but there are no zones set up, just ports 22, 6801 adn 6900.
On Wed, Apr 15, 2020 at 12:37 PM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
On April 15, 2020 2:28:05 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: >Oh this is painful. It seems to progress if you have both >he_force_ipv4 >set and run the deployment with the '--4' switch. > >But then I get a failure when the ansible script checks for >firewalld-zones >and doesn't get anything back. Should the deployment flow not be >setting >any zones it needs? > >2020-04-15 10:57:25,439+0000 INFO >otopi.ovirt_hosted_engine_setup.ansible_utils >ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Get >active list of active firewalld zones] > >2020-04-15 10:57:26,641+0000 DEBUG >otopi.ovirt_hosted_engine_setup.ansible_utils >ansible_utils._process_output:103 {u'stderr_lines': [], u'changed': >True, >u'end': u'2020-04-15 10:57:26.481202', u'_ansible_no_log': False, >u'stdout': u'', u'cmd': u'set -euo pipefail && firewall-cmd >--get-active-zones | grep -v "^\\s*interfaces"', u'start': u'2020-04-15 >10:57:26.050203', u'delta': u'0:00:00.430999', u'stderr': u'', u'rc': >1, >u'invocation': {u'module_args': {u'creates': None, u'executable': None, >u'_uses_shell': True, u'strip_empty_ends': True, u'_raw_params': u'set >-euo >pipefail && firewall-cmd --get-active-zones | grep -v >"^\\s*interfaces"', >u'removes': None, u'argv': None, u'warn': True, u'chdir': None, >u'stdin_add_newline': True, u'stdin': None}}, u'stdout_lines': [], >u'msg': >u'non-zero return code'} > >2020-04-15 10:57:26,741+0000 ERROR >otopi.ovirt_hosted_engine_setup.ansible_utils >ansible_utils._process_output:107 fatal: [localhost]: FAILED! => >{"changed": true, "cmd": "set -euo pipefail && firewall-cmd >--get-active-zones | grep -v \"^\\s*interfaces\"", "delta": >"0:00:00.430999", "end": "2020-04-15 10:57:26.481202", "msg": "non-zero >return code", "rc": 1, "start": "2020-04-15 10:57:26.050203", "stderr": >"", >"stderr_lines": [], "stdout": "", "stdout_lines": []} > >On Wed, Apr 15, 2020 at 10:23 AM Shareef Jalloq <shareef@jalloq.co.uk> >wrote: > >> Ha, spoke too soon. It's now stuck in a loop and a google
On April 15, 2020 2:40:52 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: points
me
>at >> https://bugzilla.redhat.com/show_bug.cgi?id=1746585 >> >> However, forcing ipv4 doesn't seem to have fixed the loop. >> >> On Wed, Apr 15, 2020 at 9:59 AM Shareef Jalloq <shareef@jalloq.co.uk> >> wrote: >> >>> OK, that seems to have fixed it, thanks. Is this a side effect of >>> redeploying the HE over a first time install? Nothing has changed in >our >>> setup and I didn't need to do this when I initially set up our >nodes. >>> >>> >>> >>> On Tue, Apr 14, 2020 at 6:55 PM Strahil Nikolov ><hunter86_bg@yahoo.com> >>> wrote: >>> >>>> On April 14, 2020 6:17:17 PM GMT+03:00, Shareef Jalloq < >>>> shareef@jalloq.co.uk> wrote: >>>> >Hmmm, we're not using ipv6. Is that the issue? >>>> > >>>> >On Tue, Apr 14, 2020 at 3:56 PM Strahil Nikolov ><hunter86_bg@yahoo.com> >>>> >wrote: >>>> > >>>> >> On April 14, 2020 1:27:24 PM GMT+03:00, Shareef Jalloq < >>>> >> shareef@jalloq.co.uk> wrote: >>>> >> >Right, I've given up on recovering the HE so want to
On April 15, 2020 5:59:46 PM GMT+03:00, Shareef Jalloq < shareef@jalloq.co.uk> wrote: try
>>>> >redeploy >>>> >> >it. >>>> >> >There doesn't seem to be enough information to debug why
>>>> >> >broker/agent >>>> >> >won't start cleanly. >>>> >> > >>>> >> >In running 'hosted-engine --deploy', I'm seeing the following >error >>>> >in >>>> >> >the >>>> >> >setup validation phase: >>>> >> > >>>> >> >2020-04-14 09:46:08,922+0000 DEBUG >otopi.plugins.otopi.dialog.human >>>> >> >dialog.__logString:204 DIALOG:SEND Please >provide >>>> >the >>>> >> >hostname of this host on the management network >>>> >> >[ovirt-node-00.phoelex.com]: >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,831+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge >>>> >> >hostname.getResolvedAddresses:432 >>>> >> >getResolvedAddresses: set(['64:ff9b::c0a8:13d', >'192.168.1.61']) >>>> >> > >>>> >> >2020-04-14 09:46:12,832+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge >>>> >> >hostname._validateFQDNresolvability:289 >ovirt-node-00.phoelex.com >>>> >> >resolves >>>> >> >to: set(['64:ff9b::c0a8:13d', '192.168.1.61']) >>>> >> > >>>> >> >2020-04-14 09:46:12,832+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 >>>> >> >execute: >>>> >> >['/usr/bin/dig', '+noall', '+answer', >'ovirt-node-00.phoelex.com', >>>> >> >'ANY'], >>>> >> >executable='None', cwd='None', env=None >>>> >> > >>>> >> >2020-04-14 09:46:12,871+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 >>>> >> >execute-result: ['/usr/bin/dig', '+noall', '+answer', ' >>>> >> >ovirt-node-00.phoelex.com', 'ANY'], rc=0 >>>> >> > >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge
>>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' >>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stdout: >>>> >> > >>>> >> >ovirt-node-00.phoelex.com. 86400 IN A 192.168.1.61 >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge
>>>> >> >execute-output: ['/usr/bin/dig', '+noall', '+answer', ' >>>> >> >ovirt-node-00.phoelex.com', 'ANY'] stderr: >>>> >> > >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,872+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:813 >>>> >> >execute: >>>> >> >('/usr/sbin/ip', 'addr'), executable='None', cwd='None', >env=None >>>> >> > >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge plugin.executeRaw:863 >>>> >> >execute-result: ('/usr/sbin/ip', 'addr'), rc=0 >>>> >> > >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge
>>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stdout: >>>> >> > >>>> >> >1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state >UNKNOWN >>>> >> >group >>>> >> >default qlen 1000 >>>> >> > >>>> >> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >>>> >> > >>>> >> > inet 127.0.0.1/8 scope host lo >>>> >> > >>>> >> > valid_lft forever preferred_lft forever >>>> >> > >>>> >> > inet6 ::1/128 scope host >>>> >> > >>>> >> > valid_lft forever preferred_lft forever >>>> >> > >>>> >> >2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq >master >>>> >> >ovirtmgmt state UP group default qlen 1000 >>>> >> > >>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> >3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq >state >>>> >> >DOWN >>>> >> >group default qlen 1000 >>>> >> > >>>> >> > link/ether ac:1f:6b:bc:32:6b brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> >4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state >DOWN >>>> >> >group >>>> >> >default qlen 1000 >>>> >> > >>>> >> > link/ether 02:e6:e2:80:93:8d brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> >5: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN >>>> >group >>>> >> >default qlen 1000 >>>> >> > >>>> >> > link/ether 8a:26:44:50:ee:4a brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> >21: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc >>>> >noqueue >>>> >> >state UP group default qlen 1000 >>>> >> > >>>> >> > link/ether ac:1f:6b:bc:32:6a brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> > inet 192.168.1.61/24 brd 192.168.1.255 scope global >ovirtmgmt >>>> >> > >>>> >> > valid_lft forever preferred_lft forever >>>> >> > >>>> >> > inet6 fe80::ae1f:6bff:febc:326a/64 scope link >>>> >> > >>>> >> > valid_lft forever preferred_lft forever >>>> >> > >>>> >> >22: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop >state >>>> >DOWN >>>> >> >group >>>> >> >default qlen 1000 >>>> >> > >>>> >> > link/ether 3a:02:7b:7d:b3:2a brd ff:ff:ff:ff:ff:ff >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,876+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge
>>>> >> >execute-output: ('/usr/sbin/ip', 'addr') stderr: >>>> >> > >>>> >> > >>>> >> > >>>> >> >2020-04-14 09:46:12,877+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge >>>> >> >hostname.getLocalAddresses:251 >>>> >> >addresses: [u'192.168.1.61', u'fe80::ae1f:6bff:febc:326a'] >>>> >> > >>>> >> >2020-04-14 09:46:12,877+0000 DEBUG >>>> >> >otopi.plugins.gr_he_common.network.bridge >hostname.test_hostname:464 >>>> >> >test_hostname exception >>>> >> > >>>> >> >Traceback (most recent call last): >>>> >> > >>>> >> >File
"/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py",
>>>> >> >line >>>> >> >460, in test_hostname >>>> >> > >>>> >> > not_local_text, >>>> >> > >>>> >> >File
>>>> >> >line >>>> >> >342, in _validateFQDNresolvability >>>> >> > >>>> >> > addresses=resolvedAddressesAsString >>>> >> > >>>> >> >RuntimeError: ovirt-node-00.phoelex.com resolves to >>>> >64:ff9b::c0a8:13d >>>> >> >192.168.1.61 and not all of them can be mapped to non loopback >>>> >devices >>>> >> >on >>>> >> >this host >>>> >> > >>>> >> >2020-04-14 09:46:12,884+0000 ERROR >>>> >> >otopi.plugins.gr_he_common.network.bridge >dialog.queryEnvKey:120 >>>> >Host >>>> >> >name >>>> >> >is not valid: ovirt-node-00.phoelex.com resolves to >>>> >64:ff9b::c0a8:13d >>>> >> >192.168.1.61 and not all of them can be mapped to non loopback >>>> >devices >>>> >> >on >>>> >> >this host >>>> >> > >>>> >> >The node I'm running on has an IP address of .61 and resolves >>>> >> >correctly. >>>> >> > >>>> >> >On Fri, Apr 10, 2020 at 12:55 PM Shareef Jalloq >>>> ><shareef@jalloq.co.uk> >>>> >> >wrote: >>>> >> > >>>> >> >> Where should I be checking if there are any files/folder not >owned >>>> >by >>>> >> >> vdsm:kvm? I checked on the mount the HA sits on and it's >fine. >>>> >> >> >>>> >> >> How would I go about checking vdsm can access those images? >If I >>>> >run >>>> >> >> virsh, it lists them and they were running yesterday even >though >>>> >the >>>> >> >HA was >>>> >> >> down. I've since restarted both hosts but the broker is >still >>>> >> >spitting out >>>> >> >> the same error (copied below). How do I find the reason
"/usr/lib/python2.7/site-packages/ovirt_setup_lib/hostname.py", the
>>>> >broker >>>> >> >can't >>>> >> >> connect to the storage? The conf file is already at DEBUG >>>> >verbosity: >>>> >> >> >>>> >> >> [handler_logfile] >>>> >> >> >>>> >> >> class=logging.handlers.TimedRotatingFileHandler >>>> >> >> >>>> >> >> args=('/var/log/ovirt-hosted-engine-ha/broker.log', 'd', 1, >7) >>>> >> >> >>>> >> >> level=DEBUG >>>> >> >> >>>> >> >> formatter=long >>>> >> >> >>>> >> >> And what are all these .prob-<num> files that are being >created? >>>> >> >There >>>> >> >> are over 250K of them now on the mount I'm using for
and the plugin.execute:921 plugin.execute:926 plugin.execute:921 plugin.execute:926 the
Data
>>>> >domain. >>>> >> >> They're all of 0 size and of the form, >>>> >> >> /rhev/data-center/mnt/nas-01.phoelex.com: >>>> >> >> _volume2_vmstore/.prob-ffa867da-93db-4211-82df-b1b04a625ab9 >>>> >> >> >>>> >> >> @eevans: The volume I have the Data Domain on has TB's free. > The >>>> >HA >>>> >> >is >>>> >> >> dead so I can't ssh in. No idea what started these errors >and the >>>> >> >other >>>> >> >> VMs were still running happily although they're on a >different >>>> >Data >>>> >> >Domain. >>>> >> >> >>>> >> >> Shareef. >>>> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>>
07:45:00,408::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>>> >> >> Connecting the storage >>>> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>>
07:45:00,408::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >> Connecting storage server >>>> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>>
07:45:01,577::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >> Connecting storage server >>>> >> >> >>>> >> >> MainThread::INFO::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>>
07:45:02,692::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >> Refreshing the storage domain >>>> >> >> >>>> >> >> MainThread::WARNING::2020-04-10 >>>> >> >> >>>> >> >>>> >> >>>> >>>>
07:45:05,175::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >>>> >> >> Can't connect vdsm storage: Command StorageDomain.getInfo >with >>>> >args >>>> >> >> {'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>>> >failed: >>>> >> >> >>>> >> >> (code=350, message=Error in storage domain action: >>>> >> >> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >> >>>> >> >> On Thu, Apr 9, 2020 at 5:58 PM Strahil Nikolov >>>> >> ><hunter86_bg@yahoo.com> >>>> >> >> wrote: >>>> >> >> >>>> >> >>> On April 9, 2020 11:12:30 AM GMT+03:00, Shareef Jalloq < >>>> >> >>> shareef@jalloq.co.uk> wrote: >>>> >> >>> >OK, let's go through this. I'm looking at the node that at >>>> >least >>>> >> >still >>>> >> >>> >has >>>> >> >>> >some VMs running. virsh also tells me that the >HostedEngine VM >>>> >is >>>> >> >>> >running >>>> >> >>> >but it's unresponsive and I can't shut it down. >>>> >> >>> > >>>> >> >>> >1. All storage domains exist and are mounted. >>>> >> >>> >2. The ha_agent exists: >>>> >> >>> > >>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ls >>>> >> >/rhev/data-center/mnt/ >>>> >> >>> >nas-01.phoelex.com >>>> >> >>> \:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ >>>> >> >>> > >>>> >> >>> >dom_md ha_agent images master >>>> >> >>> > >>>> >> >>> >3. There are two links >>>> >> >>> > >>>> >> >>> >[root@ovirt-node-01 ovirt-hosted-engine-ha]# ll >>>> >> >/rhev/data-center/mnt/ >>>> >> >>> >nas-01.phoelex.com >>>> >> >>> >>>>
\:_volume2_vmstore/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ha_agent/
>>>> >> >>> > >>>> >> >>> >total 8 >>>> >> >>> > >>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 >hosted-engine.lockspace >>>> >-> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/ffb90b82-42fe-4253-85d5-aaec8c280aaf/90e68791-0c6f-406a-89ac-e0d86c631604 >>>> >> >>> > >>>> >> >>> >lrwxrwxrwx. 1 vdsm kvm 132 Apr 2 14:50 >hosted-engine.metadata >>>> >-> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>/var/run/vdsm/storage/a6cea67d-dbfb-45cf-a775-b4d0d47b26f2/2161aed0-7250-4c1d-b667-ac94f60af17e/6b818e33-f80a-48cc-a59c-bba641e027d4 >>>> >> >>> > >>>> >> >>> >4. The services exist but all seem to have some sort of >warning: >>>> >> >>> > >>>> >> >>> >a) Apr 08 18:10:55 ovirt-node-01.phoelex.com sanlock[1728]: >>>> >> >*2020-04-08 >>>> >> >>> >18:10:55 1744152 [36796]: s16 delta_renew long write time >10 >>>> >sec* >>>> >> >>> > >>>> >> >>> >b) Mar 23 18:02:59 ovirt-node-01.phoelex.com >supervdsmd[29409]: >>>> >> >*failed >>>> >> >>> >to >>>> >> >>> >load module nvdimm: libbd_nvdimm.so.2: cannot open shared >object >>>> >> >file: >>>> >> >>> >No >>>> >> >>> >such file or directory* >>>> >> >>> > >>>> >> >>> >c) Apr 09 08:05:13 ovirt-node-01.phoelex.com vdsm[4801]: >*ERROR >>>> >> >failed >>>> >> >>> >to >>>> >> >>> >retrieve Hosted Engine HA score '[Errno 2] No such file or >>>> >> >directory'Is >>>> >> >>> >the >>>> >> >>> >Hosted Engine setup finished?* >>>> >> >>> > >>>> >> >>> >d)Apr 08 22:48:27 ovirt-node-01.phoelex.com >libvirtd[29307]: >>>> >> >2020-04-08 >>>> >> >>> >22:48:27.134+0000: 29309: warning : qemuGetProcessInfo:1404 >: >>>> >> >cannot >>>> >> >>> >parse >>>> >> >>> >process status data >>>> >> >>> > >>>> >> >>> >Apr 08 22:48:27 ovirt-node-01.phoelex.com libvirtd[29307]: >>>> >> >2020-04-08 >>>> >> >>> >22:48:27.134+0000: 29309: error : >virNetDevTapInterfaceStats:764 >>>> >: >>>> >> >>> >internal >>>> >> >>> >error: /proc/net/dev: Interface not found >>>> >> >>> > >>>> >> >>> >Apr 08 23:09:39 ovirt-node-01.phoelex.com libvirtd[29307]: >>>> >> >2020-04-08 >>>> >> >>> >23:09:39.844+0000: 29307: error : virNetSocketReadWire:1806 >: >>>> >End >>>> >> >of >>>> >> >>> >file >>>> >> >>> >while reading data: Input/output error >>>> >> >>> > >>>> >> >>> >Apr 09 01:05:26 ovirt-node-01.phoelex.com libvirtd[29307]: >>>> >> >2020-04-09 >>>> >> >>> >01:05:26.660+0000: 29307: error : virNetSocketReadWire:1806 >: >>>> >End >>>> >> >of >>>> >> >>> >file >>>> >> >>> >while reading data: Input/output error >>>> >> >>> > >>>> >> >>> >5 & 6. The broker log is continually printing this error: >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,438::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>>> >> >>> >ovirt-hosted-engine-ha broker 2.3.6 started >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,438::broker::55::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>>> >> >>> >Running broker >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,438::broker::120::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_monitor) >>>> >> >>> >Starting monitor >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,438::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Searching for submonitors in >>>> >> >>>
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker
>>>> >> >>> > >>>> >> >>> >/submonitors >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,439::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor network >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,440::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor cpu-load-no-engine >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor mgmt-bridge >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor network >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor cpu-load >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,441::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor engine-health >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor mgmt-bridge >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,442::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor cpu-load-no-engine >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor cpu-load >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor mem-free >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor storage-domain >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor storage-domain >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,443::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor mem-free >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,444::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Loaded submonitor engine-health >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,444::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >Finished loading submonitors >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,444::broker::128::ovirt_hosted_engine_ha.broker.broker.Broker::(_get_storage_broker) >>>> >> >>> >Starting storage broker >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,444::storage_backends::369::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>>> >> >>> >Connecting to VDSM >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,444::util::384::ovirt_hosted_engine_ha.lib.storage_backends::(__log_debug) >>>> >> >>> >Creating a new json-rpc connection to VDSM >>>> >> >>> > >>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 >>>> >> >>> >08:07:31,453::concurrent::258::root::(run) START thread >>>> >> ><Thread(Client >>>> >> >>> >localhost:54321, started daemon 139992488138496)> >(func=<bound >>>> >> >method >>>> >> >>> >Reactor.process_requests of ><yajsonrpc.betterAsyncore.Reactor >>>> >> >object at >>>> >> >>> >0x7f528acabc90>>, args=(), kwargs={}) >>>> >> >>> > >>>> >> >>> >Client localhost:54321::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,459::stompclient::138::yajsonrpc.protocols.stomp.AsyncClient::(_process_connected) >>>> >> >>> >Stomp connection established >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>>
08:07:31,467::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,530::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>>> >> >>> >Connecting the storage >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:31,531::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >Connecting storage server >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>>
08:07:31,531::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>>
08:07:31,534::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:32,199::storage_server::158::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(_validate_pre_connected_path) >>>> >> >>> >Storage domain a6cea67d-dbfb-45cf-a775-b4d0d47b26f2 is not >>>> >> >available >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:32,199::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >Connecting storage server >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>>
08:07:32,199::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:32,814::storage_server::363::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >[{u'status': 0, u'id': >u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}] >>>> >> >>> > >>>> >> >>> >MainThread::INFO::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:32,814::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >Refreshing the storage domain >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>>
08:07:32,815::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:33,129::storage_server::420::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >Error refreshing storage domain: Command >StorageDomain.getStats >>>> >> >with >>>> >> >>> >args >>>> >> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>>> >failed: >>>> >> >>> > >>>> >> >>> >(code=350, message=Error in storage domain action: >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>>
08:07:33,130::stompclient::294::jsonrpc.AsyncoreClient::(send)
>>>> >> >Sending >>>> >> >>> >response >>>> >> >>> > >>>> >> >>> >MainThread::DEBUG::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:33,795::storage_backends::208::ovirt_hosted_engine_ha.lib.storage_backends::(_get_sector_size) >>>> >> >>> >Command StorageDomain.getInfo with args {'storagedomainID': >>>> >> >>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} failed: >>>> >> >>> > >>>> >> >>> >(code=350, message=Error in storage domain action: >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >>> > >>>> >> >>> >MainThread::WARNING::2020-04-09 >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>08:07:33,795::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >>>> >> >>> >Can't connect vdsm storage: Command StorageDomain.getInfo >with >>>> >args >>>> >> >>> >{'storagedomainID': 'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>>> >failed: >>>> >> >>> > >>>> >> >>> >(code=350, message=Error in storage domain action: >>>> >> >>> >(u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >>> > >>>> >> >>> > >>>> >> >>> >The UUID it is moaning about is indeed the one that the HA >sits >>>> >on >>>> >> >and >>>> >> >>> >is >>>> >> >>> >the one I listed the contents of in step 2 above. >>>> >> >>> > >>>> >> >>> > >>>> >> >>> >So why can't it see this domain? >>>> >> >>> > >>>> >> >>> > >>>> >> >>> >Thanks, Shareef. >>>> >> >>> > >>>> >> >>> >On Thu, Apr 9, 2020 at 6:12 AM Strahil Nikolov >>>> >> ><hunter86_bg@yahoo.com> >>>> >> >>> >wrote: >>>> >> >>> > >>>> >> >>> >> On April 9, 2020 1:51:05 AM GMT+03:00, Shareef Jalloq < >>>> >> >>> >> shareef@jalloq.co.uk> wrote: >>>> >> >>> >> >Don't know if this is useful or not, but I just tried to >>>> >> >shutdown >>>> >> >>> >and >>>> >> >>> >> >start >>>> >> >>> >> >another VM on one of the hosts and get the following >error: >>>> >> >>> >> > >>>> >> >>> >> >virsh # start scratch >>>> >> >>> >> > >>>> >> >>> >> >error: Failed to start domain scratch >>>> >> >>> >> > >>>> >> >>> >> >error: Network not found: no network with matching name >>>> >> >>> >> >'vdsm-ovirtmgmt' >>>> >> >>> >> > >>>> >> >>> >> >Is this not referring to the interface name as the >network is >>>> >> >called >>>> >> >>> >> >'ovirtmgnt'. >>>> >> >>> >> > >>>> >> >>> >> >On Wed, Apr 8, 2020 at 11:35 PM Shareef Jalloq >>>> >> >>> ><shareef@jalloq.co.uk> >>>> >> >>> >> >wrote: >>>> >> >>> >> > >>>> >> >>> >> >> Hmmm, virsh tells me the HE is running but it hasn't >come >>>> >up >>>> >> >and >>>> >> >>> >the >>>> >> >>> >> >> agent.log is full of the same errors. >>>> >> >>> >> >> >>>> >> >>> >> >> On Wed, Apr 8, 2020 at 11:31 PM Shareef Jalloq >>>> >> >>> ><shareef@jalloq.co.uk> >>>> >> >>> >> >> wrote: >>>> >> >>> >> >> >>>> >> >>> >> >>> Ah hah! Ok, so I've managed to start it using virsh >on >>>> >the >>>> >> >>> >second >>>> >> >>> >> >host >>>> >> >>> >> >>> but my first host is still dead. >>>> >> >>> >> >>> >>>> >> >>> >> >>> First of all, what are these 56,317 .prob- files that >get >>>> >> >dumped >>>> >> >>> >to >>>> >> >>> >> >the >>>> >> >>> >> >>> NFS mounts? >>>> >> >>> >> >>> >>>> >> >>> >> >>> Secondly, why doesn't the node mount the NFS >directories >>>> >at >>>> >> >boot? >>>> >> >>> >> >Is >>>> >> >>> >> >>> that the issue with this particular node? >>>> >> >>> >> >>> >>>> >> >>> >> >>> On Wed, Apr 8, 2020 at 11:12 PM >>>> ><eevans@digitaldatatechs.com> >>>> >> >>> >wrote: >>>> >> >>> >> >>> >>>> >> >>> >> >>>> Did you try virsh list --inactive >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Eric Evans >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Digital Data Services LLC. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> 304.660.9080 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> *From:* Shareef Jalloq <shareef@jalloq.co.uk> >>>> >> >>> >> >>>> *Sent:* Wednesday, April 8, 2020 5:58 PM >>>> >> >>> >> >>>> *To:* Strahil Nikolov <hunter86_bg@yahoo.com> >>>> >> >>> >> >>>> *Cc:* Ovirt Users <users@ovirt.org> >>>> >> >>> >> >>>> *Subject:* [ovirt-users] Re: ovirt-engine >unresponsive - >>>> >how >>>> >> >to >>>> >> >>> >> >rescue? >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> I've now shut down the VMs on one host and rebooted >it >>>> >but >>>> >> >the >>>> >> >>> >> >agent >>>> >> >>> >> >>>> service doesn't start. If I run 'hosted-engine >>>> >--vm-status' >>>> >> >I >>>> >> >>> >get: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> The hosted engine configuration has not been >retrieved >>>> >from >>>> >> >>> >shared >>>> >> >>> >> >>>> storage. Please ensure that ovirt-ha-agent is >running and >>>> >> >the >>>> >> >>> >> >storage >>>> >> >>> >> >>>> server is reachable. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> and indeed if I list the mounts under >>>> >/rhev/data-center/mnt, >>>> >> >>> >only >>>> >> >>> >> >one of >>>> >> >>> >> >>>> the directories is mounted. I have 3 NFS mounts, >one ISO >>>> >> >Domain >>>> >> >>> >> >and two >>>> >> >>> >> >>>> Data Domains. Only one Data Domain has mounted and >this >>>> >has >>>> >> >>> >lots >>>> >> >>> >> >of .prob >>>> >> >>> >> >>>> files in. So why haven't the other NFS exports been >>>> >> >mounted? >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Manually mounting them doesn't seem to have helped >much >>>> >> >either. >>>> >> >>> >I >>>> >> >>> >> >can >>>> >> >>> >> >>>> start the broker service but the agent service says >no. >>>> >> >Same >>>> >> >>> >error >>>> >> >>> >> >as the >>>> >> >>> >> >>>> one in my last email. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Shareef. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 9:57 PM Shareef Jalloq >>>> >> >>> >> ><shareef@jalloq.co.uk> >>>> >> >>> >> >>>> wrote: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Right, still down. I've run virsh and it doesn't >know >>>> >> >anything >>>> >> >>> >> >about >>>> >> >>> >> >>>> the engine vm. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> I've restarted the broker and agent services and I >still >>>> >get >>>> >> >>> >> >nothing in >>>> >> >>> >> >>>> virsh->list. >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> In the logs under /var/log/ovirt-hosted-engine-ha I >see >>>> >lots >>>> >> >of >>>> >> >>> >> >errors: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> broker.log: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,138::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,138::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Searching for submonitors in >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,138::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor network >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,140::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor mgmt-bridge >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor network >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor cpu-load >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor engine-health >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,141::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor mgmt-bridge >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor cpu-load-no-engine >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor cpu-load >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,142::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor mem-free >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor storage-domain >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor storage-domain >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor mem-free >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,143::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Loaded submonitor engine-health >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,143::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Finished loading submonitors >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,197::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >>>> >> >>> >> >>>> Connecting the storage >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,197::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >> >>>> Connecting storage server >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,414::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >> >>>> Connecting storage server >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:20,628::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>> >> >>> >> >>>> Refreshing the storage domain >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::WARNING::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:21,057::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >>>> >> >>> >> >>>> Can't connect vdsm storage: Command >StorageDomain.getInfo >>>> >> >with >>>> >> >>> >args >>>> >> >>> >> >>>> {'storagedomainID': >>>> >'a6cea67d-dbfb-45cf-a775-b4d0d47b26f2'} >>>> >> >>> >failed: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> (code=350, message=Error in storage domain action: >>>> >> >>> >> >>>> (u'sdUUID=a6cea67d-dbfb-45cf-a775-b4d0d47b26f2',)) >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:21,901::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >>>> >> >>> >> >>>> ovirt-hosted-engine-ha broker 2.3.6 started >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:56:21,901::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >>>> >> >>> >> >>>> Searching for submonitors in >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> agent.log: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:57:00,799::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>> >> >>> >> >>>> Trying to restart agent >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>>> >>>>
>>20:57:00,799::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>> >> >>> >> >>>> Agent shutting down >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>>> >>>>
>>20:57:11,144::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>> >> >>> >> >>>> ovirt-hosted-engine-ha agent 2.3.6 started >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:57:11,182::hosted_engine::234::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) >>>> >> >>> >> >>>> Found certificate common name: >ovirt-node-01.phoelex.com >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:57:11,294::hosted_engine::543::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >>>> >> >>> >> >>>> Initializing ha-broker connection >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:57:11,296::brokerlink::80::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>> >> >>> >> >>>> Starting monitor network, options {'tcp_t_address': >'', >>>> >> >>> >> >'network_test': >>>> >> >>> >> >>>> 'dns', 'tcp_t_port': '', 'addr': '192.168.1.99'} >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:57:11,296::hosted_engine::559::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >>>> >> >>> >> >>>> Failed to start necessary monitors >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:57:11,297::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>> >> >>> >> >>>> Traceback (most recent call last): >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>> >> >>> >> >>>> line 131, in _run_agent >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> return action(he) >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>> >> >>> >> >>>> line 55, in action_proper >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> return he.start_monitoring() >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> >> >>> >> >>>> line 432, in start_monitoring >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> self._initialize_broker() >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>> >> >>> >> >>>> line 556, in _initialize_broker >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> m.get('options', {})) >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> File >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>>> >> >>> >> >>>> line 89, in start_monitor >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> ).format(t=type, o=options, e=e) >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> RequestError: brokerlink - failed to start monitor >via >>>> >> >>> >> >ovirt-ha-broker: >>>> >> >>> >> >>>> [Errno 2] No such file or directory, [monitor: >'network', >>>> >> >>> >options: >>>> >> >>> >> >>>> {'tcp_t_address': '', 'network_test': 'dns', >>>> >'tcp_t_port': >>>> >> >'', >>>> >> >>> >> >'addr': >>>> >> >>> >> >>>> '192.168.1.99'}] >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::ERROR::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >>>> >> >>> >>>> >> >>>> >> >>>> >>>>
>>20:57:11,297::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>> >> >>> >> >>>> Trying to restart agent >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> MainThread::INFO::2020-04-08 >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >>>> >> >>>> >>>>
>>20:57:11,297::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) >>>> >> >>> >> >>>> Agent shutting down >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> On Wed, Apr 8, 2020 at 6:10 PM Strahil Nikolov >>>> >> >>> >> ><hunter86_bg@yahoo.com> >>>> >> >>> >> >>>> wrote: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> On April 8, 2020 7:47:20 PM GMT+03:00, "Maton, >Brett" < >>>> >> >>> >> >>>> matonb@ltresources.co.uk> wrote: >>>> >> >>> >> >>>> >On the host you tried to restart the engine on: >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >Add an alias to virsh (authenticates with >>>> >virsh_auth.conf) >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >alias virsh='virsh -c >>>> >> >>> >> >>>> >>>> >> >>> >>>>
qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf' >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >Then run virsh: >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >virsh >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >virsh # list >>>> >> >>> >> >>>> > Id Name State >>>> >> >>> >> >>>> >>---------------------------------------------------- >>>> >> >>> >> >>>> > xx HostedEngine Paused >>>> >> >>> >> >>>> > xx ********** running >>>> >> >>> >> >>>> > ... >>>> >> >>> >> >>>> > xx ********** running >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >HostedEngine should be in the list, try and resume >the >>>> >> >engine: >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >virsh # resume HostedEngine >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >On Wed, 8 Apr 2020 at 17:28, Shareef Jalloq >>>> >> >>> ><shareef@jalloq.co.uk> >>>> >> >>> >> >>>> >wrote: >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >> Thanks! >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> The status hangs due to, I guess, the VM being >>>> >down.... >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> [root@ovirt-node-01 ~]# hosted-engine --vm-start >>>> >> >>> >> >>>> >> VM exists and is down, cleaning up and restarting >>>> >> >>> >> >>>> >> VM in WaitForLaunch >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> but this doesn't seem to do anything. OK, after >a >>>> >while >>>> >> >I >>>> >> >>> >get a >>>> >> >>> >> >>>> >status of >>>> >> >>> >> >>>> >> it being barfed... >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> --== Host ovirt-node-00.phoelex.com (id:
>status >>>> >==-- >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> conf_on_shared_storage : True >>>> >> >>> >> >>>> >> Status up-to-date : True >>>> >> >>> >> >>>> >> Hostname : >>>> >> >>> >ovirt-node-01.phoelex.com >>>> >> >>> >> >>>> >> Host ID : 2 >>>> >> >>> >> >>>> >> Engine status : {"reason": >"bad >>>> >vm >>>> >> >>> >status", >>>> >> >>> >> >>>> >"health": >>>> >> >>> >> >>>> >> "bad", "vm": "down_unexpected", "detail": "Down"} >>>> >> >>> >> >>>> >> Score : 0 >>>> >> >>> >> >>>> >> stopped : False >>>> >> >>> >> >>>> >> Local maintenance : False >>>> >> >>> >> >>>> >> crc32 : 5045f2eb >>>> >> >>> >> >>>> >> local_conf_timestamp : 1737037 >>>> >> >>> >> >>>> >> Host timestamp : 1737283 >>>> >> >>> >> >>>> >> Extra metadata (valid at timestamp): >>>> >> >>> >> >>>> >> metadata_parse_version=1 >>>> >> >>> >> >>>> >> metadata_feature_version=1 >>>> >> >>> >> >>>> >> timestamp=1737283 (Wed Apr 8 16:16:17
>>>> >> >>> >> >>>> >> host-id=2 >>>> >> >>> >> >>>> >> score=0 >>>> >> >>> >> >>>> >> vm_conf_refresh_time=1737037 (Wed Apr 8 16:12:11 >>>> >2020) >>>> >> >>> >> >>>> >> conf_on_shared_storage=True >>>> >> >>> >> >>>> >> maintenance=False >>>> >> >>> >> >>>> >> state=EngineUnexpectedlyDown >>>> >> >>> >> >>>> >> stopped=False >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >> On Wed, Apr 8, 2020 at 5:09 PM Maton, Brett >>>> >> >>> >> >>>> ><matonb@ltresources.co.uk> >>>> >> >>> >> >>>> >> wrote: >>>> >> >>> >> >>>> >> >>>> >> >>> >> >>>> >>> First steps, on one of your hosts as root: >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>> To get information: >>>> >> >>> >> >>>> >>> hosted-engine --vm-status >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>> To start the engine: >>>> >> >>> >> >>>> >>> hosted-engine --vm-start >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>> On Wed, 8 Apr 2020 at 17:00, Shareef Jalloq >>>> >> >>> >> ><shareef@jalloq.co.uk> >>>> >> >>> >> >>>> >wrote: >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>>> So my engine has gone down and I can't ssh into >it >>>> >> >either. >>>> >> >>> >If >>>> >> >>> >> >I >>>> >> >>> >> >>>> >try to >>>> >> >>> >> >>>> >>>> log into the web-ui of the node it is running >on, I >>>> >get >>>> >> >>> >> >redirected >>>> >> >>> >> >>>> >because >>>> >> >>> >> >>>> >>>> the node can't reach the engine. >>>> >> >>> >> >>>> >>>> >>>> >> >>> >> >>>> >>>> What are my next steps? >>>> >> >>> >> >>>> >>>> >>>> >> >>> >> >>>> >>>> Shareef. >>>> >> >>> >> >>>> >>>>
>>>> >> >>> >> >>>> >>>> Users mailing list -- users@ovirt.org >>>> >> >>> >> >>>> >>>> To unsubscribe send an email to >>>> >users-leave@ovirt.org >>>> >> >>> >> >>>> >>>> Privacy Statement: >>>> >> >>> >https://www.ovirt.org/privacy-policy.html >>>> >> >>> >> >>>> >>>> oVirt Code of Conduct: >>>> >> >>> >> >>>> >>>> >>>> >> https://www.ovirt.org/community/about/community-guidelines/ >>>> >> >>> >> >>>> >>>> List Archives: >>>> >> >>> >> >>>> >>>> >>>> >> >>> >> >>>> > >>>> >> >>> >> >>>> >>>> >> >>> >> > >>>> >> >>> >> >>>> >> >>> > >>>> >> >>> >>>> >> > >>>> >> >>>> > >>>> >
>>>> >> >>> >> >>>> >>>> >>>> >> >>> >> >>>> >>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> This has to be resolved: >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Engine status : unknown >stale-data >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Run again 'hosted-engine --vm-status'. If it remains >the >>>> >> >same, >>>> >> >>> >> >restart >>>> >> >>> >> >>>> ovirt-ha-broker.service & ovirt-ha-agent.service >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Verify that the engine's storage is available. Then >>>> >monitor >>>> >> >the >>>> >> >>> >> >broker >>>> >> >>> >> >>>> & agent logs in /var/log/ovirt-hosted-engine-ha >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> Best Regards, >>>> >> >>> >> >>>> Strahil Nikolov >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >>>> >> >>> >> >>>> >> >>> >> Hi Shareef, >>>> >> >>> >> >>>> >> >>> >> The flow of activation oVirt is more complex
plain
>KVM. >>>> >> >>> >> Mounting of the domains happen during the activation of >the >>>> >node >>>> >> >( >>>> >> >>> >the >>>> >> >>> >> HostedEngine is activating everything needed). >>>> >> >>> >> >>>> >> >>> >> Focus on the HostedEngine VM. >>>> >> >>> >> Is it running properly ? >>>> >> >>> >> >>>> >> >>> >> If not,try: >>>> >> >>> >> 1. Verify that the storage domain exists >>>> >> >>> >> 2. Check if it has 'ha_agents' directory >>>> >> >>> >> 3. Check if the links are OK, if not you can safely >remove >>>> >the >>>> >> >links >>>> >> >>> >> >>>> >> >>> >> 4. Next check the services are running: >>>> >> >>> >> A) sanlock >>>> >> >>> >> B) supervdsmd >>>> >> >>> >> C) vdsmd >>>> >> >>> >> D) libvirtd >>>> >> >>> >> >>>> >> >>> >> 5. Increase the log level for broker and agent services: >>>> >> >>> >> >>>> >> >>> >> cd /etc/ovirt-hosted-engine-ha >>>> >> >>> >> vim *-log.conf >>>> >> >>> >> >>>> >> >>> >> systemctl restart ovirt-ha-broker ovirt-ha-agent >>>> >> >>> >> >>>> >> >>> >> 6. Check what they are complaining about >>>> >> >>> >> Keep in mind that agent will keep throwing errors untill >the >>>> >> >broker >>>> >> >>> >stops >>>> >> >>> >> doing it (agent depends on broker), so broker must be >OK >>>> >before >>>> >> >>> >> peoceeding with the agent log. >>>> >> >>> >> >>>> >> >>> >> About the manual VM start, you need 2 things: >>>> >> >>> >> >>>> >> >>> >> 1. Define the VM network >>>> >> >>> >> # cat vdsm-ovirtmgmt.xml <network> >>>> >> >>> >> <name>vdsm-ovirtmgmt</name> >>>> >> >>> >> <uuid>8ded486e-e681-4754-af4b-5737c2b05405</uuid> >>>> >> >>> >> <forward mode='bridge'/> >>>> >> >>> >> <bridge name='ovirtmgmt'/> >>>> >> >>> >> </network> >>>> >> >>> >> >>>> >> >>> >> [root@ovirt1 HostedEngine-RECOVERY]# virsh define >>>> >> >vdsm-ovirtmgmt.xml >>>> >> >>> >> >>>> >> >>> >> 2. Get an xml definition which can be found in
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W7BP57OCIRSW5C... than a the
vdsm
>log. >>>> >> >Every VM >>>> >> >>> >at >>>> >> >>> >> start up has it's configuration printed out in vdsm log >on >>>> >the >>>> >> >host >>>> >> >>> >it >>>> >> >>> >> starts. >>>> >> >>> >> Save to file and then: >>>> >> >>> >> A) virsh define myvm.xml >>>> >> >>> >> B) virsh start myvm >>>> >> >>> >> >>>> >> >>> >> It seems there is/was a problem with your NFS shares. >>>> >> >>> >> >>>> >> >>> >> >>>> >> >>> >> Best Regards, >>>> >> >>> >> Strahil Nikolov >>>> >> >>> >> >>>> >> >>> >>>> >> >>> Hey Shareef, >>>> >> >>> >>>> >> >>> Check if there are any files or folders not owned by >vdsm:kvm . >>>> >> >Something >>>> >> >>> like this: >>>> >> >>> >>>> >> >>> find . -not -user 36 -not -group 36 -print >>>> >> >>> >>>> >> >>> Also check if vdsm can access the images in the >>>> >> >>> '<vol-mount-point>/images' directories. >>>> >> >>> >>>> >> >>> Best Regards, >>>> >> >>> Strahil Nikolov >>>> >> >>> >>>> >> >> >>>> >> >>>> >> And the IPv6 address '64:ff9b::c0a8:13d' ? >>>> >> >>>> >> I don't see in the log output. >>>> >> >>>> >> Best Regards, >>>> >> Strahil Nikolov >>>> >> >>>> >>>> Based on your output , you got a PTR record for IPv4 & IPv6 ... >most >>>> probably it's the reason. >>>> >>>> Set the IPv6 on the interface and try again. >>>> >>>> Best Regards, >>>> Strahil Nikolov >>>> >>>
Do you have firewalld up and running on the host ?
Best Regards, Strahil Nikolov
I am guessing, but your interface is not asaigned to any zone , right? Just add the interface to the default zone (usually 'public').
Best Regards, Strahil Nikolov
Keep in mind that there are a lot of playbooks that can be used to deploy a HostedEngine Environment via ansible.
Keep in mind that if you plan to use oVirt in Prod, you need to know how to debug it (at least on basic level).
Best Regards, Strahil Nikolov
It's really interesting that you mention that topic. The only way I managed to break my engine was: A) bad SELINUX rpm which was solved via reinstall of the package and relabel B) Interrupted patch, as I forgot to use screen I think it is Prod ready, but it requires knowledge as it is not as dummy-proof like VMware. Yet, oVirt is way more flexible allowing you to run your own scripts before/during/after a certain event (vdsm hooks). Sadly Ansible (this is what is used for setup of gluster -> gdeploy, and for the engine) is quite dynamic and sometimes something might break. If you feel that oVirt breaks too often - just set your engine on a separate physical or virtual (non-hosted) machine, but do not complain that a free open-source product is not Production ready, just because you don't know how to debug it. You can trial the downstream solutions from Red Hat & Oracle and you will notice the difference. For me oVirt is like Fedora compared to RHEL/OEL/CentOS, but this is just a personal opinion. Best Regards, Strahil Nikolov