[ovirt-users] Testing self hosted engine in 3.6: hostname not resolved error

Fri Oct 23 14:42:38 UTC 2015

On Fri, Oct 23, 2015 at 3:57 PM, Gianluca Cecchi <gianluca.cecchi at gmail.com>
wrote:

> On Thu, Oct 22, 2015 at 5:38 PM, Simone Tiraboschi <stirabos at redhat.com>
> wrote:
>
>>
>>
>>> In case I want to setup a single host with self hosted engine, could I
>>> configure on hypervisor
>>> a) one NFS share for sh engine
>>> b) one NFS share for ISO DOMAIN
>>> c) a local filesystem to be used to create then a local POSIX complant
>>> FS storage domain
>>> and work this way as a replacement of all-in-one?
>>>
>>
>> Yes but c is just a workaround, using another external NFS share would
>> help a lot if in the future you plan to add o to migrate to a new server.
>>
>
> Why do you see this as a workaround, if I plan to have this for example as
> a devel personal infra without no other hypervisors?
> I think about better performance directly going local instead of adding
> overhead of NFS with itself....
>

Just cause you are using as a shared storage something that is not really
shared.

>
>>>> Put the host in global maintenance (otherwise the engine VM will be
>>>> restarted)
>>>> Shutdown the engine VM
>>>> Shutdown the host
>>>>
>>>>
>>>
> Please note that at some point I had to power off the hypervisor in the
> previous step, because it was stalled trying to stop two processes:
> "Watchdog Multiplexing Daemon"
> and
> "Shared Storage Lease Manager"
>
> https://drive.google.com/file/d/0BwoPbcrMv8mvTVoyNzhRNGpqN1U/view?usp=sharing
>
> It was apparently able to stop the "Watchdog Multiplexing Daemon" after
> some minutes
>
> https://drive.google.com/file/d/0BwoPbcrMv8mvZExNNkw5LVBiXzA/view?usp=sharing
>
> But no way for the Shared Storage Lease Manager and the screen above is
> when I forced a power off yesterday, after global maintenance and correct
> shutdown of sh engine and shutdown of hypervisor stalled.
>
>
>
>
>
>>
>>>>>
>>> Ok. And for starting all again, is this correct:
>>>
>>> a) power on hypevisor
>>> b) hosted-engine --set-maintenance --mode=none
>>>
>>> other steps required?
>>>
>>>
>> No, that's correct
>>
>
>
> Today after powering on hypervisor and waiting about 6 minutes I then ran:
>
>  [root at ovc71 ~]# ps -ef|grep qemu
> root      2104  1985  0 15:41 pts/0    00:00:00 grep --color=auto qemu
>
> --> as expected no VM in execution
>
> [root at ovc71 ~]# systemctl status vdsmd
> vdsmd.service - Virtual Desktop Server Manager
>    Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
>    Active: active (running) since Fri 2015-10-23 15:34:46 CEST; 3min 25s
> ago
>   Process: 1666 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh
> --pre-start (code=exited, status=0/SUCCESS)
>  Main PID: 1745 (vdsm)
>    CGroup: /system.slice/vdsmd.service
>            ├─1745 /usr/bin/python /usr/share/vdsm/vdsm
>            └─1900 /usr/libexec/ioprocess --read-pipe-fd 56 --write-pipe-fd
> 55 --max-threads 10 --...
>
> Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 client
> step 1
> Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5
> ask_user_info()
> Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 client
> step 1
> Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5
> ask_user_info()
> Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5
> make_client_response()
> Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 client
> step 2
> Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5
> parse_server_challenge()
> Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5
> ask_user_info()
> Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5
> make_client_response()
> Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 client
> step 3
>
> --> I think it is expected that vdsmd starts anyway, even in global
> maintenance, is it correct?
>
> But then:
>
> [root at ovc71 ~]# hosted-engine --set-maintenance --mode=none
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
>     "__main__", fname, loader, pkg_name)
>   File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
>     exec code in run_globals
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py",
> line 73, in <module>
>     if not maintenance.set_mode(sys.argv[1]):
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py",
> line 61, in set_mode
>     value=m_global,
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
> line 259, in set_maintenance_mode
>     str(value))
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
> line 201, in set_global_md_flag
>     with broker.connection(self._retries, self._wait):
>   File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
>     return self.gen.next()
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> line 99, in connection
>     self.connect(retries, wait)
>   File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> line 78, in connect
>     raise BrokerConnectionError(error_msg)
> ovirt_hosted_engine_ha.lib.exceptions.BrokerConnectionError: Failed to
> connect to broker, the number of errors has exceeded the limit (1)
>
> What to do next?
>

Are ovirt-ha-agent and ovirt-ha-broker up and running?
Can you please try to restart them via systemd?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20151023/46a7cd5e/attachment-0001.html>