[ovirt-users] Unable to get HE up after update

Simone Tiraboschi stirabos at redhat.com
Fri Oct 21 10:35:43 UTC 2016


On Fri, Oct 21, 2016 at 11:10 AM, Susinthiran Sithamparanathan <
chesusin at gmail.com> wrote:

> Hi,
>
> i did run that command from the engine (now the hostname is changed to
> susin.myftp.org -> 192.168.0.101 ) and got:
> [root at susin ~]# cat < /dev/tcp/susin/54321
> -bash: connect: Connection refused
> -bash: /dev/tcp/susin/54321: Connection refused
> [root at susin ~]# cat < /dev/tcp/susin.myftp.org/54321
> -bash: connect: Connection refused
> -bash: /dev/tcp/susin.myftp.org/54321: Connection refused
> [root at susin ~]# cat < /dev/tcp/192.168.0.101/54321
> -bash: connect: Connection refused
> -bash: /dev/tcp/192.168.0.101/54321: Connection refused
> [root at susin ~]#
>

The engine should be able to reach vdsm:
please fix this before touching anything else.


>
> Both host and engine is behing a NAT and I've configured /etc/hosts
> correctly so the hosts ping by name from the engine and host. The hostname
> of the engine is susin.myftp.org so using dig or host, it will resolve to
> my public IP, and pinging will resolve correctly.
>
> But now i came over http://www.ovirt.org/documentation/how-to/
> networking/changing-engine-hostname/ since i actually changed the
> hostname for the engine to be able to login through the web UI.
> Especially the following "The bigger concern is with the engine's
> certificate. Currently, to the best of our knowledge, there is no component
> that actually checks this trust. But it's possible, that in some future
> version of one of the relevant tools - vdsm, libvirt, etc. - such a check
> will actually be made, and even prevent connections. If this happens, the
> engine might not be able to connect to the hosts, and the worst case is
> that they will have to be reinstalled, thus loosing all the configuration
> and data accumulated by then."
>
> tail -f /var/log/ovirt-engine/engine.log
> 2016-10-21 11:05:16,888 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
> (DefaultQuartzScheduler1) [] Command 'GetAllVmStatsVDSCommand(HostName =
> hosted_engine_1, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
> hostId='826a8da5-74c1-4002-ab7b-e6e32be94fe6', vds='Host[hosted_engine_1,
> 826a8da5-74c1-4002-ab7b-e6e32be94fe6]'})' execution failed:
> VDSGenericException: VDSNetworkException: Vds timeout occured
> 2016-10-21 11:05:16,888 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher]
> (DefaultQuartzScheduler1) [] Failed to fetch vms info for host
> 'hosted_engine_1' - skipping VMs monitoring.
> 2016-10-21 11:05:16,918 ERROR [org.ovirt.engine.core.dal.
> dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler4) []
> Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM
> hosted_engine_1 command failed: Message timeout which can be caused by
> communication issues
> 2016-10-21 11:05:16,918 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
> (DefaultQuartzScheduler4) [] Command 'org.ovirt.engine.core.
> vdsbroker.vdsbroker.GetCapabilitiesVDSCommand' return value
> 'org.ovirt.engine.core.vdsbroker.vdsbroker.VDSInfoReturnForXmlRpc@
> 1f2e4065'
> 2016-10-21 11:05:16,918 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
> (DefaultQuartzScheduler4) [] HostName = hosted_engine_1
> 2016-10-21 11:05:16,919 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
> (DefaultQuartzScheduler4) [] Command 'GetCapabilitiesVDSCommand(HostName
> = hosted_engine_1, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
> hostId='826a8da5-74c1-4002-ab7b-e6e32be94fe6', vds='Host[hosted_engine_1,
> 826a8da5-74c1-4002-ab7b-e6e32be94fe6]'})' execution failed:
> VDSGenericException: VDSNetworkException: Message timeout which can be
> caused by communication issues
> 2016-10-21 11:05:16,919 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
> (DefaultQuartzScheduler4) [] Failure to refresh Vds runtime info:
> VDSGenericException: VDSNetworkException: Message timeout which can be
> caused by communication issues
> 2016-10-21 11:05:16,919 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
> (DefaultQuartzScheduler4) [] Exception: org.ovirt.engine.core.
> vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException:
> VDSNetworkException: Message timeout which can be caused by communication
> issues
>     at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.
> proceedProxyReturnValue(BrokerCommandBase.java:188) [vdsbroker.jar:]
>     at org.ovirt.engine.core.vdsbroker.vdsbroker.
> GetCapabilitiesVDSCommand.executeVdsBrokerCommand(
> GetCapabilitiesVDSCommand.java:16) [vdsbroker.jar:]
>     at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.
> executeVDSCommand(VdsBrokerCommand.java:110) [vdsbroker.jar:]
>     at org.ovirt.engine.core.vdsbroker.VDSCommandBase.
> executeCommand(VDSCommandBase.java:73) [vdsbroker.jar:]
>     at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33)
> [dal.jar:]
>     at org.ovirt.engine.core.vdsbroker.ResourceManager.
> runVdsCommand(ResourceManager.java:451) [vdsbroker.jar:]
>     at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:653)
> [vdsbroker.jar:]
>     at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.
> refreshVdsRunTimeInfo(HostMonitoring.java:121) [vdsbroker.jar:]
>     at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refresh(HostMonitoring.java:85)
> [vdsbroker.jar:]
>     at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:238)
> [vdsbroker.jar:]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [rt.jar:1.8.0_102]
>     at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_102]
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_102]
>     at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_102]
>     at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77)
> [scheduler.jar:]
>     at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51)
> [scheduler.jar:]
>     at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [rt.jar:1.8.0_102]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [rt.jar:1.8.0_102]
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [rt.jar:1.8.0_102]
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [rt.jar:1.8.0_102]
>     at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_102]
>
> 2016-10-21 11:05:16,921 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager]
> (DefaultQuartzScheduler4) [] Failed to refresh VDS, network error,
> continuing, vds='hosted_engine_1'(826a8da5-74c1-4002-ab7b-e6e32be94fe6):
> VDSGenericException: VDSNetworkException: Message timeout which can be
> caused by communication issues
> 2016-10-21 11:05:16,921 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager]
> (org.ovirt.thread.pool-8-thread-1) [] Host 'hosted_engine_1' is not
> responding.
> 2016-10-21 11:05:16,993 WARN  [org.ovirt.engine.core.dal.
> dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-1)
> [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message:
> Host hosted_engine_1 is not responding. Host cannot be fenced automatically
> because power management for the host is disabled.
> 2016-10-21 11:05:19,943 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient]
> (SSL Stomp Reactor) [] Connecting to ovirt01/192.168.0.100
> 2016-10-21 11:05:19,960 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor]
> (SSL Stomp Reactor) [] Unable to process messages: General SSLEngine problem
> ..
> 2016-10-21 11:08:40,871 WARN  [org.ovirt.engine.core.bll.pm.
> VdsNotRespondingTreatmentCommand] (org.ovirt.thread.pool-8-thread-6)
> [6d681722] Validation of action 'VdsNotRespondingTreatment' failed for user
> SYSTEM. Reasons: VAR__ACTION__RESTART,POWER_MANAGEMENT_ACTION_ON_ENTITY_
> ALREADY_IN_PROGRESS
> 2016-10-21 11:08:41,023 INFO  [org.ovirt.engine.core.bll.pm.
> VdsNotRespondingTreatmentCommand] (org.ovirt.thread.pool-8-thread-5)
> [193e2e22] Running command: VdsNotRespondingTreatmentCommand internal:
> true. Entities affected :  ID: 826a8da5-74c1-4002-ab7b-e6e32be94fe6 Type:
> VDS
> 2016-10-21 11:08:41,088 INFO  [org.ovirt.engine.core.bll.pm.SshSoftFencingCommand]
> (org.ovirt.thread.pool-8-thread-5) [193e2e22] Running command:
> SshSoftFencingCommand internal: true. Entities affected :  ID:
> 826a8da5-74c1-4002-ab7b-e6e32be94fe6 Type: VDS
> 2016-10-21 11:08:41,116 INFO  [org.ovirt.engine.core.bll.pm.SshSoftFencingCommand]
> (org.ovirt.thread.pool-8-thread-5) [193e2e22] Opening SSH Soft Fencing
> session on host 'ovirt01'
> 2016-10-21 11:08:41,470 ERROR [org.ovirt.engine.core.bll.pm.SshSoftFencingCommand]
> (org.ovirt.thread.pool-8-thread-5) [193e2e22] SSH Soft Fencing command
> failed on host 'ovirt01': SSH authentication to 'root at ovirt01' failed.
> Please verify provided credentials. Make sure key is authorized at host
> Stdout:
> Stderr:
> 2016-10-21 11:08:41,483 INFO  [org.ovirt.engine.core.bll.pm.SshSoftFencingCommand]
> (org.ovirt.thread.pool-8-thread-5) [193e2e22] Lock freed to object
> 'EngineLock:{exclusiveLocks='[826a8da5-74c1-4002-ab7b-e6e32be94fe6=<VDS_FENCE,
> POWER_MANAGEMENT_ACTION_ON_ENTITY_ALREADY_IN_PROGRESS>]',
> sharedLocks='null'}'
> 2016-10-21 11:08:41,545 WARN  [org.ovirt.engine.core.bll.lock.InMemoryLockManager]
> (org.ovirt.thread.pool-8-thread-5) [193e2e22] Trying to release exclusive
> lock which does not exist, lock key: '826a8da5-74c1-4002-ab7b-
> e6e32be94fe6VDS_FENCE'
> 2016-10-21 11:08:41,547 INFO  [org.ovirt.engine.core.bll.pm.
> VdsNotRespondingTreatmentCommand] (org.ovirt.thread.pool-8-thread-5)
> [193e2e22] Lock freed to object 'EngineLock:{exclusiveLocks='[
> 826a8da5-74c1-4002-ab7b-e6e32be94fe6=<VDS_FENCE,
> POWER_MANAGEMENT_ACTION_ON_ENTITY_ALREADY_IN_PROGRESS>]',
> sharedLocks='null'}'
> 2016-10-21 11:08:43,503 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient]
> (SSL Stomp Reactor) [] Connecting to ovirt01/192.168.0.100
> 2016-10-21 11:08:43,517 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor]
> (SSL Stomp Reactor) [] Unable to process messages: General SSLEngine problem
> 2016-10-21 11:08:55,446 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient]
> (SSL Stomp Reactor) [] Connecting to ovirt01/192.168.0.100
> 2016-10-21 11:08:55,461 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor]
> (SSL Stomp Reactor) [] Unable to process messages: General SSLEngine problem
>
>
>
> Since the engine is complaining about SSL communication error, i suspect
> the problem is there.
>
> Is there still any ways to save my VMs or do i have to reinstall it?
>
>
>
>
>
> On Thu, Oct 20, 2016 at 6:27 PM, Simone Tiraboschi <stirabos at redhat.com>
> wrote:
>
>>
>>
>> On Thu, Oct 20, 2016 at 6:16 PM, Susinthiran Sithamparanathan <
>> chesusin at gmail.com> wrote:
>>
>>> Hi,
>>> still unable to get my system up with my VMs. Inside the web UI i can
>>> see a warning at the bottom: Host hosted_engine_1 is non responsive.
>>> When i try o activate master data domain NFS01, i get :Error while
>>> executing action: Cannot activate Storage. There is no active Host in the
>>> Data Center.
>>> The Data Center tab shows "VDSM hosted_engine_1 command failed: Message
>>> timeout which can be caused by communication issues" at the bottom.
>>> I can't see why the host isn't active in the engine-vm.
>>> Any help appreciated.Thanks.
>>>
>>>
>> Can you please try
>>  cat < /dev/tcp/<yourhostaddress>/54321
>> from the engine VM?
>> If it's not able to connect, please check name resolution, addressing and
>> so on.
>>
>>
>>>
>>>
>>>
>>> On Mon, Oct 17, 2016 at 5:00 PM, Susinthiran Sithamparanathan <
>>> chesusin at gmail.com> wrote:
>>>
>>>> Now after a long time i got the prompt to login.
>>>> What i see is that things are still down and unable to activate
>>>> anything. I see
>>>> [image: Inline image 1]
>>>> This host is in non responding state. Try to Activate it; If the
>>>> problem persists, switch Host to Maintenance mode and try to reinstall it.
>>>>
>>>>
>>>> On Mon, Oct 17, 2016 at 4:51 PM, Susinthiran Sithamparanathan <
>>>> chesusin at gmail.com> wrote:
>>>>
>>>>> Thanks.Savior at https://www.mail-archive.com/u
>>>>> sers at ovirt.org/msg33874.html.
>>>>> When i logged into the web UI, i couldn't bring up storage,
>>>>> datacenter, cluster, everything was down.
>>>>> I restarted the host and now when i enter admin portal, it spins for
>>>>> ever. Seems to be some SSL communciation issues:
>>>>> https://paste.fedoraproject.org/453944/76715737/
>>>>> Any hints are appreciated!
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 17, 2016 at 4:15 PM, Simone Tiraboschi <
>>>>> stirabos at redhat.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 17, 2016 at 3:48 PM, Susinthiran Sithamparanathan <
>>>>>> chesusin at gmail.com> wrote:
>>>>>>
>>>>>>> Got the engine up finally :)
>>>>>>> But now met with The client is not authorized to request an
>>>>>>> authorization. It's required to access the system using FQDN. It worked
>>>>>>> fine prior to upgrade!
>>>>>>>
>>>>>>
>>>>>> This is a new feature of 4.0; you cannot login anymore with the IP
>>>>>> address if the cert has been signed for an fqdn.
>>>>>>
>>>>>>
>>>>>>> Found https://bugzilla.redhat.com/show_bug.cgi?id=1351217, but not
>>>>>>> sure if i have FQDN case issues.
>>>>>>> Any idea how to fix this?
>>>>>>>
>>>>>>> On Mon, Oct 17, 2016 at 3:33 PM, Susinthiran Sithamparanathan <
>>>>>>> chesusin at gmail.com> wrote:
>>>>>>>
>>>>>>>> Analyzed the log to find out that the problem was in the creation
>>>>>>>> of the certs with openssl (missing distinguished name in config). And
>>>>>>>> /etc/pki/ovirt-engine/{cacert,openssl}.conf were empty!
>>>>>>>> That lead me to:
>>>>>>>> yum provides  /etc/pki/ovirt-engine/{cacert,openssl}.conf
>>>>>>>> yum remove  ovirt-engine-backend
>>>>>>>> yum install ovirt-engine-backend ovirt-engine
>>>>>>>> ovirt-engine-dashboard  ovirt-engine-setup ovirt-engine-tools
>>>>>>>> ovirt-engine-userportal ovirt-engine-webadmin-portal  ovirt-engine-restapi
>>>>>>>> ovirt-engine-dashboard
>>>>>>>>
>>>>>>>> Now i was able to successfully run engine-setup and exit
>>>>>>>> maintenance mode on the host. Let's see how things unfold within a 30 min.
>>>>>>>> Will keep you updated!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 17, 2016 at 3:14 PM, Susinthiran Sithamparanathan <
>>>>>>>> chesusin at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>> i tried that and ended up with : https://paste.fedoraproject.or
>>>>>>>>> g/453892/71003314/  :(
>>>>>>>>> Log ovirt-engine-setup-20161017150800-drmayj.log uploaded to
>>>>>>>>> https://my.owndrive.com/index.php/s/3Dcyho9bqo7oZs8?path=%2F
>>>>>>>>> engine-vm
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Oct 17, 2016 at 11:54 AM, Simone Tiraboschi <
>>>>>>>>> stirabos at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 17, 2016 at 9:55 AM, Susinthiran Sithamparanathan <
>>>>>>>>>> chesusin at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi guys,
>>>>>>>>>>> let me know if there anything else you need for further
>>>>>>>>>>> debugging purpose.
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Can you please try reinstalling all the oVirt rpms on the engine
>>>>>>>>>> VM and re-executing engine-setup there?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 13, 2016 at 7:53 PM, Susinthiran Sithamparanathan <
>>>>>>>>>>> chesusin at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Oct 13, 2016 at 3:23 PM, Yedidyah Bar David <
>>>>>>>>>>>> didi at redhat.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> OK. Can you please attach the output of:
>>>>>>>>>>>>>
>>>>>>>>>>>>> grep MANUAL /etc/ovirt-engine/engine.conf.d/*.conf
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  [root at ovirt01 ~]# grep MANUAL /etc/ovirt-engine/engine.conf.
>>>>>>>>>>>> d/*.conf
>>>>>>>>>>>> [root at ovirt01 ~]# ssh 192.168.0.101
>>>>>>>>>>>> root at 192.168.0.101's password:
>>>>>>>>>>>> Last login: Thu Oct 13 19:50:02 2016 from ovirt01
>>>>>>>>>>>> [root at engine ~]# grep MANUAL /etc/ovirt-engine/engine.conf.
>>>>>>>>>>>> d/*.conf
>>>>>>>>>>>> [root at engine ~]#
>>>>>>>>>>>>
>>>>>>>>>>>> I.e nothing found by grep for that search on the host and the
>>>>>>>>>>>> engine-vm.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> Susinthiran Sithamparanathan
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Susinthiran Sithamparanathan
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Susinthiran Sithamparanathan
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Susinthiran Sithamparanathan
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Susinthiran Sithamparanathan
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Susinthiran Sithamparanathan
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Susinthiran Sithamparanathan
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Susinthiran Sithamparanathan
>>>
>>
>>
>
>
> --
>
> Susinthiran Sithamparanathan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20161021/39239038/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 99 bytes
Desc: not available
URL: <http://lists.ovirt.org/pipermail/users/attachments/20161021/39239038/attachment-0001.png>


More information about the Users mailing list