On Fri, Oct 21, 2016 at 11:10 AM, Susinthiran Sithamparanathan <chesusin@gmail.com> wrote:
Hi,

i did run that command from the engine (now the hostname is changed to susin.myftp.org -> 192.168.0.101 ) and got:
[root@susin ~]# cat < /dev/tcp/susin/54321
-bash: connect: Connection refused
-bash: /dev/tcp/susin/54321: Connection refused
[root@susin ~]# cat < /dev/tcp/susin.myftp.org/54321
-bash: connect: Connection refused
-bash: /dev/tcp/susin.myftp.org/54321: Connection refused
[root@susin ~]# cat < /dev/tcp/192.168.0.101/54321
-bash: connect: Connection refused
-bash: /dev/tcp/192.168.0.101/54321: Connection refused
[root@susin ~]#

The engine should be able to reach vdsm:
please fix this before touching anything else.
 

Both host and engine is behing a NAT and I've configured /etc/hosts correctly so the hosts ping by name from the engine and host. The hostname of the engine is susin.myftp.org so using dig or host, it will resolve to my public IP, and pinging will resolve correctly.

But now i came over http://www.ovirt.org/documentation/how-to/networking/changing-engine-hostname/ since i actually changed the hostname for the engine to be able to login through the web UI.
Especially the following "The bigger concern is with the engine's certificate. Currently, to the best of our knowledge, there is no component that actually checks this trust. But it's possible, that in some future version of one of the relevant tools - vdsm, libvirt, etc. - such a check will actually be made, and even prevent connections. If this happens, the engine might not be able to connect to the hosts, and the worst case is that they will have to be reinstalled, thus loosing all the configuration and data accumulated by then."

tail -f /var/log/ovirt-engine/engine.log
2016-10-21 11:05:16,888 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler1) [] Command 'GetAllVmStatsVDSCommand(HostName = hosted_engine_1, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='826a8da5-74c1-4002-ab7b-e6e32be94fe6', vds='Host[hosted_engine_1,826a8da5-74c1-4002-ab7b-e6e32be94fe6]'})' execution failed: VDSGenericException: VDSNetworkException: Vds timeout occured
2016-10-21 11:05:16,888 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher] (DefaultQuartzScheduler1) [] Failed to fetch vms info for host 'hosted_engine_1' - skipping VMs monitoring.
2016-10-21 11:05:16,918 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler4) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM hosted_engine_1 command failed: Message timeout which can be caused by communication issues
2016-10-21 11:05:16,918 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler4) [] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand' return value 'org.ovirt.engine.core.vdsbroker.vdsbroker.VDSInfoReturnForXmlRpc@1f2e4065'
2016-10-21 11:05:16,918 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler4) [] HostName = hosted_engine_1
2016-10-21 11:05:16,919 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler4) [] Command 'GetCapabilitiesVDSCommand(HostName = hosted_engine_1, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='826a8da5-74c1-4002-ab7b-e6e32be94fe6', vds='Host[hosted_engine_1,826a8da5-74c1-4002-ab7b-e6e32be94fe6]'})' execution failed: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues
2016-10-21 11:05:16,919 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler4) [] Failure to refresh Vds runtime info: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues
2016-10-21 11:05:16,919 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler4) [] Exception: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues
    at org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:188) [vdsbroker.jar:]
    at org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:16) [vdsbroker.jar:]
    at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:110) [vdsbroker.jar:]
    at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:73) [vdsbroker.jar:]
    at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]
    at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:451) [vdsbroker.jar:]
    at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:653) [vdsbroker.jar:]
    at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:121) [vdsbroker.jar:]
    at org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring.refresh(HostMonitoring.java:85) [vdsbroker.jar:]
    at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:238) [vdsbroker.jar:]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_102]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_102]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_102]
    at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_102]
    at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77) [scheduler.jar:]
    at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51) [scheduler.jar:]
    at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_102]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_102]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_102]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_102]
    at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_102]

2016-10-21 11:05:16,921 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler4) [] Failed to refresh VDS, network error, continuing, vds='hosted_engine_1'(826a8da5-74c1-4002-ab7b-e6e32be94fe6): VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues
2016-10-21 11:05:16,921 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-8-thread-1) [] Host 'hosted_engine_1' is not responding.
2016-10-21 11:05:16,993 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-1) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host hosted_engine_1 is not responding. Host cannot be fenced automatically because power management for the host is disabled.
2016-10-21 11:05:19,943 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to ovirt01/192.168.0.100
2016-10-21 11:05:19,960 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages: General SSLEngine problem
..
2016-10-21 11:08:40,871 WARN  [org.ovirt.engine.core.bll.pm.VdsNotRespondingTreatmentCommand] (org.ovirt.thread.pool-8-thread-6) [6d681722] Validation of action 'VdsNotRespondingTreatment' failed for user SYSTEM. Reasons: VAR__ACTION__RESTART,POWER_MANAGEMENT_ACTION_ON_ENTITY_ALREADY_IN_PROGRESS
2016-10-21 11:08:41,023 INFO  [org.ovirt.engine.core.bll.pm.VdsNotRespondingTreatmentCommand] (org.ovirt.thread.pool-8-thread-5) [193e2e22] Running command: VdsNotRespondingTreatmentCommand internal: true. Entities affected :  ID: 826a8da5-74c1-4002-ab7b-e6e32be94fe6 Type: VDS
2016-10-21 11:08:41,088 INFO  [org.ovirt.engine.core.bll.pm.SshSoftFencingCommand] (org.ovirt.thread.pool-8-thread-5) [193e2e22] Running command: SshSoftFencingCommand internal: true. Entities affected :  ID: 826a8da5-74c1-4002-ab7b-e6e32be94fe6 Type: VDS
2016-10-21 11:08:41,116 INFO  [org.ovirt.engine.core.bll.pm.SshSoftFencingCommand] (org.ovirt.thread.pool-8-thread-5) [193e2e22] Opening SSH Soft Fencing session on host 'ovirt01'
2016-10-21 11:08:41,470 ERROR [org.ovirt.engine.core.bll.pm.SshSoftFencingCommand] (org.ovirt.thread.pool-8-thread-5) [193e2e22] SSH Soft Fencing command failed on host 'ovirt01': SSH authentication to 'root@ovirt01' failed. Please verify provided credentials. Make sure key is authorized at host
Stdout:
Stderr:
2016-10-21 11:08:41,483 INFO  [org.ovirt.engine.core.bll.pm.SshSoftFencingCommand] (org.ovirt.thread.pool-8-thread-5) [193e2e22] Lock freed to object 'EngineLock:{exclusiveLocks='[826a8da5-74c1-4002-ab7b-e6e32be94fe6=<VDS_FENCE, POWER_MANAGEMENT_ACTION_ON_ENTITY_ALREADY_IN_PROGRESS>]', sharedLocks='null'}'
2016-10-21 11:08:41,545 WARN  [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (org.ovirt.thread.pool-8-thread-5) [193e2e22] Trying to release exclusive lock which does not exist, lock key: '826a8da5-74c1-4002-ab7b-e6e32be94fe6VDS_FENCE'
2016-10-21 11:08:41,547 INFO  [org.ovirt.engine.core.bll.pm.VdsNotRespondingTreatmentCommand] (org.ovirt.thread.pool-8-thread-5) [193e2e22] Lock freed to object 'EngineLock:{exclusiveLocks='[826a8da5-74c1-4002-ab7b-e6e32be94fe6=<VDS_FENCE, POWER_MANAGEMENT_ACTION_ON_ENTITY_ALREADY_IN_PROGRESS>]', sharedLocks='null'}'
2016-10-21 11:08:43,503 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to ovirt01/192.168.0.100
2016-10-21 11:08:43,517 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages: General SSLEngine problem
2016-10-21 11:08:55,446 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to ovirt01/192.168.0.100
2016-10-21 11:08:55,461 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) [] Unable to process messages: General SSLEngine problem



Since the engine is complaining about SSL communication error, i suspect the problem is there.

Is there still any ways to save my VMs or do i have to reinstall it?





On Thu, Oct 20, 2016 at 6:27 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:


On Thu, Oct 20, 2016 at 6:16 PM, Susinthiran Sithamparanathan <chesusin@gmail.com> wrote:
Hi,
still unable to get my system up with my VMs. Inside the web UI i can see a warning at the bottom: Host hosted_engine_1 is non responsive.
When i try o activate master data domain NFS01, i get :Error while executing action: Cannot activate Storage. There is no active Host in the Data Center.
The Data Center tab shows "VDSM hosted_engine_1 command failed: Message timeout which can be caused by communication issues" at the bottom.
I can't see why the host isn't active in the engine-vm.
Any help appreciated.Thanks.


Can you please try 
 cat < /dev/tcp/<yourhostaddress>/54321
from the engine VM?
If it's not able to connect, please check name resolution, addressing and so on. 
 



On Mon, Oct 17, 2016 at 5:00 PM, Susinthiran Sithamparanathan <chesusin@gmail.com> wrote:
Now after a long time i got the prompt to login.
What i see is that things are still down and unable to activate anything. I see
Inline image 1
This host is in non responding state. Try to Activate it; If the problem persists, switch Host to Maintenance mode and try to reinstall it.


On Mon, Oct 17, 2016 at 4:51 PM, Susinthiran Sithamparanathan <chesusin@gmail.com> wrote:
When i logged into the web UI, i couldn't bring up storage, datacenter, cluster, everything was down.
I restarted the host and now when i enter admin portal, it spins for ever. Seems to be some SSL communciation issues: https://paste.fedoraproject.org/453944/76715737/
Any hints are appreciated!



On Mon, Oct 17, 2016 at 4:15 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:


On Mon, Oct 17, 2016 at 3:48 PM, Susinthiran Sithamparanathan <chesusin@gmail.com> wrote:
Got the engine up finally :)
But now met with The client is not authorized to request an authorization. It's required to access the system using FQDN. It worked fine prior to upgrade!

This is a new feature of 4.0; you cannot login anymore with the IP address if the cert has been signed for an fqdn.
 
Found https://bugzilla.redhat.com/show_bug.cgi?id=1351217, but not sure if i have FQDN case issues.
Any idea how to fix this?

On Mon, Oct 17, 2016 at 3:33 PM, Susinthiran Sithamparanathan <chesusin@gmail.com> wrote:
Analyzed the log to find out that the problem was in the creation of the certs with openssl (missing distinguished name in config). And /etc/pki/ovirt-engine/{cacert,openssl}.conf were empty!
That lead me to:
yum provides  /etc/pki/ovirt-engine/{cacert,openssl}.conf
yum remove  ovirt-engine-backend
yum install ovirt-engine-backend ovirt-engine  ovirt-engine-dashboard  ovirt-engine-setup ovirt-engine-tools  ovirt-engine-userportal ovirt-engine-webadmin-portal  ovirt-engine-restapi  ovirt-engine-dashboard

Now i was able to successfully run engine-setup and exit maintenance mode on the host. Let's see how things unfold within a 30 min. Will keep you updated!









On Mon, Oct 17, 2016 at 3:14 PM, Susinthiran Sithamparanathan <chesusin@gmail.com> wrote:
Hi,
i tried that and ended up with : https://paste.fedoraproject.org/453892/71003314/  :(
Log ovirt-engine-setup-20161017150800-drmayj.log uploaded to https://my.owndrive.com/index.php/s/3Dcyho9bqo7oZs8?path=%2Fengine-vm



On Mon, Oct 17, 2016 at 11:54 AM, Simone Tiraboschi <stirabos@redhat.com> wrote:


On Mon, Oct 17, 2016 at 9:55 AM, Susinthiran Sithamparanathan <chesusin@gmail.com> wrote:
Hi guys,
let me know if there anything else you need for further debugging purpose.
Thanks!

Can you please try reinstalling all the oVirt rpms on the engine VM and re-executing engine-setup there?
 

On Thu, Oct 13, 2016 at 7:53 PM, Susinthiran Sithamparanathan <chesusin@gmail.com> wrote:
On Thu, Oct 13, 2016 at 3:23 PM, Yedidyah Bar David <didi@redhat.com> wrote:

OK. Can you please attach the output of:

grep MANUAL /etc/ovirt-engine/engine.conf.d/*.conf
 
 [root@ovirt01 ~]# grep MANUAL /etc/ovirt-engine/engine.conf.d/*.conf
[root@ovirt01 ~]# ssh 192.168.0.101
root@192.168.0.101's password:
Last login: Thu Oct 13 19:50:02 2016 from ovirt01
[root@engine ~]# grep MANUAL /etc/ovirt-engine/engine.conf.d/*.conf
[root@engine ~]#

I.e nothing found by grep for that search on the host and the engine-vm.


--

Susinthiran Sithamparanathan



--

Susinthiran Sithamparanathan




--

Susinthiran Sithamparanathan



--

Susinthiran Sithamparanathan



--

Susinthiran Sithamparanathan




--

Susinthiran Sithamparanathan



--

Susinthiran Sithamparanathan



--

Susinthiran Sithamparanathan




--

Susinthiran Sithamparanathan