Testing self hosted engine in 3.6: hostname not resolved error

Gianluca Cecchi

21 Oct 2015 21 Oct '15

4 p.m.

Hello, playing with an environment where I have a CentOS 7.1 + updates server with 3.6 repos configured. no DNS resolution in place. running the command hosted-engine --deploy I initially get, during the prompts inside setup, the message if I want to set hostname for the sh engine and if I want to setup /etc/hosts of hypervisor and sh engine so that it will add them with clud-init, that translates in these lines inside log file: 2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:510 ENV OVEHOSTED_VM/cloudinitInstanceDomainName=str:'localdomain.local' 2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:510 ENV OVEHOSTED_VM/cloudinitInstanceHostName=str:'shengine.localdomain.local' 2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:510 ENV OVEHOSTED_VM/cloudinitRootPwd=str:'**FILTERED**' 2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:510 ENV OVEHOSTED_VM/cloudinitVMDNS=str:'192.168.122.1' 2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:510 ENV OVEHOSTED_VM/cloudinitVMETCHOSTS=bool:'True' But then I get the error related to failure of sh engine hostname resolution and installation terminates, with these finale lines in log file..... 2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:510 ENV SYSTEM/reboot=bool:'False' 2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:510 ENV SYSTEM/rebootAllow=bool:'True' 2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:510 ENV SYSTEM/rebootDeferTime=int:'10' 2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:514 ENVIRONMENT DUMP - END 2015-10-21 15:45:29 DEBUG otopi.context context._executeMethod:142 Stage pre-terminate METHOD otopi.plugins.otopi.dialog.cli.Plugin._pre_terminate 2015-10-21 15:45:29 DEBUG otopi.context context._executeMethod:148 condition False 2015-10-21 15:45:29 INFO otopi.context context.runSequence:427 Stage: Termination 2015-10-21 15:45:29 DEBUG otopi.context context.runSequence:431 STAGE terminate 2015-10-21 15:45:29 DEBUG otopi.context context._executeMethod:142 Stage terminate METHOD otopi.plugins.otopi.dialog.human.Plugin._terminate 2015-10-21 15:45:29 DEBUG otopi.context context._executeMethod:142 Stage terminate METHOD otopi.plugins.otopi.dialog.machine.Plugin._terminate 2015-10-21 15:45:29 DEBUG otopi.context context._executeMethod:148 condition False 2015-10-21 15:45:29 DEBUG otopi.context context._executeMethod:142 Stage terminate METHOD otopi.plugins.otopi.core.log.Plugin._terminate and the message of a generated answer file. Why does it ask and then fails? Can I re-run using the answer file? In this case do I have to pre-insert sh engine hostname inside /etc/hosts of hypervisor? Thanks. Gianluca

Attachments:

attachment.html (text/html — 3.2 KB)

Show replies by date

Yedidyah Bar David

21 Oct 21 Oct

4:08 p.m.

On Wed, Oct 21, 2015 at 5:00 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

...

Hello, playing with an environment where I have a CentOS 7.1 + updates server with 3.6 repos configured. no DNS resolution in place. running the command hosted-engine --deploy

I initially get, during the prompts inside setup, the message if I want to set hostname for the sh engine and if I want to setup /etc/hosts of hypervisor

IIRC we only suggest to update /etc/hosts of the engine, not the host.

...

and sh engine so that it will add them with clud-init, that translates in these lines inside log file:

2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:510 ENV OVEHOSTED_VM/cloudinitInstanceDomainName=str:'localdomain.local' 2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:510 ENV OVEHOSTED_VM/cloudinitInstanceHostName=str:'shengine.localdomain.local' 2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:510 ENV OVEHOSTED_VM/cloudinitRootPwd=str:'**FILTERED**' 2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:510 ENV OVEHOSTED_VM/cloudinitVMDNS=str:'192.168.122.1' 2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:510 ENV OVEHOSTED_VM/cloudinitVMETCHOSTS=bool:'True'

But then I get the error related to failure of sh engine hostname resolution and installation terminates, with these finale lines in log file.....

2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:510 ENV SYSTEM/reboot=bool:'False' 2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:510 ENV SYSTEM/rebootAllow=bool:'True' 2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:510 ENV SYSTEM/rebootDeferTime=int:'10' 2015-10-21 15:45:29 DEBUG otopi.context context.dumpEnvironment:514 ENVIRONMENT DUMP - END 2015-10-21 15:45:29 DEBUG otopi.context context._executeMethod:142 Stage pre-terminate METHOD otopi.plugins.otopi.dialog.cli.Plugin._pre_terminate 2015-10-21 15:45:29 DEBUG otopi.context context._executeMethod:148 condition False 2015-10-21 15:45:29 INFO otopi.context context.runSequence:427 Stage: Termination 2015-10-21 15:45:29 DEBUG otopi.context context.runSequence:431 STAGE terminate 2015-10-21 15:45:29 DEBUG otopi.context context._executeMethod:142 Stage terminate METHOD otopi.plugins.otopi.dialog.human.Plugin._terminate 2015-10-21 15:45:29 DEBUG otopi.context context._executeMethod:142 Stage terminate METHOD otopi.plugins.otopi.dialog.machine.Plugin._terminate 2015-10-21 15:45:29 DEBUG otopi.context context._executeMethod:148 condition False 2015-10-21 15:45:29 DEBUG otopi.context context._executeMethod:142 Stage terminate METHOD otopi.plugins.otopi.core.log.Plugin._terminate

and the message of a generated answer file.

Why does it ask and then fails?

Generally speaking, you are supposed to take care of all name resolution yourself. Changing /etc/hosts on the engine vm was added mainly to allow unattended setup of it. Note that you also have to make sure the engine can resolve the host, and that the hosts can resolve each other. See also [1] about that. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1188675

...

Can I re-run using the answer file? In this case do I have to pre-insert sh engine hostname inside /etc/hosts of hypervisor?

Yes. Best regards,

...

Thanks. Gianluca

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Didi

Gianluca Cecchi

4:43 p.m.

On Wed, Oct 21, 2015 at 4:08 PM, Yedidyah Bar David <didi@redhat.com> wrote:

...

On Wed, Oct 21, 2015 at 5:00 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

...
Hello, playing with an environment where I have a CentOS 7.1 + updates server with 3.6 repos configured. no DNS resolution in place. running the command hosted-engine --deploy

I initially get, during the prompts inside setup, the message if I want to set hostname for the sh engine and if I want to setup /etc/hosts of hypervisor

IIRC we only suggest to update /etc/hosts of the engine, not the host.

Ok, I will crosscheck the output... I run it through screen and I'm not able to scroll through the output of it....

...

...
Can I re-run using the answer file? In this case do I have to pre-insert sh engine hostname inside /etc/hosts of hypervisor?

Yes.

What is the supposed command for using answer file? What should have it put the generated output file? It seems there is nothing under /etc/ovirt-hosted-engine [root@ovc71 ovirt-hosted-engine-setup]# ll /etc/ovirt-hosted-engine total 4 -rw-r--r--. 1 root root 222 Oct 15 10:37 10-appliance.conf [root@ovc71 ovirt-hosted-engine-setup]# cat /etc/ovirt-hosted-engine/10-appliance.conf description=The oVirt Engine Appliance image (OVA) version=20151015.0-1.el7.centos path=/usr/share/ovirt-engine-appliance/ovirt-engine-appliance-20151015.0-1.el7.centos.ova sha1sum=010c974c81aa45560002b0e3dfdf56fc81e31eb4 AH ok.. I found it under /var/lib/ovirt-hosted-engine-setup/answers/answers-20151021154529.conf So it remains the question regarding syntax for using answer file.. Thanks, Gianluca

Simone Tiraboschi

4:58 p.m.

On Wed, Oct 21, 2015 at 4:43 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

...

On Wed, Oct 21, 2015 at 4:08 PM, Yedidyah Bar David <didi@redhat.com> wrote:

...
On Wed, Oct 21, 2015 at 5:00 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

...
Hello, playing with an environment where I have a CentOS 7.1 + updates server with 3.6 repos configured. no DNS resolution in place. running the command hosted-engine --deploy

I initially get, during the prompts inside setup, the message if I want to set hostname for the sh engine and if I want to setup /etc/hosts of hypervisor

IIRC we only suggest to update /etc/hosts of the engine, not the host.

Ok, I will crosscheck the output...

That question is: Add lines for the appliance itself and for this host to /etc/hosts on the engine VM? Note: ensuring that this host could resolve the engine VM hostname is still up to you As Didi stated the properly way to use that is with a properly working DHCP/DNS infrastructure having each host resolving each others and so on. Then hosted-engine setup will ask you about engine VM mac address and you have to create a reservation for that on your infrastructure. For smaller/test deployment where a DHCP server is not present we are providing also a static IP addressing option for the engine appliance: you can specify an IP address for your new VM and it will be configured in the appliance via cloud-init. Then your host should be able to resolve it and so, if you don't have a registering DNS, it's up to you to configure it via /etc/hosts on your host. Cause the appliance deployment is fully automated you don't have the time to do it on the appliance and so we provided a way to inject the host address into /etc/hosts on the appliance. It's still up to you to add other hosts there if you want to deploy additional ones. So having a properly working DNS and DHCP server is definitively a better option.

...

I run it through screen and I'm not able to scroll through the output of it....

1. Hit your screen prefix combination ( C-a / control + A by default), then hit Escape . 2. Move up/down with the arrow keys to scrool the output buffer. 3. When you're done, hit Return twice to get back to the end of the scroll buffer.

...

...
Can I re-run using the answer file? In this case do I have to pre-insert sh

...
engine hostname inside /etc/hosts of hypervisor?

Yes.

What is the supposed command for using answer file?

hosted-engine --deploy --config-append=/var/lib/ovirt-hosted-engine-setup/answers/answers-20151021154529.conf Depending from where it stopped it could be that you have to manually destroy your previous empty appliance instance and cleanup its storage before being able to retry.

...

What should have it put the generated output file? It seems there is nothing under /etc/ovirt-hosted-engine

[root@ovc71 ovirt-hosted-engine-setup]# ll /etc/ovirt-hosted-engine total 4 -rw-r--r--. 1 root root 222 Oct 15 10:37 10-appliance.conf

[root@ovc71 ovirt-hosted-engine-setup]# cat /etc/ovirt-hosted-engine/10-appliance.conf description=The oVirt Engine Appliance image (OVA) version=20151015.0-1.el7.centos

path=/usr/share/ovirt-engine-appliance/ovirt-engine-appliance-20151015.0-1.el7.centos.ova sha1sum=010c974c81aa45560002b0e3dfdf56fc81e31eb4

AH ok.. I found it under /var/lib/ovirt-hosted-engine-setup/answers/answers-20151021154529.conf

So it remains the question regarding syntax for using answer file..

Thanks, Gianluca

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Gianluca Cecchi

5:44 p.m.

On Wed, Oct 21, 2015 at 4:58 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...

...
Ok, I will crosscheck the output...

That question is: Add lines for the appliance itself and for this host to /etc/hosts on the engine VM? Note: ensuring that this host could resolve the engine VM hostname is still up to you

You are right. I misunderstood the message meaning..

...

...
I run it through screen and I'm not able to scroll through the output of it....

1. Hit your screen prefix combination ( C-a / control + A by default), then hit Escape . 2. Move up/down with the arrow keys to scrool the output buffer. 3. When you're done, hit Return twice to get back to the end of the scroll buffer.

Ok, thank you very much! I never used screen before and didn't find this useful infromation.

...

...
...
Can I re-run using the answer file? In this case do I have to pre-insert sh

...
engine hostname inside /etc/hosts of hypervisor?

Yes.

What is the supposed command for using answer file?

hosted-engine --deploy --config-append=/var/lib/ovirt-hosted-engine-setup/answers/answers-20151021154529.conf

OK. It went successfull fro engine point of view. One note: I chose spice as protocol for the sh engine and while I was able to connect to it using root user, I wasn't using a normal user. I got: [g.cecchi@ovc71 ~]$ remote-viewer --spice-ca-file=/etc/pki/vdsm/libvirt-spice/ca-cert.pem spice://localhost?tls-port=5900 --spice-host-subject="C=EN, L=Test, O=Test, CN=Test" ** (remote-viewer:1633): WARNING **: Couldn't connect to accessibility bus: Failed to connect to socket /tmp/dbus-1N7zvOm2Zf: Connection refused GLib-GIO-Message: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications. (remote-viewer:1633): GSpice-WARNING **: loading ca certs from /etc/pki/vdsm/libvirt-spice/ca-cert.pem failed (/usr/bin/remote-viewer:1633): Spice-Warning **: ssl_verify.c:428:openssl_verify: Error in certificate chain verification: self signed certificate in certificate chain (num=19:depth1:/C=EN/L=Test/O=Test/CN=TestCA) (remote-viewer:1633): GSpice-WARNING **: main-1:0: SSL_connect: error:00000001:lib(0):func(0):reason(1) It seems that the hypervisor phase was stalled in some way and then failed: [ INFO ] Engine is still not reachable, waiting... [ INFO ] Engine replied: DB Up!Welcome to Health Status! [ INFO ] Connecting to the Engine [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... .. [ INFO ] Still waiting for VDSM host to become operational... [ ERROR ] Timed out while waiting for host to start. Please check the logs. [ ERROR ] Unable to add hosted_engine_1 to the manager [ INFO ] Saving hosted-engine configuration on the shared storage domain [ INFO ] Shutting down the engine VM [ INFO ] Enabling and starting HA services Hosted Engine successfully set up [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20151021173522.conf' [ INFO ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination During this waiting phase on hypervisor I got: [root@ovc71 ~]# systemctl status vdsmd vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Wed 2015-10-21 17:10:55 CEST; 22min ago Main PID: 506 (vdsm) CGroup: /system.slice/vdsmd.service ├─319 /usr/libexec/ioprocess --read-pipe-fd 29 --write-pipe-fd 23 --max-threads 10 --m... ├─506 /usr/bin/python /usr/share/vdsm/vdsm └─943 /usr/libexec/ioprocess --read-pipe-fd 57 --write-pipe-fd 56 --max-threads 10 --m... Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 client step 1 Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 ask_user_info() Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 client step 1 Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 ask_user_info() Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 make_client_response() Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 client step 2 Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 parse_server_challenge() Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 ask_user_info() Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 make_client_response() Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 client step 3 Inside vdsm.log under /var/log/vdsm/ I saw many loops of type Thread-160::DEBUG::2015-10-21 17:31:23,084::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bi n/dd if=/rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN/f538db13-330f-4a73-bf6d-1f7e3af12 370/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None) Thread-160::DEBUG::2015-10-21 17:31:23,091::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS : <err> = '0+1 records in\n0+1 records out\n311 bytes (311 B) copied, 0.000233708 s, 1.3 MB/s\n'; <r c> = 0 Reactor thread::INFO::2015-10-21 17:31:30,028::protocoldetector::72::ProtocolDetector.AcceptorImpl:: (handle_accept) Accepting connection from 127.0.0.1:40382 Reactor thread::DEBUG::2015-10-21 17:31:30,034::protocoldetector::82::ProtocolDetector.Detector::(__ init__) Using required_size=11 Reactor thread::INFO::2015-10-21 17:31:30,035::protocoldetector::118::ProtocolDetector.Detector::(ha ndle_read) Detected protocol xml from 127.0.0.1:40382 Reactor thread::DEBUG::2015-10-21 17:31:30,035::bindingxmlrpc::1297::XmlDetector::(handle_socket) xm l over http detected from ('127.0.0.1', 40382) BindingXMLRPC::INFO::2015-10-21 17:31:30,036::xmlrpc::73::vds.XMLRPCServer::(handle_request) Startin g request handler for 127.0.0.1:40382 Thread-284::INFO::2015-10-21 17:31:30,037::xmlrpc::84::vds.XMLRPCServer::(_process_requests) Request handler for 127.0.0.1:40382 started Thread-284::INFO::2015-10-21 17:31:30,043::xmlrpc::92::vds.XMLRPCServer::(_process_requests) Request handler for 127.0.0.1:40382 stopped Thread-160::DEBUG::2015-10-21 17:31:33,095::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd if=/rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN/f538db13-330f-4a73-bf6d-1f7e3af12370/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None) Thread-160::DEBUG::2015-10-21 17:31:33,104::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n311 bytes (311 B) copied, 0.00030368 s, 1.0 MB/s\n'; <rc> = 0 Thread-160::DEBUG::2015-10-21 17:31:43,109::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd if=/rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN/f538db13-330f-4a73-bf6d-1f7e3af12370/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None) Thread-160::DEBUG::2015-10-21 17:31:43,116::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n311 bytes (311 B) copied, 0.00109833 s, 283 kB/s\n'; <rc> = 0 Reactor thread::INFO::2015-10-21 17:31:45,067::protocoldetector::72::ProtocolDetector.AcceptorImpl::(handle_accept) Accepting connection from 127.0.0.1:40384 Reactor thread::DEBUG::2015-10-21 17:31:45,072::protocoldetector::82::ProtocolDetector.Detector::(__init__) Using required_size=11 Reactor thread::INFO::2015-10-21 17:31:45,073::protocoldetector::118::ProtocolDetector.Detector::(handle_read) Detected protocol xml from 127.0.0.1:40384 Reactor thread::DEBUG::2015-10-21 17:31:45,073::bindingxmlrpc::1297::XmlDetector::(handle_socket) xml over http detected from ('127.0.0.1', 40384) BindingXMLRPC::INFO::2015-10-21 17:31:45,074::xmlrpc::73::vds.XMLRPCServer::(handle_request) Starting request handler for 127.0.0.1:40384 Thread-285::INFO::2015-10-21 17:31:45,074::xmlrpc::84::vds.XMLRPCServer::(_process_requests) Request handler for 127.0.0.1:40384 started Thread-285::INFO::2015-10-21 17:31:45,080::xmlrpc::92::vds.XMLRPCServer::(_process_requests) Request handler for 127.0.0.1:40384 stopped Full vdsm.log in gzip format here: https://drive.google.com/file/d/0BwoPbcrMv8mvbjRqQ1ZuTzhsUEU/view?usp=sharin... Exactly what kind of check does the install run to veryfy hypervisor is up ? Can I simply restart the hypervisor and see what happens? I remember similar thinigs in all-in-one setups where initial setup failed the vdsm part but then a restart was ok..... Gianluca

Simone Tiraboschi

6:15 p.m.

On Wed, Oct 21, 2015 at 5:44 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

...

On Wed, Oct 21, 2015 at 4:58 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...
...
Ok, I will crosscheck the output...

That question is: Add lines for the appliance itself and for this host to /etc/hosts on the engine VM? Note: ensuring that this host could resolve the engine VM hostname is still up to you

You are right. I misunderstood the message meaning..

...
...
I run it through screen and I'm not able to scroll through the output of it....

1. Hit your screen prefix combination ( C-a / control + A by default), then hit Escape . 2. Move up/down with the arrow keys to scrool the output buffer. 3. When you're done, hit Return twice to get back to the end of the scroll buffer.

Ok, thank you very much! I never used screen before and didn't find this useful infromation.

...
...
...
Can I re-run using the answer file? In this case do I have to pre-insert sh

...
engine hostname inside /etc/hosts of hypervisor?

Yes.

What is the supposed command for using answer file?

hosted-engine --deploy --config-append=/var/lib/ovirt-hosted-engine-setup/answers/answers-20151021154529.conf

OK. It went successfull fro engine point of view.

One note: I chose spice as protocol for the sh engine and while I was able to connect to it using root user, I wasn't using a normal user. I got:

[g.cecchi@ovc71 ~]$ remote-viewer --spice-ca-file=/etc/pki/vdsm/libvirt-spice/ca-cert.pem spice://localhost?tls-port=5900 --spice-host-subject="C=EN, L=Test, O=Test, CN=Test"

** (remote-viewer:1633): WARNING **: Couldn't connect to accessibility bus: Failed to connect to socket /tmp/dbus-1N7zvOm2Zf: Connection refused GLib-GIO-Message: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications.

(remote-viewer:1633): GSpice-WARNING **: loading ca certs from /etc/pki/vdsm/libvirt-spice/ca-cert.pem failed (/usr/bin/remote-viewer:1633): Spice-Warning **: ssl_verify.c:428:openssl_verify: Error in certificate chain verification: self signed certificate in certificate chain (num=19:depth1:/C=EN/L=Test/O=Test/CN=TestCA)

(remote-viewer:1633): GSpice-WARNING **: main-1:0: SSL_connect: error:00000001:lib(0):func(0):reason(1)

It seems that the hypervisor phase was stalled in some way and then failed:

[ INFO ] Engine is still not reachable, waiting... [ INFO ] Engine replied: DB Up!Welcome to Health Status! [ INFO ] Connecting to the Engine [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... .. [ INFO ] Still waiting for VDSM host to become operational... [ ERROR ] Timed out while waiting for host to start. Please check the logs. [ ERROR ] Unable to add hosted_engine_1 to the manager [ INFO ] Saving hosted-engine configuration on the shared storage domain [ INFO ] Shutting down the engine VM [ INFO ] Enabling and starting HA services Hosted Engine successfully set up [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20151021173522.conf' [ INFO ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination

During this waiting phase on hypervisor I got:

[root@ovc71 ~]# systemctl status vdsmd vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Wed 2015-10-21 17:10:55 CEST; 22min ago Main PID: 506 (vdsm) CGroup: /system.slice/vdsmd.service ├─319 /usr/libexec/ioprocess --read-pipe-fd 29 --write-pipe-fd 23 --max-threads 10 --m... ├─506 /usr/bin/python /usr/share/vdsm/vdsm └─943 /usr/libexec/ioprocess --read-pipe-fd 57 --write-pipe-fd 56 --max-threads 10 --m...

Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 client step 1 Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 ask_user_info() Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 client step 1 Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 ask_user_info() Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 make_client_response() Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 client step 2 Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 parse_server_challenge() Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 ask_user_info() Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 make_client_response() Oct 21 17:10:56 ovc71.localdomain.local python[506]: DIGEST-MD5 client step 3

Inside vdsm.log under /var/log/vdsm/ I saw many loops of type

Thread-160::DEBUG::2015-10-21 17:31:23,084::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bi n/dd if=/rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN/f538db13-330f-4a73-bf6d-1f7e3af12 370/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None) Thread-160::DEBUG::2015-10-21 17:31:23,091::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS : <err> = '0+1 records in\n0+1 records out\n311 bytes (311 B) copied, 0.000233708 s, 1.3 MB/s\n'; <r c> = 0 Reactor thread::INFO::2015-10-21 17:31:30,028::protocoldetector::72::ProtocolDetector.AcceptorImpl:: (handle_accept) Accepting connection from 127.0.0.1:40382 Reactor thread::DEBUG::2015-10-21 17:31:30,034::protocoldetector::82::ProtocolDetector.Detector::(__ init__) Using required_size=11 Reactor thread::INFO::2015-10-21 17:31:30,035::protocoldetector::118::ProtocolDetector.Detector::(ha ndle_read) Detected protocol xml from 127.0.0.1:40382 Reactor thread::DEBUG::2015-10-21 17:31:30,035::bindingxmlrpc::1297::XmlDetector::(handle_socket) xm l over http detected from ('127.0.0.1', 40382) BindingXMLRPC::INFO::2015-10-21 17:31:30,036::xmlrpc::73::vds.XMLRPCServer::(handle_request) Startin g request handler for 127.0.0.1:40382 Thread-284::INFO::2015-10-21 17:31:30,037::xmlrpc::84::vds.XMLRPCServer::(_process_requests) Request handler for 127.0.0.1:40382 started Thread-284::INFO::2015-10-21 17:31:30,043::xmlrpc::92::vds.XMLRPCServer::(_process_requests) Request handler for 127.0.0.1:40382 stopped Thread-160::DEBUG::2015-10-21 17:31:33,095::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd if=/rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN/f538db13-330f-4a73-bf6d-1f7e3af12370/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None) Thread-160::DEBUG::2015-10-21 17:31:33,104::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n311 bytes (311 B) copied, 0.00030368 s, 1.0 MB/s\n'; <rc> = 0 Thread-160::DEBUG::2015-10-21 17:31:43,109::fileSD::173::Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd if=/rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN/f538db13-330f-4a73-bf6d-1f7e3af12370/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None) Thread-160::DEBUG::2015-10-21 17:31:43,116::fileSD::173::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n311 bytes (311 B) copied, 0.00109833 s, 283 kB/s\n'; <rc> = 0 Reactor thread::INFO::2015-10-21 17:31:45,067::protocoldetector::72::ProtocolDetector.AcceptorImpl::(handle_accept) Accepting connection from 127.0.0.1:40384 Reactor thread::DEBUG::2015-10-21 17:31:45,072::protocoldetector::82::ProtocolDetector.Detector::(__init__) Using required_size=11 Reactor thread::INFO::2015-10-21 17:31:45,073::protocoldetector::118::ProtocolDetector.Detector::(handle_read) Detected protocol xml from 127.0.0.1:40384 Reactor thread::DEBUG::2015-10-21 17:31:45,073::bindingxmlrpc::1297::XmlDetector::(handle_socket) xml over http detected from ('127.0.0.1', 40384) BindingXMLRPC::INFO::2015-10-21 17:31:45,074::xmlrpc::73::vds.XMLRPCServer::(handle_request) Starting request handler for 127.0.0.1:40384 Thread-285::INFO::2015-10-21 17:31:45,074::xmlrpc::84::vds.XMLRPCServer::(_process_requests) Request handler for 127.0.0.1:40384 started Thread-285::INFO::2015-10-21 17:31:45,080::xmlrpc::92::vds.XMLRPCServer::(_process_requests) Request handler for 127.0.0.1:40384 stopped

Full vdsm.log in gzip format here:

https://drive.google.com/file/d/0BwoPbcrMv8mvbjRqQ1ZuTzhsUEU/view?usp=sharin...

Exactly what kind of check does the install run to veryfy hypervisor is up ?

It pools engine REST API checking the host status for 10 minutes till it become 'up' or 'non_operational'. In your case it reached the 10 minutes timeout. Please check engine and host-deploy logs on the engine VM.

...

Can I simply restart the hypervisor and see what happens? I remember similar thinigs in all-in-one setups where initial setup failed the vdsm part but then a restart was ok.....

It's better to understood what was wrong.

...

Gianluca

Gianluca Cecchi

22 Oct 22 Oct

11:50 a.m.

On Wed, Oct 21, 2015 at 6:15 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...

It pools engine REST API checking the host status for 10 minutes till it become 'up' or 'non_operational'. In your case it reached the 10 minutes timeout. Please check engine and host-deploy logs on the engine VM.

Ok. One note I did't specify, if it could influence my results. I'm working on a laptop with Fedora 21. The CentOS 7.1 hypervisor on which I'm working is a vm of this Fedora system, where I enabled nested virtualization. host-deploy on engine seems empty: [root@shengine host-deploy]# pwd /var/log/ovirt-engine/host-deploy [root@shengine host-deploy]# ll total 0 [root@shengine host-deploy]# engine.log.gz of engine found here: https://drive.google.com/file/d/0BwoPbcrMv8mvU0tsbXhEeHc2TlE/view?usp=sharin... timestamp to check against is between 17:15 and 17:40 of yesterday

...

...
Can I simply restart the hypervisor and see what happens? I remember similar thinigs in all-in-one setups where initial setup failed the vdsm part but then a restart was ok.....

It's better to understood what was wrong.

I couldn't agree more with you... ;-) Anyway yesterday I had to poweroff my laptop and so my environment and today I power on the hypervisor again. The engine VM automatically starts inside it and I can access the web admin portal, but the host of course results as down. Datacenters and Clusters are not populated; I only have the "Default". See screenshot of portal with yesterday and today events for the host here: https://drive.google.com/file/d/0BwoPbcrMv8mvZWJwdUhEXzJ3Wjg/view?usp=sharin... BTW: I find from the events that actually yesterday host-deploy log file is on the host itself under /tmp --> perhaps a better place on host? Anyway here it is the generated file ovirt-host-deploy-20151021172025-gbtn0q.log: https://drive.google.com/file/d/0BwoPbcrMv8mvQnhDTWRUQnhiOUU/view?usp=sharin... Gianluca

Gianluca Cecchi

2 p.m.

On Thu, Oct 22, 2015 at 11:50 AM, Gianluca Cecchi <gianluca.cecchi@gmail.com

...

wrote:

...

On Wed, Oct 21, 2015 at 6:15 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...
It pools engine REST API checking the host status for 10 minutes till it become 'up' or 'non_operational'. In your case it reached the 10 minutes timeout. Please check engine and host-deploy logs on the engine VM.

Ok.

What I see inside ovirt-host-deploy log file 2015-10-21 17:20:26 DEBUG otopi.plugins.otopi.packagers.dnfpackager dnfpackager._boot:178 Cannot initialize minidnf Traceback (most recent call last): File "/tmp/ovirt-xP0lq4KMou/otopi-plugins/otopi/packagers/dnfpackager.py", line 165, in _boot constants.PackEnv.DNF_DISABLED_PLUGINS File "/tmp/ovirt-xP0lq4KMou/otopi-plugins/otopi/packagers/dnfpackager.py", line 75, in _getMiniDNF from otopi import minidnf File "/tmp/ovirt-xP0lq4KMou/pythonlib/otopi/minidnf.py", line 9, in <module> import dnf ImportError: No module named dnf ... 2015-10-21 17:36:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:219 DIALOG:SEND ### Please input VDSM certificate chain that matches certificate request, top is issuer 2015-10-21 17:36:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:219 DIALOG:SEND ### 2015-10-21 17:36:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:219 DIALOG:SEND ### type '--=451b80dc-996f-432e-9e4f-2b29ef6d1141=--' in own line to mark end, '--=451b80dc-996f-ABORT-9e4f-2b29ef6d1141=--' aborts 2015-10-21 17:36:33 DEBUG otopi.context context._executeMethod:156 method exception Traceback (most recent call last): File "/tmp/ovirt-xP0lq4KMou/pythonlib/otopi/context.py", line 146, in _executeMethod method['method']() File "/tmp/ovirt-xP0lq4KMou/otopi-plugins/ovirt-host-common/vdsm/pki.py", line 319, in _misc '\n\nPlease input VDSM certificate chain that ' File "/tmp/ovirt-xP0lq4KMou/otopi-plugins/otopi/dialog/machine.py", line 207, in queryMultiString v = self._readline() File "/tmp/ovirt-xP0lq4KMou/pythonlib/otopi/dialog.py", line 263, in _readline raise IOError(_('End of file')) IOError: End of file 2015-10-21 17:36:33 ERROR otopi.context context._executeMethod:165 Failed to execute stage 'Misc configuration': End of file 2015-10-21 17:36:33 DEBUG otopi.transaction transaction.abort:134 aborting 'Yum Transaction' 2015-10-21 17:36:33 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:95 Yum Performing yum transaction rollback Loaded plugins: fastestmirror, langpacks And in engine.log 2015-10-21 15:19:11,061 INFO [org.ovirt.engine.core.bll.Backend] (ServerService Thread Pool -- 43) [] Started task scheduler org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl@7160a32a 2015-10-21 15:19:11,321 INFO [org.ovirt.engine.core.bll.Backend] (ServerService Thread Pool -- 43) [] Started task scheduler org.ovirt.engine.core.utils.timer.DBSchedulerUtilQuartzImpl@55760e6a 2015-10-21 15:19:11,746 INFO [org.ovirt.engine.core.bll.Backend] (ServerService Thread Pool -- 43) [] Start org.ovirt.engine.core.dal.utils.CacheManager@2a4c024b 2015-10-21 15:19:11,957 WARN [org.ovirt.engine.core.utils.ConfigUtilsBase] (ServerService Thread Pool -- 43) [] Could not find enum value for option: 'MigrateDowntime' 2015-10-21 15:19:11,957 WARN [org.ovirt.engine.core.utils.ConfigUtilsBase] (ServerService Thread Pool -- 43) [] Could not find enum value for option: 'MigrateDowntime' 2015-10-21 15:19:11,957 WARN [org.ovirt.engine.core.utils.ConfigUtilsBase] (ServerService Thread Pool -- 43) [] Could not find enum value for option: 'MigrateDowntime' 2015-10-21 15:19:11,958 WARN [org.ovirt.engine.core.utils.ConfigUtilsBase] (ServerService Thread Pool -- 43) [] Could not find enum value for option: 'MigrateDowntime' 2015-10-21 15:19:11,958 WARN [org.ovirt.engine.core.utils.ConfigUtilsBase] (ServerService Thread Pool -- 43) [] Could not find enum value for option: 'MigrateDowntime' 2015-10-21 15:19:11,958 WARN [org.ovirt.engine.core.utils.ConfigUtilsBase] (ServerService Thread Pool -- 43) [] Could not find enum value for option: 'MigrateDowntime' 2015-10-21 15:19:11,964 ERROR [org.ovirt.engine.core.dal.dbbroker.generic.DBConfigUtils] (ServerService Thread Pool -- 43) [] Error parsing option 'AutoRecoveryAllowedTypes' value: org.codehaus.jackson.JsonParseException: Unexpected character ('\' (code 92)): was expecting double-quote to start field name at [Source: java.io.StringReader@21b12337; line: 1, column: 3] 2015-10-21 15:19:11,969 INFO [org.ovirt.engine.core.utils.osinfo.OsInfoPreferencesLoader] (ServerService Thread Pool -- 43) [] Loading file '/etc/ovirt-engine/osinfo.conf.d/00-defaults.properties' 2015-10-21 15:19:12,322 INFO [org.ovirt.engine.core.bll.Backend] (ServerService Thread Pool -- 43) [] Running ovirt-engine 3.6.0.1-1.el7.centos 2015-10-21 15:19:12,322 INFO [org.ovirt.engine.core.bll.CpuFlagsManagerHandler] (ServerService Thread Pool -- 43) [] Start initializing dictionaries 2015-10-21 15:19:12,328 INFO [org.ovirt.engine.core.bll.CpuFlagsManagerHandler] (ServerService Thread Pool -- 43) [] Finished initializing dictionaries ... 2015-10-21 15:35:08,852 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-61) [] Timer update runtime info failed. Exception:: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000451: The connection manager is shutdown: java:/ENGINEDataSource at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:80) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:573) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:637) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:666) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:706) [spring-jdbc.jar:3.1.1.RELEASE] at org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall.executeCallInternal(PostgresDbEngineDialect.java:154) [dal.jar:] at org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall.doExecute(PostgresDbEngineDialect.java:120) [dal.jar:] at org.springframework.jdbc.core.simple.SimpleJdbcCall.execute(SimpleJdbcCall.java:181) [spring-jdbc.jar:3.1.1.RELEASE] at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeImpl(SimpleJdbcCallsHandler.java:147) [dal.jar:] at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeReadList(SimpleJdbcCallsHandler.java:109) [dal.jar:] at org.ovirt.engine.core.dao.VdsDaoImpl.get(VdsDaoImpl.java:53) [dal.jar:] at org.ovirt.engine.core.dao.VdsDaoImpl.get(VdsDaoImpl.java:47) [dal.jar:] at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCachedVds(VdsManager.java:278) [vdsbroker. jar:] at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:208) [vdsbroker.jar:] at sun.reflect.GeneratedMethodAccessor133.invoke(Unknown Source) [:1.7.0_85] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_85] at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_85] at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81) [scheduler.jar:] at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52) [scheduler.jar:] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:] at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:] Caused by: java.sql.SQLException: javax.resource.ResourceException: IJ000451: The connection manager is shutdown: java:/ENGINEDataSource at org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:154) at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source) [:1.7.0_85] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_85] at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_85] at org.jboss.weld.util.reflection.Reflections.invokeAndUnwrap(Reflections.java:414) [weld-core-impl-2.2.6.Final.jar:2014-10-03 10:05] at org.jboss.weld.bean.builtin.CallableMethodHandler.invoke(CallableMethodHandler.java:42) [weld-core-impl-2.2.6.Final.jar:2014-10-03 10:05] at org.jboss.weld.bean.proxy.EnterpriseTargetBeanInstance.invoke(EnterpriseTargetBeanInstance.java:56) [weld-core-impl-2.2.6.Final.jar:2014-10-03 10:05] at org.jboss.weld.bean.proxy.ProxyMethodHandler.invoke(ProxyMethodHandler.java:100) [weld-core-impl-2.2.6.Final.jar:2014-10-03 10:05] at org.jboss.weldx.sql.CommonDataSource$DataSource$Wrapper$1587847696$Proxy$_$$_Weld$Proxy$.getConnection(Unknown Source) at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77) [spring-jdbc.jar:3.1.1.RELEASE] ... 20 more Caused by: javax.resource.ResourceException: IJ000451: The connection manager is shutdown: java:/ENGINEDataSource at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.getManagedConnection(AbstractConnectionManager.java:371) at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.getManagedConnection(TxConnectionManagerImpl.java:421) at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:515) at org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:146) ... 30 more 2015-10-21 15:35:11,887 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-65) [] Timer update runtime info failed. Exception:: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000451: The connection manager is shutdown: java:/ENGINEDataSource

Simone Tiraboschi

2:15 p.m.

On Thu, Oct 22, 2015 at 2:00 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

...

On Thu, Oct 22, 2015 at 11:50 AM, Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote:

...
On Wed, Oct 21, 2015 at 6:15 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...
It pools engine REST API checking the host status for 10 minutes till it become 'up' or 'non_operational'. In your case it reached the 10 minutes timeout. Please check engine and host-deploy logs on the engine VM.

Ok.

What I see inside ovirt-host-deploy log file

2015-10-21 17:20:26 DEBUG otopi.plugins.otopi.packagers.dnfpackager dnfpackager._boot:178 Cannot initialize minidnf Traceback (most recent call last): File "/tmp/ovirt-xP0lq4KMou/otopi-plugins/otopi/packagers/dnfpackager.py", line 165, in _boot constants.PackEnv.DNF_DISABLED_PLUGINS File "/tmp/ovirt-xP0lq4KMou/otopi-plugins/otopi/packagers/dnfpackager.py", line 75, in _getMiniDNF from otopi import minidnf File "/tmp/ovirt-xP0lq4KMou/pythonlib/otopi/minidnf.py", line 9, in <module> import dnf ImportError: No module named dnf

...

2015-10-21 17:36:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:219 DIALOG:SEND ### Please input VDSM certificate chain that matches certificate request, top is issuer 2015-10-21 17:36:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:219 DIALOG:SEND ### 2015-10-21 17:36:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:219 DIALOG:SEND ### type '--=451b80dc-996f-432e-9e4f-2b29ef6d1141=--' in own line to mark end, '--=451b80dc-996f-ABORT-9e4f-2b29ef6d1141=--' aborts 2015-10-21 17:36:33 DEBUG otopi.context context._executeMethod:156 method exception Traceback (most recent call last): File "/tmp/ovirt-xP0lq4KMou/pythonlib/otopi/context.py", line 146, in _executeMethod method['method']() File "/tmp/ovirt-xP0lq4KMou/otopi-plugins/ovirt-host-common/vdsm/pki.py", line 319, in _misc '\n\nPlease input VDSM certificate chain that ' File "/tmp/ovirt-xP0lq4KMou/otopi-plugins/otopi/dialog/machine.py", line 207, in queryMultiString v = self._readline() File "/tmp/ovirt-xP0lq4KMou/pythonlib/otopi/dialog.py", line 263, in _readline raise IOError(_('End of file')) IOError: End of file 2015-10-21 17:36:33 ERROR otopi.context context._executeMethod:165 Failed to execute stage 'Misc configuration': End of file 2015-10-21 17:36:33 DEBUG otopi.transaction transaction.abort:134 aborting 'Yum Transaction' 2015-10-21 17:36:33 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:95 Yum Performing yum transaction rollback Loaded plugins: fastestmirror, langpacks

The issue seams to be there: we have an input request on host-deploy to have somebody explicitly trusting the VDSM cert chain but of course, being an automated process, nobody will respond and so it failed. Did you manually changed the engine cert or some others CA cert?

...

And in engine.log

2015-10-21 15:19:11,061 INFO [org.ovirt.engine.core.bll.Backend] (ServerService Thread Pool -- 43) [] Started task scheduler org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl@7160a32a 2015-10-21 15:19:11,321 INFO [org.ovirt.engine.core.bll.Backend] (ServerService Thread Pool -- 43) [] Started task scheduler org.ovirt.engine.core.utils.timer.DBSchedulerUtilQuartzImpl@55760e6a 2015-10-21 15:19:11,746 INFO [org.ovirt.engine.core.bll.Backend] (ServerService Thread Pool -- 43) [] Start org.ovirt.engine.core.dal.utils.CacheManager@2a4c024b 2015-10-21 15:19:11,957 WARN [org.ovirt.engine.core.utils.ConfigUtilsBase] (ServerService Thread Pool -- 43) [] Could not find enum value for option: 'MigrateDowntime' 2015-10-21 15:19:11,957 WARN [org.ovirt.engine.core.utils.ConfigUtilsBase] (ServerService Thread Pool -- 43) [] Could not find enum value for option: 'MigrateDowntime' 2015-10-21 15:19:11,957 WARN [org.ovirt.engine.core.utils.ConfigUtilsBase] (ServerService Thread Pool -- 43) [] Could not find enum value for option: 'MigrateDowntime' 2015-10-21 15:19:11,958 WARN [org.ovirt.engine.core.utils.ConfigUtilsBase] (ServerService Thread Pool -- 43) [] Could not find enum value for option: 'MigrateDowntime' 2015-10-21 15:19:11,958 WARN [org.ovirt.engine.core.utils.ConfigUtilsBase] (ServerService Thread Pool -- 43) [] Could not find enum value for option: 'MigrateDowntime' 2015-10-21 15:19:11,958 WARN [org.ovirt.engine.core.utils.ConfigUtilsBase] (ServerService Thread Pool -- 43) [] Could not find enum value for option: 'MigrateDowntime' 2015-10-21 15:19:11,964 ERROR [org.ovirt.engine.core.dal.dbbroker.generic.DBConfigUtils] (ServerService Thread Pool -- 43) [] Error parsing option 'AutoRecoveryAllowedTypes' value: org.codehaus.jackson.JsonParseException: Unexpected character ('\' (code 92)): was expecting double-quote to start field name at [Source: java.io.StringReader@21b12337; line: 1, column: 3] 2015-10-21 15:19:11,969 INFO [org.ovirt.engine.core.utils.osinfo.OsInfoPreferencesLoader] (ServerService Thread Pool -- 43) [] Loading file '/etc/ovirt-engine/osinfo.conf.d/00-defaults.properties' 2015-10-21 15:19:12,322 INFO [org.ovirt.engine.core.bll.Backend] (ServerService Thread Pool -- 43) [] Running ovirt-engine 3.6.0.1-1.el7.centos 2015-10-21 15:19:12,322 INFO [org.ovirt.engine.core.bll.CpuFlagsManagerHandler] (ServerService Thread Pool -- 43) [] Start initializing dictionaries 2015-10-21 15:19:12,328 INFO [org.ovirt.engine.core.bll.CpuFlagsManagerHandler] (ServerService Thread Pool -- 43) [] Finished initializing dictionaries ... 2015-10-21 15:35:08,852 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-61) [] Timer update runtime info failed. Exception:: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000451: The connection manager is shutdown: java:/ENGINEDataSource at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:80) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:573) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:637) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:666) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:706) [spring-jdbc.jar:3.1.1.RELEASE] at org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall.executeCallInternal(PostgresDbEngineDialect.java:154) [dal.jar:] at org.ovirt.engine.core.dal.dbbroker.PostgresDbEngineDialect$PostgresSimpleJdbcCall.doExecute(PostgresDbEngineDialect.java:120) [dal.jar:] at org.springframework.jdbc.core.simple.SimpleJdbcCall.execute(SimpleJdbcCall.java:181) [spring-jdbc.jar:3.1.1.RELEASE] at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeImpl(SimpleJdbcCallsHandler.java:147) [dal.jar:] at org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.executeReadList(SimpleJdbcCallsHandler.java:109) [dal.jar:] at org.ovirt.engine.core.dao.VdsDaoImpl.get(VdsDaoImpl.java:53) [dal.jar:] at org.ovirt.engine.core.dao.VdsDaoImpl.get(VdsDaoImpl.java:47) [dal.jar:] at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCachedVds(VdsManager.java:278) [vdsbroker. jar:] at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:208) [vdsbroker.jar:] at sun.reflect.GeneratedMethodAccessor133.invoke(Unknown Source) [:1.7.0_85] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_85] at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_85] at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81) [scheduler.jar:] at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52) [scheduler.jar:] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:] at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:] Caused by: java.sql.SQLException: javax.resource.ResourceException: IJ000451: The connection manager is shutdown: java:/ENGINEDataSource at org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:154) at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source) [:1.7.0_85] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_85] at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_85] at org.jboss.weld.util.reflection.Reflections.invokeAndUnwrap(Reflections.java:414) [weld-core-impl-2.2.6.Final.jar:2014-10-03 10:05] at org.jboss.weld.bean.builtin.CallableMethodHandler.invoke(CallableMethodHandler.java:42) [weld-core-impl-2.2.6.Final.jar:2014-10-03 10:05] at org.jboss.weld.bean.proxy.EnterpriseTargetBeanInstance.invoke(EnterpriseTargetBeanInstance.java:56) [weld-core-impl-2.2.6.Final.jar:2014-10-03 10:05] at org.jboss.weld.bean.proxy.ProxyMethodHandler.invoke(ProxyMethodHandler.java:100) [weld-core-impl-2.2.6.Final.jar:2014-10-03 10:05] at org.jboss.weldx.sql.CommonDataSource$DataSource$Wrapper$1587847696$Proxy$_$$_Weld$Proxy$.getConnection(Unknown Source) at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111) [spring-jdbc.jar:3.1.1.RELEASE] at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77) [spring-jdbc.jar:3.1.1.RELEASE] ... 20 more Caused by: javax.resource.ResourceException: IJ000451: The connection manager is shutdown: java:/ENGINEDataSource at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.getManagedConnection(AbstractConnectionManager.java:371) at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.getManagedConnection(TxConnectionManagerImpl.java:421) at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:515) at org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:146) ... 30 more

2015-10-21 15:35:11,887 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-65) [] Timer update runtime info failed. Exception:: org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLException: javax.resource.ResourceException: IJ000451: The connection manager is shutdown: java:/ENGINEDataSource

Gianluca Cecchi

2:29 p.m.

On Thu, Oct 22, 2015 at 2:15 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...

...
2015-10-21 17:36:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:219 DIALOG:SEND ### Please input VDSM certificate chain that matches certificate request, top is issuer 2015-10-21 17:36:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:219 DIALOG:SEND ### 2015-10-21 17:36:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:219 DIALOG:SEND ### type '--=451b80dc-996f-432e-9e4f-2b29ef6d1141=--' in own line to mark end, '--=451b80dc-996f-ABORT-9e4f-2b29ef6d1141=--' aborts 2015-10-21 17:36:33 DEBUG otopi.context context._executeMethod:156 method exception Traceback (most recent call last): File "/tmp/ovirt-xP0lq4KMou/pythonlib/otopi/context.py", line 146, in _executeMethod method['method']() File "/tmp/ovirt-xP0lq4KMou/otopi-plugins/ovirt-host-common/vdsm/pki.py", line 319, in _misc '\n\nPlease input VDSM certificate chain that ' File "/tmp/ovirt-xP0lq4KMou/otopi-plugins/otopi/dialog/machine.py", line 207, in queryMultiString v = self._readline() File "/tmp/ovirt-xP0lq4KMou/pythonlib/otopi/dialog.py", line 263, in _readline raise IOError(_('End of file')) IOError: End of file 2015-10-21 17:36:33 ERROR otopi.context context._executeMethod:165 Failed to execute stage 'Misc configuration': End of file 2015-10-21 17:36:33 DEBUG otopi.transaction transaction.abort:134 aborting 'Yum Transaction' 2015-10-21 17:36:33 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:95 Yum Performing yum transaction rollback Loaded plugins: fastestmirror, langpacks

The issue seams to be there: we have an input request on host-deploy to have somebody explicitly trusting the VDSM cert chain but of course, being an automated process, nobody will respond and so it failed. Did you manually changed the engine cert or some others CA cert?

No.

The only thing is that I first ran hosted-engine --deploy without putting the hostname of engine inside /etc/hosts of hypervisor and it failed (see my first mail of the thread), I think without doing anything (at least at engine VM level, I don't know if it created a cert...), but generating an answer file. And then I ran, as you suggested (with the warning you noted) hosted-engine --deploy --config-append=answer_file Inside log of first run (ovirt-hosted-engine-setup-20151021151938-j4hy5g.log) I see 2015-10-21 15:20:13 DEBUG otopi.plugins.ovirt_hosted_engine_setup.pki.vdsmpki plugin.execute:936 execut e-output: ('/bin/openssl', 'x509', '-noout', '-text', '-in', '/etc/pki/vdsm/libvirt-spice/server-cert.p em') stdout: Certificate: Data: Version: 1 (0x0) Serial Number: 1 (0x1) Signature Algorithm: sha1WithRSAEncryption Issuer: C=EN, L=Test, O=Test, CN=TestCA Validity Not Before: Oct 21 13:20:13 2015 GMT Not After : Oct 20 13:20:13 2018 GMT Subject: C=EN, L=Test, O=Test, CN=Test Subject Public Key Info: Public Key Algorithm: rsaEncryption Public-Key: (1024 bit) Modulus: 00:bd:f8:d4:a0:87:9e:20:7f:71:12:8d:8e:90:e0: ... Inside the run with answer file (ovirt-hosted-engine-setup-20151021170822-p1iv3y.log) I see 2015-10-21 17:08:22 DEBUG otopi.plugins.ovirt_hosted_engine_setup.pki.vdsmpki plugin.execute:936 execute-output: ('/bin/openssl', 'x509', '-noout', '-text', '-in', '/etc/pki/vdsm/libvirt-spice/server-cert.pem') stdout: Certificate: Data: Version: 1 (0x0) Serial Number: 1 (0x1) Signature Algorithm: sha1WithRSAEncryption Issuer: C=EN, L=Test, O=Test, CN=TestCA Validity Not Before: Oct 21 13:20:13 2015 GMT Not After : Oct 20 13:20:13 2018 GMT Subject: C=EN, L=Test, O=Test, CN=Test Subject Public Key Info: Public Key Algorithm: rsaEncryption Public-Key: (1024 bit) Modulus: 00:bd:f8:d4:a0:87:9e:20:7f:71:12:8d:8e:90:e0: Any particular file or section in log files to cross check? I can also start from scratch in case.... just to be sure that I don't get into same problem, so that it can be useful to find it before... Thanks, Gianluca

Simone Tiraboschi

3:01 p.m.

On Thu, Oct 22, 2015 at 2:29 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

...

On Thu, Oct 22, 2015 at 2:15 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...
...
2015-10-21 17:36:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:219 DIALOG:SEND ### Please input VDSM certificate chain that matches certificate request, top is issuer 2015-10-21 17:36:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:219 DIALOG:SEND ### 2015-10-21 17:36:33 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:219 DIALOG:SEND ### type '--=451b80dc-996f-432e-9e4f-2b29ef6d1141=--' in own line to mark end, '--=451b80dc-996f-ABORT-9e4f-2b29ef6d1141=--' aborts 2015-10-21 17:36:33 DEBUG otopi.context context._executeMethod:156 method exception Traceback (most recent call last): File "/tmp/ovirt-xP0lq4KMou/pythonlib/otopi/context.py", line 146, in _executeMethod method['method']() File "/tmp/ovirt-xP0lq4KMou/otopi-plugins/ovirt-host-common/vdsm/pki.py", line 319, in _misc '\n\nPlease input VDSM certificate chain that ' File "/tmp/ovirt-xP0lq4KMou/otopi-plugins/otopi/dialog/machine.py", line 207, in queryMultiString v = self._readline() File "/tmp/ovirt-xP0lq4KMou/pythonlib/otopi/dialog.py", line 263, in _readline raise IOError(_('End of file')) IOError: End of file 2015-10-21 17:36:33 ERROR otopi.context context._executeMethod:165 Failed to execute stage 'Misc configuration': End of file 2015-10-21 17:36:33 DEBUG otopi.transaction transaction.abort:134 aborting 'Yum Transaction' 2015-10-21 17:36:33 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:95 Yum Performing yum transaction rollback Loaded plugins: fastestmirror, langpacks

The issue seams to be there: we have an input request on host-deploy to have somebody explicitly trusting the VDSM cert chain but of course, being an automated process, nobody will respond and so it failed. Did you manually changed the engine cert or some others CA cert?

No.

The only thing is that I first ran hosted-engine --deploy without putting the hostname of engine inside /etc/hosts of hypervisor and it failed (see my first mail of the thread), I think without doing anything (at least at engine VM level, I don't know if it created a cert...), but generating an answer file.

And then I ran, as you suggested (with the warning you noted) hosted-engine --deploy --config-append=answer_file

Inside log of first run (ovirt-hosted-engine-setup-20151021151938-j4hy5g.log) I see

2015-10-21 15:20:13 DEBUG otopi.plugins.ovirt_hosted_engine_setup.pki.vdsmpki plugin.execute:936 execut e-output: ('/bin/openssl', 'x509', '-noout', '-text', '-in', '/etc/pki/vdsm/libvirt-spice/server-cert.p em') stdout: Certificate: Data: Version: 1 (0x0) Serial Number: 1 (0x1) Signature Algorithm: sha1WithRSAEncryption Issuer: C=EN, L=Test, O=Test, CN=TestCA Validity Not Before: Oct 21 13:20:13 2015 GMT Not After : Oct 20 13:20:13 2018 GMT Subject: C=EN, L=Test, O=Test, CN=Test Subject Public Key Info: Public Key Algorithm: rsaEncryption Public-Key: (1024 bit) Modulus: 00:bd:f8:d4:a0:87:9e:20:7f:71:12:8d:8e:90:e0: ...

Inside the run with answer file (ovirt-hosted-engine-setup-20151021170822-p1iv3y.log) I see 2015-10-21 17:08:22 DEBUG otopi.plugins.ovirt_hosted_engine_setup.pki.vdsmpki plugin.execute:936 execute-output: ('/bin/openssl', 'x509', '-noout', '-text', '-in', '/etc/pki/vdsm/libvirt-spice/server-cert.pem') stdout: Certificate: Data: Version: 1 (0x0) Serial Number: 1 (0x1) Signature Algorithm: sha1WithRSAEncryption Issuer: C=EN, L=Test, O=Test, CN=TestCA Validity Not Before: Oct 21 13:20:13 2015 GMT Not After : Oct 20 13:20:13 2018 GMT Subject: C=EN, L=Test, O=Test, CN=Test Subject Public Key Info: Public Key Algorithm: rsaEncryption Public-Key: (1024 bit) Modulus: 00:bd:f8:d4:a0:87:9e:20:7f:71:12:8d:8e:90:e0:

Any particular file or section in log files to cross check? I can also start from scratch in case.... just to be sure that I don't get into same problem, so that it can be useful to find it before...

I suspect that that host-deploy fails cause you have in place a leftover VDSM cert from the previous attempt which is still signed by your previous attempt engine and so it fails to match this new engine: on the second attempt hosted-engine-setup deployed again the engine appliance creating a new instance with different certs. You could try to run on the host: /bin/rm /etc/vdsm/vdsm.conf /bin/rm /etc/pki/vdsm/*/*.pem /bin/rm /etc/pki/CA/cacert.pem /bin/rm /etc/pki/libvirt/*.pem /bin/rm /etc/pki/libvirt/private/*.pem vdsm-tool configure --force systemctl restart vdsmd than try to redeploy the host from the web-ui. Hosted-engine configuration should be keep so it should work. To be sure simply reboot the host: if everything is fine the HA agent should restart your engine VM.

...

Thanks, Gianluca

Gianluca Cecchi

4:28 p.m.

On Thu, Oct 22, 2015 at 3:01 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...

...
Any particular file or section in log files to cross check? I can also start from scratch in case.... just to be sure that I don't get into same problem, so that it can be useful to find it before...

I suspect that that host-deploy fails cause you have in place a leftover VDSM cert from the previous attempt which is still signed by your previous attempt engine and so it fails to match this new engine: on the second attempt hosted-engine-setup deployed again the engine appliance creating a new instance with different certs.

I decided to restart clean and in fact all went well. Last lines of output of "hosted-engine --deploy" ... [ INFO ] Connecting to the Engine [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] The VDSM Host is now operational [ INFO ] Saving hosted-engine configuration on the shared storage domain [ INFO ] Shutting down the engine VM [ INFO ] Enabling and starting HA services Hosted Engine successfully set up [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20151022160359.conf' [ INFO ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination engine is up and admin web portal accessible and host results up. I expected storage to be configured inside admin web portal but apparently I don't see anything already configured and also I don't see the sh engine VM listed... is it correct? Filesystem layout at this time on the hypervisor is this: [root@ovc71 tmp]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 27G 2.7G 24G 11% / devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 4.0K 3.9G 1% /dev/shm tmpfs 3.9G 8.7M 3.9G 1% /run tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/sda1 497M 130M 368M 27% /boot /dev/mapper/OVIRT_DOMAIN-ISO_DOMAIN 5.0G 33M 5.0G 1% /ISO_DOMAIN /dev/mapper/OVIRT_DOMAIN-NFS_DOMAIN 45G 2.7G 43G 6% /NFS_DOMAIN ovc71.localdomain.local:/NFS_DOMAIN 45G 2.7G 43G 6% /rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN /dev/loop1 2.0G 3.1M 1.9G 1% /rhev/data-center/mnt/_var_lib_ovirt-hosted-engine-setup_tmpUEso__Q BTW: what is the 2Gb file system on loop device? I configured the storage domain part as NFS, pointing to ovc71.localdomain.local:/NFS_DOMAIN Reading page at http://www.ovirt.org/Features/Self_Hosted_Engine it is not clear to me what to do next if I want for example keep a single host with its sh engine as a replacement concept of what before was all-in-one... and start creating VMs.... Output of my web admin page: https://drive.google.com/file/d/0BwoPbcrMv8mva3UyMTFDbHdsN3c/view?usp=sharin... Also, I didn't restart my hypervisor yet. WHat should be the shutdown procedure? SImply run shutdown on hypervisor or 1) shutdwn engine vm 2) shutdown hypervisor ? Gianluca

Simone Tiraboschi

5:08 p.m.

On Thu, Oct 22, 2015 at 4:28 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

...

On Thu, Oct 22, 2015 at 3:01 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...
...
Any particular file or section in log files to cross check? I can also start from scratch in case.... just to be sure that I don't get into same problem, so that it can be useful to find it before...

I suspect that that host-deploy fails cause you have in place a leftover VDSM cert from the previous attempt which is still signed by your previous attempt engine and so it fails to match this new engine: on the second attempt hosted-engine-setup deployed again the engine appliance creating a new instance with different certs.

I decided to restart clean and in fact all went well. Last lines of output of "hosted-engine --deploy"

... [ INFO ] Connecting to the Engine [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] The VDSM Host is now operational [ INFO ] Saving hosted-engine configuration on the shared storage domain [ INFO ] Shutting down the engine VM [ INFO ] Enabling and starting HA services Hosted Engine successfully set up [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20151022160359.conf' [ INFO ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination

engine is up and admin web portal accessible and host results up.

I expected storage to be configured inside admin web portal but apparently I don't see anything already configured and also I don't see the sh engine VM listed... is it correct?

No, we have an open bug on that: https://bugzilla.redhat.com/show_bug.cgi?id=1269768 You can try to manually import it in the mean time. But hosted-engine storage domain can just contain the engine VM so you still need to add a regular storage domain for other VMs.

...

Filesystem layout at this time on the hypervisor is this: [root@ovc71 tmp]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 27G 2.7G 24G 11% / devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 4.0K 3.9G 1% /dev/shm tmpfs 3.9G 8.7M 3.9G 1% /run tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/sda1 497M 130M 368M 27% /boot /dev/mapper/OVIRT_DOMAIN-ISO_DOMAIN 5.0G 33M 5.0G 1% /ISO_DOMAIN /dev/mapper/OVIRT_DOMAIN-NFS_DOMAIN 45G 2.7G 43G 6% /NFS_DOMAIN ovc71.localdomain.local:/NFS_DOMAIN 45G 2.7G 43G 6% /rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN /dev/loop1 2.0G 3.1M 1.9G 1% /rhev/data-center/mnt/_var_lib_ovirt-hosted-engine-setup_tmpUEso__Q

BTW: what is the 2Gb file system on loop device?

It was used by hosted-engine-setup as a fake storage pool to bootstrap the hosted-engine storage domain. It shouldn't be there at the end. Could you please attach hosted-engine-setup logs to let me check why it's still there?

...

I configured the storage domain part as NFS, pointing to ovc71.localdomain.local:/NFS_DOMAIN Reading page at http://www.ovirt.org/Features/Self_Hosted_Engine

it is not clear to me what to do next if I want for example keep a single host with its sh engine as a replacement concept of what before was all-in-one... and start creating VMs....

You have to setup your first regular data domain for other VMs: you can add another NFS one.

...

Output of my web admin page:

https://drive.google.com/file/d/0BwoPbcrMv8mva3UyMTFDbHdsN3c/view?usp=sharin...

Also, I didn't restart my hypervisor yet. WHat should be the shutdown procedure? SImply run shutdown on hypervisor or 1) shutdwn engine vm 2) shutdown hypervisor ?

Put the host in global maintenance (otherwise the engine VM will be restarted) Shutdown the engine VM Shutdown the host

...

Gianluca

Gianluca Cecchi

5:28 p.m.

On Thu, Oct 22, 2015 at 5:08 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...

...
engine is up and admin web portal accessible and host results up.

I expected storage to be configured inside admin web portal but apparently I don't see anything already configured and also I don't see the sh engine VM listed... is it correct?

No, we have an open bug on that: https://bugzilla.redhat.com/show_bug.cgi?id=1269768

You can try to manually import it in the mean time.

Ok, I'll try. So when bug solved, if I have a problem with the engine and for example I'm not able to connect to it through ssh and/or web admin console, what other means would I have to connect to it console and check (eg if it is in kernel panic for any reason)? During setup I was proposed to connect via remote-viewer via a temporary password; I imagine this way is not usable after install, correct?

...

But hosted-engine storage domain can just contain the engine VM so you still need to add a regular storage domain for other VMs.

...

Ah, ok. I thought that the initial storage domain would have become the first storage domain for general VMs purposes too... So actually if on dedicated filesystem/device, it could be small in size, say 5-10Gb if I use the appliance, correct?

...

BTW: what is the 2Gb file system on loop device?

...
It was used by hosted-engine-setup as a fake storage pool to bootstrap the hosted-engine storage domain. It shouldn't be there at the end. Could you please attach hosted-engine-setup logs to let me check why it's still there?

here it is: https://drive.google.com/file/d/0BwoPbcrMv8mvRFFKSmR0REN3Qkk/view?usp=sharin...

...

...
I configured the storage domain part as NFS, pointing to ovc71.localdomain.local:/NFS_DOMAIN Reading page at http://www.ovirt.org/Features/Self_Hosted_Engine

it is not clear to me what to do next if I want for example keep a single host with its sh engine as a replacement concept of what before was all-in-one... and start creating VMs....

You have to setup your first regular data domain for other VMs: you can add another NFS one.

OK. In case I want to setup a single host with self hosted engine, could I configure on hypervisor a) one NFS share for sh engine b) one NFS share for ISO DOMAIN c) a local filesystem to be used to create then a local POSIX complant FS storage domain and work this way as a replacement of all-in-one?

...

...
Put the host in global maintenance (otherwise the engine VM will be restarted) Shutdown the engine VM Shutdown the host

...
Ok. And for starting all again, is this correct:

a) power on hypevisor b) hosted-engine --set-maintenance --mode=none other steps required?

Simone Tiraboschi

5:38 p.m.

On Thu, Oct 22, 2015 at 5:28 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

...

On Thu, Oct 22, 2015 at 5:08 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...
...
engine is up and admin web portal accessible and host results up.

I expected storage to be configured inside admin web portal but apparently I don't see anything already configured and also I don't see the sh engine VM listed... is it correct?

No, we have an open bug on that: https://bugzilla.redhat.com/show_bug.cgi?id=1269768

You can try to manually import it in the mean time.

Ok, I'll try. So when bug solved, if I have a problem with the engine and for example I'm not able to connect to it through ssh and/or web admin console, what other means would I have to connect to it console and check (eg if it is in kernel panic for any reason)?

Ehm if the engine VM is not responding you cannot use the engine to check it. But you have an HA agent that monitors it for you and can restart if needed. You can also use ssh with the root password you set via cloud-init.

...

During setup I was proposed to connect via remote-viewer via a temporary password; I imagine this way is not usable after install, correct?

You can use hosted-engine --add-console-password to set another temporary password.

...

...
But hosted-engine storage domain can just contain the engine VM so you still need to add a regular storage domain for other VMs.

...
Ah, ok. I thought that the initial storage domain would have become the first storage domain for general VMs purposes too... So actually if on dedicated filesystem/device, it could be small in size, say 5-10Gb if I use the appliance, correct?

20GB is the minimum recommended value plus you need additional space for ancillary storage domain data structured so I'd avoid allocating less than 25GB.

...

...
BTW: what is the 2Gb file system on loop device?

...
It was used by hosted-engine-setup as a fake storage pool to bootstrap the hosted-engine storage domain. It shouldn't be there at the end. Could you please attach hosted-engine-setup logs to let me check why it's still there?

here it is:

https://drive.google.com/file/d/0BwoPbcrMv8mvRFFKSmR0REN3Qkk/view?usp=sharin...

Thanks

...

...
...
I configured the storage domain part as NFS, pointing to ovc71.localdomain.local:/NFS_DOMAIN Reading page at http://www.ovirt.org/Features/Self_Hosted_Engine

it is not clear to me what to do next if I want for example keep a single host with its sh engine as a replacement concept of what before was all-in-one... and start creating VMs....

You have to setup your first regular data domain for other VMs: you can add another NFS one.

OK. In case I want to setup a single host with self hosted engine, could I configure on hypervisor a) one NFS share for sh engine b) one NFS share for ISO DOMAIN c) a local filesystem to be used to create then a local POSIX complant FS storage domain and work this way as a replacement of all-in-one?

Yes but c is just a workaround, using another external NFS share would help a lot if in the future you plan to add o to migrate to a new server.

...

...
...
Put the host in global maintenance (otherwise the engine VM will be restarted) Shutdown the engine VM Shutdown the host

...
Ok. And for starting all again, is this correct:

a) power on hypevisor b) hosted-engine --set-maintenance --mode=none

other steps required?

No, that's correct

Simone Tiraboschi

5:44 p.m.

On Thu, Oct 22, 2015 at 5:38 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...

On Thu, Oct 22, 2015 at 5:28 PM, Gianluca Cecchi < gianluca.cecchi@gmail.com> wrote:

...
On Thu, Oct 22, 2015 at 5:08 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...
...
engine is up and admin web portal accessible and host results up.

I expected storage to be configured inside admin web portal but apparently I don't see anything already configured and also I don't see the sh engine VM listed... is it correct?

No, we have an open bug on that: https://bugzilla.redhat.com/show_bug.cgi?id=1269768

You can try to manually import it in the mean time.

Ok, I'll try. So when bug solved, if I have a problem with the engine and for example I'm not able to connect to it through ssh and/or web admin console, what other means would I have to connect to it console and check (eg if it is in kernel panic for any reason)?

Ehm if the engine VM is not responding you cannot use the engine to check it. But you have an HA agent that monitors it for you and can restart if needed.

You can also use ssh with the root password you set via cloud-init.

...
During setup I was proposed to connect via remote-viewer via a temporary password; I imagine this way is not usable after install, correct?

You can use hosted-engine --add-console-password to set another temporary password.

...
...
But hosted-engine storage domain can just contain the engine VM so you still need to add a regular storage domain for other VMs.

...
Ah, ok. I thought that the initial storage domain would have become the first storage domain for general VMs purposes too... So actually if on dedicated filesystem/device, it could be small in size, say 5-10Gb if I use the appliance, correct?

20GB is the minimum recommended value plus you need additional space for ancillary storage domain data structured so I'd avoid allocating less than 25GB.

...
...
BTW: what is the 2Gb file system on loop device?

...
It was used by hosted-engine-setup as a fake storage pool to bootstrap the hosted-engine storage domain. It shouldn't be there at the end. Could you please attach hosted-engine-setup logs to let me check why it's still there?

here it is:

https://drive.google.com/file/d/0BwoPbcrMv8mvRFFKSmR0REN3Qkk/view?usp=sharin...

Thanks

That ovirt-hosted-engine-setup attempt used /dev/loop0 and it got correctly detached at the end. 2015-10-22 15:51:28 DEBUG otopi.plugins.ovirt_hosted_engine_setup.storage.storage plugin.executeRaw:828 execute: ('/sbin/losetup', '--detach', u'/dev/loop0'), executable='None', cwd='None', env=None 2015-10-22 15:51:28 DEBUG otopi.plugins.ovirt_hosted_engine_setup.storage.storage plugin.executeRaw:878 execute-result: ('/sbin/losetup', '--detach', u'/dev/loop0'), rc=0 So that /dev/loop3 is probably just a leftover of one of your previous attempts. I think you can safely remove it.

...

...
...
...
I configured the storage domain part as NFS, pointing to ovc71.localdomain.local:/NFS_DOMAIN Reading page at http://www.ovirt.org/Features/Self_Hosted_Engine

it is not clear to me what to do next if I want for example keep a single host with its sh engine as a replacement concept of what before was all-in-one... and start creating VMs....

You have to setup your first regular data domain for other VMs: you can add another NFS one.

OK. In case I want to setup a single host with self hosted engine, could I configure on hypervisor a) one NFS share for sh engine b) one NFS share for ISO DOMAIN c) a local filesystem to be used to create then a local POSIX complant FS storage domain and work this way as a replacement of all-in-one?

Yes but c is just a workaround, using another external NFS share would help a lot if in the future you plan to add o to migrate to a new server.

...
...
...
Put the host in global maintenance (otherwise the engine VM will be restarted) Shutdown the engine VM Shutdown the host

...
Ok. And for starting all again, is this correct:

a) power on hypevisor b) hosted-engine --set-maintenance --mode=none

other steps required?

No, that's correct

Gianluca Cecchi

23 Oct 23 Oct

3:57 p.m.

On Thu, Oct 22, 2015 at 5:38 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...

...
In case I want to setup a single host with self hosted engine, could I configure on hypervisor a) one NFS share for sh engine b) one NFS share for ISO DOMAIN c) a local filesystem to be used to create then a local POSIX complant FS storage domain and work this way as a replacement of all-in-one?

Yes but c is just a workaround, using another external NFS share would help a lot if in the future you plan to add o to migrate to a new server.

...

...
...
...
Put the host in global maintenance (otherwise the engine VM will be restarted) Shutdown the engine VM Shutdown the host

Please note that at some point I had to power off the hypervisor in the

Why do you see this as a workaround, if I plan to have this for example as a devel personal infra without no other hypervisors? I think about better performance directly going local instead of adding overhead of NFS with itself.... previous step, because it was stalled trying to stop two processes: "Watchdog Multiplexing Daemon" and "Shared Storage Lease Manager" https://drive.google.com/file/d/0BwoPbcrMv8mvTVoyNzhRNGpqN1U/view?usp=sharin... It was apparently able to stop the "Watchdog Multiplexing Daemon" after some minutes https://drive.google.com/file/d/0BwoPbcrMv8mvZExNNkw5LVBiXzA/view?usp=sharin... But no way for the Shared Storage Lease Manager and the screen above is when I forced a power off yesterday, after global maintenance and correct shutdown of sh engine and shutdown of hypervisor stalled.

...

...
...
...
Ok. And for starting all again, is this correct:

a) power on hypevisor b) hosted-engine --set-maintenance --mode=none

other steps required?

No, that's correct

Today after powering on hypervisor and waiting about 6 minutes I then ran: [root@ovc71 ~]# ps -ef|grep qemu root 2104 1985 0 15:41 pts/0 00:00:00 grep --color=auto qemu --> as expected no VM in execution [root@ovc71 ~]# systemctl status vdsmd vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Fri 2015-10-23 15:34:46 CEST; 3min 25s ago Process: 1666 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 1745 (vdsm) CGroup: /system.slice/vdsmd.service ├─1745 /usr/bin/python /usr/share/vdsm/vdsm └─1900 /usr/libexec/ioprocess --read-pipe-fd 56 --write-pipe-fd 55 --max-threads 10 --... Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 client step 1 Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 ask_user_info() Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 client step 1 Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 ask_user_info() Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 make_client_response() Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 client step 2 Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 parse_server_challenge() Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 ask_user_info() Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 make_client_response() Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 client step 3 --> I think it is expected that vdsmd starts anyway, even in global maintenance, is it correct? But then: [root@ovc71 ~]# hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 201, in set_global_md_flag with broker.connection(self._retries, self._wait): File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection self.connect(retries, wait) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect raise BrokerConnectionError(error_msg) ovirt_hosted_engine_ha.lib.exceptions.BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1) What to do next?

Simone Tiraboschi

4:42 p.m.

On Fri, Oct 23, 2015 at 3:57 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

...

On Thu, Oct 22, 2015 at 5:38 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...
...
In case I want to setup a single host with self hosted engine, could I configure on hypervisor a) one NFS share for sh engine b) one NFS share for ISO DOMAIN c) a local filesystem to be used to create then a local POSIX complant FS storage domain and work this way as a replacement of all-in-one?

Yes but c is just a workaround, using another external NFS share would help a lot if in the future you plan to add o to migrate to a new server.

Why do you see this as a workaround, if I plan to have this for example as a devel personal infra without no other hypervisors? I think about better performance directly going local instead of adding overhead of NFS with itself....

Just cause you are using as a shared storage something that is not really shared.

...

...
...
...
Put the host in global maintenance (otherwise the engine VM will be restarted) Shutdown the engine VM Shutdown the host

Please note that at some point I had to power off the hypervisor in the previous step, because it was stalled trying to stop two processes: "Watchdog Multiplexing Daemon" and "Shared Storage Lease Manager"

https://drive.google.com/file/d/0BwoPbcrMv8mvTVoyNzhRNGpqN1U/view?usp=sharin...

It was apparently able to stop the "Watchdog Multiplexing Daemon" after some minutes

https://drive.google.com/file/d/0BwoPbcrMv8mvZExNNkw5LVBiXzA/view?usp=sharin...

But no way for the Shared Storage Lease Manager and the screen above is when I forced a power off yesterday, after global maintenance and correct shutdown of sh engine and shutdown of hypervisor stalled.

...
...
...
...
Ok. And for starting all again, is this correct:

a) power on hypevisor b) hosted-engine --set-maintenance --mode=none

other steps required?

No, that's correct

Today after powering on hypervisor and waiting about 6 minutes I then ran:

[root@ovc71 ~]# ps -ef|grep qemu root 2104 1985 0 15:41 pts/0 00:00:00 grep --color=auto qemu

--> as expected no VM in execution

[root@ovc71 ~]# systemctl status vdsmd vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Fri 2015-10-23 15:34:46 CEST; 3min 25s ago Process: 1666 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 1745 (vdsm) CGroup: /system.slice/vdsmd.service ├─1745 /usr/bin/python /usr/share/vdsm/vdsm └─1900 /usr/libexec/ioprocess --read-pipe-fd 56 --write-pipe-fd 55 --max-threads 10 --...

Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 client step 1 Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 ask_user_info() Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 client step 1 Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 ask_user_info() Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 make_client_response() Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 client step 2 Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 parse_server_challenge() Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 ask_user_info() Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 make_client_response() Oct 23 15:34:46 ovc71.localdomain.local python[1745]: DIGEST-MD5 client step 3

--> I think it is expected that vdsmd starts anyway, even in global maintenance, is it correct?

But then:

[root@ovc71 ~]# hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 201, in set_global_md_flag with broker.connection(self._retries, self._wait): File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection self.connect(retries, wait) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect raise BrokerConnectionError(error_msg) ovirt_hosted_engine_ha.lib.exceptions.BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)

What to do next?

Are ovirt-ha-agent and ovirt-ha-broker up and running? Can you please try to restart them via systemd?

Gianluca Cecchi

4:56 p.m.

On Fri, Oct 23, 2015 at 4:42 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...

Are ovirt-ha-agent and ovirt-ha-broker up and running? Can you please try to restart them via systemd?

In the mean time I found inside the logs they failed to start.

I found in broker log the message Thread-1730::ERROR::2015-10-22 17:31:47,016::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=f53854cd-8767-4011-9564-36dc36e0a5d1' Traceback (most recent call last): ... BackendFailureException: path to storage domain f53854cd-8767-4011-9564-36dc36e0a5d1 not found in /rhev/data-center/mnt so probably the NFS part was not in lace yet when the broker attempted to start? I saw that actually I had now [root@ovc71 ovirt-hosted-engine-ha]# ll /rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN total 0 -rwxr-xr-x. 1 vdsm kvm 0 Oct 23 16:46 __DIRECT_IO_TEST__ drwxr-xr-x. 5 vdsm kvm 47 Oct 22 15:49 f53854cd-8767-4011-9564-36dc36e0a5d1 and I was able to run systemctl start ovirt-ha-broker.service and verify it correctly started. and the same for systemctl start ovirt-ha-agent after a couple of minutes the sh engine VM was powered on and I was able to access web admin portal. But if I try to connect to its console with [root@ovc71 ovirt-hosted-engine-ha]# hosted-engine --add-console-password Enter password: code = 0 message = 'Done' and then # remote-viewer --spice-ca-file=/etc/pki/vdsm/libvirt-spice/ca-cert.pem spice://localhost?tls-port=5900 --spice-host-subject="C=EN, L=Test, O=Test, CN=Test" ** (remote-viewer:7173): WARNING **: Couldn't connect to accessibility bus: Failed to connect to socket /tmp/dbus-Gb5xXSKiKK: Connection refused GLib-GIO-Message: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications. (/usr/bin/remote-viewer:7173): Spice-Warning **: ssl_verify.c:492:openssl_verify: ssl: subject 'C=EN, L=Test, O=Test, CN=Test' verification failed (/usr/bin/remote-viewer:7173): Spice-Warning **: ssl_verify.c:494:openssl_verify: ssl: verification failed (remote-viewer:7173): GSpice-WARNING **: main-1:0: SSL_connect: error:00000001:lib(0):func(0):reason(1) I get an error window with Unable to connect to the graphic server spice://localhost?tls-port=5900 [root@ovc71 ovirt-hosted-engine-ha]# netstat -tan | grep 5900 tcp 0 0 0.0.0.0:5900 0.0.0.0:* LISTEN the qemu command line of the sh engine is: qemu 4489 1 23 16:41 ? 00:02:35 /usr/libexec/qemu-kvm -name HostedEngine -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu Nehalem -m 8192 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 9e654c4a-925c-48ba-9818-6908b7714d3a -smbios type=1,manufacturer=oVirt,product=oVirt Node,version=7-1.1503.el7.centos.2.8,serial=97F39B57-FA7D-2A47-9E0E-304705DE227D,uuid=9e654c4a-925c-48ba-9818-6908b7714d3a -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/HostedEngine.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2015-10-23T14:41:23,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-reboot -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/var/run/vdsm/storage/f53854cd-8767-4011-9564-36dc36e0a5d1/45ae3a4a-2190-4494-9419-b7c2af8a7aef/52b97c5b-96ae-4efc-b2e0-f56cde243384,if=none,id=drive-virtio-disk0,format=raw,serial=45ae3a4a-2190-4494-9419-b7c2af8a7aef,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:16:6a:b6,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/9e654c4a-925c-48ba-9818-6908b7714d3a.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/9e654c4a-925c-48ba-9818-6908b7714d3a.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -chardev socket,id=charchannel3,path=/var/lib/libvirt/qemu/channels/9e654c4a-925c-48ba-9818-6908b7714d3a.org.ovirt.hosted-engine-setup.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=4,chardev=charchannel3,id=channel3,name=org.ovirt.hosted-engine-setup.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -spice tls-port=5900,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -msg timestamp=on

Simone Tiraboschi

5:05 p.m.

On Fri, Oct 23, 2015 at 4:56 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

...

On Fri, Oct 23, 2015 at 4:42 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...
Are ovirt-ha-agent and ovirt-ha-broker up and running? Can you please try to restart them via systemd?

In the mean time I found inside the logs they failed to start.

I found in broker log the message Thread-1730::ERROR::2015-10-22 17:31:47,016::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=nfs3 sd_uuid=f53854cd-8767-4011-9564-36dc36e0a5d1' Traceback (most recent call last): ... BackendFailureException: path to storage domain f53854cd-8767-4011-9564-36dc36e0a5d1 not found in /rhev/data-center/mnt

so probably the NFS part was not in lace yet when the broker attempted to start? I saw that actually I had now

[root@ovc71 ovirt-hosted-engine-ha]# ll /rhev/data-center/mnt/ovc71.localdomain.local:_NFS__DOMAIN total 0 -rwxr-xr-x. 1 vdsm kvm 0 Oct 23 16:46 __DIRECT_IO_TEST__ drwxr-xr-x. 5 vdsm kvm 47 Oct 22 15:49 f53854cd-8767-4011-9564-36dc36e0a5d1

and I was able to run

systemctl start ovirt-ha-broker.service and verify it correctly started. and the same for systemctl start ovirt-ha-agent

after a couple of minutes the sh engine VM was powered on and I was able to access web admin portal.

OK, can you please try again the whole reboot procedure just to ensure that it was just a temporary NFS glitch?

...

But if I try to connect to its console with

[root@ovc71 ovirt-hosted-engine-ha]# hosted-engine --add-console-password Enter password: code = 0 message = 'Done'

and then # remote-viewer --spice-ca-file=/etc/pki/vdsm/libvirt-spice/ca-cert.pem spice://localhost?tls-port=5900 --spice-host-subject="C=EN, L=Test, O=Test, CN=Test"

** (remote-viewer:7173): WARNING **: Couldn't connect to accessibility bus: Failed to connect to socket /tmp/dbus-Gb5xXSKiKK: Connection refused GLib-GIO-Message: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications. (/usr/bin/remote-viewer:7173): Spice-Warning **: ssl_verify.c:492:openssl_verify: ssl: subject 'C=EN, L=Test, O=Test, CN=Test' verification failed (/usr/bin/remote-viewer:7173): Spice-Warning **: ssl_verify.c:494:openssl_verify: ssl: verification failed

The issue was here: --spice-host-subject="C=EN, L=Test, O=Test, CN=Test" This one was just the temporary subject used by hosted-engine-setup during the bootstrap sequence when your engine was still to come. At the end that cert got replace by the engine CA signed ones and so you have to substitute that subject to match the one you used during your setup.

...

(remote-viewer:7173): GSpice-WARNING **: main-1:0: SSL_connect: error:00000001:lib(0):func(0):reason(1)

I get an error window with Unable to connect to the graphic server spice://localhost?tls-port=5900

[root@ovc71 ovirt-hosted-engine-ha]# netstat -tan | grep 5900 tcp 0 0 0.0.0.0:5900 0.0.0.0:* LISTEN

the qemu command line of the sh engine is: qemu 4489 1 23 16:41 ? 00:02:35 /usr/libexec/qemu-kvm -name HostedEngine -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu Nehalem -m 8192 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 9e654c4a-925c-48ba-9818-6908b7714d3a -smbios type=1,manufacturer=oVirt,product=oVirt Node,version=7-1.1503.el7.centos.2.8,serial=97F39B57-FA7D-2A47-9E0E-304705DE227D,uuid=9e654c4a-925c-48ba-9818-6908b7714d3a -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/HostedEngine.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2015-10-23T14:41:23,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-reboot -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/var/run/vdsm/storage/f53854cd-8767-4011-9564-36dc36e0a5d1/45ae3a4a-2190-4494-9419-b7c2af8a7aef/52b97c5b-96ae-4efc-b2e0-f56cde243384,if=none,id=drive-virtio-disk0,format=raw,serial=45ae3a4a-2190-4494-9419-b7c2af8a7aef,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:16:6a:b6,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/9e654c4a-925c-48ba-9818-6908b7714d3a.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/9e654c4a-925c-48ba-9818-6908b7714d3a.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -chardev socket,id=charchannel3,path=/var/lib/libvirt/qemu/channels/9e654c4a-925c-48ba-9818-6908b7714d3a.org.ovirt.hosted-engine-setup.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=4,chardev=charchannel3,id=channel3,name=org.ovirt.hosted-engine-setup.0 -chardev pty,id=charconsole0 -device virtconsole,chardev=charconsole0,id=console0 -spice tls-port=5900,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -msg timestamp=on

Gianluca Cecchi

5:55 p.m.

On Fri, Oct 23, 2015 at 5:05 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...

...
OK, can you please try again the whole reboot procedure just to ensure that it was just a temporary NFS glitch?

It seems reproducible. This time I was able to shutdown the hypervisor without manual power off. Only strange thing is that I ran shutdown -h now and actually the VM at some point (I was able to see that the watchdog stopped...) booted.... ? Related lines in messages: Oct 23 17:33:32 ovc71 systemd: Unmounting RPC Pipe File System... Oct 23 17:33:32 ovc71 systemd: Stopping Session 11 of user root. Oct 23 17:33:33 ovc71 systemd: Stopped Session 11 of user root. Oct 23 17:33:33 ovc71 systemd: Stopping user-0.slice. Oct 23 17:33:33 ovc71 systemd: Removed slice user-0.slice. Oct 23 17:33:33 ovc71 systemd: Stopping vdsm-dhclient.slice. Oct 23 17:33:33 ovc71 systemd: Removed slice vdsm-dhclient.slice. Oct 23 17:33:33 ovc71 systemd: Stopping vdsm.slice. Oct 23 17:33:33 ovc71 systemd: Removed slice vdsm.slice. Oct 23 17:33:33 ovc71 systemd: Stopping Sound Card. Oct 23 17:33:33 ovc71 systemd: Stopped target Sound Card. Oct 23 17:33:33 ovc71 systemd: Stopping LVM2 PV scan on device 8:2... Oct 23 17:33:33 ovc71 systemd: Stopping LVM2 PV scan on device 8:16... Oct 23 17:33:33 ovc71 systemd: Stopping Dump dmesg to /var/log/dmesg... Oct 23 17:33:33 ovc71 systemd: Stopped Dump dmesg to /var/log/dmesg. Oct 23 17:33:33 ovc71 systemd: Stopping Watchdog Multiplexing Daemon... Oct 23 17:33:33 ovc71 systemd: Stopping Multi-User System. Oct 23 17:33:33 ovc71 systemd: Stopped target Multi-User System. Oct 23 17:33:33 ovc71 systemd: Stopping ABRT kernel log watcher... Oct 23 17:33:33 ovc71 systemd: Stopping Command Scheduler... Oct 23 17:33:33 ovc71 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="690" x-info="http://www.rsyslog.com"] exiting on signal 15. Oct 23 17:36:24 ovc71 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="697" x-info="http://www.rsyslog.com"] start Oct 23 17:36:21 ovc71 journal: Runtime journal is using 8.0M (max 500.0M, leaving 750.0M of free 4.8G, current limit 500.0M). Oct 23 17:36:21 ovc71 kernel: Initializing cgroup subsys cpuset Coming back with the ovrt processes I see: [root@ovc71 ~]# systemctl status ovirt-ha-broker ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled) Active: inactive (dead) since Fri 2015-10-23 17:36:25 CEST; 31s ago Process: 849 ExecStop=/usr/lib/systemd/systemd-ovirt-ha-broker stop (code=exited, status=0/SUCCESS) Process: 723 ExecStart=/usr/lib/systemd/systemd-ovirt-ha-broker start (code=exited, status=0/SUCCESS) Main PID: 844 (code=exited, status=0/SUCCESS) CGroup: /system.slice/ovirt-ha-broker.service Oct 23 17:36:24 ovc71.localdomain.local systemd-ovirt-ha-broker[723]: Starting ovirt-ha-broker: [... Oct 23 17:36:24 ovc71.localdomain.local systemd[1]: Started oVirt Hosted Engine High Availabili...r. Oct 23 17:36:25 ovc71.localdomain.local systemd-ovirt-ha-broker[849]: Stopping ovirt-ha-broker: [... Hint: Some lines were ellipsized, use -l to show in full. ANd [root@ovc71 ~]# systemctl status nfs-server nfs-server.service - NFS server and services Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled) Active: active (exited) since Fri 2015-10-23 17:36:27 CEST; 1min 9s ago Process: 1123 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=0/SUCCESS) Process: 1113 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS) Main PID: 1123 (code=exited, status=0/SUCCESS) CGroup: /system.slice/nfs-server.service Oct 23 17:36:27 ovc71.localdomain.local systemd[1]: Starting NFS server and services... Oct 23 17:36:27 ovc71.localdomain.local systemd[1]: Started NFS server and services. So it seems that the broker tries to start and fails (17:36:25) before NFS server start phase completes (17:36:27)...? Again if I then manually start ha-broker and ha-agent, they start ok and I'm able to become operational again with the sh engine up systemd file for broker is this [Unit] Description=oVirt Hosted Engine High Availability Communications Broker [Service] Type=forking EnvironmentFile=-/etc/sysconfig/ovirt-ha-broker ExecStart=/usr/lib/systemd/systemd-ovirt-ha-broker start ExecStop=/usr/lib/systemd/systemd-ovirt-ha-broker stop [Install] WantedBy=multi-user.target Probably inside the [unit] section I should add After=nfs-server.service but this should be true only for sh engine configured with NFS.... so to be done at install/setup time? If you want I can set this change for my environment and verify...

...

The issue was here: --spice-host-subject="C=EN, L=Test, O=Test, CN=Test" This one was just the temporary subject used by hosted-engine-setup during the bootstrap sequence when your engine was still to come. At the end that cert got replace by the engine CA signed ones and so you have to substitute that subject to match the one you used during your setup.

Even using correct certificate I have problem On hypervisor [root@ovc71 ~]# openssl x509 -in /etc/pki/vdsm/libvirt-spice/ca-cert.pem -text | grep Subject Subject: C=US, O=localdomain.local, CN=shengine.localdomain.local.75331 Subject Public Key Info: X509v3 Subject Key Identifier: On engine [root@shengine ~]# openssl x509 -in /etc/pki/ovirt-engine/ca.pem -text | grep Subject Subject: C=US, O=localdomain.local, CN=shengine.localdomain.local.75331 Subject Public Key Info: X509v3 Subject Key Identifier: but [root@ovc71 ~]# hosted-engine --add-console-password Enter password: code = 0 message = 'Done' [root@ovc71 ~]# remote-viewer --spice-ca-file=/etc/pki/vdsm/libvirt-spice/ca-cert.pem spice://localhost?tls-port=5900 --spice-host-subject="C=US, O=localdomain.local, CN=shengine.localdomain.local.75331" ** (remote-viewer:4297): WARNING **: Couldn't connect to accessibility bus: Failed to connect to socket /tmp/dbus-Gb5xXSKiKK: Connection refused GLib-GIO-Message: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications. (/usr/bin/remote-viewer:4297): Spice-Warning **: ssl_verify.c:492:openssl_verify: ssl: subject 'C=US, O=localdomain.local, CN=shengine.localdomain.local.75331' verification failed (/usr/bin/remote-viewer:4297): Spice-Warning **: ssl_verify.c:494:openssl_verify: ssl: verification failed (remote-viewer:4297): GSpice-WARNING **: main-1:0: SSL_connect: error:00000001:lib(0):func(0):reason(1) and the remote-viewer window with Unable to connect to the graphic server spice://localhost?tls-port=5900

Simone Tiraboschi

6:10 p.m.

On Fri, Oct 23, 2015 at 5:55 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:

...

On Fri, Oct 23, 2015 at 5:05 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...
...
OK, can you please try again the whole reboot procedure just to ensure that it was just a temporary NFS glitch?

It seems reproducible.

This time I was able to shutdown the hypervisor without manual power off. Only strange thing is that I ran

shutdown -h now

and actually the VM at some point (I was able to see that the watchdog stopped...) booted.... ?

Related lines in messages: Oct 23 17:33:32 ovc71 systemd: Unmounting RPC Pipe File System... Oct 23 17:33:32 ovc71 systemd: Stopping Session 11 of user root. Oct 23 17:33:33 ovc71 systemd: Stopped Session 11 of user root. Oct 23 17:33:33 ovc71 systemd: Stopping user-0.slice. Oct 23 17:33:33 ovc71 systemd: Removed slice user-0.slice. Oct 23 17:33:33 ovc71 systemd: Stopping vdsm-dhclient.slice. Oct 23 17:33:33 ovc71 systemd: Removed slice vdsm-dhclient.slice. Oct 23 17:33:33 ovc71 systemd: Stopping vdsm.slice. Oct 23 17:33:33 ovc71 systemd: Removed slice vdsm.slice. Oct 23 17:33:33 ovc71 systemd: Stopping Sound Card. Oct 23 17:33:33 ovc71 systemd: Stopped target Sound Card. Oct 23 17:33:33 ovc71 systemd: Stopping LVM2 PV scan on device 8:2... Oct 23 17:33:33 ovc71 systemd: Stopping LVM2 PV scan on device 8:16... Oct 23 17:33:33 ovc71 systemd: Stopping Dump dmesg to /var/log/dmesg... Oct 23 17:33:33 ovc71 systemd: Stopped Dump dmesg to /var/log/dmesg. Oct 23 17:33:33 ovc71 systemd: Stopping Watchdog Multiplexing Daemon... Oct 23 17:33:33 ovc71 systemd: Stopping Multi-User System. Oct 23 17:33:33 ovc71 systemd: Stopped target Multi-User System. Oct 23 17:33:33 ovc71 systemd: Stopping ABRT kernel log watcher... Oct 23 17:33:33 ovc71 systemd: Stopping Command Scheduler... Oct 23 17:33:33 ovc71 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="690" x-info="http://www.rsyslog.com"] exiting on signal 15. Oct 23 17:36:24 ovc71 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="697" x-info="http://www.rsyslog.com"] start Oct 23 17:36:21 ovc71 journal: Runtime journal is using 8.0M (max 500.0M, leaving 750.0M of free 4.8G, current limit 500.0M). Oct 23 17:36:21 ovc71 kernel: Initializing cgroup subsys cpuset

Coming back with the ovrt processes I see:

[root@ovc71 ~]# systemctl status ovirt-ha-broker ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; enabled) Active: inactive (dead) since Fri 2015-10-23 17:36:25 CEST; 31s ago Process: 849 ExecStop=/usr/lib/systemd/systemd-ovirt-ha-broker stop (code=exited, status=0/SUCCESS) Process: 723 ExecStart=/usr/lib/systemd/systemd-ovirt-ha-broker start (code=exited, status=0/SUCCESS) Main PID: 844 (code=exited, status=0/SUCCESS) CGroup: /system.slice/ovirt-ha-broker.service

Oct 23 17:36:24 ovc71.localdomain.local systemd-ovirt-ha-broker[723]: Starting ovirt-ha-broker: [... Oct 23 17:36:24 ovc71.localdomain.local systemd[1]: Started oVirt Hosted Engine High Availabili...r. Oct 23 17:36:25 ovc71.localdomain.local systemd-ovirt-ha-broker[849]: Stopping ovirt-ha-broker: [... Hint: Some lines were ellipsized, use -l to show in full.

ANd [root@ovc71 ~]# systemctl status nfs-server nfs-server.service - NFS server and services Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled) Active: active (exited) since Fri 2015-10-23 17:36:27 CEST; 1min 9s ago Process: 1123 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=0/SUCCESS) Process: 1113 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS) Main PID: 1123 (code=exited, status=0/SUCCESS) CGroup: /system.slice/nfs-server.service

Oct 23 17:36:27 ovc71.localdomain.local systemd[1]: Starting NFS server and services... Oct 23 17:36:27 ovc71.localdomain.local systemd[1]: Started NFS server and services.

So it seems that the broker tries to start and fails (17:36:25) before NFS server start phase completes (17:36:27)...?

Again if I then manually start ha-broker and ha-agent, they start ok and I'm able to become operational again with the sh engine up

systemd file for broker is this

[Unit] Description=oVirt Hosted Engine High Availability Communications Broker

[Service] Type=forking EnvironmentFile=-/etc/sysconfig/ovirt-ha-broker ExecStart=/usr/lib/systemd/systemd-ovirt-ha-broker start ExecStop=/usr/lib/systemd/systemd-ovirt-ha-broker stop

[Install] WantedBy=multi-user.target

Probably inside the [unit] section I should add After=nfs-server.service

Ok, I understood. You are right: the broker was failing cause the NFS storage was not ready cause it was served in loopback and there isn't any explicit service dependency on that. We are not imposing it cause generally an NFS shared domain is generally thought to be served from and external system while a loopback NFS is just a degenerate case. Simply fix it manually.

...

but this should be true only for sh engine configured with NFS.... so to be done at install/setup time?

If you want I can set this change for my environment and verify...

...
The issue was here: --spice-host-subject="C=EN, L=Test, O=Test, CN=Test" This one was just the temporary subject used by hosted-engine-setup during the bootstrap sequence when your engine was still to come. At the end that cert got replace by the engine CA signed ones and so you have to substitute that subject to match the one you used during your setup.

Even using correct certificate I have problem On hypervisor

[root@ovc71 ~]# openssl x509 -in /etc/pki/vdsm/libvirt-spice/ca-cert.pem -text | grep Subject Subject: C=US, O=localdomain.local, CN=shengine.localdomain.local.75331 Subject Public Key Info: X509v3 Subject Key Identifier:

On engine [root@shengine ~]# openssl x509 -in /etc/pki/ovirt-engine/ca.pem -text | grep Subject Subject: C=US, O=localdomain.local, CN=shengine.localdomain.local.75331 Subject Public Key Info: X509v3 Subject Key Identifier:

but

[root@ovc71 ~]# hosted-engine --add-console-password Enter password: code = 0 message = 'Done'

[root@ovc71 ~]# remote-viewer --spice-ca-file=/etc/pki/vdsm/libvirt-spice/ca-cert.pem spice://localhost?tls-port=5900 --spice-host-subject="C=US, O=localdomain.local, CN=shengine.localdomain.local.75331"

it should be: remote-viewer --spice-ca-file=/etc/pki/vdsm/libvirt-spice/ca-cert.pem spice://ovc71.localdomain.local?tls-port=5900 --spice-host-subject="C=US, O=localdomain.local, CN=ovc71.localdomain.local"

...

** (remote-viewer:4297): WARNING **: Couldn't connect to accessibility bus: Failed to connect to socket /tmp/dbus-Gb5xXSKiKK: Connection refused GLib-GIO-Message: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications. (/usr/bin/remote-viewer:4297): Spice-Warning **: ssl_verify.c:492:openssl_verify: ssl: subject 'C=US, O=localdomain.local, CN=shengine.localdomain.local.75331' verification failed (/usr/bin/remote-viewer:4297): Spice-Warning **: ssl_verify.c:494:openssl_verify: ssl: verification failed

(remote-viewer:4297): GSpice-WARNING **: main-1:0: SSL_connect: error:00000001:lib(0):func(0):reason(1)

and the remote-viewer window with

Unable to connect to the graphic server spice://localhost?tls-port=5900

Gianluca Cecchi

6:26 p.m.

On Fri, Oct 23, 2015 at 6:10 PM, Simone Tiraboschi <stirabos@redhat.com> wrote:

...

...
Probably inside the [unit] section I should add After=nfs-server.service

Ok, I understood. You are right: the broker was failing cause the NFS storage was not ready cause it was served in loopback and there isn't any explicit service dependency on that.

We are not imposing it cause generally an NFS shared domain is generally thought to be served from and external system while a loopback NFS is just a degenerate case. Simply fix it manually.

OK, understod. Done and the fix works as expected.

...

it should be: remote-viewer --spice-ca-file=/etc/pki/vdsm/libvirt-spice/ca-cert.pem spice://ovc71.localdomain.local?tls-port=5900 --spice-host-subject="C=US, O=localdomain.local, CN=ovc71.localdomain.local"

same error... [root@ovc71 ~]# remote-viewer --spice-ca-file=/etc/pki/vdsm/libvirt-spice/ca-cert.pem spice://ovc71.localdomain.local?tls-port=5900 --spice-host-subject="C=US, O=localdomain.local, CN=ovc71.localdomain.local" ** (remote-viewer:4788): WARNING **: Couldn't connect to accessibility bus: Failed to connect to socket /tmp/dbus-Gb5xXSKiKK: Connection refused GLib-GIO-Message: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications. (/usr/bin/remote-viewer:4788): Spice-Warning **: ssl_verify.c:492:openssl_verify: ssl: subject 'C=US, O=localdomain.local, CN=ovc71.localdomain.local' verification failed (/usr/bin/remote-viewer:4788): Spice-Warning **: ssl_verify.c:494:openssl_verify: ssl: verification failed (remote-viewer:4788): GSpice-WARNING **: main-1:0: SSL_connect: error:00000001:lib(0):func(0):reason(1) even if I copy the /etc/pki/vdsm/libvirt-spice/ca-cert.pem from hypervisor to my pc in /tmp and run: [g.cecchi@ope46 ~]$ remote-viewer --spice-ca-file=/tmp/ca-cert.pem spice://ovc71.localdomain.local?tls-port=5900 --spice-host-subject="C=US, O=localdomain.local, CN=ovc71.localdomain.local"(/usr/bin/remote-viewer:8915): Spice-Warning **: ssl_verify.c:493:openssl_verify: ssl: subject 'C=US, O=localdomain.local, CN=ovc71.localdomain.local' verification failed (/usr/bin/remote-viewer:8915): Spice-Warning **: ssl_verify.c:495:openssl_verify: ssl: verification failed (remote-viewer:8915): GSpice-WARNING **: main-1:0: SSL_connect: error:00000001:lib(0):func(0):reason(1)

3694

Age (days ago)

3696

Last active (days ago)

List overview

Download

22 comments

3 participants

participants (3)

Gianluca Cecchi
Simone Tiraboschi
Yedidyah Bar David

Testing self hosted engine in 3.6: hostname not resolved error

tags

participants (3)