Progress: I managed to get a host setup and then a vm setup on the host,
but the VM won't start. I'm looking for some help getting the VM to
start. Details...
I was able to work around the VDSM host in a failed state issue (I think),
by doing a host reinstall from the engine web GUI, after specifying the the
root password for the host. That is, I changed to password authentication,
instead of what was originally set to SSH PublicKey authentication. After
the host reinstall completed, I was left with what appears to be an
all-in-one setup with the engine and a host running on the same server.
Maybe this would have worked automatically, without the re-install step, if
I had ovirt SSH keys setup for root as expected?
Question: Any hints on to how to debug this in terms of host install SSH
PublicKey auth not working? What user is attempting to SSH from the engine
to the host with what private key to complete the setup? I see an
ovirt-engine public key that was added to /root/.ssh/authorized_keys. Is
this the public key that's expected to work with the automated all-in-one
host setup? If so, where's the corresponding private key so I might
attempt to debug further?
After adding a virtual local disk to the new host and setting up a new VM
through the web interface, I tried to attempt to start the VM but got this
permission denied error in engine.log:
2015-12-29 12:01:47,814 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(ForkJoinPool-1-worker-3) [] Correlation ID: null, Call Stack: null, Custom
Event ID: -1, Message: VM nycovirttest1 is down with error. Exit message:
internal error: process exited while connecting to monitor:
2015-12-29T17:01:45.424728Z qemu-kvm: -drive
file=/rhev/data-center/62bbbc20-00c6-46b3-8ac9-2677a4da69e4/954ebee1-521b-454d-b54c-db9516f36cff/images/e87e1633-7afc-4835-931e-c444e1cf72f5/f6dc9f53-b373-4c96-9015-25b316dcbf0a,if=none,id=drive-virtio-disk0,format=raw,serial=e87e1633-7afc-4835-931e-c444e1cf72f5,cache=none,werror=stop,rerror=stop,aio=threads:
Could not open
'/rhev/data-center/62bbbc20-00c6-46b3-8ac9-2677a4da69e4/954ebee1-521b-454d-b54c-db9516f36cff/images/e87e1633-7afc-4835-931e-c444e1cf72f5/f6dc9f53-b373-4c96-9015-25b316dcbf0a':
Permission denied
Question: How can I debug this VM startup permission denied error further?
Is there a command line I can issue that's similar to what ovirt engine is
attempting that can give me more detail? Something I can strace?
The vdsm user can read the above referenced image just fine, so it's
something more than just simple filesystem permissions I think?:
$ id
uid=36(vdsm) gid=36(kvm) groups=36(kvm),107(qemu),179(sanlock)
$ file
/rhev/data-center/62bbbc20-00c6-46b3-8ac9-2677a4da69e4/954ebee1-521b-454d-b54c-db9516f36cff/images/e87e1633-7afc-4835-931e-c444e1cf72f5/f6dc9f53-b373-4c96-9015-25b316dcbf0a
/rhev/data-center/62bbbc20-00c6-46b3-8ac9-2677a4da69e4/954ebee1-521b-454d-b54c-db9516f36cff/images/e87e1633-7afc-4835-931e-c444e1cf72f5/f6dc9f53-b373-4c96-9015-25b316dcbf0a:
data
Thanks for your help,
-Matthew
On Mon, Dec 28, 2015 at 12:46 PM, Matthew Bohnsack <bohnsack(a)gmail.com>
wrote:
Hello,
I am attempting to build a fairly basic proof of concept ovirt 3.6 machine
with the engine, host and guests all on a single physical CentOS 7.2 box
with local storage (to start), but am running into issues where
engine-install produces the following error:
[ ERROR ] The VDSM host was found in a failed state. Please check engine
and bootstrap installation logs.
The engine webconsole seems to be working fine after this - I can login
and click around - but there's no host CPU/storage resources available, as
expected. From the logs shown below, there seems to be an issue with the
vdsm installation process being unable to contact the engine host during
installation. Any ideas what's going wrong and/or what I can do to debug
things further so I can move beyond this issue?
Thanks,
-Matthew
Steps I took to install and diagnose the problem:
1. Installed system with our configuration management system.
2. Deleted users using UID=36 (needed by vdsm user) and UID=108 (needed by
ovirt user).
3. Ensured that /etc/sudoers contained the line "#includedir
/etc/sudoers.d" so that /etc/sudoers.d/50_vdsm will take effect.
4. Made directories for isos and images:
# mkdir /state/partition1/images/; chmod 777 /state/partition1/images/
# mkdir /state/partition1/iso/; chmod 777 /state/partition1/iso/
5. Ensured selinux was disabled and no firewall rules were installed.
6. Installed RPMs:
# yum -y install
http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm
# yum -y install ovirt-engine ovirt-engine-setup-plugin-allinone
# rpm -qa | grep ovirt-release36
ovirt-release36-002-2.noarch
7. Installed engine with an all-in-one configuration (Configure VDSM on
this host? Yes):
# cat /root/ovirt-engine.ans
# action=setup
[environment:default]
OVESETUP_DIALOG/confirmSettings=bool:True
OVESETUP_CONFIG/applicationMode=str:virt
OVESETUP_CONFIG/remoteEngineSetupStyle=none:None
OVESETUP_CONFIG/sanWipeAfterDelete=bool:False
OVESETUP_CONFIG/storageIsLocal=bool:False
OVESETUP_CONFIG/firewallManager=none:None
OVESETUP_CONFIG/remoteEngineHostRootPassword=none:None
OVESETUP_CONFIG/firewallChangesReview=none:None
OVESETUP_CONFIG/updateFirewall=bool:False
OVESETUP_CONFIG/remoteEngineHostSshPort=none:None
OVESETUP_CONFIG/fqdn=<...host.fqdn...>
OVESETUP_CONFIG/storageType=none:None
OSETUP_RPMDISTRO/requireRollback=none:None
OSETUP_RPMDISTRO/enableUpgrade=none:None
OVESETUP_DB/secured=bool:False
OVESETUP_DB/host=str:localhost
OVESETUP_DB/user=str:engine
OVESETUP_DB/dumper=str:pg_custom
OVESETUP_DB/database=str:engine
OVESETUP_DB/fixDbViolations=none:None
OVESETUP_DB/port=int:5432
OVESETUP_DB/filter=none:None
OVESETUP_DB/restoreJobs=int:2
OVESETUP_DB/securedHostValidation=bool:False
OVESETUP_ENGINE_CORE/enable=bool:True
OVESETUP_CORE/engineStop=none:None
OVESETUP_SYSTEM/memCheckEnabled=bool:True
OVESETUP_SYSTEM/nfsConfigEnabled=bool:True
OVESETUP_PKI/organization=str:<...dn...>
OVESETUP_PKI/renew=none:None
OVESETUP_CONFIG/isoDomainName=str:ISO_DOMAIN
OVESETUP_CONFIG/engineHeapMax=str:7975M
OVESETUP_CONFIG/adminPassword=str:<...password...>
OVESETUP_CONFIG/isoDomainACL=str:*(rw)
OVESETUP_CONFIG/isoDomainMountPoint=str:/state/partition1/iso
OVESETUP_CONFIG/engineDbBackupDir=str:/var/lib/ovirt-engine/backups
OVESETUP_CONFIG/engineHeapMin=str:7975M
OVESETUP_AIO/configure=bool:True
OVESETUP_AIO/storageDomainName=str:local_storage
OVESETUP_AIO/storageDomainDir=str:/state/partition1/images/
OVESETUP_PROVISIONING/postgresProvisioningEnabled=bool:True
OVESETUP_APACHE/configureRootRedirection=bool:True
OVESETUP_APACHE/configureSsl=bool:True
OVESETUP_VMCONSOLE_PROXY_CONFIG/vmconsoleProxyConfig=bool:True
OVESETUP_ENGINE_CONFIG/fqdn=str:<...fqdn...>
OVESETUP_CONFIG/websocketProxyConfig=bool:True
# engine-setup --config-append=/root/ovirt-engine.ans
...
[ INFO ] Starting engine service
[ INFO ] Restarting httpd
[ INFO ] Waiting for VDSM host to become operational. This may take
several minutes...
[ ERROR ] The VDSM host was found in a failed state. Please check engine
and bootstrap installation logs.
[WARNING] Local storage domain not added because the VDSM host was not up.
Please add it manually.
[ INFO ] Restarting ovirt-vmconsole proxy service
[ INFO ] Stage: Clean up
Log file is located at
/var/log/ovirt-engine/setup/ovirt-engine-setup-20151228112813-iksxhe.log
[ INFO ] Generating answer file
'/var/lib/ovirt-engine/setup/answers/20151228113003-setup.conf'
[ INFO ] Stage: Pre-termination
[ INFO ] Stage: Termination
[ INFO ] Execution of setup completed successfully
8. Examined /var/log/ovirt-engine/setup/ovirt-engine-setup-20151228112813-iksxhe.log
and found this error message which seems to indicate that the vdsm
installation process was unable to contact the engine:
2015-12-28 11:29:29 DEBUG otopi.plugins.otopi.services.systemd
plugin.execute:941 execute-output: ('/bin/systemctl', 'start',
'httpd.service') stderr:
2015-12-28 11:29:29 DEBUG otopi.context context._executeMethod:142 Stage
closeup METHOD
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi.Plugin._closeup
2015-12-28 11:29:29 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._closeup:211 Connecting to the Engine
2015-12-28 11:29:29 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._waitEngineUp:103 Waiting Engine API response
2015-12-28 11:29:29 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._waitEngineUp:133 Cannot connect to engine
Traceback (most recent call last):
File
"/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/all-in-one/vdsmi.py",
line 127, in _waitEngineUp
insecure=True,
File "/usr/lib/python2.7/site-packages/ovirtsdk/api.py", line 191, in
__init__
url=''
File
"/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py", line
115, in request
persistent_auth=self.__persistent_auth
File
"/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py",
line 79, in do_request
persistent_auth)
File
"/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py",
line 155, in __do_request
raise errors.RequestError(response_code, response_reason,
response_body)
RequestError: ^M
status: 503^M
reason: Service Unavailable^M
detail:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>503 Service Unavailable</title>
</head><body>
<h1>Service Unavailable</h1>
<p>The server is temporarily unable to service your
request due to maintenance downtime or capacity
problems. Please try again later.</p>
</body></html>
2015-12-28 11:29:36 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._waitEngineUp:133 Cannot connect to engine
Traceback (most recent call last):
File
"/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/all-in-one/vdsmi.py",
line 127, in _waitEngineUp
insecure=True,
File "/usr/lib/python2.7/site-packages/ovirtsdk/api.py", line 191, in
__init__
url=''
File
"/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py", line
115, in request
persistent_auth=self.__persistent_auth
File
"/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py",
line 79, in do_request
persistent_auth)
File
"/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py",
line 155, in __do_request
raise errors.RequestError(response_code, response_reason,
response_body)
RequestError: ^M
status: 404^M
reason: Not Found^M
detail:
...
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-type" content="text/html;
charset=utf-8" />
<link id="id-link-favicon" rel="shortcut icon"
href="/ovirt-engine/theme-resource/favicon" type="image/x-icon"
/>
<title>404 - Page not found</title>
...
</html>
2015-12-28 11:29:50 DEBUG otopi.ovirt_engine_setup.engine_common.database
database.execute:171 Database: 'None', Statement: '
select version, option_value
from vdc_options
where option_name = %(name)s
', args: {'name': 'SupportedClusterLevels'}
2015-12-28 11:29:50 DEBUG otopi.ovirt_engine_setup.engine_common.database
database.execute:221 Result: [{'version': 'general',
'option_value':
'3.0,3.1,3.2,3.3,3.4,3.5,3.6'}]
2015-12-28 11:29:50 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._closeup:225 engine SupportedClusterLevels
[3.0,3.1,3.2,3.3,3.4,3.5,3.6], PACKAGE_VERSION [3.6.1.3],
2015-12-28 11:29:50 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._getSupportedClusterLevels:181 Attempting to load the dsaversion vdsm
module
2015-12-28 11:29:50 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._closeup:236 VDSM SupportedClusterLevels [['3.4', '3.5',
'3.6']],
VDSM VERSION [4.17.13-0.el7.centos],
2015-12-28 11:29:50 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._closeup:259 Creating the local data center
2015-12-28 11:29:50 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._closeup:269 Creating the local cluster into the local data center
2015-12-28 11:29:52 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._closeup:284 Adding the local host to the local cluster
2015-12-28 11:29:55 INFO
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._waitVDSMHostUp:58 Waiting for VDSM host to become operational. This
may take several minutes...
2015-12-28 11:29:55 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._waitVDSMHostUp:87 VDSM host in installing state
2015-12-28 11:29:56 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._waitVDSMHostUp:87 VDSM host in installing state
2015-12-28 11:29:58 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._waitVDSMHostUp:87 VDSM host in installing state
2015-12-28 11:29:59 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._waitVDSMHostUp:87 VDSM host in installing state
2015-12-28 11:30:00 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._waitVDSMHostUp:87 VDSM host in installing state
2015-12-28 11:30:01 DEBUG
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._waitVDSMHostUp:87 VDSM host in installing state
2015-12-28 11:30:02 ERROR
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._waitVDSMHostUp:77 The VDSM host was found in a failed state. Please
check engine and bootstrap installation logs.
2015-12-28 11:30:02 WARNING
otopi.plugins.ovirt_engine_setup.ovirt_engine.all-in-one.vdsmi
vdsmi._closeup:306 Local storage domain not added because the VDSM host was
not up. Please add it manually.
2015-12-28 11:30:02 DEBUG otopi.context context._executeMethod:142 Stage
closeup METHOD
otopi.plugins.ovirt_engine_setup.vmconsole_proxy_helper.system.Plugin._closeup
9. Looked at the vdsm logs and examined service status:
# ls -l /var/log/vdsm/
total 8
drwxr-xr-x 2 vdsm kvm 6 Dec 9 03:24 backup
-rw-r--r-- 1 vdsm kvm 0 Dec 28 11:13 connectivity.log
-rw-r--r-- 1 vdsm kvm 0 Dec 28 11:13 mom.log
-rw-r--r-- 1 root root 2958 Dec 28 11:30 supervdsm.log
-rw-r--r-- 1 root root 1811 Dec 28 11:30 upgrade.log
-rw-r--r-- 1 vdsm kvm 0 Dec 28 11:13 vdsm.log
# cat /var/log/vdsm/upgrade.log
MainThread::DEBUG::2015-12-28
11:30:02,803::upgrade::90::upgrade::(apply_upgrade) Running upgrade
upgrade-unified-persistence
MainThread::DEBUG::2015-12-28
11:30:02,806::libvirtconnection::160::root::(get) trying to connect libvirt
MainThread::DEBUG::2015-12-28 11:30:02,813::utils::669::root::(execCmd)
/sbin/ip route show to 0.0.0.0/0 table main (cwd None)
MainThread::DEBUG::2015-12-28 11:30:02,826::utils::687::root::(execCmd)
SUCCESS: <err> = ''; <rc> = 0
MainThread::DEBUG::2015-12-28
11:30:02,826::unified_persistence::46::root::(run)
upgrade-unified-persistence upgrade persisting networks {} and bondings {}
MainThread::INFO::2015-12-28
11:30:02,827::netconfpersistence::179::root::(_clearDisk) Clearing
/var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/
MainThread::DEBUG::2015-12-28
11:30:02,827::netconfpersistence::187::root::(_clearDisk) No existent
config to clear.
MainThread::INFO::2015-12-28
11:30:02,827::netconfpersistence::179::root::(_clearDisk) Clearing
/var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/
MainThread::DEBUG::2015-12-28
11:30:02,827::netconfpersistence::187::root::(_clearDisk) No existent
config to clear.
MainThread::INFO::2015-12-28
11:30:02,827::netconfpersistence::129::root::(save) Saved new config
RunningConfig({}, {}) to /var/run/vdsm/netconf/nets/ and
/var/run/vdsm/netconf/bonds/
MainThread::DEBUG::2015-12-28 11:30:02,827::utils::669::root::(execCmd)
/usr/share/vdsm/vdsm-store-net-config unified (cwd None)
MainThread::DEBUG::2015-12-28 11:30:02,836::utils::687::root::(execCmd)
SUCCESS: <err> = 'cp: cannot stat
\xe2\x80\x98/var/run/vdsm/netconf\xe2\x80\x99: No such file or
directory\n'; <rc> = 0
MainThread::DEBUG::2015-12-28
11:30:02,837::upgrade::51::upgrade::(_upgrade_seal) Upgrade
upgrade-unified-persistence successfully performed
# cat /var/log/vdsm/supervdsm.log
MainThread::DEBUG::2015-12-28
11:30:02,415::supervdsmServer::539::SuperVdsm.Server::(main) Making sure
I'm root - SuperVdsm
MainThread::DEBUG::2015-12-28
11:30:02,415::supervdsmServer::548::SuperVdsm.Server::(main) Parsing cmd
args
MainThread::DEBUG::2015-12-28
11:30:02,415::supervdsmServer::551::SuperVdsm.Server::(main) Cleaning old
socket /var/run/vdsm/svdsm.sock
MainThread::DEBUG::2015-12-28
11:30:02,415::supervdsmServer::555::SuperVdsm.Server::(main) Setting up
keep alive thread
MainThread::DEBUG::2015-12-28
11:30:02,415::supervdsmServer::561::SuperVdsm.Server::(main) Creating
remote object manager
MainThread::DEBUG::2015-12-28
11:30:02,416::fileUtils::192::Storage.fileUtils::(chown) Changing owner for
/var/run/vdsm/svdsm.sock, to (36:36)
MainThread::DEBUG::2015-12-28
11:30:02,416::supervdsmServer::572::SuperVdsm.Server::(main) Started
serving super vdsm object
sourceRoute::DEBUG::2015-12-28
11:30:02,416::sourceroutethread::79::root::(_subscribeToInotifyLoop)
sourceRouteThread.subscribeToInotifyLoop started
restore-net::DEBUG::2015-12-28
11:30:03,080::libvirtconnection::160::root::(get) trying to connect libvirt
restore-net::INFO::2015-12-28
11:30:03,188::vdsm-restore-net-config::86::root::(_restore_sriov_numvfs)
SRIOV network device which is not persisted found at: 0000:01:00.1.
restore-net::INFO::2015-12-28
11:30:03,189::vdsm-restore-net-config::86::root::(_restore_sriov_numvfs)
SRIOV network device which is not persisted found at: 0000:01:00.0.
restore-net::INFO::2015-12-28
11:30:03,189::vdsm-restore-net-config::86::root::(_restore_sriov_numvfs)
SRIOV network device which is not persisted found at: 0000:01:00.3.
restore-net::INFO::2015-12-28
11:30:03,189::vdsm-restore-net-config::86::root::(_restore_sriov_numvfs)
SRIOV network device which is not persisted found at: 0000:01:00.2.
restore-net::INFO::2015-12-28
11:30:03,189::vdsm-restore-net-config::385::root::(restore) starting
network restoration.
restore-net::DEBUG::2015-12-28
11:30:03,189::vdsm-restore-net-config::183::root::(_remove_networks_in_running_config)
Not cleaning running configuration since it is empty.
restore-net::INFO::2015-12-28
11:30:03,205::netconfpersistence::179::root::(_clearDisk) Clearing
/var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/
restore-net::DEBUG::2015-12-28
11:30:03,206::netconfpersistence::187::root::(_clearDisk) No existent
config to clear.
restore-net::INFO::2015-12-28
11:30:03,206::netconfpersistence::129::root::(save) Saved new config
RunningConfig({}, {}) to /var/run/vdsm/netconf/nets/ and
/var/run/vdsm/netconf/bonds/
restore-net::DEBUG::2015-12-28
11:30:03,207::vdsm-restore-net-config::329::root::(_wait_for_for_all_devices_up)
All devices are up.
restore-net::INFO::2015-12-28
11:30:03,214::netconfpersistence::71::root::(setBonding) Adding
bond0({'nics': [], 'options': ''})
restore-net::INFO::2015-12-28
11:30:03,214::vdsm-restore-net-config::396::root::(restore) restoration
completed successfully.
# systemctl status supervdsmd
● supervdsmd.service - Auxiliary vdsm service for running helper functions
as root
Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static;
vendor preset: enabled)
Active: active (running) since Mon 2015-12-28 11:30:02 EST; 1h 8min ago
Main PID: 81535 (supervdsmServer)
CGroup: /system.slice/supervdsmd.service
└─81535 /usr/bin/python /usr/share/vdsm/supervdsmServer
--sockfile /var/run/vdsm/svdsm.sock
Dec 28 11:30:02 hostname systemd[1]: Started Auxiliary vdsm service for
running helper functions as root.
Dec 28 11:30:02 hostname systemd[1]: Starting Auxiliary vdsm service for
running helper functions as root.