Hi Adam,
Thanks for looking! The storage is fibre attached and I've verified with the SAN folks nothing went wonky during this window on their side.
Here is what I've got from vdsm.log during the window (and a bit surrounding it for context):
libvirtEventLoop::WARNING::2017-02-16 08:35:17,435::utils::140:: root::(rmFile) File: /var/lib/libvirt/qemu/ channels/ba806b93-b6fe-4873- 99ec-55bb34c12e5f.com.redhat. rhevm.vdsm already removed
libvirtEventLoop::WARNING::2017-02-16 08:35:17,435::utils::140:: root::(rmFile) File: /var/lib/libvirt/qemu/ channels/ba806b93-b6fe-4873- 99ec-55bb34c12e5f.org.qemu. guest_agent.0 already removed
periodic/2::WARNING::2017-02-16 08:35:18,144::periodic::295:: virt.vm::(__call__) vmId=`ba806b93-b6fe-4873-99ec- 55bb34c12e5f`::could not run on ba806b93-b6fe-4873-99ec- 55bb34c12e5f: domain not connected
periodic/3::WARNING::2017-02-16 08:35:18,305::periodic::261:: virt.periodic.VmDispatcher::(_ _call__) could not run <class 'virt.periodic. DriveWatermarkMonitor'> on ['ba806b93-b6fe-4873-99ec- 55bb34c12e5f']
Thread-23021::ERROR::2017-02-16 09:28:33,096::task::866:: Storage.TaskManager.Task::(_ setError) Task=`ecab8086-261f-44b9-8123- eefb9bbf5b05`::Unexpected error
Thread-23021::ERROR::2017-02-16 09:28:33,097::dispatcher::76:: Storage.Dispatcher::(wrapper) {'status': {'message': "Storage domain is member of pool: 'domain=81f19871-4d91-4698- a97d-36452bfae281'", 'code': 900}}
Thread-23783::ERROR::2017-02-16 10:13:32,876::task::866:: Storage.TaskManager.Task::(_ setError) Task=`ff628204-6e41-4e5e-b83a- dad6ec94d0d3`::Unexpected error
Thread-23783::ERROR::2017-02-16 10:13:32,877::dispatcher::76:: Storage.Dispatcher::(wrapper) {'status': {'message': "Storage domain is member of pool: 'domain=81f19871-4d91-4698- a97d-36452bfae281'", 'code': 900}}
Thread-24542::ERROR::2017-02-16 10:58:32,578::task::866:: Storage.TaskManager.Task::(_ setError) Task=`f5111200-e980-46bb-bbc3- 898ae312d556`::Unexpected error
Thread-24542::ERROR::2017-02-16 10:58:32,579::dispatcher::76:: Storage.Dispatcher::(wrapper) {'status': {'message': "Storage domain is member of pool: 'domain=81f19871-4d91-4698- a97d-36452bfae281'", 'code': 900}}
jsonrpc.Executor/4::ERROR::2017-02-16 11:28:24,049::sdc::139:: Storage.StorageDomainCache::(_ findDomain) looking for unfetched domain 13127103-3f59-418a-90f1- 5b1ade8526b1
jsonrpc.Executor/4::ERROR::2017-02-16 11:28:24,049::sdc::156:: Storage.StorageDomainCache::(_ findUnfetchedDomain) looking for domain 13127103-3f59-418a-90f1- 5b1ade8526b1
jsonrpc.Executor/4::ERROR::2017-02-16 11:28:24,305::sdc::145:: Storage.StorageDomainCache::(_ findDomain) domain 13127103-3f59-418a-90f1- 5b1ade8526b1 not found
6e31bf97-458c-4a30-9df5-14f475db3339::ERROR::2017-02- 16 11:29:19,402::image::205:: Storage.Image::(getChain) There is no leaf in the image e17ebd7c-0763-42b2-b344- 5ad7f9cf448e
6e31bf97-458c-4a30-9df5-14f475db3339::ERROR::2017-02- 16 11:29:19,403::task::866:: Storage.TaskManager.Task::(_ setError) Task=`6e31bf97-458c-4a30-9df5- 14f475db3339`::Unexpected error
79ed31a2-5ac7-4304-ab4d-d05f72694860::ERROR::2017-02- 16 11:29:20,649::image::205:: Storage.Image::(getChain) There is no leaf in the image b4c4b53e-3813-4959-a145- 16f1dfcf1838
79ed31a2-5ac7-4304-ab4d-d05f72694860::ERROR::2017-02- 16 11:29:20,650::task::866:: Storage.TaskManager.Task::(_ setError) Task=`79ed31a2-5ac7-4304-ab4d- d05f72694860`::Unexpected error
jsonrpc.Executor/5::ERROR::2017-02-16 11:30:17,063::image::205:: Storage.Image::(getChain) There is no leaf in the image e17ebd7c-0763-42b2-b344- 5ad7f9cf448e
jsonrpc.Executor/5::ERROR::2017-02-16 11:30:17,064::task::866:: Storage.TaskManager.Task::(_ setError) Task=`62f20e22-e850-44c8-8943- faa4ce71e973`::Unexpected error
jsonrpc.Executor/5::ERROR::2017-02-16 11:30:17,065::dispatcher::76:: Storage.Dispatcher::(wrapper) {'status': {'message': "Image is not a legal chain: ('e17ebd7c-0763-42b2-b344- 5ad7f9cf448e',)", 'code': 262}}
jsonrpc.Executor/4::ERROR::2017-02-16 11:33:18,487::image::205:: Storage.Image::(getChain) There is no leaf in the image e17ebd7c-0763-42b2-b344- 5ad7f9cf448e
jsonrpc.Executor/4::ERROR::2017-02-16 11:33:18,488::task::866:: Storage.TaskManager.Task::(_ setError) Task=`e4d893f2-7be6-4f84-9ac6- 58b5a5d1364e`::Unexpected error
jsonrpc.Executor/4::ERROR::2017-02-16 11:33:18,489::dispatcher::76:: Storage.Dispatcher::(wrapper) {'status': {'message': "Image is not a legal chain: ('e17ebd7c-0763-42b2-b344- 5ad7f9cf448e',)", 'code': 262}}
3132106a-ce35-4b12-9a72-812e415eff7f::ERROR::2017-02- 16 11:34:47,595::image::205:: Storage.Image::(getChain) There is no leaf in the image e17ebd7c-0763-42b2-b344- 5ad7f9cf448e
3132106a-ce35-4b12-9a72-812e415eff7f::ERROR::2017-02- 16 11:34:47,596::task::866:: Storage.TaskManager.Task::(_ setError) Task=`3132106a-ce35-4b12-9a72- 812e415eff7f`::Unexpected error
112fb772-a497-4788-829f-190d6d008d95::ERROR::2017-02- 16 11:34:48,517::image::205:: Storage.Image::(getChain) There is no leaf in the image b4c4b53e-3813-4959-a145- 16f1dfcf1838
112fb772-a497-4788-829f-190d6d008d95::ERROR::2017-02- 16 11:34:48,517::task::866:: Storage.TaskManager.Task::(_ setError) Task=`112fb772-a497-4788-829f- 190d6d008d95`::Unexpected error
Thread-25336::ERROR::2017-02-16 11:43:32,726::task::866:: Storage.TaskManager.Task::(_ setError) Task=`fafb120e-e7c6-4d3e-b87a- 8116484f1c1a`::Unexpected error
Thread-25336::ERROR::2017-02-16 11:43:32,727::dispatcher::76:: Storage.Dispatcher::(wrapper) {'status': {'message': "Storage domain is member of pool: 'domain=81f19871-4d91-4698- a97d-36452bfae281'", 'code': 900}}
jsonrpc.Executor/0::WARNING::2017-02-16 11:54:05,875::momIF::113::MOM: :(getStatus) MOM not available.
jsonrpc.Executor/0::WARNING::2017-02-16 11:54:05,877::momIF::76::MOM:: (getKsmStats) MOM not available, KSM stats will be missing.
ioprocess communication (10025)::ERROR::2017-02-16 11:54:05,890::__init__::176::IOProcessClient::(_ communicate) IOProcess failure
ioprocess communication (10364)::ERROR::2017-02-16 11:54:05,892::__init__::176::IOProcessClient::(_ communicate) IOProcess failure
ioprocess communication (23403)::ERROR::2017-02-16 11:54:05,892::__init__::176::IOProcessClient::(_ communicate) IOProcess failure
ioprocess communication (31710)::ERROR::2017-02-16 11:54:05,999::__init__::176::IOProcessClient::(_ communicate) IOProcess failure
ioprocess communication (31717)::ERROR::2017-02-16 11:54:05,999::__init__::176::IOProcessClient::(_ communicate) IOProcess failure
ioprocess communication (31724)::ERROR::2017-02-16 11:54:06,000::__init__::176::IOProcessClient::(_ communicate) IOProcess failure
Thread-16::ERROR::2017-02-16 11:54:21,657::monitor::387::Storage.Monitor::(_ acquireHostId) Error acquiring host id 2 for domain 81f19871-4d91-4698-a97d- 36452bfae281
jsonrpc.Executor/7::ERROR::2017-02-16 11:54:21,885::API::1871::vds:: (_getHaInfo) failed to retrieve Hosted Engine HA info
jsonrpc.Executor/0::ERROR::2017-02-16 11:54:21,890::task::866:: Storage.TaskManager.Task::(_ setError) Task=`73ca0c58-3e86-47e8-80f2- 31d97346f0a3`::Unexpected error
jsonrpc.Executor/0::ERROR::2017-02-16 11:54:21,892::dispatcher::79:: Storage.Dispatcher::(wrapper) Secured object is not in safe state
Thread-16::ERROR::2017-02-16 11:54:31,673::monitor::387::Storage.Monitor::(_ acquireHostId) Error acquiring host id 2 for domain 81f19871-4d91-4698-a97d- 36452bfae281
jsonrpc.Executor/4::ERROR::2017-02-16 11:54:34,309::API::1871::vds:: (_getHaInfo) failed to retrieve Hosted Engine HA info
jsonrpc.Executor/2::ERROR::2017-02-16 11:57:30,796::API::1871::vds:: (_getHaInfo) failed to retrieve Hosted Engine HA info
jsonrpc.Executor/7::ERROR::2017-02-16 11:57:39,847::image::205:: Storage.Image::(getChain) There is no leaf in the image e17ebd7c-0763-42b2-b344- 5ad7f9cf448e
jsonrpc.Executor/7::ERROR::2017-02-16 11:57:39,848::task::866:: Storage.TaskManager.Task::(_ setError) Task=`e4ae2972-77d4-406a-ac71- b285953b76ae`::Unexpected error
jsonrpc.Executor/7::ERROR::2017-02-16 11:57:39,849::dispatcher::76:: Storage.Dispatcher::(wrapper) {'status': {'message': "Image is not a legal chain: ('e17ebd7c-0763-42b2-b344- 5ad7f9cf448e',)", 'code': 262}}
jsonrpc.Executor/0::ERROR::2017-02-16 11:57:45,965::API::1871::vds:: (_getHaInfo) failed to retrieve Hosted Engine HA info
jsonrpc.Executor/5::ERROR::2017-02-16 13:01:26,274::image::205:: Storage.Image::(getChain) There is no leaf in the image e17ebd7c-0763-42b2-b344- 5ad7f9cf448e
jsonrpc.Executor/5::ERROR::2017-02-16 13:01:26,275::task::866:: Storage.TaskManager.Task::(_ setError) Task=`2a214b3a-a50b-425a-ad99- bf5cc6be13ef`::Unexpected error
jsonrpc.Executor/5::ERROR::2017-02-16 13:01:26,276::dispatcher::76:: Storage.Dispatcher::(wrapper) {'status': {'message': "Image is not a legal chain: ('e17ebd7c-0763-42b2-b344- 5ad7f9cf448e',)", 'code': 262}}
periodic/3::WARNING::2017-02-16 13:13:52,268::periodic::261:: virt.periodic.VmDispatcher::(_ _call__) could not run <class 'virt.periodic. DriveWatermarkMonitor'> on ['ba806b93-b6fe-4873-99ec- 55bb34c12e5f']
periodic/2::WARNING::2017-02-16 13:50:15,062::periodic::261:: virt.periodic.VmDispatcher::(_ _call__) could not run <class 'virt.periodic. DriveWatermarkMonitor'> on ['ba806b93-b6fe-4873-99ec- 55bb34c12e5f']
periodic/1::WARNING::2017-02-16 13:51:15,085::periodic::261:: virt.periodic.VmDispatcher::(_ _call__) could not run <class 'virt.periodic. DriveWatermarkMonitor'> on ['ba806b93-b6fe-4873-99ec- 55bb34c12e5f']
periodic/3::WARNING::2017-02-16 13:51:45,081::periodic::261:: virt.periodic.VmDispatcher::(_ _call__) could not run <class 'virt.periodic. DriveWatermarkMonitor'> on ['ba806b93-b6fe-4873-99ec- 55bb34c12e5f']
periodic/0::WARNING::2017-02-16 15:21:45,347::periodic::261:: virt.periodic.VmDispatcher::(_ _call__) could not run <class 'virt.periodic. DriveWatermarkMonitor'> on ['ba806b93-b6fe-4873-99ec- 55bb34c12e5f']
periodic/0::WARNING::2017-02-16 16:21:00,522::periodic::261:: virt.periodic.VmDispatcher::(_ _call__) could not run <class 'virt.periodic. DriveWatermarkMonitor'> on ['ba806b93-b6fe-4873-99ec- 55bb34c12e5f']
periodic/3::WARNING::2017-02-16 17:49:00,858::periodic::261:: virt.periodic.VmDispatcher::(_ _call__) could not run <class 'virt.periodic. DriveWatermarkMonitor'> on ['ba806b93-b6fe-4873-99ec- 55bb34c12e5f']
periodic/3::WARNING::2017-02-16 17:50:00,868::periodic::261:: virt.periodic.VmDispatcher::(_ _call__) could not run <class 'virt.periodic. DriveWatermarkMonitor'> on ['ba806b93-b6fe-4873-99ec- 55bb34c12e5f']
periodic/0::WARNING::2017-02-16 17:51:30,899::periodic::261:: virt.periodic.VmDispatcher::(_ _call__) could not run <class 'virt.periodic. DriveWatermarkMonitor'> on ['ba806b93-b6fe-4873-99ec- 55bb34c12e5f']
periodic/0::WARNING::2017-02-16 17:52:30,907::periodic::261:: virt.periodic.VmDispatcher::(_ _call__) could not run <class 'virt.periodic. DriveWatermarkMonitor'> on ['ba806b93-b6fe-4873-99ec- 55bb34c12e5f']
On 02/20/2017 08:45 AM, Adam Litke wrote:
Hi Pat. I'd like to help you investigate this issue further. Could you send a snippet of the vdsm.log on slam-vmnode-03 that covers the time period during this failure? Engine is reporting that vdsm has likely thrown an exception while acquiring locks associated with the VM disk you are exporting.
On Thu, Feb 16, 2017 at 12:40 PM, Pat Riehecky <riehecky@fnal.gov> wrote:
Any attempts to export my VM error out. Last night the disk images got 'unregistered' from oVirt and I had to rescan the storage domain to find them again. Now I'm just trying to get a backup of the VM.
The snapshots off of the old disks are still listed, but I don't know if the lvm slices are still real or if that is even what is wrong.
steps I followed ->
Halt VM
Click Export
leave things unchecked and click OK
oVirt version:
ovirt-engine-4.0.3-1.el7.centos.noarch
ovirt-engine-backend-4.0.3-1.el7.centos.noarch
ovirt-engine-cli-3.6.9.2-1.el7.noarch
ovirt-engine-dashboard-1.0.3-1.el7.centos.noarch
ovirt-engine-dbscripts-4.0.3-1.el7.centos.noarch
ovirt-engine-dwh-4.0.2-1.el7.centos.noarch
ovirt-engine-dwh-setup-4.0.2-1.el7.centos.noarch
ovirt-engine-extension-aaa-jdbc-1.1.0-1.el7.noarch
ovirt-engine-extension-aaa-ldap-1.2.1-1.el7.noarch
ovirt-engine-extension-aaa-ldap-setup-1.2.1-1.el7.noarch
ovirt-engine-extensions-api-impl-4.0.3-1.el7.centos.noarch
ovirt-engine-lib-4.0.3-1.el7.centos.noarch
ovirt-engine-restapi-4.0.3-1.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7.noarch
ovirt-engine-setup-4.0.3-1.el7.centos.noarch
ovirt-engine-setup-base-4.0.3-1.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-4.0.3-1.el7.centos.no arch
ovirt-engine-setup-plugin-ovirt-engine-common-4.0.3-1.el7.ce ntos.noarch
ovirt-engine-setup-plugin-vmconsole-proxy-helper-4.0.3-1.el7 .centos.noarch
ovirt-engine-setup-plugin-websocket-proxy-4.0.3-1.el7.centos .noarch