[ovirt-users] Datacenter unresponsive: recovery procedure?

Andrea Ghelardi a.ghelardi at iontrading.com
Thu Apr 16 09:12:36 UTC 2015


(sorry: resending as I wasn’t part of the list, yet)



hi,

this is my first post so hallo all and thank you for reading.

I have an issue with my production Ovirt environment (3.5.1.1-1.el6).



My system consists of several datancers.

2 of them are connected to an iSCSI SAN and they were working fine.

Until the moment I had the bad idea of deleting a SAN volume from the SAN
manager before deleting the associated storage on Ovirt. From that moment,
the DC where this storage was mounted became not responsive: it cannot
attach the master storage (or any other).

I tried to

1) manually destroy the offending storage (select -> destroy) but still
cannot recover the situation.

2) right click on master storage and activate it

3) re-initialize the datacenter using a NFS storage from the working sister
DC.



All Hosts are still running even though their status is "unknown".

All VM are still running even though their status is "not responding".



I half resolved the issue by manually restarting the host where the
datastore was originally mounted. This cleared the orphaned multipath.

However, the SPM does not come up still.

This is an extract of the log

*2015-04-16 03:51:48,069 WARN
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand]
(DefaultQuartzScheduler_Worker-14) [61a44b19] could not stop spm of pool
00000002-0002-0002-0002-00000000009c on vds
89254f23-8748-402a-afc9-08438dca0975 - reason:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: Message timeout which can be
caused by communication issues*

*2015-04-16 03:51:48,072 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand]
(DefaultQuartzScheduler_Worker-14) [61a44b19] FINISH, SpmStopVDSCommand,
log id: 4354cf46*

*2015-04-16 03:51:48,072 WARN
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
(DefaultQuartzScheduler_Worker-14) [61a44b19] spm stop on spm failed,
stopping spm selection!*

*2015-04-16 03:51:58,223 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
(DefaultQuartzScheduler_Worker-4) [4ca2d938] hostFromVds::selectedVds -
Brachetto, spmStatus Free, storage pool IRDC-INTEL*

*2015-04-16 03:51:58,225 ERROR
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
(DefaultQuartzScheduler_Worker-4) [4ca2d938] SPM Init: could not find
reported vds or not up - pool:IRDC-INTEL vds_spm_id: 3*

*2015-04-16 03:51:58,239 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxyData]
(DefaultQuartzScheduler_Worker-4) [4ca2d938] SPM selection - vds seems as
spm sovana*

*2015-04-16 03:51:58,252 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand]
(DefaultQuartzScheduler_Worker-4) [4ca2d938] START,
SpmStopVDSCommand(HostName = sovana, HostId =
89254f23-8748-402a-afc9-08438dca0975, storagePoolId =
00000002-0002-0002-0002-00000000009c), log id: 63a17687*



storagePoolId = 00000002-0002-0002-0002-00000000009c is (was) hertz-dstore2
which does not exists anymore on SAN adn ovirt

hostid  89254f23-8748-402a-afc9-08438dca0975 is sovana server (current SPM)









I’m thinking about

*Put the hosted engine host into Maintenance*

*Shutdown Ovirt Manager*

*Rebooted SPM server*

*Restarted Ovirt Manager*

*Took hosted engine host out of Maintenance*





any help or clue is highly welcomed with cheers and beers

thank you!

Andrea
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20150416/ed80c274/attachment-0001.html>


More information about the Users mailing list