Hi Adam,
First of all, thanks to all of you for your time and help.
I'll try to explain once again for those who don't know the complete story.
I had a crash disaster Wednesday 11th of march, I don't yet the cause, the facts are
that after "maybe" an electrical overload, the oVirt manager, which is for the
moment not a physical server, but a VM became down, but not his host.
It wouldn't be a so critical, if we didn't lose the access to all the VMs of the
Data Center, my two Hypervisors hosts was yet up, so I've suspected that something was
broken with vdsm link.
I've rebooted the manager, but all the Data Center was unresponsive, and no way to
have the DC coming back, I've also rebooted the hosts and the SAN bay without any
changes.
I had a backup done by my new AcronisBackupAdvanced solution, but unlikely I also had an
(not known according Acronis) issue to restore the VM, the Acronis'Team was on the
subject without any solution (from March 12th) until I found a workaround last Friday
March 27 and reach to restore the Manager from March 10th.
So during these long waiting days, I've tried some solutions to attempt to recover my
Data Center, maybe I've done wrong things...
I've first tried to Update oVirt from 3.5.0 to 3.5.1.1 without any success, so
I've decided to build a new manager from scratch using a clone of my oVirt Manager to
see if I could use "Import Domain" option, I've tried on two Storage Domain
Volumes : VOL-UNC-NAS-01 and VOL-UNC-PROD-02.
The result was that the import worked, but no VMs seen after the import.
I was already in touch with Maor Lipchuk to help me, but I couldn't retrieve any
VMs.
We are now Friday 27th, I found the workaround to restore my Acronis Backup, and after
some work and reinstallation of the hosts, I could finally find again my Data Center, the
Storage Domains UP (All at this time), all the VMs.
When I've tried to start VMs, only 4 of them have gone UP, in fact only those
contained on the volume : VOL-UNC-PROD-01, that I didn't try with the "Import
Domain" option. I deduced that the problem came from the two SD VOL-UNC-NAS-01 and
VOL-UNC-PROD-02.
I've decided to put them in maintenance to activate or detach them, but from this
moment, the two volumes stay in Maintenance mode without any other way to change their
state.
We are at this point now, I hope that you'll be able to find a solution.
Anyway, the mystery is always present, why the Data Center has gone down, I can understand
that the Manager had a problem, but why all the VMs has gone at the same time without a
way to recover them ???
Now, the most important thing is to recover my Data Center, but It will very important to
find the cause of the disaster, It could compromise my project of my Private Cloud in my
enterprise.
I hope that I were enough clear and comprehensive to all of you guys, but don't
hesitate to contact me if you have any questions.
Thanks a lot again for your help
Alain VONDRA
Chargé d'exploitation des Systèmes d'Information
Direction Administrative et Financière
+33 1 44 39 77 76
UNICEF France
3 rue Duguay Trouin 75006 PARIS
www.unicef.fr
-----Message d'origine-----
De : Adam Litke [mailto:alitke@redhat.com]
Envoyé : mercredi 1 avril 2015 17:06
À : VONDRA Alain
Cc : Elad Ben Aharon; users(a)ovirt.org; Federico Simoncelli; Maor Lipchuk
Objet : Re: [ovirt-users] Storage domain not in pool issue
On 31/03/15 08:43 +0000, VONDRA Alain wrote:
Hi,
Here is the logs.
Thanks
Federico, Maor: tldr; Can you offer some advice for recovering this block SD after a DC
disaster?
Hi Alain,
After looking at your logs, it's clear that the metadata on the storage domain itself
says that the domain is attached to pool
c58a44b1-1c98-450e-97e1-3347eeb28f86 while engine thinks the domain is attached to pool
f422de63-8869-41ef-a782-8b0c9ee03c41.
Can you please explain the process you used to recover from your datacenter disaster? My
guess is you:
1. Reinstalled the engine host with a blank oVirt DB
2. Created a new data center
3. Created a new master domain
4. Attached some storage domains which were not attached at the time
of your previous disaster
5. Tried to attach sd:d7b9d7cc-f7d6-43c7-ae13-e720951657c9 which was
attached to your old storage pool at the time of the disaster.
#5 failed because the metadata on the storage shows the old storage pool. At this point I
see two possible options to recover your storage. PLEASE DO NOT DO ANYTHING YET (until we
confirm what the best approach for recovery will be).
Option 1: Use the new import storage domain feature to import this domain into your new
datacenter.
Option 2: Modify the storage domain metadata to remove the reference to the old storage
pool.
I am adding some other oVirt storage experts to the thread in order to offer you the best
advice. Federico, Maor: can you offer some expert advice on this matter?
I did notice this wiki page which talks about clearing the storage pool metadata from an
export domain. Since this SD is iSCSI, it will be a bit more difficult to manually edit
the md but I'd guess someone has a script or some instructions on how to do it.
--
Adam Litke