[ovirt-users] Storage domain not in pool issue

VONDRA Alain AVONDRA at unicef.fr
Wed Apr 1 15:59:30 UTC 2015


Hi Adam,
First of all, thanks to all of you for your time and help.
I'll try to explain once again for those who don't know the complete story.
I had a crash disaster Wednesday 11th of march, I don't yet the cause, the facts are that after "maybe" an electrical overload, the oVirt manager, which is for the moment not a physical server, but a VM became down, but not his host.
It wouldn't be a so critical, if we didn't lose the access to all the VMs of the Data Center, my two Hypervisors hosts was yet up, so I've suspected that something was broken with vdsm link.
I've rebooted the manager, but all the Data Center was unresponsive, and no way to have the DC coming back, I've also rebooted the hosts and the SAN bay without any changes.

I had a backup done by my new AcronisBackupAdvanced solution, but unlikely I also had an (not known according Acronis) issue to restore the VM, the Acronis'Team was on the subject without any solution (from March 12th) until I found a workaround last Friday March 27 and reach to restore the Manager from March 10th.

So during these long waiting days, I've tried some solutions to attempt to recover my Data Center, maybe I've done wrong things...
I've first tried to Update oVirt from 3.5.0 to 3.5.1.1 without any success, so I've decided to build a new manager from scratch using a clone of my oVirt Manager to see if I could use "Import Domain" option, I've tried on two Storage Domain Volumes : VOL-UNC-NAS-01 and VOL-UNC-PROD-02.
The result was that the import worked, but no VMs seen after the import.
I was already in touch with Maor Lipchuk to help me, but I couldn't retrieve any VMs.

We are now Friday 27th, I found the workaround to restore my Acronis Backup, and after some work and reinstallation of the hosts, I could finally find again my Data Center, the Storage Domains UP (All at this time), all the VMs.
When I've tried to start VMs, only 4 of them have gone UP, in fact only those contained on the volume : VOL-UNC-PROD-01, that I didn't try with the "Import Domain" option. I deduced that the problem came from the two SD  VOL-UNC-NAS-01 and VOL-UNC-PROD-02.
I've decided to put them in maintenance to activate or detach them, but from this moment, the two volumes stay in Maintenance mode without any other way to change their state.

We are at this point now, I hope that you'll be able to find a solution.
Anyway, the mystery is always present, why the Data Center has gone down, I can understand that the Manager had a problem, but why all the VMs has gone at the same time without a way to recover them ???
Now, the most important thing is to recover my Data Center, but It will very important to find the cause of the disaster, It could compromise my project of my Private Cloud in my enterprise.

I hope that I were enough clear and comprehensive to all of you guys, but don't hesitate to contact me if you have any questions.

Thanks a lot again for your help






Alain VONDRA
Chargé d'exploitation des Systèmes d'Information
Direction Administrative et Financière
+33 1 44 39 77 76
UNICEF France
3 rue Duguay Trouin  75006 PARIS
www.unicef.fr




-----Message d'origine-----
De : Adam Litke [mailto:alitke at redhat.com]
Envoyé : mercredi 1 avril 2015 17:06
À : VONDRA Alain
Cc : Elad Ben Aharon; users at ovirt.org; Federico Simoncelli; Maor Lipchuk
Objet : Re: [ovirt-users] Storage domain not in pool issue

On 31/03/15 08:43 +0000, VONDRA Alain wrote:
>Hi,
>Here is the logs.
>Thanks

Federico, Maor: tldr; Can you offer some advice for recovering this block SD after a DC disaster?

Hi Alain,

After looking at your logs, it's clear that the metadata on the storage domain itself says that the domain is attached to pool
c58a44b1-1c98-450e-97e1-3347eeb28f86 while engine thinks the domain is attached to pool f422de63-8869-41ef-a782-8b0c9ee03c41.

Can you please explain the process you used to recover from your datacenter disaster?  My guess is you:
  1. Reinstalled the engine host with a blank oVirt DB
  2. Created a new data center
  3. Created a new master domain
  4. Attached some storage domains which were not attached at the time
     of your previous disaster
  5. Tried to attach sd:d7b9d7cc-f7d6-43c7-ae13-e720951657c9 which was
     attached to your old storage pool at the time of the disaster.

#5 failed because the metadata on the storage shows the old storage pool.  At this point I see two possible options to recover your storage.  PLEASE DO NOT DO ANYTHING YET (until we confirm what the best approach for recovery will be).

Option 1: Use the new import storage domain feature to import this domain into your new datacenter.

Option 2: Modify the storage domain metadata to remove the reference to the old storage pool.

I am adding some other oVirt storage experts to the thread in order to offer you the best advice.  Federico, Maor: can you offer some expert advice on this matter?

I did notice this wiki page which talks about clearing the storage pool metadata from an export domain.  Since this SD is iSCSI, it will be a bit more difficult to manually edit the md but I'd guess someone has a script or some instructions on how to do it.

--
Adam Litke


More information about the Users mailing list