[ovirt-users] [SOLVED] Working but unstable storage domains

Paul Heinlein heinlein at madboa.com
Wed Jun 18 17:16:13 UTC 2014


On Tue, 10 Jun 2014, Paul Heinlein wrote:

> I'm running oVirt Engine 3.2.0-2.fc18 (which I know is out of date) 
> on a dedicated physical host; we have 12 hosts split between two 
> clusters and nine storage domains, all NFS.
>
> Late last week, a VM that in the scope of our clusters consumes a 
> lot of resources failed in migration. Since then the storage domains 
> have from the engine's point of view been going up and down (though 
> the underlying NFS exports are fine). Key symptoms from the oVirt 
> Manager:
>
>  * two of the storage domains are always marked as having type of
>    "Data (Master)" when historically only one was;
>
>  * the Manager reports "Storage Pool Manager runs on $host" then
>    "Sync Error on Master Domain..." then "Reconstruct Master Domain
>    ...completed" then "Data Center is being initialized" over and
>    over and over again.
>
> The Sync Error messages indicate "$pool is marked as Mater in oVirt 
> Engine database but not on the Storage side. Please consult with 
> Support on how to fix this issue." Note that $pool changes between 
> the various domains that get marked as Data (Master).

For the record, here's what I ended up doing to solve the issue. It 
took quite a while to identify the moving parts, but the actual fix 
wasn't terribly hard.

The secondary goal was to keep all VMs running during the process, 
since we had some ongoing benchmarking we didn't want to interrupt.

I'll say ahead of time that I didn't like mucking around in postgres, 
but all my attempts to fix the issue without doing so came to naught.

1. Disable power management on all Hosts: necessary because Hosts
    would be rebooted if the Engine couldn't contact vdsmd after
    a relatively short period of time.

2. Shutdown ovirt-engine.

3. Shutdown vdsmd on all Hosts.

4. Query postresql about current master version:

SELECT master_domain_version FROM storage_pool;

5. Determine which storage pool has metadata file corresponding to
    that master version number; it will become the new master.

find /rhev/... -type f -name metadata -ls -exec grep ^MASTER {} \;

6. Back up all dom_md directories (with metadata files) and master 
directories (with *.ovf files).

7. Run database query to set pool with highest master version number
    to be Data (master).

-- make all Data (master) domains into regular Data domains
UPDATE storage_domain_static
SET storage_domain_type = 1
WHERE storage_domain_type = 0;

-- promote one Data domain to be a Data (master)
UPDATE storage_domain_static
SET
   storage_domain_type = 1,
   _update_date = LOCALTIMESTAMP,
   last_time_used_as_master = extract(epoch from now())::bigint
WHERE
   storage_name = '$pool';

8. Edit metadata files in non-master pools to read "ROLE=Regular"
    and "MASTER_VERSION=0"; also remove all _SHA_CKSUM entries.

9. Restart Engine.

10. Restart vdsmd on high-priority SPM Host.

11. Watch that the storage pools come up and stay cleanly.

12. Restart vdsmd on all other Hosts.

13. Enjoy adult beverage.

-- 
Paul Heinlein
heinlein at madboa.com
45°38' N, 122°6' W


More information about the Users mailing list