Hi Nir
Yes, the metadata was corrupted but the VMs were running OK. This master storage domain
has increased its allocation significantly overnight and ran out the space limit and went
to offline completely. The cluster was online and VMs were running OK but the affected
Storage Domain went offline. I tired increase the storage domain but the Ovirt wasn’t
allowing to expend the storage.
Due to time constrain, I had restore the storage domain using Compellent snapshot.
However, we need to prevent this happening again when Master storage Domain fill-up with
the space. Currently, we have the following parameter set in the 5TB storage Domain.
ID: 0e1f2a5d-a548-476c-94bd-3ab3fe239926
Size: 5119 GiB
Available: 2361 GiB
Used: 2758 GiB
Allocated: 3104 GiB
Over Allocation Ratio: 14%
Images: 13
Warning Low Space Indicator: 10% (511 GiB)
Critical Space Action Blocker: 5 GiB
Please kindly advise what action needs to implement, so we can prevent this occurs again
in the future.
Thanks
Aminur Rahman
aminur.rahman@iongroup.com<mailto:aminur.rahman@iongroup.com>
t
+44 20 7398 0243<tel:+44%2020%207398%200243>
m
+44 7825 780697<tel:+44%207825%20780697%3c>
iongroup.com<https://www.iongroup.com>
From: Nir Soffer <nsoffer(a)redhat.com>
Sent: 10 June 2019 22:07
To: David Teigland <teigland(a)redhat.com>
Cc: Aminur Rahman <aminur.rahman(a)iongroup.com>; users <users(a)ovirt.org>
Subject: Re: [ovirt-users] Failed to activate Storage Domain --- ovirt 4.2
On Mon, Jun 10, 2019 at 11:22 PM David Teigland
<teigland@redhat.com<mailto:teigland@redhat.com>> wrote:
On Mon, Jun 10, 2019 at 10:59:43PM +0300, Nir Soffer wrote:
> [root@uk1-ion-ovm-18 pvscan
> /dev/mapper/36000d310056978000000000000000014: Checksum error at offset
> 4397954425856
> Couldn't read volume group metadata from
> /dev/mapper/36000d310056978000000000000000014.
> Metadata location on /dev/mapper/36000d310056978000000000000000014 at
> 4397954425856 has invalid summary for VG.
> Failed to read metadata summary from
> /dev/mapper/36000d310056978000000000000000014
> Failed to scan VG from /dev/mapper/36000d310056978000000000000000014
This looks like corrupted vg metadata.
Yes, the second metadata area, at the end of the device is corrupted; the
first metadata area is probably ok. That version of lvm is not able to
continue by just using the one good copy.
Can we copy the first metadata area into the second metadata area?
Last week I pushed out major changes to LVM upstream to be able to handle
and repair most of these cases. So, one option is to build lvm from the
upstream master branch, and check if that can read and repair this
metadata.
This sound pretty risky for production.
David, we keep 2 metadata copies on the first PV. Can we use one of
the
copies on the PV to restore the metadata to the least good state?
pvcreate with --restorefile and --uuid, and with the right backup metadata
What would be the right backup metadata?
could probably correct things, but experiment with some temporary PVs
first.
Aminur, can you copy and compress the metadata areas, and shared them somewhere?
To copy the first metadata area, use:
dd if=/dev/mapper/360014058ccaab4857eb40f393aaf0351 of=md1 bs=128M count=1 skip=4096
iflag=skip_bytes
To copy the second metadata area, you need to know the size of the PV. On my setup with
100G
PV, I have 800 extents (128M each), and this works:
dd if=/dev/mapper/360014058ccaab4857eb40f393aaf0351 of=md2 bs=128M count=1 skip=799
gzip md1 md2
Nir