Failed to activate Storage Domain --- ovirt 4.2

newer
Can't bring upgraded to 4.3 host...

aminur.rahman＠iongroup.com

7 Jun 2019 7 Jun '19

2 p.m.

Hi Has anyone experiencing the following issue with Storage Domain - Failed to activate Storage Domain cLUN-R940-DC2-dstore01 -- VDSM command ActivateStorageDomainVDS failed: Storage domain does not exist: (u'1b0ef853-fd71-45ea-8165-cc6047a267bc',) Currently, the storge Domain is Inactive and strangely, the VMs are running as normal. We can't manage or extend the volume size of this storage domain. The pvscan shows as: [root@uk1-ion-ovm-18 pvscan /dev/mapper/36000d310056978000000000000000014: Checksum error at offset 4397954425856 Couldn't read volume group metadata from /dev/mapper/36000d310056978000000000000000014. Metadata location on /dev/mapper/36000d310056978000000000000000014 at 4397954425856 has invalid summary for VG. Failed to read metadata summary from /dev/mapper/36000d310056978000000000000000014 Failed to scan VG from /dev/mapper/36000d310056978000000000000000014 I have tired the following steps: 1. Restarted ovirt-engine.service 2. tried to restore the metadata using vgcfgrestore but it failed with the following error: [root@uk1-ion-ovm-19 backup]# vgcfgrestore 36000d310056978000000000000000014 Volume group 36000d310056978000000000000000014 has active volume: . WARNING: Found 1 active volume(s) in volume group "36000d310056978000000000000000014". Restoring VG with active LVs, may cause mismatch with its metadata. Do you really want to proceed with restore of volume group "36000d310056978000000000000000014", while 1 volume(s) are active? [y/n]: y /dev/mapper/36000d310056978000000000000000014: Checksum error at offset 4397954425856 Couldn't read volume group metadata from /dev/mapper/36000d310056978000000000000000014. Metadata location on /dev/mapper/36000d310056978000000000000000014 at 4397954425856 has invalid summary for VG. Failed to read metadata summary from /dev/mapper/36000d310056978000000000000000014 Failed to scan VG from /dev/mapper/36000d310056978000000000000000014 /etc/lvm/backup/36000d310056978000000000000000014: stat failed: No such file or directory Couldn't read volume group metadata from file. Failed to read VG 36000d310056978000000000000000014 from /etc/lvm/backup/36000d310056978000000000000000014 Restore failed. Please let me know if anyone knows any possible resolution. -AMinur

Show replies by date

aminur.rahman＠iongroup.com

7 Jun 7 Jun

2:16 p.m.

The Storage Domain was the master domain and it ran out space.

Eyal Shenitzky

10 Jun 10 Jun

6:19 a.m.

Hi Aminur, Can you please send the engine and vdsm versions? On Fri, Jun 7, 2019 at 5:03 PM <aminur.rahman@iongroup.com> wrote:

...

Hi Has anyone experiencing the following issue with Storage Domain -

Failed to activate Storage Domain cLUN-R940-DC2-dstore01 -- VDSM command ActivateStorageDomainVDS failed: Storage domain does not exist: (u'1b0ef853-fd71-45ea-8165-cc6047a267bc',)

Currently, the storge Domain is Inactive and strangely, the VMs are running as normal. We can't manage or extend the volume size of this storage domain. The pvscan shows as: [root@uk1-ion-ovm-18 pvscan /dev/mapper/36000d310056978000000000000000014: Checksum error at offset 4397954425856 Couldn't read volume group metadata from /dev/mapper/36000d310056978000000000000000014. Metadata location on /dev/mapper/36000d310056978000000000000000014 at 4397954425856 has invalid summary for VG. Failed to read metadata summary from /dev/mapper/36000d310056978000000000000000014 Failed to scan VG from /dev/mapper/36000d310056978000000000000000014

I have tired the following steps: 1. Restarted ovirt-engine.service 2. tried to restore the metadata using vgcfgrestore but it failed with the following error:

[root@uk1-ion-ovm-19 backup]# vgcfgrestore 36000d310056978000000000000000014 Volume group 36000d310056978000000000000000014 has active volume: . WARNING: Found 1 active volume(s) in volume group "36000d310056978000000000000000014". Restoring VG with active LVs, may cause mismatch with its metadata. Do you really want to proceed with restore of volume group "36000d310056978000000000000000014", while 1 volume(s) are active? [y/n]: y /dev/mapper/36000d310056978000000000000000014: Checksum error at offset 4397954425856 Couldn't read volume group metadata from /dev/mapper/36000d310056978000000000000000014. Metadata location on /dev/mapper/36000d310056978000000000000000014 at 4397954425856 has invalid summary for VG. Failed to read metadata summary from /dev/mapper/36000d310056978000000000000000014 Failed to scan VG from /dev/mapper/36000d310056978000000000000000014 /etc/lvm/backup/36000d310056978000000000000000014: stat failed: No such file or directory Couldn't read volume group metadata from file. Failed to read VG 36000d310056978000000000000000014 from /etc/lvm/backup/36000d310056978000000000000000014 Restore failed.

Please let me know if anyone knows any possible resolution.

-AMinur _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/W2JP7ZO5XMV66A...

-- Regards, Eyal Shenitzky

Nir Soffer

7:59 p.m.

On Fri, Jun 7, 2019 at 5:03 PM <aminur.rahman@iongroup.com> wrote:

...

Hi Has anyone experiencing the following issue with Storage Domain -

Failed to activate Storage Domain cLUN-R940-DC2-dstore01 -- VDSM command ActivateStorageDomainVDS failed: Storage domain does not exist: (u'1b0ef853-fd71-45ea-8165-cc6047a267bc',)

Currently, the storge Domain is Inactive and strangely, the VMs are running as normal. We can't manage or extend the volume size of this storage domain. The pvscan shows as: [root@uk1-ion-ovm-18 pvscan /dev/mapper/36000d310056978000000000000000014: Checksum error at offset 4397954425856 Couldn't read volume group metadata from /dev/mapper/36000d310056978000000000000000014. Metadata location on /dev/mapper/36000d310056978000000000000000014 at 4397954425856 has invalid summary for VG. Failed to read metadata summary from /dev/mapper/36000d310056978000000000000000014 Failed to scan VG from /dev/mapper/36000d310056978000000000000000014

This looks like corrupted vg metadata.

...

I have tired the following steps: 1. Restarted ovirt-engine.service 2. tried to restore the metadata using vgcfgrestore but it failed with the following error:

[root@uk1-ion-ovm-19 backup]# vgcfgrestore 36000d310056978000000000000000014 Volume group 36000d310056978000000000000000014 has active volume: . WARNING: Found 1 active volume(s) in volume group "36000d310056978000000000000000014". Restoring VG with active LVs, may cause mismatch with its metadata. Do you really want to proceed with restore of volume group "36000d310056978000000000000000014", while 1 volume(s) are active? [y/n]: y

This is not safe, you cannot fix the VG while it is being used by oVirt. You need to migrate the running VMs to other storage, or shut down the VMs. Then deactivate this storage domain. Only then you can try to restore the VG. /dev/mapper/36000d310056978000000000000000014: Checksum error at offset

...

4397954425856 Couldn't read volume group metadata from /dev/mapper/36000d310056978000000000000000014. Metadata location on /dev/mapper/36000d310056978000000000000000014 at 4397954425856 has invalid summary for VG. Failed to read metadata summary from /dev/mapper/36000d310056978000000000000000014 Failed to scan VG from /dev/mapper/36000d310056978000000000000000014 /etc/lvm/backup/36000d310056978000000000000000014: stat failed: No such file or directory

Looks like you don't have a backup in this host. You may have the most recent backup on another host.

...

Couldn't read volume group metadata from file. Failed to read VG 36000d310056978000000000000000014 from /etc/lvm/backup/36000d310056978000000000000000014 Restore failed.

Please let me know if anyone knows any possible resolution.

David, we keep 2 metadata copies on the first PV. Can we use one of the copies on the PV to restore the metadata to the least good state? David, how do you suggest to proceed? Nir

David Teigland

8:22 p.m.

On Mon, Jun 10, 2019 at 10:59:43PM +0300, Nir Soffer wrote:

...

...
[root@uk1-ion-ovm-18 pvscan /dev/mapper/36000d310056978000000000000000014: Checksum error at offset 4397954425856 Couldn't read volume group metadata from /dev/mapper/36000d310056978000000000000000014. Metadata location on /dev/mapper/36000d310056978000000000000000014 at 4397954425856 has invalid summary for VG. Failed to read metadata summary from /dev/mapper/36000d310056978000000000000000014 Failed to scan VG from /dev/mapper/36000d310056978000000000000000014

This looks like corrupted vg metadata.

Yes, the second metadata area, at the end of the device is corrupted; the first metadata area is probably ok. That version of lvm is not able to continue by just using the one good copy. Last week I pushed out major changes to LVM upstream to be able to handle and repair most of these cases. So, one option is to build lvm from the upstream master branch, and check if that can read and repair this metadata.

...

David, we keep 2 metadata copies on the first PV. Can we use one of the copies on the PV to restore the metadata to the least good state?

pvcreate with --restorefile and --uuid, and with the right backup metadata could probably correct things, but experiment with some temporary PVs first.

Nir Soffer

9:06 p.m.

On Mon, Jun 10, 2019 at 11:22 PM David Teigland <teigland@redhat.com> wrote:

...

On Mon, Jun 10, 2019 at 10:59:43PM +0300, Nir Soffer wrote:

...
...
[root@uk1-ion-ovm-18 pvscan /dev/mapper/36000d310056978000000000000000014: Checksum error at offset 4397954425856 Couldn't read volume group metadata from /dev/mapper/36000d310056978000000000000000014. Metadata location on /dev/mapper/36000d310056978000000000000000014 at 4397954425856 has invalid summary for VG. Failed to read metadata summary from /dev/mapper/36000d310056978000000000000000014 Failed to scan VG from /dev/mapper/36000d310056978000000000000000014

This looks like corrupted vg metadata.

Yes, the second metadata area, at the end of the device is corrupted; the first metadata area is probably ok. That version of lvm is not able to continue by just using the one good copy.

Can we copy the first metadata area into the second metadata area? Last week I pushed out major changes to LVM upstream to be able to handle

...

and repair most of these cases. So, one option is to build lvm from the upstream master branch, and check if that can read and repair this metadata.

This sound pretty risky for production.

...

David, we keep 2 metadata copies on the first PV. Can we use one of the

...
copies on the PV to restore the metadata to the least good state?

pvcreate with --restorefile and --uuid, and with the right backup metadata

What would be the right backup metadata?

...

could probably correct things, but experiment with some temporary PVs first.

Aminur, can you copy and compress the metadata areas, and shared them somewhere? To copy the first metadata area, use: dd if=/dev/mapper/360014058ccaab4857eb40f393aaf0351 of=md1 bs=128M count=1 skip=4096 iflag=skip_bytes To copy the second metadata area, you need to know the size of the PV. On my setup with 100G PV, I have 800 extents (128M each), and this works: dd if=/dev/mapper/360014058ccaab4857eb40f393aaf0351 of=md2 bs=128M count=1 skip=799 gzip md1 md2 Nir

Aminur Rahman

11 Jun 11 Jun

8:44 a.m.

Hi Nir Yes, the metadata was corrupted but the VMs were running OK. This master storage domain has increased its allocation significantly overnight and ran out the space limit and went to offline completely. The cluster was online and VMs were running OK but the affected Storage Domain went offline. I tired increase the storage domain but the Ovirt wasn’t allowing to expend the storage. Due to time constrain, I had restore the storage domain using Compellent snapshot. However, we need to prevent this happening again when Master storage Domain fill-up with the space. Currently, we have the following parameter set in the 5TB storage Domain. ID: 0e1f2a5d-a548-476c-94bd-3ab3fe239926 Size: 5119 GiB Available: 2361 GiB Used: 2758 GiB Allocated: 3104 GiB Over Allocation Ratio: 14% Images: 13 Warning Low Space Indicator: 10% (511 GiB) Critical Space Action Blocker: 5 GiB Please kindly advise what action needs to implement, so we can prevent this occurs again in the future. Thanks Aminur Rahman aminur.rahman@iongroup.com<mailto:aminur.rahman@iongroup.com> t +44 20 7398 0243<tel:+44%2020%207398%200243> m +44 7825 780697<tel:+44%207825%20780697%3c> iongroup.com<https://www.iongroup.com> From: Nir Soffer <nsoffer@redhat.com> Sent: 10 June 2019 22:07 To: David Teigland <teigland@redhat.com> Cc: Aminur Rahman <aminur.rahman@iongroup.com>; users <users@ovirt.org> Subject: Re: [ovirt-users] Failed to activate Storage Domain --- ovirt 4.2 On Mon, Jun 10, 2019 at 11:22 PM David Teigland <teigland@redhat.com<mailto:teigland@redhat.com>> wrote: On Mon, Jun 10, 2019 at 10:59:43PM +0300, Nir Soffer wrote:

...

...
[root@uk1-ion-ovm-18 pvscan /dev/mapper/36000d310056978000000000000000014: Checksum error at offset 4397954425856 Couldn't read volume group metadata from /dev/mapper/36000d310056978000000000000000014. Metadata location on /dev/mapper/36000d310056978000000000000000014 at 4397954425856 has invalid summary for VG. Failed to read metadata summary from /dev/mapper/36000d310056978000000000000000014 Failed to scan VG from /dev/mapper/36000d310056978000000000000000014

This looks like corrupted vg metadata.

Yes, the second metadata area, at the end of the device is corrupted; the first metadata area is probably ok. That version of lvm is not able to continue by just using the one good copy. Can we copy the first metadata area into the second metadata area? Last week I pushed out major changes to LVM upstream to be able to handle and repair most of these cases. So, one option is to build lvm from the upstream master branch, and check if that can read and repair this metadata. This sound pretty risky for production.

...

David, we keep 2 metadata copies on the first PV. Can we use one of the copies on the PV to restore the metadata to the least good state?

pvcreate with --restorefile and --uuid, and with the right backup metadata What would be the right backup metadata? could probably correct things, but experiment with some temporary PVs first. Aminur, can you copy and compress the metadata areas, and shared them somewhere? To copy the first metadata area, use: dd if=/dev/mapper/360014058ccaab4857eb40f393aaf0351 of=md1 bs=128M count=1 skip=4096 iflag=skip_bytes To copy the second metadata area, you need to know the size of the PV. On my setup with 100G PV, I have 800 extents (128M each), and this works: dd if=/dev/mapper/360014058ccaab4857eb40f393aaf0351 of=md2 bs=128M count=1 skip=799 gzip md1 md2 Nir

2388

Age (days ago)

2392

Last active (days ago)

List overview

Download

6 comments

5 participants

participants (5)

Aminur Rahman
aminur.rahman＠iongroup.com
David Teigland
Eyal Shenitzky
Nir Soffer