On Wed, Feb 22, 2017 at 9:32 AM, Nir Soffer <nsoffer(a)redhat.com> wrote:
On Wed, Feb 22, 2017 at 10:31 AM, Nir Soffer
<nsoffer(a)redhat.com> wrote:
>
> This means that sanlock could not initialize a lease in the new volume
created
> for the snapshot.
>
> Can you attach sanlock.log?
Found it in your next message
OK.
Just to recap what happened from a physical point of view:
- apparently I had an array of disks with no more spare disks and on this
array was the LUN composing the disk storage domain.
So I was in involved in moving disks of the impacted storage domain and
then removal of storage domain itself, so that we can remove the logical
array on storage
This is a test storage system without support so at the moment I had no
more spare disks on it
- actually there was another disk problem with the array, generating loss
of data because of no more spare available at that time
- No evidence of error at VM OS level and at storage domain level
- But probably the 2 operations:
1) move disk
2) create snapshot of the VM containing the disk
could not complete due to this low level problem
It should be nice to find an evidence to this. Storage domain didn't go
offline BTW
- I got confirmation of the loss of data this way:
The original disk of the VM, inside the VM, was a PV of a VG
I added a disk (on another storage domain) to the VM, made it a PV and
added to the original VG
Tried pvmove from source disk to new disk, but it reached about 47% and
then stopped/failed, pausing the VM.
I could start again the VM but as soon as the pvmove continued, the VM came
back to paused state.
So I powered off the VM and was able to detach/delete the corrupted disk
and then remove the storage domain (see other thread opened yesterday)
I then managed to recover the now corrupted VG and restore from backup the
data contained in original fs.
So the original problem was low level error of storage.
If can be of help to narrow down oVirt behavior in this case scenario I can
provide further logs from VM OS or from hosts/engine.
Let me know.
Some questions:
- how is it managed the reaction of putting VM in paused mode due to I/O
error as in this case? Can I in some way manage to keep VM on a ndlet it
generate errors as in real physical server or not?
- Why I didn't get any message at storage domain level but only at VM disk
level?
Thanks for the given help
Gianluca