<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Feb 22, 2017 at 9:32 AM, Nir Soffer <span dir="ltr"><<a href="mailto:nsoffer@redhat.com" target="_blank">nsoffer@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Wed, Feb 22, 2017 at 10:31 AM, Nir Soffer <<a href="mailto:nsoffer@redhat.com">nsoffer@redhat.com</a>> wrote:<br><br>
><br>
> This means that sanlock could not initialize a lease in the new volume created<br>
> for the snapshot.<br>
><br>
> Can you attach sanlock.log?<br>
<br>
</div></div>Found it in your next message<br>
<div class="HOEnZb"><div class="h5"><br></div></div></blockquote><div><br></div><div>OK.</div><div>Just to recap what happened from a physical point of view:</div><div><br></div><div>- apparently I had an array of disks with no more spare disks and on this array was the LUN composing the disk storage domain.</div><div>So I was in involved in moving disks of the impacted storage domain and then removal of storage domain itself, so that we can remove the logical array on storage</div><div>This is a test storage system without support so at the moment I had no more spare disks on it</div><div><br></div><div>- actually there was another disk problem with the array, generating loss of data because of no more spare available at that time</div><div><br></div><div>- No evidence of error at VM OS level and at storage domain level</div><div><br></div><div>- But probably the 2 operations:</div><div>1) move disk</div><div>2) create snapshot of the VM containing the disk</div><div>could not complete due to this low level problem</div><div><br></div><div>It should be nice to find an evidence to this. Storage domain didn't go offline BTW</div><div><br></div><div>- I got confirmation of the loss of data this way:</div><div>The original disk of the VM, inside the VM, was a PV of a VG</div><div>I added a disk (on another storage domain) to the VM, made it a PV and added to the original VG</div><div>Tried pvmove from source disk to new disk, but it reached about 47% and then stopped/failed, pausing the VM.</div><div>I could start again the VM but as soon as the pvmove continued, the VM came back to paused state.</div><div>So I powered off the VM and was able to detach/delete the corrupted disk and then remove the storage domain (see other thread opened yesterday)</div><div><br></div><div>I then managed to recover the now corrupted VG and restore from backup the data contained in original fs.</div><div><br></div><div>So the original problem was low level error of storage.</div><div>If can be of help to narrow down oVirt behavior in this case scenario I can provide further logs from VM OS or from hosts/engine.</div><div>Let me know.</div><div><br></div><div>Some questions:</div><div>- how is it managed the reaction of putting VM in paused mode due to I/O error as in this case? Can I in some way manage to keep VM on a ndlet it generate errors as in real physical server or not? </div><div>- Why I didn't get any message at storage domain level but only at VM disk level?</div><div><br></div><div>Thanks for the given help</div><div>Gianluca</div><div><br></div><div> </div></div></div></div>