=================================
=new findings/ it's working now =
=================================
I cloned the faulty system in order to play with it, and the cloned vm *boots
with no problem at all*. so there's clearly an issue with moving between
nfs to iscsi SD's with a snapshot. I have both VM's now and if anyone is
interested I can keep going on troubleshooting.
if not, I would like to give a very special ***THANK YOU*** to the whole
REDHAT+OVIRT team for their daily outstanding work and their excellent
products.
thanks for your time!
JP
qemu-img info --backing-chain
/rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c4405e8379/images/36678881-9686-48b5-b39c-16fafece5c5a/1bbf1375-7469-426a-b68d-adbc3446d51e
image:
/rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c4405e8379/images/36678881-9686-48b5-b39c-16fafece5c5a/1bbf1375-7469-426a-b68d-adbc3446d51e
file format: raw
virtual size: 1.1T (1181116006400 bytes)
disk size: 0
2018-05-14 14:58 GMT-03:00 Juan Pablo <pablo.localhost(a)gmail.com>:
Im still wondering why Ovirt moved an image and left it with errors,
but
reported the move as successful in the GUI. If you think it's related to
the snapshot, its really strange as it's the first time I see this odd
behavior, never got an issue like this when moving images+snaps.
on the other side, if it moved the 700G image(and not the extra 400), why
it's unconsistant? shouldn't be old, but nevertheless in good shape? is
there anything else to try?
thanks in advance!
regards,
2018-05-14 12:01 GMT-03:00 Juan Pablo <pablo.localhost(a)gmail.com>:
> Hi Nir, thanks for the reply, here's the output:
>
> *(BASE)*
> *[root@node02 ~]# * qemu-img info --backing-chain
> /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440
> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/52532d0
> 5-970e-4643-9774-96c31796062c
> image: /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440
> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/52532d0
> 5-970e-4643-9774-96c31796062c
> file format: raw
> virtual size: 700G (751619276800 bytes)
> disk size: 0
>
>
>
> *(with Snapshot)[root@node02 ~]# * qemu-img info --backing-chain
> /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440
> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/86a6fdb
> 9-b9a4-4b78-8e3d-940f83cedc5a
> image: /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440
> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/86a6fdb
> 9-b9a4-4b78-8e3d-940f83cedc5a
> file format: qcow2
> virtual size: 1.1T (1181116006400 bytes)
> disk size: 0
> cluster_size: 65536
> backing file: 52532d05-970e-4643-9774-96c31796062c (actual path:
> /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440
> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/52532d0
> 5-970e-4643-9774-96c31796062c)
> backing file format: raw
> Format specific information:
> compat: 1.1
> lazy refcounts: false
> refcount bits: 16
> corrupt: false
>
> image: /rhev/data-center/mnt/blockSD/cec63cf0-9311-488d-b1fa-99c440
> 5e8379/images/65ec515e-0aae-4fe6-a561-387929c7fb4d/52532d0
> 5-970e-4643-9774-96c31796062c
> file format: raw
> virtual size: 700G (751619276800 bytes)
> disk size: 0
>
> I really appreciate your time helping,
> regards,
>
>
> 2018-05-14 11:36 GMT-03:00 Nir Soffer <nsoffer(a)redhat.com>:
>
>> On Mon, May 14, 2018 at 5:19 PM Juan Pablo <pablo.localhost(a)gmail.com>
>> wrote:
>>
>>> ok, so Im confirming that the image is wrong somehow:
>>> with no snapshot, from inside the vm disk size is reporting 750G.
>>> with a snapshot, from inside the vm disk size is reporting 1100G.
>>> both have no partitions on it, so I guess ovirt migrated the structure
>>> of the 750G disk on a 1100 disk, any ideas to troubleshoot this and see if
>>> there's data to recover?
>>>
>>
>> Maybe you resized the disk after making a snapshot?
>>
>> If the base is raw, the size seen by the guest is the size of the image.
>>
>> The snapshot is qcow2, the size seen by the guest is the size saved in
>> the qcow2 header.
>>
>> Can you share the output of:
>>
>> qemu-img info --backing-chain /path/to/snapshot
>>
>> And:
>>
>> qemu-img info --backing-chain /path/to/base
>>
>> You can see the path in the vm xml, either in vdsm.log, or using virsh:
>>
>> virsh -r list
>> virtsh -r dumpxml vm-id
>>
>> Nir
>>
>>
>>>
>>> regards,
>>>
>>>
>>> 2018-05-13 15:25 GMT-03:00 Juan Pablo <pablo.localhost(a)gmail.com>:
>>>
>>>> 2 clues:
>>>> -the original size of the disk was 750G and was extended a month ago
>>>> to 1100G. The System rebooted fine several times, and took the new size
>>>> with no problems.
>>>>
>>>> -I run fdisk from a centos 7 rescue cd and '/dev/vda' reported
750G.
>>>> then, I took a snapshot of the disk to play with recovery tools and now
>>>> fdisk reports 1100G... ¬¬
>>>>
>>>> so my guess is on the extend and later migration to a different
>>>> storage domain caused the issue.
>>>> Im currently running testdisk to see if theres any partition to
>>>> recover.
>>>>
>>>> regards,
>>>>
>>>> 2018-05-13 12:31 GMT-03:00 Juan Pablo <pablo.localhost(a)gmail.com>:
>>>>
>>>>> I removed the auto-snapshot and still no lucky. no bootable disk
>>>>> found. =(
>>>>> ideas?
>>>>>
>>>>>
>>>>> 2018-05-13 12:26 GMT-03:00 Juan Pablo
<pablo.localhost(a)gmail.com>:
>>>>>
>>>>>> benny, thanks for your reply:
>>>>>> ok, so the steps are : removing the snapshot on the first place.
>>>>>> then what do you suggest?
>>>>>>
>>>>>>
>>>>>> 2018-05-12 15:23 GMT-03:00 Nir Soffer
<nsoffer(a)redhat.com>:
>>>>>>
>>>>>>> On Sat, 12 May 2018, 11:32 Benny Zlotnik,
<bzlotnik(a)redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Using the auto-generated snapshot is generally a bad idea
as it's
>>>>>>>> inconsistent,
>>>>>>>>
>>>>>>>
>>>>>>> What do you mean by inconsistant?
>>>>>>>
>>>>>>>
>>>>>>> you should remove it before moving further
>>>>>>>>
>>>>>>>> On Fri, May 11, 2018 at 7:25 PM, Juan Pablo <
>>>>>>>> pablo.localhost(a)gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I rebooted it with no luck, them I used the auto-gen
snapshot ,
>>>>>>>>> same luck.
>>>>>>>>> attaching the logs in gdrive
>>>>>>>>>
>>>>>>>>> thanks in advance
>>>>>>>>>
>>>>>>>>> 2018-05-11 12:50 GMT-03:00 Benny Zlotnik
<bzlotnik(a)redhat.com>:
>>>>>>>>>
>>>>>>>>>> I see here a failed attempt:
>>>>>>>>>> 2018-05-09 16:00:20,129-03 ERROR
[org.ovirt.engine.core.dal.dbb
>>>>>>>>>> roker.auditloghandling.AuditLogDirector]
>>>>>>>>>>
(EE-ManagedThreadFactory-engineScheduled-Thread-67)
>>>>>>>>>> [bd8eeb1d-f49a-4f91-a521-e0f31b4a7cbd] EVENT_ID:
>>>>>>>>>> USER_MOVED_DISK_FINISHED_FAILURE(2,011), User
>>>>>>>>>> admin@internal-authz have failed to move disk
mail02-int_Disk1
>>>>>>>>>> to domain 2penLA.
>>>>>>>>>>
>>>>>>>>>> Then another:
>>>>>>>>>> 2018-05-09 16:15:06,998-03 ERROR
[org.ovirt.engine.core.dal.dbb
>>>>>>>>>> roker.auditloghandling.AuditLogDirector]
>>>>>>>>>>
(EE-ManagedThreadFactory-engineScheduled-Thread-34) []
>>>>>>>>>> EVENT_ID:
USER_MOVED_DISK_FINISHED_FAILURE(2,011), User
>>>>>>>>>> admin@internal-authz have failed to move disk
mail02-int_Disk1
>>>>>>>>>> to domain 2penLA.
>>>>>>>>>>
>>>>>>>>>> Here I see a successful attempt:
>>>>>>>>>> 2018-05-09 21:58:42,628-03 INFO
[org.ovirt.engine.core.dal.dbb
>>>>>>>>>> roker.auditloghandling.AuditLogDirector] (default
task-50)
>>>>>>>>>> [940b051c-8c63-4711-baf9-f3520bb2b825] EVENT_ID:
>>>>>>>>>> USER_MOVED_DISK(2,008), User admin@internal-authz
moving disk
>>>>>>>>>> mail02-int_Disk1 to domain 2penLA.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Then, in the last attempt I see the attempt was
successful but
>>>>>>>>>> live merge failed:
>>>>>>>>>> 2018-05-11 03:37:59,509-03 ERROR
[org.ovirt.engine.core.bll.MergeStatusCommand]
>>>>>>>>>>
(EE-ManagedThreadFactory-commandCoordinator-Thread-2)
>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Failed to
live merge,
>>>>>>>>>> still in volume chain:
[5d9d2958-96bc-49fa-9100-2f33a3ba737f,
>>>>>>>>>> 52532d05-970e-4643-9774-96c31796062c]
>>>>>>>>>> 2018-05-11 03:38:01,495-03 INFO
[org.ovirt.engine.core.bll.Ser
>>>>>>>>>> ialChildCommandsExecutionCallback]
>>>>>>>>>>
(EE-ManagedThreadFactory-engineScheduled-Thread-51)
>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Command
>>>>>>>>>> 'LiveMigrateDisk' (id:
'115fc375-6018-4d59-b9f2-51ee05ca49f8')
>>>>>>>>>> waiting on child command id:
'26bc52a4-4509-4577-b342-44a679bc628f'
>>>>>>>>>> type:'RemoveSnapshot' to complete
>>>>>>>>>> 2018-05-11 03:38:01,501-03 ERROR
[org.ovirt.engine.core.bll.sna
>>>>>>>>>> pshots.RemoveSnapshotSingleDiskLiveCommand]
>>>>>>>>>>
(EE-ManagedThreadFactory-engineScheduled-Thread-51)
>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Command
id:
>>>>>>>>>> '4936d196-a891-4484-9cf5-fceaafbf3364 failed
child command
>>>>>>>>>> status for step 'MERGE_STATUS'
>>>>>>>>>> 2018-05-11 03:38:01,501-03 INFO
[org.ovirt.engine.core.bll.sna
>>>>>>>>>>
pshots.RemoveSnapshotSingleDiskLiveCommandCallback]
>>>>>>>>>>
(EE-ManagedThreadFactory-engineScheduled-Thread-51)
>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Command
>>>>>>>>>> 'RemoveSnapshotSingleDiskLive' id:
'4936d196-a891-4484-9cf5-fceaafbf3364'
>>>>>>>>>> child commands
'[8da5f261-7edd-4930-8d9d-d34f232d84b3,
>>>>>>>>>> 1c320f4b-7296-43c4-a3e6-8a868e23fc35,
>>>>>>>>>> a0e9e70c-cd65-4dfb-bd00-076c4e99556a]'
executions were
>>>>>>>>>> completed, status 'FAILED'
>>>>>>>>>> 2018-05-11 03:38:02,513-03 ERROR
[org.ovirt.engine.core.bll.sna
>>>>>>>>>> pshots.RemoveSnapshotSingleDiskLiveCommand]
>>>>>>>>>>
(EE-ManagedThreadFactory-engineScheduled-Thread-2)
>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Merging of
snapshot
>>>>>>>>>> '319e8bbb-9efe-4de4-a9a6-862e3deb891f'
images
>>>>>>>>>>
'52532d05-970e-4643-9774-96c31796062c'..'5d9d2958-96bc-49fa-9100-2f33a3ba737f'
>>>>>>>>>> failed. Images have been marked illegal and can
no longer be previewed or
>>>>>>>>>> reverted to. Please retry Live Merge on the
snapshot to complete the
>>>>>>>>>> operation.
>>>>>>>>>> 2018-05-11 03:38:02,519-03 ERROR
[org.ovirt.engine.core.bll.sna
>>>>>>>>>> pshots.RemoveSnapshotSingleDiskLiveCommand]
>>>>>>>>>>
(EE-ManagedThreadFactory-engineScheduled-Thread-2)
>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Ending
command
>>>>>>>>>>
'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand'
>>>>>>>>>> with failure.
>>>>>>>>>> 2018-05-11 03:38:03,530-03 INFO
[org.ovirt.engine.core.bll.Con
>>>>>>>>>> currentChildCommandsExecutionCallback]
>>>>>>>>>>
(EE-ManagedThreadFactory-engineScheduled-Thread-37)
>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Command
'RemoveSnapshot'
>>>>>>>>>> id:
'26bc52a4-4509-4577-b342-44a679bc628f' child commands
>>>>>>>>>> '[4936d196-a891-4484-9cf5-fceaafbf3364]'
executions were
>>>>>>>>>> completed, status 'FAILED'
>>>>>>>>>> 2018-05-11 03:38:04,548-03 ERROR
[org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand]
>>>>>>>>>>
(EE-ManagedThreadFactory-engineScheduled-Thread-66)
>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Ending
command
>>>>>>>>>>
'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand'
>>>>>>>>>> with failure.
>>>>>>>>>> 2018-05-11 03:38:04,557-03 INFO
[org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand]
>>>>>>>>>>
(EE-ManagedThreadFactory-engineScheduled-Thread-66)
>>>>>>>>>> [d5b7fdf5-9c37-4c1f-8543-a7bc75c993a5] Lock freed
to object
>>>>>>>>>>
'EngineLock:{exclusiveLocks='[4808bb70-c9cc-4286-aa39-16b579
>>>>>>>>>> 8213ac=LIVE_STORAGE_MIGRATION]',
sharedLocks=''}'
>>>>>>>>>>
>>>>>>>>>> I do not see the merge attempt in the vdsm.log,
so please send
>>>>>>>>>> vdsm logs for node02.phy.eze.ampgn.com.ar from
that time.
>>>>>>>>>>
>>>>>>>>>> Also, did you use the auto-generated snapshot to
start the vm?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, May 11, 2018 at 6:11 PM, Juan Pablo <
>>>>>>>>>> pablo.localhost(a)gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> after the xfs_repair, it says: sorry I could
not find valid
>>>>>>>>>>> secondary superblock
>>>>>>>>>>>
>>>>>>>>>>> 2018-05-11 12:09 GMT-03:00 Juan Pablo <
>>>>>>>>>>> pablo.localhost(a)gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> hi,
>>>>>>>>>>>> Alias:
>>>>>>>>>>>> mail02-int_Disk1
>>>>>>>>>>>> Description:
>>>>>>>>>>>> ID:
>>>>>>>>>>>> 65ec515e-0aae-4fe6-a561-387929c7fb4d
>>>>>>>>>>>> Alignment:
>>>>>>>>>>>> Unknown
>>>>>>>>>>>> Disk Profile:
>>>>>>>>>>>> Wipe After Delete:
>>>>>>>>>>>> No
>>>>>>>>>>>>
>>>>>>>>>>>> that one
>>>>>>>>>>>>
>>>>>>>>>>>> 2018-05-11 11:12 GMT-03:00 Benny Zlotnik
<bzlotnik(a)redhat.com>
>>>>>>>>>>>> :
>>>>>>>>>>>>
>>>>>>>>>>>>> I looked at the logs and I see some
disks have moved
>>>>>>>>>>>>> successfully and some failed. Which
disk is causing the problems?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, May 11, 2018 at 5:02 PM, Juan
Pablo <
>>>>>>>>>>>>> pablo.localhost(a)gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi, just sent you via drive the
files. attaching some extra
>>>>>>>>>>>>>> info, thanks thanks and thanks :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> from inside the migrated vm I had
the following attached
>>>>>>>>>>>>>> dmesg output before rebooting
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> regards and thanks again for the
help,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2018-05-11 10:45 GMT-03:00 Benny
Zlotnik <
>>>>>>>>>>>>>> bzlotnik(a)redhat.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dropbox or google drive I
guess. Also, can you attach
>>>>>>>>>>>>>>> engine.log?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, May 11, 2018 at 4:43
PM, Juan Pablo <
>>>>>>>>>>>>>>> pablo.localhost(a)gmail.com>
wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> vdsm is too big for gmail
...any other way I can share it
>>>>>>>>>>>>>>>> with you?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ---------- Forwrded
message ----------
>>>>>>>>>>>>>>>> From: Juan Pablo
<pablo.localhost(a)gmail.com>
>>>>>>>>>>>>>>>> Date: 2018-05-11 10:40
GMT-03:00
>>>>>>>>>>>>>>>> Subject: Re:
[ovirt-users] strange issue: vm lost info on
>>>>>>>>>>>>>>>> disk
>>>>>>>>>>>>>>>> To: Benny Zlotnik
<bzlotnik(a)redhat.com>
>>>>>>>>>>>>>>>> Cc: users
<Users(a)ovirt.org>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Benny, thanks for your
reply! it was a Live migration.
>>>>>>>>>>>>>>>> sorry, it was from nfs to
iscsi, not otherwise. I have reboot the vm for
>>>>>>>>>>>>>>>> rescue and it does not
detect any partitions with fdisk, Im running a
>>>>>>>>>>>>>>>> xfs_repair with -n and
found some corrupted primary superblock., its still
>>>>>>>>>>>>>>>> running... ( so...
there's info in the disk maybe?)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> attaching logs, let me
know if those are the ones.
>>>>>>>>>>>>>>>> thanks again!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2018-05-11 9:45 GMT-03:00
Benny Zlotnik <
>>>>>>>>>>>>>>>> bzlotnik(a)redhat.com>:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Can you provide the
logs? engine and vdsm.
>>>>>>>>>>>>>>>>> Did you perform a
live migration (the VM is running) or
>>>>>>>>>>>>>>>>> cold?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, May 11, 2018
at 2:49 PM, Juan Pablo <
>>>>>>>>>>>>>>>>>
pablo.localhost(a)gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi! , Im strugled
about an ongoing problem:
>>>>>>>>>>>>>>>>>> after migrating
a vm's disk from an iscsi domain to a
>>>>>>>>>>>>>>>>>> nfs and ovirt
reporting the migration was successful, I see there's no data
>>>>>>>>>>>>>>>>>> 'inside'
the vm's disk. we never had this issues with ovirt so Im stranged
>>>>>>>>>>>>>>>>>> about the root
cause and if theres a chance of recovering the information.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> can you please
help me out troubleshooting this one? I
>>>>>>>>>>>>>>>>>> would really
appreciate it =)
>>>>>>>>>>>>>>>>>> running ovirt
4.2.1 here!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> thanks in
advance,
>>>>>>>>>>>>>>>>>> JP
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>>>> Users mailing
list -- users(a)ovirt.org
>>>>>>>>>>>>>>>>>> To unsubscribe
send an email to users-leave(a)ovirt.org
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list -- users(a)ovirt.org
>>>>>>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>