In this specific case Ieven used virgin hardware originally.
Once I managed to kill the hosted-engine by downgrading the datacenter cluster to legacy,
I re-installed all gluster storage from the VDO level up. No traces of a file system
should be left with LVM and XFS on top, even if I didn't actually null the SSD (does
writing nulls to an SSD actually cost you an overwrite these days or is that translated
into a trim by the firmware?)
No difference in terms of faults between the virgin hardware and the re-install, so stale
Gluster extended file attributes etc. (your error theory, I believe) is not a factor.
Choosing between 'vmstore' and 'data' domains for the imports makes no
difference, full allocation over thin allocation neither. But actually I didn't just
see write errors from qemu-img, but also read-errors, which had me concerned about some
other corruption source. That was another motivation to start with a fresh source, which
meant a backup-domain instead of an export domain or OVAs.
The storage underneath the backup domain is NFS (Posix has a 4k issue and I'm not sure
I want to try moving Glusters between farms just yet), which is easy to detach at the
source and import at the target. If NFS is your default, oVirt can be so much easier, but
that more 'professional' domain we use vSphere and actually SAN storage. The
attraction of oVirt for the lab use case, critically depends on HCI and gluster.
The VMs were fine running from the backup domain (which incidentally must have lost its
backup attribute at the target, because otherwise it should have kept the VMs from
launching...), but once I tried moving their disks to the gluster, I got empty or unusable
disks again, or error while moving.
The only way that I found to transfer gluster to gluster was to use disk uploads either
via the GUI or by Python, but that results into fully allocated images and is very slow at
50MB/s even with Python. BTW sparsifying does nothing to those images, I guess because
sectors full of nulls aren't actually the same as a logically unused sector. At least
the VDO underneath should take reduce some of the overhead.