
This is needed to prevent any inconsistencies stemming from buffered writes/caching file data during live VM migration. Besides, for Gluster to truly honor direct-io behavior in qemu's 'cache=none' mode (which is what oVirt uses), one needs to turn on performance.strict-o-direct and disable remote-dio. -Krutika On Wed, Mar 27, 2019 at 12:24 PM Leo David <leoalex@gmail.com> wrote:
Hi, I can confirm that after setting these two options, I haven't encountered disk corruptions anymore. The downside, is that at least for me it had a pretty big impact on performance. The iops really went down - performing inside vm fio tests.
On Wed, Mar 27, 2019, 07:03 Krutika Dhananjay <kdhananj@redhat.com> wrote:
Could you enable strict-o-direct and disable remote-dio on the src volume as well, restart the vms on "old" and retry migration?
# gluster volume set <VOLNAME> performance.strict-o-direct on # gluster volume set <VOLNAME> network.remote-dio off
-Krutika
On Tue, Mar 26, 2019 at 10:32 PM Sander Hoentjen <sander@hoentjen.eu> wrote:
+Krutika Dhananjay and gluster ml
On Tue, Mar 26, 2019 at 6:16 PM Sander Hoentjen <sander@hoentjen.eu> wrote:
Hello,
tl;dr We have disk corruption when doing live storage migration on oVirt 4.2 with gluster 3.12.15. Any idea why?
We have a 3-node oVirt cluster that is both compute and gluster-storage. The manager runs on separate hardware. We are running out of space on this volume, so we added another Gluster volume that is bigger, put a storage domain on it and then we migrated VM's to it with LSM. After some time, we noticed that (some of) the migrated VM's had corrupted filesystems. After moving everything back with export-import to the
On 26-03-19 14:23, Sahina Bose wrote: old
domain where possible, and recovering from backups where needed we set off to investigate this issue.
We are now at the point where we can reproduce this issue within a day. What we have found so far: 1) The corruption occurs at the very end of the replication step, most probably between START and FINISH of diskReplicateFinish, before the START merge step 2) In the corrupted VM, at some place where data should be, this data is replaced by zero's. This can be file-contents or a directory-structure or whatever. 3) The source gluster volume has different settings then the destination (Mostly because the defaults were different at creation time):
Setting old(src) new(dst) cluster.op-version 30800 30800 (the same) cluster.max-op-version 31202 31202 (the same) cluster.metadata-self-heal off on cluster.data-self-heal off on cluster.entry-self-heal off on performance.low-prio-threads 16 32 performance.strict-o-direct off on network.ping-timeout 42 30 network.remote-dio enable off transport.address-family - inet performance.stat-prefetch off on features.shard-block-size 512MB 64MB cluster.shd-max-threads 1 8 cluster.shd-wait-qlength 1024 10000 cluster.locking-scheme full granular cluster.granular-entry-heal no enable
4) To test, we migrate some VM's back and forth. The corruption does not occur every time. To this point it only occurs from old to new, but we don't have enough data-points to be sure about that.
Anybody an idea what is causing the corruption? Is this the best list to ask, or should I ask on a Gluster list? I am not sure if this is oVirt specific or Gluster specific though. Do you have logs from old and new gluster volumes? Any errors in the new volume's fuse mount logs?
Around the time of corruption I see the message: The message "I [MSGID: 133017] [shard.c:4941:shard_seek] 0-ZoneA_Gluster1-shard: seek called on 7fabc273-3d8a-4a49-8906-b8ccbea4a49f. [Operation not supported]" repeated 231 times between [2019-03-26 13:14:22.297333] and [2019-03-26 13:15:42.912170]
I also see this message at other times, when I don't see the corruption occur, though.
-- Sander _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/M3T2VGGGV6DE64...
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZUIRM5PT4Y4USO...