Hello,
Here are additional log tiles as well as a tree of the problematic Gluster
storage domain. During this time I attempted to copy a virtual disk to
another domain, move a virtual disk to another domain and run a VM where
the virtual hard disk would be used.
The copies/moves failed and the VM went into pause mode when the virtual
HDD was involved.
Please check these out.
Thank You For Your Help !
On Sat, Jun 20, 2020 at 9:54 AM C Williams <cwilliams3320(a)gmail.com> wrote:
Strahil,
I understand. Please keep me posted.
Thanks For The Help !
On Sat, Jun 20, 2020 at 4:36 AM Strahil Nikolov <hunter86_bg(a)yahoo.com>
wrote:
> Hey C Williams,
>
> sorry for the delay, but I couldn't get somw time to check your logs.
> Will try a little bit later.
>
> Best Regards,
> Strahil Nikolov
>
> На 20 юни 2020 г. 2:37:22 GMT+03:00, C Williams <cwilliams3320(a)gmail.com>
> написа:
> >Hello,
> >
> >Was wanting to follow up on this issue. Users are impacted.
> >
> >Thank You
> >
> >On Fri, Jun 19, 2020 at 9:20 AM C Williams <cwilliams3320(a)gmail.com>
> >wrote:
> >
> >> Hello,
> >>
> >> Here are the logs (some IPs are changed )
> >>
> >> ov05 is the SPM
> >>
> >> Thank You For Your Help !
> >>
> >> On Thu, Jun 18, 2020 at 11:31 PM Strahil Nikolov
> ><hunter86_bg(a)yahoo.com>
> >> wrote:
> >>
> >>> Check on the hosts tab , which is your current SPM (last column in
> >Admin
> >>> UI).
> >>> Then open the /var/log/vdsm/vdsm.log and repeat the operation.
> >>> Then provide the log from that host and the engine's log (on the
> >>> HostedEngine VM or on your standalone engine).
> >>>
> >>> Best Regards,
> >>> Strahil Nikolov
> >>>
> >>> На 18 юни 2020 г. 23:59:36 GMT+03:00, C Williams
> ><cwilliams3320(a)gmail.com>
> >>> написа:
> >>> >Resending to eliminate email issues
> >>> >
> >>> >---------- Forwarded message ---------
> >>> >From: C Williams <cwilliams3320(a)gmail.com>
> >>> >Date: Thu, Jun 18, 2020 at 4:01 PM
> >>> >Subject: Re: [ovirt-users] Fwd: Issues with Gluster Domain
> >>> >To: Strahil Nikolov <hunter86_bg(a)yahoo.com>
> >>> >
> >>> >
> >>> >Here is output from mount
> >>> >
> >>> >192.168.24.12:/stor/import0 on
> >>> >/rhev/data-center/mnt/192.168.24.12:_stor_import0
> >>> >type nfs4
> >>>
> >>>
>
>
>>(rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=192.168.24.18,local_lock=none,addr=192.168.24.12)
> >>> >192.168.24.13:/stor/import1 on
> >>> >/rhev/data-center/mnt/192.168.24.13:_stor_import1
> >>> >type nfs4
> >>>
> >>>
>
>
>>(rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=192.168.24.18,local_lock=none,addr=192.168.24.13)
> >>> >192.168.24.13:/stor/iso1 on
> >>> >/rhev/data-center/mnt/192.168.24.13:_stor_iso1
> >>> >type nfs4
> >>>
> >>>
>
>
>>(rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=192.168.24.18,local_lock=none,addr=192.168.24.13)
> >>> >192.168.24.13:/stor/export0 on
> >>> >/rhev/data-center/mnt/192.168.24.13:_stor_export0
> >>> >type nfs4
> >>>
> >>>
>
>
>>(rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=192.168.24.18,local_lock=none,addr=192.168.24.13)
> >>> >192.168.24.15:/images on
> >>> >/rhev/data-center/mnt/glusterSD/192.168.24.15:_images
> >>> >type fuse.glusterfs
> >>>
> >>>
>
>
>>(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
> >>> >192.168.24.18:/images3 on
> >>> >/rhev/data-center/mnt/glusterSD/192.168.24.18:_images3
> >>> >type fuse.glusterfs
> >>>
> >>>
>
>
>>(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
> >>> >tmpfs on /run/user/0 type tmpfs
> >>> >(rw,nosuid,nodev,relatime,seclabel,size=13198392k,mode=700)
> >>> >[root@ov06 glusterfs]#
> >>> >
> >>> >Also here is a screenshot of the console
> >>> >
> >>> >[image: image.png]
> >>> >The other domains are up
> >>> >
> >>> >Import0 and Import1 are NFS . GLCL0 is gluster. They all are
> >running
> >>> >VMs
> >>> >
> >>> >Thank You For Your Help !
> >>> >
> >>> >On Thu, Jun 18, 2020 at 3:51 PM Strahil Nikolov
> ><hunter86_bg(a)yahoo.com>
> >>> >wrote:
> >>> >
> >>> >> I don't see
'/rhev/data-center/mnt/192.168.24.13:_stor_import1'
> >>> >mounted
> >>> >> at all .
> >>> >> What is the status of all storage domains ?
> >>> >>
> >>> >> Best Regards,
> >>> >> Strahil Nikolov
> >>> >>
> >>> >> На 18 юни 2020 г. 21:43:44 GMT+03:00, C Williams
> >>> ><cwilliams3320(a)gmail.com>
> >>> >> написа:
> >>> >> > Resending to deal with possible email issues
> >>> >> >
> >>> >> >---------- Forwarded message ---------
> >>> >> >From: C Williams <cwilliams3320(a)gmail.com>
> >>> >> >Date: Thu, Jun 18, 2020 at 2:07 PM
> >>> >> >Subject: Re: [ovirt-users] Issues with Gluster Domain
> >>> >> >To: Strahil Nikolov <hunter86_bg(a)yahoo.com>
> >>> >> >
> >>> >> >
> >>> >> >More
> >>> >> >
> >>> >> >[root@ov06 ~]# for i in $(gluster volume list); do echo
> >$i;echo;
> >>> >> >gluster
> >>> >> >volume info $i; echo;echo;gluster volume status
> >>> >$i;echo;echo;echo;done
> >>> >> >images3
> >>> >> >
> >>> >> >
> >>> >> >Volume Name: images3
> >>> >> >Type: Replicate
> >>> >> >Volume ID: 0243d439-1b29-47d0-ab39-d61c2f15ae8b
> >>> >> >Status: Started
> >>> >> >Snapshot Count: 0
> >>> >> >Number of Bricks: 1 x 3 = 3
> >>> >> >Transport-type: tcp
> >>> >> >Bricks:
> >>> >> >Brick1: 192.168.24.18:/bricks/brick04/images3
> >>> >> >Brick2: 192.168.24.19:/bricks/brick05/images3
> >>> >> >Brick3: 192.168.24.20:/bricks/brick06/images3
> >>> >> >Options Reconfigured:
> >>> >> >performance.client-io-threads: on
> >>> >> >nfs.disable: on
> >>> >> >transport.address-family: inet
> >>> >> >user.cifs: off
> >>> >> >auth.allow: *
> >>> >> >performance.quick-read: off
> >>> >> >performance.read-ahead: off
> >>> >> >performance.io-cache: off
> >>> >> >performance.low-prio-threads: 32
> >>> >> >network.remote-dio: off
> >>> >> >cluster.eager-lock: enable
> >>> >> >cluster.quorum-type: auto
> >>> >> >cluster.server-quorum-type: server
> >>> >> >cluster.data-self-heal-algorithm: full
> >>> >> >cluster.locking-scheme: granular
> >>> >> >cluster.shd-max-threads: 8
> >>> >> >cluster.shd-wait-qlength: 10000
> >>> >> >features.shard: on
> >>> >> >cluster.choose-local: off
> >>> >> >client.event-threads: 4
> >>> >> >server.event-threads: 4
> >>> >> >storage.owner-uid: 36
> >>> >> >storage.owner-gid: 36
> >>> >> >performance.strict-o-direct: on
> >>> >> >network.ping-timeout: 30
> >>> >> >cluster.granular-entry-heal: enable
> >>> >> >
> >>> >> >
> >>> >> >Status of volume: images3
> >>> >> >Gluster process TCP Port RDMA
Port
> >>> >Online
> >>> >> > Pid
> >>> >>
> >>> >>
> >>>
> >>>
>
>
>>>------------------------------------------------------------------------------
> >>> >> >Brick 192.168.24.18:/bricks/brick04/images3 49152 0
> >Y
> >>> >> >6666
> >>> >> >Brick 192.168.24.19:/bricks/brick05/images3 49152 0
> >Y
> >>> >> >6779
> >>> >> >Brick 192.168.24.20:/bricks/brick06/images3 49152 0
> >Y
> >>> >> >7227
> >>> >> >Self-heal Daemon on localhost N/A N/A
> >Y
> >>> >> >6689
> >>> >> >Self-heal Daemon on
ov07.ntc.srcle.com N/A N/A
> >Y
> >>> >> >6802
> >>> >> >Self-heal Daemon on
ov08.ntc.srcle.com N/A N/A
> >Y
> >>> >> >7250
> >>> >> >
> >>> >> >Task Status of Volume images3
> >>> >>
> >>> >>
> >>>
> >>>
>
>
>>>------------------------------------------------------------------------------
> >>> >> >There are no active volume tasks
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >[root@ov06 ~]# ls -l /rhev/data-center/mnt/glusterSD/
> >>> >> >total 16
> >>> >> >drwxr-xr-x. 5 vdsm kvm 8192 Jun 18 14:04
192.168.24.15:_images
> >>> >> >drwxr-xr-x. 5 vdsm kvm 8192 Jun 18 14:05
192.168.24.18:_images3
> >>> >> >[root@ov06 ~]#
> >>> >> >
> >>> >> >On Thu, Jun 18, 2020 at 2:03 PM C Williams
> ><cwilliams3320(a)gmail.com>
> >>> >> >wrote:
> >>> >> >
> >>> >> >> Strahil,
> >>> >> >>
> >>> >> >> Here you go -- Thank You For Your Help !
> >>> >> >>
> >>> >> >> BTW -- I can write a test file to gluster and it
replicates
> >>> >properly.
> >>> >> >> Thinking something about the oVirt Storage Domain ?
> >>> >> >>
> >>> >> >> [root@ov08 ~]# gluster pool list
> >>> >> >> UUID Hostname
> >>> >State
> >>> >> >> 5b40c659-d9ab-43c3-9af8-18b074ea0b83 ov06
> >>> >> >Connected
> >>> >> >> 36ce5a00-6f65-4926-8438-696944ebadb5
ov07.ntc.srcle.com
> >>> >> >Connected
> >>> >> >> c7e7abdb-a8f4-4842-924c-e227f0db1b29 localhost
> >>> >> >Connected
> >>> >> >> [root@ov08 ~]# gluster volume list
> >>> >> >> images3
> >>> >> >>
> >>> >> >> On Thu, Jun 18, 2020 at 1:13 PM Strahil Nikolov
> >>> >> ><hunter86_bg(a)yahoo.com>
> >>> >> >> wrote:
> >>> >> >>
> >>> >> >>> Log to the oVirt cluster and provide the output
of:
> >>> >> >>> gluster pool list
> >>> >> >>> gluster volume list
> >>> >> >>> for i in $(gluster volume list); do echo
$i;echo; gluster
> >>> >volume
> >>> >> >info
> >>> >> >>> $i; echo;echo;gluster volume status
$i;echo;echo;echo;done
> >>> >> >>>
> >>> >> >>> ls -l /rhev/data-center/mnt/glusterSD/
> >>> >> >>>
> >>> >> >>> Best Regards,
> >>> >> >>> Strahil Nikolov
> >>> >> >>>
> >>> >> >>>
> >>> >> >>> На 18 юни 2020 г. 19:17:46 GMT+03:00, C Williams
> >>> >> ><cwilliams3320(a)gmail.com>
> >>> >> >>> написа:
> >>> >> >>> >Hello,
> >>> >> >>> >
> >>> >> >>> >I recently added 6 hosts to an existing oVirt
> >compute/gluster
> >>> >> >cluster.
> >>> >> >>> >
> >>> >> >>> >Prior to this attempted addition, my cluster
had 3
> >Hypervisor
> >>> >hosts
> >>> >> >and
> >>> >> >>> >3
> >>> >> >>> >gluster bricks which made up a single gluster
volume
> >(replica 3
> >>> >> >volume)
> >>> >> >>> >. I
> >>> >> >>> >added the additional hosts and made a brick on
3 of the new
> >>> >hosts
> >>> >> >and
> >>> >> >>> >attempted to make a new replica 3 volume. I
had difficulty
> >>> >> >creating
> >>> >> >>> >the
> >>> >> >>> >new volume. So, I decided that I would make a
new
> >>> >compute/gluster
> >>> >> >>> >cluster
> >>> >> >>> >for each set of 3 new hosts.
> >>> >> >>> >
> >>> >> >>> >I removed the 6 new hosts from the existing
oVirt
> >>> >Compute/Gluster
> >>> >> >>> >Cluster
> >>> >> >>> >leaving the 3 original hosts in place with
their bricks. At
> >that
> >>> >> >point
> >>> >> >>> >my
> >>> >> >>> >original bricks went down and came back up .
The volume
> >showed
> >>> >> >entries
> >>> >> >>> >that
> >>> >> >>> >needed healing. At that point I ran gluster
volume heal
> >images3
> >>> >> >full,
> >>> >> >>> >etc.
> >>> >> >>> >The volume shows no unhealed entries. I also
corrected some
> >peer
> >>> >> >>> >errors.
> >>> >> >>> >
> >>> >> >>> >However, I am unable to copy disks, move disks
to another
> >>> >domain,
> >>> >> >>> >export
> >>> >> >>> >disks, etc. It appears that the engine cannot
locate disks
> >>> >properly
> >>> >> >and
> >>> >> >>> >I
> >>> >> >>> >get storage I/O errors.
> >>> >> >>> >
> >>> >> >>> >I have detached and removed the oVirt Storage
Domain. I
> >>> >reimported
> >>> >> >the
> >>> >> >>> >domain and imported 2 VMs, But the VM disks
exhibit the same
> >>> >> >behaviour
> >>> >> >>> >and
> >>> >> >>> >won't run from the hard disk.
> >>> >> >>> >
> >>> >> >>> >
> >>> >> >>> >I get errors such as this
> >>> >> >>> >
> >>> >> >>> >VDSM ov05 command HSMGetAllTasksStatusesVDS
failed: low
> >level
> >>> >Image
> >>> >> >>> >copy
> >>> >> >>> >failed: ("Command
['/usr/bin/qemu-img', 'convert', '-p',
> >'-t',
> >>> >> >'none',
> >>> >> >>> >'-T', 'none', '-f',
'raw',
> >>> >> >>>
>u'/rhev/data-center/mnt/glusterSD/192.168.24.18:
> >>> >> >>>
> >>> >>
> >>> >>
> >>>
> >>>
>
>
>>>_images3/5fe3ad3f-2d21-404c-832e-4dc7318ca10d/images/3ea5afbd-0fe0-4c09-8d39-e556c66a8b3d/fe6eab63-3b22-4815-bfe6-4a0ade292510',
> >>> >> >>> >'-O', 'raw',
> >>> >> >>> >u'/rhev/data-center/mnt/192.168.24.13:
> >>> >> >>>
> >>> >>
> >>> >>
> >>>
> >>>
>
>
>>>_stor_import1/1ab89386-a2ba-448b-90ab-bc816f55a328/images/f707a218-9db7-4e23-8bbd-9b12972012b6/d6591ec5-3ede-443d-bd40-93119ca7c7d5']
> >>> >> >>> >failed with rc=1 out=''
err=bytearray(b'qemu-img: error
> >while
> >>> >> >reading
> >>> >> >>> >sector 135168: Transport endpoint is not
> >connected\\nqemu-img:
> >>> >> >error
> >>> >> >>> >while
> >>> >> >>> >reading sector 131072: Transport endpoint is
not
> >>> >> >connected\\nqemu-img:
> >>> >> >>> >error while reading sector 139264: Transport
endpoint is not
> >>> >> >>> >connected\\nqemu-img: error while reading
sector 143360:
> >>> >Transport
> >>> >> >>> >endpoint
> >>> >> >>> >is not connected\\nqemu-img: error while
reading sector
> >147456:
> >>> >> >>> >Transport
> >>> >> >>> >endpoint is not connected\\nqemu-img: error
while reading
> >sector
> >>> >> >>> >155648:
> >>> >> >>> >Transport endpoint is not
connected\\nqemu-img: error while
> >>> >reading
> >>> >> >>> >sector
> >>> >> >>> >151552: Transport endpoint is not
connected\\nqemu-img:
> >error
> >>> >while
> >>> >> >>> >reading
> >>> >> >>> >sector 159744: Transport endpoint is not
connected\\n')",)
> >>> >> >>> >
> >>> >> >>> >oVirt version is 4.3.82-1.el7
> >>> >> >>> >OS CentOS Linux release 7.7.1908 (Core)
> >>> >> >>> >
> >>> >> >>> >The Gluster Cluster has been working very well
until this
> >>> >incident.
> >>> >> >>> >
> >>> >> >>> >Please help.
> >>> >> >>> >
> >>> >> >>> >Thank You
> >>> >> >>> >
> >>> >> >>> >Charles Williams
> >>> >> >>>
> >>> >> >>
> >>> >>
> >>>
> >>
>