
Hi Thanks for the info. There have been some progress with the situation. So to make the story as short as possible we are in a process of changing our range of IP addresse to 10.8.X.X to 10.16.X.X for all of the ovirt infra. This implies a new DHCP server, new switchs etc etc. For now we went back to our old IP address ranges because we were not able to stabilize the system. So the last status using our new range of addresses was that gluster was all fine, the hosted engine domaine was moutning okey. I suspect DNS table was not properly updated.. but i am not 100% sure. But if we tried to used the new range of adrreses everything seems to be fine except that the hosted-engine always fail the "liveliness check" after going up. I was not able to solve this situation so i went back to our previous DHCP server. So i am not sure what is missing for the hosted-engine to use the DHCP server. Is there any hardcode config in the hosted-egnine that need to be updated when chaging DHCP server(i.e new address with the same hostname, new gateway..) More info on the test i did with the new DHCP server -- > All node have name resolution working. I am able to ssh to the hosted-engine Any suggestions will be appreciated as i am out of idea for now. Do i need to redo some sort of setup in the engine to take into account the range of address/new gateway? There is also a LDAP server access configure in the engine for username mapping.. Carl On Sat, Jul 13, 2019 at 6:31 AM Strahil Nikolov <hunter86_bg@yahoo.com> wrote:
Can you mount the volume manually at another location ? Also, have you done any changes to Gluster ?
Please provide "gluster volume info engine" . I have noticed the following in your logs: option 'parallel-readdir' is not recognized
Best Regards, Strahil Nikolov
В петък, 12 юли 2019 г., 22:30:41 ч. Гринуич+3, carl langlois < crl.langlois@gmail.com> написа:
Hi ,
I am in state where my system does not recover from a major failure. I have pinpoint the probleme to be that the hosted engine storage domain is not able to mount
I have a glusterfs containing the storage domain. but when it attempt to mount glusterfs to /rhev/data-center/mnt/glusterSD/ovhost1:_engine i get
+------------------------------------------------------------------------------+ [2019-07-12 19:19:44.063608] I [rpc-clnt.c:1986:rpc_clnt_reconfig] 0-engine-client-2: changing port to 49153 (from 0) [2019-07-12 19:19:55.033725] I [fuse-bridge.c:4205:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22 [2019-07-12 19:19:55.033748] I [fuse-bridge.c:4835:fuse_graph_sync] 0-fuse: switched to graph 0 [2019-07-12 19:19:55.033895] I [MSGID: 108006] [afr-common.c:5372:afr_local_init] 0-engine-replicate-0: no subvolumes up [2019-07-12 19:19:55.033938] E [fuse-bridge.c:4271:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected) [2019-07-12 19:19:55.034041] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected) [2019-07-12 19:19:55.034060] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 2: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed [2019-07-12 19:19:55.034095] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected) [2019-07-12 19:19:55.034102] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed [2019-07-12 19:19:55.035596] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected) [2019-07-12 19:19:55.035611] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 4: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed [2019-07-12 19:19:55.047957] I [fuse-bridge.c:5093:fuse_thread_proc] 0-fuse: initating unmount of /rhev/data-center/mnt/glusterSD/ovhost1:_engine The message "I [MSGID: 108006] [afr-common.c:5372:afr_local_init] 0-engine-replicate-0: no subvolumes up" repeated 3 times between [2019-07-12 19:19:55.033895] and [2019-07-12 19:19:55.035588] [2019-07-12 19:19:55.048138] W [glusterfsd.c:1375:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7e25) [0x7f51cecb3e25] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x5632143bd4b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x5632143bd32b] ) 0-: received signum (15), shutting down [2019-07-12 19:19:55.048150] I [fuse-bridge.c:5852:fini] 0-fuse: Unmounting '/rhev/data-center/mnt/glusterSD/ovhost1:_engine'. [2019-07-12 19:19:55.048155] I [fuse-bridge.c:5857:fini] 0-fuse: Closing fuse connection to '/rhev/data-center/mnt/glusterSD/ovhost1:_engine'. [2019-07-12 19:19:56.029923] I [MSGID: 100030] [glusterfsd.c:2511:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.12.11 (args: /usr/sbin/glusterfs --volfile-server=ovhost1 --volfile-server=ovhost2 --volfile-server=ovhost3 --volfile-id=/engine /rhev/data-center/mnt/glusterSD/ovhost1:_engine) [2019-07-12 19:19:56.032209] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction [2019-07-12 19:19:56.037510] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-07-12 19:19:56.039618] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2019-07-12 19:19:56.039691] W [MSGID: 101174] [graph.c:363:_log_if_unknown_option] 0-engine-readdir-ahead: option 'parallel-readdir' is not recognized [2019-07-12 19:19:56.039739] I [MSGID: 114020] [client.c:2360:notify] 0-engine-client-0: parent translators are ready, attempting connect on transport [2019-07-12 19:19:56.043324] I [MSGID: 114020] [client.c:2360:notify] 0-engine-client-1: parent translators are ready, attempting connect on transport [2019-07-12 19:19:56.043481] I [rpc-clnt.c:1986:rpc_clnt_reconfig] 0-engine-client-0: changing port to 49153 (from 0) [2019-07-12 19:19:56.048539] I [MSGID: 114020] [client.c:2360:notify] 0-engine-client-2: parent translators are ready, attempting connect on transport [2019-07-12 19:19:56.048952] I [rpc-clnt.c:1986:rpc_clnt_reconfig] 0-engine-client-1: changing port to 49153 (from 0) Final graph:
without this mount point the ha-agent is not starting.
the volume seem to be okey
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------ Brick ovhost1:/gluster_bricks/data/data 49152 0 Y 7505 Brick ovhost2:/gluster_bricks/data/data 49152 0 Y 3640 Brick ovhost3:/gluster_bricks/data/data 49152 0 Y 6329 Self-heal Daemon on localhost N/A N/A Y 7712 Self-heal Daemon on ovhost2 N/A N/A Y 4925 Self-heal Daemon on ovhost3 N/A N/A Y 6501
Task Status of Volume data
------------------------------------------------------------------------------ There are no active volume tasks
Status of volume: engine Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------ Brick ovhost1:/gluster_bricks/engine/engine 49153 0 Y 7514 Brick ovhost2:/gluster_bricks/engine/engine 49153 0 Y 3662 Brick ovhost3:/gluster_bricks/engine/engine 49153 0 Y 6339 Self-heal Daemon on localhost N/A N/A Y 7712 Self-heal Daemon on ovhost2 N/A N/A Y 4925 Self-heal Daemon on ovhost3 N/A N/A Y 6501
Task Status of Volume engine
------------------------------------------------------------------------------ There are no active volume tasks
Status of volume: iso Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------ Brick ovhost1:/gluster_bricks/iso/iso 49154 0 Y 7523 Brick ovhost2:/gluster_bricks/iso/iso 49154 0 Y 3715 Brick ovhost3:/gluster_bricks/iso/iso 49154 0 Y 6349 Self-heal Daemon on localhost N/A N/A Y 7712 Self-heal Daemon on ovhost2 N/A N/A Y 4925 Self-heal Daemon on ovhost3 N/A N/A Y 6501
Task Status of Volume iso
------------------------------------------------------------------------------ There are no active volume tasks
Status of volume: vmstore Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------ Brick ovhost1:/gluster_bricks/vmstore/vmsto re 49155 0 Y 7532 Brick ovhost2:/gluster_bricks/vmstore/vmsto re 49155 0 Y 3739 Brick ovhost3:/gluster_bricks/vmstore/vmsto re 49155 0 Y 6359 Self-heal Daemon on localhost N/A N/A Y 7712 Self-heal Daemon on ovhost2 N/A N/A Y 4925 Self-heal Daemon on ovhost3 N/A N/A Y 6501
Task Status of Volume vmstore
------------------------------------------------------------------------------ There are no active volume tasks
I am not sure what to look for at this point Any help would be really appreciated.
Thanks Carl
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-leave@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SR4QQXKCAGH2AB...