Local storage formatting
by Matt Simonsen
Hello,
I'm running oVirt with several data centers, some with NFS storage and
some with local storage.
I had problems in the past with a large pool and local storage. The
problem was nodectl showed the pool being too full (I think >80%), but
it was only the images that made the pool "full" -- and this storage was
carefully setup such that there was no chance it would actually fill.
The LVs for oVirt itself were all under 20%, yet nodectl still reported
the pool was too full.
My solution so far has been to use our RAID card tools, so that sda is
the oVirt node install, and sdb is for images. There are probably other
good reasons for me to handle it this way, for example being able to use
different RAID levels, but I'm hoping someone can confirm my
partitioning below doesn't have some risk I'm now yet aware of.
I setup a new volume group for images, as below:
[root@node4-g8-h4 multipath]# pvs
PV VG Fmt Attr PSize
PFree
/dev/mapper/3600508b1001c7e172160824d7b204c3b2 onn_node4-g8-h4 lvm2
a-- <119.00g <22.85g
/dev/sdb1 data lvm2 a--
1.13t <361.30g
[root@node4-g8-h4 multipath]# vgs
VG #PV #LV #SN Attr VSize VFree
data 1 1 0 wz--n- 1.13t <361.30g
onn_node4-g8-h4 1 13 0 wz--n- <119.00g <22.85g
[root@node4-g8-h4 multipath]# lvs
LV VG Attr LSize
Pool Origin Data% Meta% Move Log Cpy%Sync
Convert
images_main data -wi-ao---- 800.00g
home onn_node4-g8-h4 Vwi-aotz--
1.00g pool00 4.79
ovirt-node-ng-4.2.5.1-0.20180816.0 onn_node4-g8-h4 Vwi---tz-k
64.10g pool00 root
ovirt-node-ng-4.2.5.1-0.20180816.0+1 onn_node4-g8-h4 Vwi---tz--
64.10g pool00 ovirt-node-ng-4.2.5.1-0.20180816.0
ovirt-node-ng-4.2.6-0.20180903.0 onn_node4-g8-h4 Vri---tz-k
64.10g pool00
ovirt-node-ng-4.2.6-0.20180903.0+1 onn_node4-g8-h4 Vwi-aotz--
64.10g pool00 ovirt-node-ng-4.2.6-0.20180903.0 4.83
pool00 onn_node4-g8-h4 twi-aotz--
91.10g 8.94 0.49
root onn_node4-g8-h4 Vwi---tz--
64.10g pool00
swap onn_node4-g8-h4 -wi-ao---- 4.00g
tmp onn_node4-g8-h4 Vwi-aotz--
1.00g pool00 4.87
var onn_node4-g8-h4 Vwi-aotz--
15.00g pool00 3.31
var_crash onn_node4-g8-h4 Vwi-aotz--
10.00g pool00 2.86
var_log onn_node4-g8-h4 Vwi-aotz--
8.00g pool00 3.57
var_log_audit onn_node4-g8-h4 Vwi-aotz--
2.00g pool00 4.89
The images_main is setup as "Block device for filesystems" with ext4. Is
there any reason I should consider pool for thinly provisioned volumes?
I don't need to over-allocate storage and it seems to me like a fixed
partition is ideal. Please confirm or let me know if there's anything
else I should consider.
Thanks
Matt
6 years, 3 months
self-hosted engine lost password
by Rob Epping
Hello list,
I'm trying to update our ovirt self-hosted engine to the latest version
using the docs at
https://www.ovirt.org/documentation/upgrade-guide/chap-Updates_between_
Minor_Releases/#updating-the-ovirt-self-hosted-engine-and-underlying-
virtualization-hosts
For this I need to logon to the engine VM, but none of the passwords we
used seems to work.
I can not seem to find information on how te reset passwords of the
hosted engine, so I'm stuck.
Am I doing something wrong or is access to the self-hosted engine
required. When access is required, is there a way to reset the
password?
THNX && GRTNX,
RobJE
6 years, 3 months
Upgraded host, engine now won't boot
by Jim Kusznir
Hello:
I saw that there were updates to my ovirt-4.2 3 node hyperconverged system,
so I proceeded to apply them the usual way through the UI.
At one point, the hosted engine was migrated to one of the upgraded hosts,
and then went "unstable" on me. Now, the hosted engine appears to be
crashed: It gets powered up, but it never boots up to the point where it
responds to pings or allows logins. After a while, the hosted engine shows
status (via console "hosted-engine --vm-status" command) "Powering Down".
It stays there for a long time.
I tried forcing a poweroff then powering it on, but again, it never gets up
to where it will respond to pings. --vm-status shows bad health, but up.
I tried running the hosted-engine --console command, but got:
[root@ovirt1 ~]# hosted-engine --console
The engine VM is running on this host
Connected to domain HostedEngine
Escape character is ^]
error: internal error: cannot find character device <null>
[root@ovirt1 ~]#
I tried to run the hosted-engine --upgrade-appliance command, but it hangs
at obtaining certificate (understandably, as the hosted-engine is not up).
How do i recover from this? And what caused this?
--Jim
6 years, 3 months
Wrong network threshold limit warnings on 4.2.5
by Florian Schmid
Good morning,
since we have upgraded to version 4.2.5, we get a lot of warnings about network interface exceeded defined threshold limits.
For example:
Aug 31, 2018, 7:54:05 AM
Host xxx has network interface which exceeded the defined threshold [95%] (enp9s0.80: transmit rate[100%], receive rate [12%])
This is a 10 Gbit interface and on our monitoring software, which is getting network statistics every 10s, the bandwidth of TX was 150 Mbit maximum at this time, so far away from being 100%.
Could it be, that the engine detected the wrong interface speed or there is a calculation error?
In the engine for this host, I have 10000 Mbps for all interfaces.
I have checked now all those warnings on our different hosts and they happen every time, we go over 100 Mbit and this is for sure quite often...
Can I maybe disable these warnings, because we have it anyway in our monitoring software?
If you need any logs, please ask.
BR Florian Schmid
6 years, 3 months
Bricks do not appear in oVirt Manager UI under Hosts unless manually created on nodes.
by payden.pringle+ovirtlist@gmail.com
I have a CentOS 7.5 system running the oVirt-Engine package as the manager for three oVirt Nodes (version 4.2). Two of the nodes have a 200GB disk and one has a 2GB disk. I am attempting to create a Gluster Volume that replicates between the two 200GB disks with the 2GB as Arbiter.
When I create the bricks by going to Hosts > host1 > Storage Devices > Create Brick (with the 200GB device selected), it succeeds, but no bricks appear under the Bricks tab for the Host. Doing the same for host2 and host3 results in the same problem.
If I run the following command on a host `gluster volume create test-volume replica 3 arbiter 1 192.168.5.{50,60,40}:/gluster-bricks/200GBbrick/test`, it creates the volume correctly and the bricks then appear as they should. However, then these Bricks are labeled as "Down" and when I go to Storage > Volumes, I get this error:
```
Uncaught exception occurred. Please try reloading the page. Details: (TypeError) : Cannot read property 'a' of null
Please have your administrator check the UI logs
```
The Volume is also shown as "Down". SELinux is enabled on my CentOS 7.5 system.
I'm not really sure what the correct method is to create bricks then a Volume in Gluster through the oVirt Management UI, but I've followed the guides and this is where it has got me.
6 years, 3 months
oVirt Node 4.2.3.1. to 4.2.5 upgrade trouble, log attached
by Matt Simonsen
Hello all,
I've emailed about similar trouble with an oVirt Node upgrade using the
ISO install. I've attached the /tmp/imgbased.log file in hopes it will
help give a clue as to why the trouble.
Since these use NFS storage I can rebuild, but would like to know,
ideally, what caused the upgrade to break.
Truthfully following the install, I don't think I have done *that* much
to these systems, so I'm not sure what would have caused the problem.
I have done several successful upgrades in the past and most of my
standalone systems have been working great.
I've been really happy with oVirt, so kudos to the team.
Thanks for any help,
Matt
6 years, 3 months
the issue of starting a VM
by hhz711@126.com
hello:
I created a VM on node1, but when I power on the vm and try to boot from the CD-ROM with ISO, but it's stucked .
the boot window only show below info:
"seaBIOS ( version xxxxxx)
Machine UUID xxxxxxxx "
Did you guys have met this before ?
6 years, 3 months
Gluster clients intermittently hang until first gluster server in a Replica 1 Arbiter 1 cluster is rebooted, server error: 0-management: Unlocking failed & client error: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x2131 sent = <datestamp>. timeout = 1800
by Sam McLeod
We've got an odd problem where clients are blocked from writing to Gluster volumes until the first node of the Gluster cluster is rebooted.
I suspect I've either configured something incorrectly with the arbiter / replica configuration of the volumes, or there is some sort of bug in the gluster client-server connection that we're triggering.
I was wondering if anyone has seen this or could point me in the right direction?
Environment:
Typology: 3 node cluster, replica 2, arbiter 1 (third node is metadata only).
Version: Client and Servers both running 4.1.3, both on CentOS 7, kernel 4.18.x, (Xen) VMs with relatively fast networked SSD storage backing them, XFS.
Client: Native Gluster FUSE client mounting via the kubernetes provider
Problem:
Seemingly randomly some clients will be blocked / are unable to write to what should be a highly available gluster volume.
The client gluster logs show it failing to do new file operations across various volumes and all three nodes of the gluster.
The server gluster (or OS) logs do not show any warnings or errors.
The client recovers and is able to write to volumes again after the first node of the gluster cluster is rebooted.
Until the first node of the gluster cluster is rebooted, the client fails to write to the volume that is (or should be) available on the second node (a replica) and third node (an arbiter only node).
What 'fixes' the issue:
Although the clients (kubernetes hosts) connect to all 3 nodes of the Gluster cluster - restarting the first gluster node always unblocks the IO and allows the client to continue writing.
Stopping and starting the glusterd service on the gluster server is not enough to fix the issue, nor is restarting its networking.
This suggests to me that the volume is unavailable for writing for some reason and restarting the first node in the cluster either clears some sort of TCP sessions between the client-server or between the server-server replication.
Expected behaviour:
If the first gluster node / server had failed or was blocked from performing operations for some reason (which it doesn't seem it is), I'd expect the clients to access data from the second gluster node and write metadata to the third gluster node as well as it's an arbiter / metadata only node.
If for some reason the a gluster node was not able to serve connections to clients, I'd expect to see errors in the volume, glusterd or brick log files (there are none on the first gluster node).
If the first gluster node was for some reason blocking IO on a volume, I'd expect that node either to show as unhealthy or unavailable in the gluster peer status or gluster volume status.
Client gluster errors:
staging_static in this example is a volume name.
You can see the client trying to connect to the second and third nodes of the gluster cluster and failing (unsure as to why?)
The server side logs on the first gluster node do not show any errors or problems, but the second / third node show errors in the glusterd.log when trying to 'unlock' the 0-management volume on the first node.
On a gluster client (a kubernetes host using the kubernetes connector which uses the native fuse client) when its blocked from writing but the gluster appears healthy (other than the errors mentioned later):
[2018-09-02 15:33:22.750874] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x1cce sent = 2018-09-02 15:03:22.417773. timeout = 1800 for <ip of third gluster node>:49154
[2018-09-02 15:33:22.750989] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]
[2018-09-02 16:03:23.097905] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x2e21 sent = 2018-09-02 15:33:22.765751. timeout = 1800 for <ip of second gluster node>:49154
[2018-09-02 16:03:23.097988] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]
[2018-09-02 16:33:23.439172] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x1d4b sent = 2018-09-02 16:03:23.098133. timeout = 1800 for <ip of third gluster node>:49154
[2018-09-02 16:33:23.439282] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]
[2018-09-02 17:03:23.786858] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x2ee7 sent = 2018-09-02 16:33:23.455171. timeout = 1800 for <ip of second gluster node>:49154
[2018-09-02 17:03:23.786971] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]
[2018-09-02 17:33:24.160607] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x1dc8 sent = 2018-09-02 17:03:23.787120. timeout = 1800 for <ip of third gluster node>:49154
[2018-09-02 17:33:24.160720] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]
[2018-09-02 18:03:24.505092] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x2faf sent = 2018-09-02 17:33:24.173153. timeout = 1800 for <ip of second gluster node>:49154
[2018-09-02 18:03:24.505185] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]
[2018-09-02 18:33:24.841248] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x1e45 sent = 2018-09-02 18:03:24.505328. timeout = 1800 for <ip of third gluster node>:49154
[2018-09-02 18:33:24.841311] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]
[2018-09-02 19:03:25.204711] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x3074 sent = 2018-09-02 18:33:24.855372. timeout = 1800 for <ip of second gluster node>:49154
[2018-09-02 19:03:25.204784] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]
[2018-09-02 19:33:25.533545] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x1ec2 sent = 2018-09-02 19:03:25.204977. timeout = 1800 for <ip of third gluster node>:49154
[2018-09-02 19:33:25.533611] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]
[2018-09-02 20:03:25.877020] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x3138 sent = 2018-09-02 19:33:25.545921. timeout = 1800 for <ip of second gluster node>:49154
[2018-09-02 20:03:25.877098] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]
[2018-09-02 20:33:26.217858] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x1f3e sent = 2018-09-02 20:03:25.877264. timeout = 1800 for <ip of third gluster node>:49154
[2018-09-02 20:33:26.217973] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]
[2018-09-02 21:03:26.588237] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x31ff sent = 2018-09-02 20:33:26.233010. timeout = 1800 for <ip of second gluster node>:49154
[2018-09-02 21:03:26.588316] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]
[2018-09-02 21:33:26.912334] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x1fbb sent = 2018-09-02 21:03:26.588456. timeout = 1800 for <ip of third gluster node>:49154
[2018-09-02 21:33:26.912449] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]
[2018-09-02 22:03:37.258915] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x32c5 sent = 2018-09-02 21:33:32.091009. timeout = 1800 for <ip of second gluster node>:49154
[2018-09-02 22:03:37.259000] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]
[2018-09-02 22:33:37.615497] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x2039 sent = 2018-09-02 22:03:37.259147. timeout = 1800 for <ip of third gluster node>:49154
[2018-09-02 22:33:37.615574] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]
[2018-09-02 23:03:37.940969] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x3386 sent = 2018-09-02 22:33:37.629655. timeout = 1800 for <ip of second gluster node>:49154
[2018-09-02 23:03:37.941049] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]
[2018-09-02 23:33:38.270998] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x20b5 sent = 2018-09-02 23:03:37.941199. timeout = 1800 for <ip of third gluster node>:49154
[2018-09-02 23:33:38.271078] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]
[2018-09-03 00:03:38.607186] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x3447 sent = 2018-09-02 23:33:38.285899. timeout = 1800 for <ip of second gluster node>:49154
[2018-09-03 00:03:38.607263] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]
[2018-09-03 00:33:38.934385] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x2131 sent = 2018-09-03 00:03:38.607410. timeout = 1800 for <ip of third gluster node>:49154
[2018-09-03 00:33:38.934479] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]
[2018-09-03 01:03:39.256842] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-1: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x350c sent = 2018-09-03 00:33:38.948570. timeout = 1800 for <ip of second gluster node>:49154
[2018-09-03 01:03:39.256972] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-1: remote operation failed [Transport endpoint is not connected]
[2018-09-03 01:33:39.614402] E [rpc-clnt.c:184:call_bail] 0-staging_static-client-2: bailing out frame type(GlusterFS 4.x v1) op(INODELK(29)) xid = 0x21ae sent = 2018-09-03 01:03:39.258166. timeout = 1800 for <ip of third gluster node>:49154
[2018-09-03 01:33:39.614483] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-staging_static-client-2: remote operation failed [Transport endpoint is not connected]
On the second gluster server:
We are seeing the following error in the glusterd.log file when the client is blocked from writing the volume, I think this is probably the most important information about the error and suggests a problem with the first node but doesn't explain the client behaviour:
[2018-09-02 08:31:03.902272] E [MSGID: 106115] [glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on <FQDN of the first gluster node>. Please check log file for details.
[2018-09-02 08:31:03.902477] E [MSGID: 106151] [glusterd-syncop.c:1640:gd_unlock_op_phase] 0-management: Failed to unlock on some peer(s)
Note in the above error:
1. I'm not sure which log to check (there doesn't seem to be a management brick / brick log)?
2. If there's a problem with the first node, why isn't it rejected from the gluster / taken offline / the health of the peers or volume list degraded?
3. Why does the client fail to write to the volume rather than (I'm assuming) trying the second (or third I guess) node to write to the volume?
We are also seeing the following errors repeated a lot in the logs, both when the volumes are working and when there's an issue in the brick log (/var/log/glusterfs/bricks/mnt-gluster-storage-staging_static.log):
[2018-09-03 01:58:35.128923] E [server.c:137:server_submit_reply] (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x1fd14) [0x7f8470319d14] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x5f24a) [0x7f846bdde24a] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) [0x7f846bd89fce] ) 0-: Reply submission failed
[2018-09-03 01:58:35.128957] E [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3d60, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 29) to rpc-transport (tcp.staging_static-server)
[2018-09-03 01:58:35.128983] E [server.c:137:server_submit_reply] (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x1fd14) [0x7f8470319d14] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x5f24a) [0x7f846bdde24a] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) [0x7f846bd89fce] ) 0-: Reply submission failed
[2018-09-03 01:58:35.129016] E [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3e2a, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 29) to rpc-transport (tcp.staging_static-server)
[2018-09-03 01:58:35.129042] E [server.c:137:server_submit_reply] (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x1fd14) [0x7f8470319d14] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x5f24a) [0x7f846bdde24a] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) [0x7f846bd89fce] ) 0-: Reply submission failed
[2018-09-03 01:58:35.129077] E [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3ef6, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 29) to rpc-transport (tcp.staging_static-server)
[2018-09-03 01:58:35.129149] E [server.c:137:server_submit_reply] (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x1fd14) [0x7f8470319d14] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x5f24a) [0x7f846bdde24a] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) [0x7f846bd89fce] ) 0-: Reply submission failed
[2018-09-03 01:58:35.129191] E [rpcsvc.c:1378:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3fc6, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 29) to rpc-transport (tcp.staging_static-server)
[2018-09-03 01:58:35.129219] E [server.c:137:server_submit_reply] (-->/usr/lib64/glusterfs/4.1.2/xlator/debug/io-stats.so(+0x1fd14) [0x7f8470319d14] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0x5f24a) [0x7f846bdde24a] -->/usr/lib64/glusterfs/4.1.2/xlator/protocol/server.so(+0xafce) [0x7f846bd89fce] ) 0-: Reply submission failed
Gluster volume information:
# gluster volume info staging_static
Volume Name: staging_static
Type: Replicate
Volume ID: 7f3b8e91-afea-4fc6-be83-3399a089b6f3
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: <first gluster node.fqdn>:/mnt/gluster-storage/staging_static
Brick2: <second gluster node.fqdn>:/mnt/gluster-storage/staging_static
Brick3: <third gluster node.fqdn>:/mnt/gluster-storage/staging_static (arbiter)
Options Reconfigured:
storage.fips-mode-rchecksum: true
cluster.self-heal-window-size: 16
cluster.shd-wait-qlength: 4096
cluster.shd-max-threads: 8
performance.cache-min-file-size: 2KB
performance.rda-cache-limit: 1GB
network.inode-lru-limit: 50000
server.outstanding-rpc-limit: 256
transport.listen-backlog: 2048
performance.write-behind-window-size: 512MB
performance.stat-prefetch: true
performance.io-thread-count: 16
performance.client-io-threads: true
performance.cache-size: 1GB
performance.cache-refresh-timeout: 60
performance.cache-invalidation: true
cluster.use-compound-fops: true
cluster.readdir-optimize: true
cluster.lookup-optimize: true
cluster.favorite-child-policy: size
cluster.eager-lock: true
client.event-threads: 4
nfs.disable: on
transport.address-family: inet
diagnostics.brick-log-level: ERROR
diagnostics.client-log-level: ERROR
features.cache-invalidation-timeout: 300
features.cache-invalidation: true
network.ping-timeout: 15
performance.cache-max-file-size: 3MB
performance.md-cache-timeout: 300
server.event-threads: 4
Thanks in advance,
--
Sam McLeod (protoporpoise on IRC)
https://smcleod.net
https://twitter.com/s_mcleod
Words are my own opinions and do not necessarily represent those of my employer or partners.
6 years, 3 months