ovirt 3.6.6 and gluster 3.7.13

David Gossage

21 Jul 2016 21 Jul '16

11:28 a.m.

Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed. I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts. Saw a lot of messages like these that went away once gluster rollback finished [2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22 [2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) *David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

Attachments:

attachment.html (text/html — 3.0 KB)

Show replies by date

Sandro Bonazzola

21 Jul 21 Jul

1:22 p.m.

On Thu, Jul 21, 2016 at 11:28 AM, David Gossage <dgossage@carouselchecks.com

...

wrote:

...

Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed.

Be sure both gluster server and client have the same version. Adding Sahina.

...

I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts.

Saw a lot of messages like these that went away once gluster rollback finished

[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22 [2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted)

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

-- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com

Frank Rothenstein

1:50 p.m.

Hey Devid, I have the very same problem on my test-cluster, despite on running ovirt 4.0. If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS. Frank Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage:

...

...
...
...
...
Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed.

...
I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts.

...
Saw a lot of messages like these that went away once gluster rollback finished

...
...
[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] 0- glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22 [2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc- fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc- fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0- glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315- 3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc- fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc- fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0- glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315- 3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted)

David Gossage

Carousel Checks Inc. | System Administrator

Office 708.613.2284

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

______________________________________________________________________________ BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten Telefon: 03821-700-0 Fax: 03821-700-240 E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188 Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge- sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf- fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen. Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus ***

Scott

3:18 p.m.

I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following: $ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag all_stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with: [2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) Scott On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < f.rothenstein@bodden-kliniken.de> wrote:

...

Hey Devid,

I have the very same problem on my test-cluster, despite on running ovirt 4.0. If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS.

Frank

Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage:

Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed.

I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts.

Saw a lot of messages like these that went away once gluster rollback finished

[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22 [2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted)

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

------------------------------

______________________________________________________________________________ BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten

Telefon: 03821-700-0 Fax: 03821-700-240

E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188

Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski

Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge-

sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf-

fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen.

Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

David Gossage

5:37 p.m.

I'm creating a test box I can more thoroughly mess with so I can submit to bugzilla something. Since my errors all popped up while trying to get ovirt and gluster functional again rather than thoroughly gather logs and test my data is kinda sketchy. *David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284 On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote:

...

I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following:

$ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag all_stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted

If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with:

[2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted)

Scott

On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < f.rothenstein@bodden-kliniken.de> wrote:

...
Hey Devid,

I have the very same problem on my test-cluster, despite on running ovirt 4.0. If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS.

Frank

Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage:

Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed.

I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts.

Saw a lot of messages like these that went away once gluster rollback finished

[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22 [2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted)

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

------------------------------

______________________________________________________________________________ BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten

Telefon: 03821-700-0 Fax: 03821-700-240

E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188

Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski

Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge-

sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf-

fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen.

Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

David Gossage

6:36 p.m.

What back end storage do you run gluster on? xfs/zfs/ext4 etc? *David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284 On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote:

...

I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following:

$ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag all_stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted

If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with:

[2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted)

Scott

On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < f.rothenstein@bodden-kliniken.de> wrote:

...
Hey Devid,

I have the very same problem on my test-cluster, despite on running ovirt 4.0. If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS.

Frank

Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage:

Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed.

I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts.

Saw a lot of messages like these that went away once gluster rollback finished

[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22 [2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted)

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

------------------------------

______________________________________________________________________________ BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten

Telefon: 03821-700-0 Fax: 03821-700-240

E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188

Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski

Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge-

sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf-

fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen.

Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Scott

6:47 p.m.

Hi David, My backend storage is ZFS. I thought about moving from FUSE to NFS mounts for my Gluster volumes to help test. But since I use hosted engine this would be a real pain. Its difficult to modify the storage domain type/path in the hosted-engine.conf. And I don't want to go through the process of re-deploying hosted engine. Scott On Thu, Jul 21, 2016 at 11:36 AM David Gossage <dgossage@carouselchecks.com> wrote:

...

What back end storage do you run gluster on? xfs/zfs/ext4 etc?

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote:

...
I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following:

$ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag all_stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted

If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with:

[2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted)

Scott

On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < f.rothenstein@bodden-kliniken.de> wrote:

...
Hey Devid,

I have the very same problem on my test-cluster, despite on running ovirt 4.0. If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS.

Frank

Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage:

Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed.

I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts.

Saw a lot of messages like these that went away once gluster rollback finished

[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22 [2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted)

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

------------------------------

______________________________________________________________________________ BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten

Telefon: 03821-700-0 Fax: 03821-700-240

E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188

Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski

Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge-

sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf-

fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen.

Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

David Gossage

7:53 p.m.

On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> wrote:

...

Hi David,

My backend storage is ZFS.

I thought about moving from FUSE to NFS mounts for my Gluster volumes to help test. But since I use hosted engine this would be a real pain. Its difficult to modify the storage domain type/path in the hosted-engine.conf. And I don't want to go through the process of re-deploying hosted engine.

I found this https://bugzilla.redhat.com/show_bug.cgi?id=1347553 Not sure if related. But I also have zfs backend, another user in gluster mailing list had issues and used zfs backend although she used proxmox and got it working by changing disk to writeback cache I think it was. I also use hosted engine, but I run my gluster volume for HE actually on a LVM separate from zfs on xfs and if i recall it did not have the issues my gluster on zfs did. I'm wondering now if the issue was zfs settings. Hopefully should have a test machone up soon I can play around with more. Scott

...

On Thu, Jul 21, 2016 at 11:36 AM David Gossage < dgossage@carouselchecks.com> wrote:

...
What back end storage do you run gluster on? xfs/zfs/ext4 etc?

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote:

...
I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following:

$ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag all_stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted

If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with:

[2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted)

Scott

On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < f.rothenstein@bodden-kliniken.de> wrote:

...
Hey Devid,

I have the very same problem on my test-cluster, despite on running ovirt 4.0. If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS.

Frank

Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage:

Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed.

I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts.

Saw a lot of messages like these that went away once gluster rollback finished

[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22 [2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted)

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

------------------------------

______________________________________________________________________________ BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten

Telefon: 03821-700-0 Fax: 03821-700-240

E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188

Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski

Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge-

sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf-

fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen.

Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Scott

9:17 p.m.

You change the cache mode using a custom property per-VM I believe. I don't know if this would work for the hosted engine. I've already downgraded my system, but once you have the test machine up, perhaps you can try it. The custom property would be: viodiskcache=writethrough or viodiskcache=writeback https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualizat... Scott On Thu, Jul 21, 2016 at 12:53 PM David Gossage <dgossage@carouselchecks.com> wrote:

...

On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> wrote:

...
Hi David,

My backend storage is ZFS.

I thought about moving from FUSE to NFS mounts for my Gluster volumes to help test. But since I use hosted engine this would be a real pain. Its difficult to modify the storage domain type/path in the hosted-engine.conf. And I don't want to go through the process of re-deploying hosted engine.

I found this

https://bugzilla.redhat.com/show_bug.cgi?id=1347553

Not sure if related.

But I also have zfs backend, another user in gluster mailing list had issues and used zfs backend although she used proxmox and got it working by changing disk to writeback cache I think it was.

I also use hosted engine, but I run my gluster volume for HE actually on a LVM separate from zfs on xfs and if i recall it did not have the issues my gluster on zfs did. I'm wondering now if the issue was zfs settings.

Hopefully should have a test machone up soon I can play around with more.

Scott

...
On Thu, Jul 21, 2016 at 11:36 AM David Gossage < dgossage@carouselchecks.com> wrote:

...
What back end storage do you run gluster on? xfs/zfs/ext4 etc?

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote:

...
I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following:

$ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag all_stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted

If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with:

[2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted)

Scott

On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < f.rothenstein@bodden-kliniken.de> wrote:

...
Hey Devid,

I have the very same problem on my test-cluster, despite on running ovirt 4.0. If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS.

Frank

Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage:

Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed.

I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts.

Saw a lot of messages like these that went away once gluster rollback finished

[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22 [2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted)

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

------------------------------

______________________________________________________________________________ BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten

Telefon: 03821-700-0 Fax: 03821-700-240

E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188

Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski

Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge-

sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf-

fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen.

Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Frank Rothenstein

22 Jul 22 Jul

7:46 a.m.

...

...
You change the cache mode using a custom property per-VM I believe. I don't know if this would work for the hosted engine. I've already downgraded my system, but once you have the test machine up, perhaps you can try it. The custom property would be:

viodiskcache=writethrough

or

viodiskcache=writeback

...
...
...
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virt ualization/3.6/html/Virtual_Machine_Management_Guide/appe- Reference_Settings_in_Administration_Portal_and_User_Portal_Windows.h tml#Virtual_Machine_Custom_Properties_settings_explained

Scott

...
On Thu, Jul 21, 2016 at 12:53 PM David Gossage <dgossage@carouselchec ks.com> wrote:

...
...
On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> wrote: Hi David, My backend storage is ZFS.

...
...
...
> > > > > > > > > I thought about moving from FUSE to NFS mounts for my Gluster volumes to help test. But since I use hosted engine this would be a real pain. Its difficult to modify the storage domain type/path in the hosted-engine.conf. And I don't want to go

...

...
...
I found this https://bugzilla.redhat.com/show_bug.cgi?id=1347553

Not sure if related.

...
...
...
...
But I also have zfs backend, another user in gluster mailing list had issues and used zfs backend although she used proxmox and got it working by changing disk to writeback cache I think it was.

...
...
...
...
> > I also use hosted engine, but I run my gluster volume for HE actually on a LVM separate from zfs on xfs and if i recall it did not have the issues my gluster on zfs did. I'm wondering now if

...

...
...
...
Hopefully should have a test machone up soon I can play around with

more.

...
...
Scott

...
...
...
On Thu, Jul 21, 2016 at 11:36 AM David Gossage <dgossage@carousel

checks.com> wrote:

...
...
...
What back end storage do you run gluster on? xfs/zfs/ext4 etc?

David Gossage

Carousel Checks Inc. | System Administrator

Office 708.613.2284

...
...
> > > > > > On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote:

...
...
> > > > > > > > > I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following: $ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): > > > > File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals > > > > > > > > > File "/usr/lib/python2.7/site-

...

...
...
...
...
if not maintenance.set_mode(sys.argv[1]):

...
> > > > > > > > > File "/usr/lib/python2.7/site-

...

...
...
...
...
value=m_global,

...
> > > > > > > > > File "/usr/lib/python2.7/site-

...

...
...
...
...
str(value))

...
> > > > > > > > > File "/usr/lib/python2.7/site-

...

...
...
...
...
all_stats = broker.get_stats_from_storage(service)

...
> > > > > > > > > File "/usr/lib/python2.7/site-

...

...
...
...
...
result = self._checked_communicate(request)

...
> > > > > > > > > File "/usr/lib/python2.7/site-

...

...
...
...
...
.format(message or response))

...
> > > > > > > > > ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not

As per my cluster the problem is on an higher level: you can't activate the domains an FUSE, sanlock can't acquire the lock due to the permission errors visible in brick log. Am Donnerstag, den 21.07.2016, 19:17 +0000 schrieb Scott: through the process of re-deploying hosted engine. the issue was zfs settings. packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate permitted

...

...
...
...
...
...
> > > > > > > > > If I only upgrade one host, then things will continue to work

but my nodes are constantly healing shards. My logs are also flooded with:

...
...
> > > > > > > > > [2016-07-21 13:15:14.137734] W [fuse-

bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4

...
...
> > > > 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) > > > > > > > > > > > > > > > > > > > The message "W [MSGID: 114031] [client-rpc- fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] > > > > > > > > > > > > > > > > > > > The message "W [MSGID: 114031] [client-rpc- fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] > > > > > > > > > > > > > > > > > > > The message "W [MSGID: 114031] [client-rpc- fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] > > > > > > > > > [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc- fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] > > > > > > > > > [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc- fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] > > > > > > > > > > > > > > [2016-07-21 13:15:24.134793] W [fuse- bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) > > > > > > > > > > > > > > [2016-07-21 13:15:34.135413] W [fuse- bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) > > > > > > > > > > > > > > [2016-07-21 13:15:44.141062] W [fuse- bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) > > > > > > > > > [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc- fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] > > > > > > > > > > > > > > [2016-07-21 13:15:54.133629] W [fuse- bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) > > > > > > > > > > > > > > [2016-07-21 13:16:04.133666] W [fuse- bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) > > > > > > > > > > > > > > [2016-07-21 13:16:14.134954] W [fuse- bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted)

Scott

...
> > > > On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein <f.rothenst ein@bodden-kliniken.de> wrote: Hey Devid, > > > > > > I have the very same problem on my test-cluster, despite on running ovirt 4.0. > > > > > > > > > > > > If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS.

Frank

> > > > > > Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts. > > > > > > > > > Saw a lot of messages like these that went away once gluster rollback finished > > > > > > > > > > > > > > > > [2016-07-09 15:27:46.935694] I [fuse- bridge.c:4083:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22 > > > > > > > > > > > > > > > [2016-07-09 15:27:49.555466] W [MSGID: 114031] [client- rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] > > > > > > > > > > > > > > > [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client- rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] > > > > > > > > > > > > > > > > > > > > > > [2016-07-09 15:27:49.556659] W [fuse- bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) > > > > > > > > > > > > > > > [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client- rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] > > > > > > > > > > > > > > > [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client- rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] > > > > > > > > > > > > > > > > > > > > > > [2016-07-09 15:27:59.613781] W [fuse- bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) > > > > David Gossage > > > Carousel Checks Inc. | System Administrator > > > > > Office 708.613.2284 > > > > > > > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users

> > > > > > ___________________________________________________________

...

...
...
...
...
...
BODDEN-KLINIKEN Ribnitz-Damgarten GmbH

Sandhufe 2

18311 Ribnitz-Damgarten

Telefon: 03821-700-0

Fax:       03821-700-240

> > > > > > E-Mail: info@bodden-kliniken.de   Internet: http://www.bodd

en-kliniken.de

...
> > > > > > > > > > > > Sitz: Ribnitz-

Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer- Nr.: 079/133/40188

...
> > > > > > Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer:

Dr. Falko Milski

...
> > > > > > > > > > > > Der Inhalt dieser E-

Mail ist ausschließlich für den bezeichneten Adressaten bes timmt. Wenn Sie nicht der vorge-

...
> > > > > > > > > > > > sehene Adressat dieser E-

Mail oder dessen Vertreter sein sollten, beachten Sie bitte , dass jede Form der Veröf-

...
> > > > > > fentlichung, Vervielfältigung oder Weitergabe des Inhalts d

ieser E-Mail unzulässig ist. Wir bitten Sie, sofort den

...
Absender zu informieren und die E-Mail zu löschen.

             Bodden-Kliniken Ribnitz-Damgarten GmbH 2016

> > > > > > *** Virenfrei durch Kerio Mail Server und Sophos Antivirus

...

...
...
...
...
...
_______________________________________________

Users mailing list

Users@ovirt.org

http://lists.ovirt.org/mailman/listinfo/users

Krutika Dhananjay

noon

Hi David, Could you also share the brick logs from the affected volume? They're located at /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log. Also, could you share the volume configuration (output of `gluster volume info <VOL>`) for the affected volume(s) AND at the time you actually saw this issue? -Krutika On Thu, Jul 21, 2016 at 11:23 PM, David Gossage <dgossage@carouselchecks.com

...

wrote:

...

On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> wrote:

...
Hi David,

My backend storage is ZFS.

I thought about moving from FUSE to NFS mounts for my Gluster volumes to help test. But since I use hosted engine this would be a real pain. Its difficult to modify the storage domain type/path in the hosted-engine.conf. And I don't want to go through the process of re-deploying hosted engine.

I found this

https://bugzilla.redhat.com/show_bug.cgi?id=1347553

Not sure if related.

But I also have zfs backend, another user in gluster mailing list had issues and used zfs backend although she used proxmox and got it working by changing disk to writeback cache I think it was.

I also use hosted engine, but I run my gluster volume for HE actually on a LVM separate from zfs on xfs and if i recall it did not have the issues my gluster on zfs did. I'm wondering now if the issue was zfs settings.

Hopefully should have a test machone up soon I can play around with more.

Scott

...
On Thu, Jul 21, 2016 at 11:36 AM David Gossage < dgossage@carouselchecks.com> wrote:

...
What back end storage do you run gluster on? xfs/zfs/ext4 etc?

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote:

...
I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following:

$ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag all_stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted

If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with:

[2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted)

Scott

On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < f.rothenstein@bodden-kliniken.de> wrote:

...
Hey Devid,

I have the very same problem on my test-cluster, despite on running ovirt 4.0. If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS.

Frank

Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage:

Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed.

I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts.

Saw a lot of messages like these that went away once gluster rollback finished

[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22 [2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted)

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

------------------------------

______________________________________________________________________________ BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten

Telefon: 03821-700-0 Fax: 03821-700-240

E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188

Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski

Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge-

sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf-

fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen.

Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

David Gossage

3:47 p.m.

Trimmed out the logs to just about when I was shutting down ovirt servers for updates which was 14:30 UTC 2016-07-09 Pre-update settings were Volume Name: GLUSTER1 Type: Replicate Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 Options Reconfigured: performance.readdir-ahead: on storage.owner-uid: 36 storage.owner-gid: 36 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server server.allow-insecure: on cluster.self-heal-window-size: 1024 cluster.background-self-heal-count: 16 performance.strict-write-ordering: off nfs.disable: on nfs.addr-namelookup: off nfs.enable-ino32: off At the time of updates ccgl3 was offline from bad nic on server but had been so for about a week with no issues in volume Shortly after update I added these settings to enable sharding but did not as of yet have any VM images sharded. features.shard-block-size: 64MB features.shard: on *David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284 On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote:

...

Hi David,

Could you also share the brick logs from the affected volume? They're located at /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log.

Also, could you share the volume configuration (output of `gluster volume info <VOL>`) for the affected volume(s) AND at the time you actually saw this issue?

-Krutika

On Thu, Jul 21, 2016 at 11:23 PM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> wrote:

...
Hi David,

My backend storage is ZFS.

I thought about moving from FUSE to NFS mounts for my Gluster volumes to help test. But since I use hosted engine this would be a real pain. Its difficult to modify the storage domain type/path in the hosted-engine.conf. And I don't want to go through the process of re-deploying hosted engine.

I found this

https://bugzilla.redhat.com/show_bug.cgi?id=1347553

Not sure if related.

But I also have zfs backend, another user in gluster mailing list had issues and used zfs backend although she used proxmox and got it working by changing disk to writeback cache I think it was.

I also use hosted engine, but I run my gluster volume for HE actually on a LVM separate from zfs on xfs and if i recall it did not have the issues my gluster on zfs did. I'm wondering now if the issue was zfs settings.

Hopefully should have a test machone up soon I can play around with more.

Scott

...
On Thu, Jul 21, 2016 at 11:36 AM David Gossage < dgossage@carouselchecks.com> wrote:

...
What back end storage do you run gluster on? xfs/zfs/ext4 etc?

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote:

...
I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following:

$ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag all_stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted

If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with:

[2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted)

Scott

On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < f.rothenstein@bodden-kliniken.de> wrote:

...
Hey Devid,

I have the very same problem on my test-cluster, despite on running ovirt 4.0. If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS.

Frank

Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage:

Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed.

I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts.

Saw a lot of messages like these that went away once gluster rollback finished

[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22 [2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted)

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

_______________________________________________ Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users

------------------------------

______________________________________________________________________________ BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten

Telefon: 03821-700-0 Fax: 03821-700-240

E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188

Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski

Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge-

sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf-

fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen.

Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Krutika Dhananjay

25 Jul 25 Jul

8:01 a.m.

Hi, Thanks for the logs. So I have identified one issue from the logs for which the fix is this: http://review.gluster.org/#/c/14669/. Because of a bug in the code, ENOENT was getting converted to EPERM and being propagated up the stack causing the reads to bail out early with 'Operation not permitted' errors. I still need to find out two things: i) why there was a readv() sent on a non-existent (ENOENT) file (this is important since some of the other users have not faced or reported this issue on gluster-users with 3.7.13) ii) need to see if there's a way to work around this issue. Do you mind sharing the steps needed to be executed to run into this issue? This is so that we can apply our patches, test and ensure they fix the problem. -Krutika On Fri, Jul 22, 2016 at 7:17 PM, David Gossage <dgossage@carouselchecks.com> wrote:

...

Trimmed out the logs to just about when I was shutting down ovirt servers for updates which was 14:30 UTC 2016-07-09

Pre-update settings were

Volume Name: GLUSTER1 Type: Replicate Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 Options Reconfigured: performance.readdir-ahead: on storage.owner-uid: 36 storage.owner-gid: 36 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server server.allow-insecure: on cluster.self-heal-window-size: 1024 cluster.background-self-heal-count: 16 performance.strict-write-ordering: off nfs.disable: on nfs.addr-namelookup: off nfs.enable-ino32: off

At the time of updates ccgl3 was offline from bad nic on server but had been so for about a week with no issues in volume

Shortly after update I added these settings to enable sharding but did not as of yet have any VM images sharded. features.shard-block-size: 64MB features.shard: on

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote:

...
Hi David,

Could you also share the brick logs from the affected volume? They're located at /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log.

Also, could you share the volume configuration (output of `gluster volume info <VOL>`) for the affected volume(s) AND at the time you actually saw this issue?

-Krutika

On Thu, Jul 21, 2016 at 11:23 PM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> wrote:

...
Hi David,

My backend storage is ZFS.

I thought about moving from FUSE to NFS mounts for my Gluster volumes to help test. But since I use hosted engine this would be a real pain. Its difficult to modify the storage domain type/path in the hosted-engine.conf. And I don't want to go through the process of re-deploying hosted engine.

I found this

https://bugzilla.redhat.com/show_bug.cgi?id=1347553

Not sure if related.

But I also have zfs backend, another user in gluster mailing list had issues and used zfs backend although she used proxmox and got it working by changing disk to writeback cache I think it was.

I also use hosted engine, but I run my gluster volume for HE actually on a LVM separate from zfs on xfs and if i recall it did not have the issues my gluster on zfs did. I'm wondering now if the issue was zfs settings.

Hopefully should have a test machone up soon I can play around with more.

Scott

...
On Thu, Jul 21, 2016 at 11:36 AM David Gossage < dgossage@carouselchecks.com> wrote:

...
What back end storage do you run gluster on? xfs/zfs/ext4 etc?

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote:

...
I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following:

$ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag all_stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted

If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with:

[2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted)

Scott

On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < f.rothenstein@bodden-kliniken.de> wrote:

> Hey Devid, > > I have the very same problem on my test-cluster, despite on running > ovirt 4.0. > If you access your volumes via NFS all is fine, problem is FUSE. I > stayed on 3.7.13, but have no solution yet, now I use NFS. > > Frank > > Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage: > > Anyone running one of recent 3.6.x lines and gluster using 3.7.13? > I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but > have been told by users on gluster mail list due to some gluster changes > I'd need to change the disk parameters to use writeback cache. Something > to do with aio support being removed. > > I believe this could be done with custom parameters? But I believe > strage tests are done using dd and would they fail with current settings > then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability > isues where gluster storage would go into down state and always show N/A as > space available/used. Even if hosts saw storage still and VM's were > running on it on all 3 hosts. > > Saw a lot of messages like these that went away once gluster > rollback finished > > [2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel > 7.22 > [2016-07-09 15:27:49.555466] W [MSGID: 114031] > [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote > operation failed [Operation not permitted] > [2016-07-09 15:27:49.556574] W [MSGID: 114031] > [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote > operation failed [Operation not permitted] > [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d > fd=0x7f5224002f68 (Operation not permitted) > [2016-07-09 15:27:59.612477] W [MSGID: 114031] > [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote > operation failed [Operation not permitted] > [2016-07-09 15:27:59.613700] W [MSGID: 114031] > [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote > operation failed [Operation not permitted] > [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d > fd=0x7f5224002f68 (Operation not permitted) > > *David Gossage* > *Carousel Checks Inc. | System Administrator* > *Office* 708.613.2284 > > _______________________________________________ > Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users > > > > ------------------------------ > > > > > ______________________________________________________________________________ > BODDEN-KLINIKEN Ribnitz-Damgarten GmbH > Sandhufe 2 > 18311 Ribnitz-Damgarten > > Telefon: 03821-700-0 > Fax: 03821-700-240 > > E-Mail: info@bodden-kliniken.de Internet: > http://www.bodden-kliniken.de > > > Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188 > > Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski > > > Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge- > > sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf- > > fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den > Absender zu informieren und die E-Mail zu löschen. > > > Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 > *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users >

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

David Gossage

11:34 a.m.

On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote:

...

Hi,

Thanks for the logs. So I have identified one issue from the logs for which the fix is this: http://review.gluster.org/#/c/14669/. Because of a bug in the code, ENOENT was getting converted to EPERM and being propagated up the stack causing the reads to bail out early with 'Operation not permitted' errors. I still need to find out two things: i) why there was a readv() sent on a non-existent (ENOENT) file (this is important since some of the other users have not faced or reported this issue on gluster-users with 3.7.13) ii) need to see if there's a way to work around this issue.

Do you mind sharing the steps needed to be executed to run into this issue? This is so that we can apply our patches, test and ensure they fix the problem.

Well after upgrade of gluster all I did was start ovirt hosts up which launched and started their ha-agent and broker processes. I don't believe I started getting any errors till it mounted GLUSTER1. I had enabled sharding but had no sharded disk images yet. Not sure if the check for shards would have caused that. Unfortunately I can't just update this cluster and try and see what caused it as it has sme VM's users expect to be available in few hours. I can see if I can get my test setup to recreate it. I think I'll need to de-activate data center so I can detach the storage thats on xfs and attach the one thats over zfs with sharding enabled. My test is 3 bricks on same local machine, with 3 different volumes but I think im running into sanlock issue or something as it won't mount more than one volume that was created locally. -Krutika

...

On Fri, Jul 22, 2016 at 7:17 PM, David Gossage < dgossage@carouselchecks.com> wrote:

...
Trimmed out the logs to just about when I was shutting down ovirt servers for updates which was 14:30 UTC 2016-07-09

Pre-update settings were

Volume Name: GLUSTER1 Type: Replicate Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 Options Reconfigured: performance.readdir-ahead: on storage.owner-uid: 36 storage.owner-gid: 36 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server server.allow-insecure: on cluster.self-heal-window-size: 1024 cluster.background-self-heal-count: 16 performance.strict-write-ordering: off nfs.disable: on nfs.addr-namelookup: off nfs.enable-ino32: off

At the time of updates ccgl3 was offline from bad nic on server but had been so for about a week with no issues in volume

Shortly after update I added these settings to enable sharding but did not as of yet have any VM images sharded. features.shard-block-size: 64MB features.shard: on

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote:

...
Hi David,

Could you also share the brick logs from the affected volume? They're located at /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log.

Also, could you share the volume configuration (output of `gluster volume info <VOL>`) for the affected volume(s) AND at the time you actually saw this issue?

-Krutika

On Thu, Jul 21, 2016 at 11:23 PM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> wrote:

...
Hi David,

My backend storage is ZFS.

I thought about moving from FUSE to NFS mounts for my Gluster volumes to help test. But since I use hosted engine this would be a real pain. Its difficult to modify the storage domain type/path in the hosted-engine.conf. And I don't want to go through the process of re-deploying hosted engine.

I found this

https://bugzilla.redhat.com/show_bug.cgi?id=1347553

Not sure if related.

But I also have zfs backend, another user in gluster mailing list had issues and used zfs backend although she used proxmox and got it working by changing disk to writeback cache I think it was.

I also use hosted engine, but I run my gluster volume for HE actually on a LVM separate from zfs on xfs and if i recall it did not have the issues my gluster on zfs did. I'm wondering now if the issue was zfs settings.

Hopefully should have a test machone up soon I can play around with more.

Scott

...
On Thu, Jul 21, 2016 at 11:36 AM David Gossage < dgossage@carouselchecks.com> wrote:

...
What back end storage do you run gluster on? xfs/zfs/ext4 etc?

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote:

> I get similar problems with oVirt 4.0.1 and hosted engine. After > upgrading all my hosts to Gluster 3.7.13 (client and server), I get the > following: > > $ sudo hosted-engine --set-maintenance --mode=none > Traceback (most recent call last): > File "/usr/lib64/python2.7/runpy.py", line 162, in > _run_module_as_main > "__main__", fname, loader, pkg_name) > File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code > exec code in run_globals > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", > line 73, in <module> > if not maintenance.set_mode(sys.argv[1]): > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", > line 61, in set_mode > value=m_global, > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", > line 259, in set_maintenance_mode > str(value)) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", > line 204, in set_global_md_flag > all_stats = broker.get_stats_from_storage(service) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > line 232, in get_stats_from_storage > result = self._checked_communicate(request) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > line 260, in _checked_communicate > .format(message or response)) > ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: > failed to read metadata: [Errno 1] Operation not permitted > > If I only upgrade one host, then things will continue to work but my > nodes are constantly healing shards. My logs are also flooded with: > > [2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 274714: READ => -1 gfid=4 > 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not > permitted) > The message "W [MSGID: 114031] > [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote > operation failed [Operation not permitted]" repeated 6 times between > [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] > The message "W [MSGID: 114031] > [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote > operation failed [Operation not permitted]" repeated 8 times between > [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] > The message "W [MSGID: 114031] > [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote > operation failed [Operation not permitted]" repeated 7 times between > [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] > [2016-07-21 13:15:24.134647] W [MSGID: 114031] > [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote > operation failed [Operation not permitted] > [2016-07-21 13:15:24.134764] W [MSGID: 114031] > [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote > operation failed [Operation not permitted] > [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 274741: READ => -1 > gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not > permitted) > [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 274756: READ => -1 > gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not > permitted) > [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 274818: READ => -1 > gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not > permitted) > [2016-07-21 13:15:54.133582] W [MSGID: 114031] > [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote > operation failed [Operation not permitted] > [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 274853: READ => -1 > gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not > permitted) > [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 274879: READ => -1 > gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not > permitted) > [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 274894: READ => -1 > gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not > permitted) > > Scott > > > On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < > f.rothenstein@bodden-kliniken.de> wrote: > >> Hey Devid, >> >> I have the very same problem on my test-cluster, despite on running >> ovirt 4.0. >> If you access your volumes via NFS all is fine, problem is FUSE. I >> stayed on 3.7.13, but have no solution yet, now I use NFS. >> >> Frank >> >> Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage: >> >> Anyone running one of recent 3.6.x lines and gluster using 3.7.13? >> I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but >> have been told by users on gluster mail list due to some gluster changes >> I'd need to change the disk parameters to use writeback cache. Something >> to do with aio support being removed. >> >> I believe this could be done with custom parameters? But I believe >> strage tests are done using dd and would they fail with current settings >> then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability >> isues where gluster storage would go into down state and always show N/A as >> space available/used. Even if hosts saw storage still and VM's were >> running on it on all 3 hosts. >> >> Saw a lot of messages like these that went away once gluster >> rollback finished >> >> [2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] >> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel >> 7.22 >> [2016-07-09 15:27:49.555466] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote >> operation failed [Operation not permitted] >> [2016-07-09 15:27:49.556574] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote >> operation failed [Operation not permitted] >> [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d >> fd=0x7f5224002f68 (Operation not permitted) >> [2016-07-09 15:27:59.612477] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote >> operation failed [Operation not permitted] >> [2016-07-09 15:27:59.613700] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote >> operation failed [Operation not permitted] >> [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d >> fd=0x7f5224002f68 (Operation not permitted) >> >> *David Gossage* >> *Carousel Checks Inc. | System Administrator* >> *Office* 708.613.2284 >> >> _______________________________________________ >> Users mailing listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users >> >> >> >> ------------------------------ >> >> >> >> >> ______________________________________________________________________________ >> BODDEN-KLINIKEN Ribnitz-Damgarten GmbH >> Sandhufe 2 >> 18311 Ribnitz-Damgarten >> >> Telefon: 03821-700-0 >> Fax: 03821-700-240 >> >> E-Mail: info@bodden-kliniken.de Internet: >> http://www.bodden-kliniken.de >> >> >> Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188 >> >> Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski >> >> >> Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge- >> >> sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf- >> >> fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den >> Absender zu informieren und die E-Mail zu löschen. >> >> >> Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 >> *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> >

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Samuli Heinonen

1:27 p.m.

Hi,

...

On 25 Jul 2016, at 12:34, David Gossage <dgossage@carouselchecks.com> wrote:

On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote: Hi,

Thanks for the logs. So I have identified one issue from the logs for which the fix is this: http://review.gluster.org/#/c/14669/. Because of a bug in the code, ENOENT was getting converted to EPERM and being propagated up the stack causing the reads to bail out early with 'Operation not permitted' errors. I still need to find out two things: i) why there was a readv() sent on a non-existent (ENOENT) file (this is important since some of the other users have not faced or reported this issue on gluster-users with 3.7.13) ii) need to see if there's a way to work around this issue.

Do you mind sharing the steps needed to be executed to run into this issue? This is so that we can apply our patches, test and ensure they fix the problem.

Unfortunately I can’t test this right away nor give exact steps how to test this. This is just a theory but please correct me if you see some mistakes. oVirt uses cache=none settings for VM’s by default which requires direct I/O. oVirt also uses dd with iflag=direct to check that storage has direct I/O enabled. Problems exist with GlusterFS with sharding enabled and bricks running on ZFS on Linux. Everything seems to be fine with GlusterFS 3.7.11 and problems exist at least with version .12 and .13. There has been some posts saying that GlusterFS 3.8.x is also affected. Steps to reproduce: 1. Sharded file is created with GlusterFS 3.7.11. Everything works ok. 2. GlusterFS is upgraded to 3.7.12+ 3. Sharded file cannot be read or written with direct I/O enabled. (Ie. oVirt uses to check storage connection with command "dd if=/rhev/data-center/00000001-0001-0001-0001-0000000002b6/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000”) Please let me know if you need more information. -samuli

...

Well after upgrade of gluster all I did was start ovirt hosts up which launched and started their ha-agent and broker processes. I don't believe I started getting any errors till it mounted GLUSTER1. I had enabled sharding but had no sharded disk images yet. Not sure if the check for shards would have caused that. Unfortunately I can't just update this cluster and try and see what caused it as it has sme VM's users expect to be available in few hours.

I can see if I can get my test setup to recreate it. I think I'll need to de-activate data center so I can detach the storage thats on xfs and attach the one thats over zfs with sharding enabled. My test is 3 bricks on same local machine, with 3 different volumes but I think im running into sanlock issue or something as it won't mount more than one volume that was created locally.

-Krutika

On Fri, Jul 22, 2016 at 7:17 PM, David Gossage <dgossage@carouselchecks.com> wrote: Trimmed out the logs to just about when I was shutting down ovirt servers for updates which was 14:30 UTC 2016-07-09

Pre-update settings were

Volume Name: GLUSTER1 Type: Replicate Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 Options Reconfigured: performance.readdir-ahead: on storage.owner-uid: 36 storage.owner-gid: 36 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server server.allow-insecure: on cluster.self-heal-window-size: 1024 cluster.background-self-heal-count: 16 performance.strict-write-ordering: off nfs.disable: on nfs.addr-namelookup: off nfs.enable-ino32: off

At the time of updates ccgl3 was offline from bad nic on server but had been so for about a week with no issues in volume

Shortly after update I added these settings to enable sharding but did not as of yet have any VM images sharded. features.shard-block-size: 64MB features.shard: on

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284

On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote: Hi David,

Could you also share the brick logs from the affected volume? They're located at /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log.

Also, could you share the volume configuration (output of `gluster volume info <VOL>`) for the affected volume(s) AND at the time you actually saw this issue?

-Krutika

On Thu, Jul 21, 2016 at 11:23 PM, David Gossage <dgossage@carouselchecks.com> wrote: On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> wrote: Hi David,

My backend storage is ZFS.

I thought about moving from FUSE to NFS mounts for my Gluster volumes to help test. But since I use hosted engine this would be a real pain. Its difficult to modify the storage domain type/path in the hosted-engine.conf. And I don't want to go through the process of re-deploying hosted engine.

I found this

https://bugzilla.redhat.com/show_bug.cgi?id=1347553

Not sure if related.

But I also have zfs backend, another user in gluster mailing list had issues and used zfs backend although she used proxmox and got it working by changing disk to writeback cache I think it was.

I also use hosted engine, but I run my gluster volume for HE actually on a LVM separate from zfs on xfs and if i recall it did not have the issues my gluster on zfs did. I'm wondering now if the issue was zfs settings.

Hopefully should have a test machone up soon I can play around with more.

Scott

On Thu, Jul 21, 2016 at 11:36 AM David Gossage <dgossage@carouselchecks.com> wrote: What back end storage do you run gluster on? xfs/zfs/ext4 etc?

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284

On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote: I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following:

$ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag all_stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted

If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with:

[2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted)

Scott

On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein <f.rothenstein@bodden-kliniken.de> wrote: Hey Devid,

I have the very same problem on my test-cluster, despite on running ovirt 4.0. If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS.

Frank

Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage:

...
Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed.

I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts.

Saw a lot of messages like these that went away once gluster rollback finished

[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22 [2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted)

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284 _______________________________________________ Users mailing list

Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

______________________________________________________________________________ BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten

Telefon: 03821-700-0 Fax: 03821-700-240

E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188 Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski

Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge- sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf- fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen.

Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Krutika Dhananjay

4:58 p.m.

OK, could you try the following: i. Set network.remote-dio to off # gluster volume set <VOL> network.remote-dio off ii. Set performance.strict-o-direct to on # gluster volume set <VOL> performance.strict-o-direct on iii. Stop the affected vm(s) and start again and tell me if you notice any improvement? -Krutika On Mon, Jul 25, 2016 at 4:57 PM, Samuli Heinonen <samppah@neutraali.net> wrote:

...

Hi,

...
On 25 Jul 2016, at 12:34, David Gossage <dgossage@carouselchecks.com> wrote:

On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote: Hi,

Thanks for the logs. So I have identified one issue from the logs for which the fix is this: http://review.gluster.org/#/c/14669/. Because of a bug in the code, ENOENT was getting converted to EPERM and being propagated up the stack causing the reads to bail out early with 'Operation not permitted' errors. I still need to find out two things: i) why there was a readv() sent on a non-existent (ENOENT) file (this is important since some of the other users have not faced or reported this issue on gluster-users with 3.7.13) ii) need to see if there's a way to work around this issue.

Do you mind sharing the steps needed to be executed to run into this issue? This is so that we can apply our patches, test and ensure they fix the problem.

Unfortunately I can’t test this right away nor give exact steps how to test this. This is just a theory but please correct me if you see some mistakes.

oVirt uses cache=none settings for VM’s by default which requires direct I/O. oVirt also uses dd with iflag=direct to check that storage has direct I/O enabled. Problems exist with GlusterFS with sharding enabled and bricks running on ZFS on Linux. Everything seems to be fine with GlusterFS 3.7.11 and problems exist at least with version .12 and .13. There has been some posts saying that GlusterFS 3.8.x is also affected.

Steps to reproduce: 1. Sharded file is created with GlusterFS 3.7.11. Everything works ok. 2. GlusterFS is upgraded to 3.7.12+ 3. Sharded file cannot be read or written with direct I/O enabled. (Ie. oVirt uses to check storage connection with command "dd if=/rhev/data-center/00000001-0001-0001-0001-0000000002b6/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000”)

Please let me know if you need more information.

-samuli

...
Well after upgrade of gluster all I did was start ovirt hosts up which launched and started their ha-agent and broker processes. I don't believe I started getting any errors till it mounted GLUSTER1. I had enabled sharding but had no sharded disk images yet. Not sure if the check for shards would have caused that. Unfortunately I can't just update this cluster and try and see what caused it as it has sme VM's users expect to be available in few hours.

I can see if I can get my test setup to recreate it. I think I'll need to de-activate data center so I can detach the storage thats on xfs and attach the one thats over zfs with sharding enabled. My test is 3 bricks on same local machine, with 3 different volumes but I think im running into sanlock issue or something as it won't mount more than one volume that was created locally.

-Krutika

On Fri, Jul 22, 2016 at 7:17 PM, David Gossage < dgossage@carouselchecks.com> wrote: Trimmed out the logs to just about when I was shutting down ovirt servers for updates which was 14:30 UTC 2016-07-09

Pre-update settings were

Volume Name: GLUSTER1 Type: Replicate Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 Options Reconfigured: performance.readdir-ahead: on storage.owner-uid: 36 storage.owner-gid: 36 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server server.allow-insecure: on cluster.self-heal-window-size: 1024 cluster.background-self-heal-count: 16 performance.strict-write-ordering: off nfs.disable: on nfs.addr-namelookup: off nfs.enable-ino32: off

At the time of updates ccgl3 was offline from bad nic on server but had been so for about a week with no issues in volume

Shortly after update I added these settings to enable sharding but did not as of yet have any VM images sharded. features.shard-block-size: 64MB features.shard: on

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284

On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote: Hi David,

Could you also share the brick logs from the affected volume? They're located at /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log.

Also, could you share the volume configuration (output of `gluster volume info <VOL>`) for the affected volume(s) AND at the time you actually saw this issue?

-Krutika

On Thu, Jul 21, 2016 at 11:23 PM, David Gossage < dgossage@carouselchecks.com> wrote: On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> wrote: Hi David,

My backend storage is ZFS.

I thought about moving from FUSE to NFS mounts for my Gluster volumes to help test. But since I use hosted engine this would be a real pain. Its difficult to modify the storage domain type/path in the hosted-engine.conf. And I don't want to go through the process of re-deploying hosted engine.

I found this

https://bugzilla.redhat.com/show_bug.cgi?id=1347553

Not sure if related.

But I also have zfs backend, another user in gluster mailing list had issues and used zfs backend although she used proxmox and got it working by changing disk to writeback cache I think it was.

I also use hosted engine, but I run my gluster volume for HE actually on a LVM separate from zfs on xfs and if i recall it did not have the issues my gluster on zfs did. I'm wondering now if the issue was zfs settings.

Hopefully should have a test machone up soon I can play around with more.

Scott

On Thu, Jul 21, 2016 at 11:36 AM David Gossage < dgossage@carouselchecks.com> wrote: What back end storage do you run gluster on? xfs/zfs/ext4 etc?

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284

On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote: I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following:

$ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag all_stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted

If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with:

[2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted)

Scott

On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < f.rothenstein@bodden-kliniken.de> wrote: Hey Devid,

I have the very same problem on my test-cluster, despite on running ovirt 4.0. If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS.

Frank

...
Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed.

I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings

Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage: then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts.

...
Saw a lot of messages like these that went away once gluster rollback

finished

...
[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init]

0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22

...
[2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted)

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284 _______________________________________________ Users mailing list

Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

______________________________________________________________________________

...
BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten

Telefon: 03821-700-0 Fax: 03821-700-240

E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188 Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski

Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge- sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf- fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen.

Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

David Gossage

8 p.m.

On Mon, Jul 25, 2016 at 9:58 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote:

...

OK, could you try the following:

i. Set network.remote-dio to off # gluster volume set <VOL> network.remote-dio off

ii. Set performance.strict-o-direct to on # gluster volume set <VOL> performance.strict-o-direct on

iii. Stop the affected vm(s) and start again

and tell me if you notice any improvement?

...

-Krutika

On Mon, Jul 25, 2016 at 4:57 PM, Samuli Heinonen <samppah@neutraali.net> wrote:

...
Hi,

...
On 25 Jul 2016, at 12:34, David Gossage <dgossage@carouselchecks.com> wrote:

On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote: Hi,

Thanks for the logs. So I have identified one issue from the logs for which the fix is this: http://review.gluster.org/#/c/14669/. Because of a bug in the code, ENOENT was getting converted to EPERM and being propagated up the stack causing the reads to bail out early with 'Operation not permitted' errors. I still need to find out two things: i) why there was a readv() sent on a non-existent (ENOENT) file (this is important since some of the other users have not faced or reported this issue on gluster-users with 3.7.13) ii) need to see if there's a way to work around this issue.

Do you mind sharing the steps needed to be executed to run into this issue? This is so that we can apply our patches, test and ensure they fix the problem.

Unfortunately I can’t test this right away nor give exact steps how to test this. This is just a theory but please correct me if you see some mistakes.

oVirt uses cache=none settings for VM’s by default which requires direct I/O. oVirt also uses dd with iflag=direct to check that storage has direct I/O enabled. Problems exist with GlusterFS with sharding enabled and bricks running on ZFS on Linux. Everything seems to be fine with GlusterFS 3.7.11 and problems exist at least with version .12 and .13. There has been some posts saying that GlusterFS 3.8.x is also affected.

Steps to reproduce: 1. Sharded file is created with GlusterFS 3.7.11. Everything works ok. 2. GlusterFS is upgraded to 3.7.12+ 3. Sharded file cannot be read or written with direct I/O enabled. (Ie. oVirt uses to check storage connection with command "dd if=/rhev/data-center/00000001-0001-0001-0001-0000000002b6/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000”)

Please let me know if you need more information.

-samuli

...
Well after upgrade of gluster all I did was start ovirt hosts up which launched and started their ha-agent and broker processes. I don't believe I started getting any errors till it mounted GLUSTER1. I had enabled sharding but had no sharded disk images yet. Not sure if the check for shards would have caused that. Unfortunately I can't just update this cluster and try and see what caused it as it has sme VM's users expect to be available in few hours.

I can see if I can get my test setup to recreate it. I think I'll need to de-activate data center so I can detach the storage thats on xfs and attach the one thats over zfs with sharding enabled. My test is 3 bricks on same local machine, with 3 different volumes but I think im running into sanlock issue or something as it won't mount more than one volume that was created locally.

-Krutika

On Fri, Jul 22, 2016 at 7:17 PM, David Gossage < dgossage@carouselchecks.com> wrote: Trimmed out the logs to just about when I was shutting down ovirt servers for updates which was 14:30 UTC 2016-07-09

Pre-update settings were

Volume Name: GLUSTER1 Type: Replicate Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 Options Reconfigured: performance.readdir-ahead: on storage.owner-uid: 36 storage.owner-gid: 36 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server server.allow-insecure: on cluster.self-heal-window-size: 1024 cluster.background-self-heal-count: 16 performance.strict-write-ordering: off nfs.disable: on nfs.addr-namelookup: off nfs.enable-ino32: off

At the time of updates ccgl3 was offline from bad nic on server but had been so for about a week with no issues in volume

Shortly after update I added these settings to enable sharding but did not as of yet have any VM images sharded. features.shard-block-size: 64MB features.shard: on

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284

On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote: Hi David,

Could you also share the brick logs from the affected volume? They're located at /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log.

Also, could you share the volume configuration (output of `gluster volume info <VOL>`) for the affected volume(s) AND at the time you actually saw this issue?

-Krutika

On Thu, Jul 21, 2016 at 11:23 PM, David Gossage < dgossage@carouselchecks.com> wrote: On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> wrote: Hi David,

My backend storage is ZFS.

I thought about moving from FUSE to NFS mounts for my Gluster volumes to help test. But since I use hosted engine this would be a real pain. Its difficult to modify the storage domain type/path in the hosted-engine.conf. And I don't want to go through the process of re-deploying hosted engine.

I found this

https://bugzilla.redhat.com/show_bug.cgi?id=1347553

Not sure if related.

But I also have zfs backend, another user in gluster mailing list had issues and used zfs backend although she used proxmox and got it working by changing disk to writeback cache I think it was.

I also use hosted engine, but I run my gluster volume for HE actually on a LVM separate from zfs on xfs and if i recall it did not have the issues my gluster on zfs did. I'm wondering now if the issue was zfs settings.

Hopefully should have a test machone up soon I can play around with more.

Scott

On Thu, Jul 21, 2016 at 11:36 AM David Gossage < dgossage@carouselchecks.com> wrote: What back end storage do you run gluster on? xfs/zfs/ext4 etc?

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284

On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote: I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following:

$ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag all_stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted

If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with:

[2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted)

Scott

On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < f.rothenstein@bodden-kliniken.de> wrote: Hey Devid,

I have the very same problem on my test-cluster, despite on running ovirt 4.0. If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS.

Frank

...
Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed.

I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings

Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage: then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts.

...
Saw a lot of messages like these that went away once gluster rollback

finished

...
[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init]

0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22

...
[2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted)

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284 _______________________________________________ Users mailing list

Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

______________________________________________________________________________

...
BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten

Telefon: 03821-700-0 Fax: 03821-700-240

E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188 Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski

Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge- sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf- fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen.

Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

David Gossage

8:07 p.m.

On Mon, Jul 25, 2016 at 1:00 PM, David Gossage <dgossage@carouselchecks.com> wrote:

...

On Mon, Jul 25, 2016 at 9:58 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote:

...
OK, could you try the following:

i. Set network.remote-dio to off # gluster volume set <VOL> network.remote-dio off

ii. Set performance.strict-o-direct to on # gluster volume set <VOL> performance.strict-o-direct on

iii. Stop the affected vm(s) and start again

and tell me if you notice any improvement?

Previous instll I had issue with is still on gluster 3.7.11

My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a locak disk right now isn't allowing me to add the gluster storage at all.

Keep getting some type of UI error

2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785 2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Uncaught exception: : java.lang.ClassCastException at Unknown.ps( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.ts( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.vs( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.iJf( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Xab( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.P8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.jQr( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.A8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.u8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Eap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.p8n( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Cao( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Bap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.kRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.nRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.eVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.hVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.MTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.PTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.KJe( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Izk( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.P3( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.g4( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown._t( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.du( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8... )

If I add from storage tab it creates storage domaibn but won't attach to a datacenter Error while executing action Attach Storage Domain: AcquireHostIdFailure engine.log 2016-07-25 13:04:45,186 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-90) [4e0e7cbd] Failed in 'CreateStoragePoolVDS' method 2016-07-25 13:04:45,211 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-90) [4e0e7cbd] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM local command failed: Cannot acquire host id: (u'5b8a4477-4d87-43a1-aa52-b664b1bd9e08', SanlockException(1, 'Sanlock lockspace add failure', 'Operation not permitted')) 2016-07-25 13:04:45,211 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-90) [4e0e7cbd] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=661, message=Cannot acquire host id: (u'5b8a4477-4d87-43a1-aa52-b664b1bd9e08', SanlockException(1, 'Sanlock lockspace add failure', 'Operation not permitted'))]]' 2016-07-25 13:04:45,211 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-90) [4e0e7cbd] HostName = local 2016-07-25 13:04:45,212 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-90) [4e0e7cbd] Command 'CreateStoragePoolVDSCommand(HostName = local, CreateStoragePoolVDSCommandParameters:{runAsync='true', hostId='b4d03420-3de8-45b8-a671-45bbe7c05e06', storagePoolId='7fe4f6ec-71aa-485b-8bba-958e493b66eb', storagePoolName='NewDefault', masterDomainId='5b8a4477-4d87-43a1-aa52-b664b1bd9e08', domainsIdList='[5b8a4477-4d87-43a1-aa52-b664b1bd9e08]', masterVersion='4'})' execution failed: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: (u'5b8a4477-4d87-43a1-aa52-b664b1bd9e08', SanlockException(1, 'Sanlock lockspace add failure', 'Operation not permitted')), code = 661 2016-07-25 13:04:45,212 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-90) [4e0e7cbd] FINISH, CreateStoragePoolVDSCommand, log id: 2ed8b2b6 2016-07-25 13:04:45,212 ERROR [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (default task-90) [4e0e7cbd] Command 'org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: (u'5b8a4477-4d87-43a1-aa52-b664b1bd9e08', SanlockException(1, 'Sanlock lockspace add failure', 'Operation not permitted')), code = 661 (Failed with error AcquireHostIdFailure and code 661) 2016-07-25 13:04:45,220 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-90) [4e0e7cbd] Correlation ID: 4f77f0e0, Job ID: 6aae65f2-ff61-4bec-a513-18b31828442b, Call Stack: null, Custom Event ID: -1, Message: Failed to attach Storage Domains to Data Center NewDefault. (User: admin@internal) 2016-07-25 13:04:45,228 INFO [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (default task-90) [4e0e7cbd] Lock freed to object 'EngineLock:{exclusiveLocks='[5b8a4477-4d87-43a1-aa52-b664b1bd9e08=<STORAGE, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2016-07-25 13:04:45,229 INFO [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (default task-90) [4e0e7cbd] Command [id=d08f24d6-f0f9-4df8-aa34-3718ab44f454]: Compensating DELETED_OR_UPDATED_ENTITY of org.ovirt.engine.core.common.businessentities.StoragePool; snapshot: id=7fe4f6ec-71aa-485b-8bba-958e493b66eb. 2016-07-25 13:04:45,231 INFO [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (default task-90) [4e0e7cbd] Command [id=d08f24d6-f0f9-4df8-aa34-3718ab44f454]: Compensating NEW_ENTITY_ID of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: StoragePoolIsoMapId:{storagePoolId='7fe4f6ec-71aa-485b-8bba-958e493b66eb', storageId='5b8a4477-4d87-43a1-aa52-b664b1bd9e08'}. 2016-07-25 13:04:45,231 INFO [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (default task-90) [4e0e7cbd] Command [id=d08f24d6-f0f9-4df8-aa34-3718ab44f454]: Compensating DELETED_OR_UPDATED_ENTITY of org.ovirt.engine.core.common.businessentities.StorageDomainStatic; snapshot: id=5b8a4477-4d87-43a1-aa52-b664b1bd9e08. 2016-07-25 13:04:45,245 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-90) [4e0e7cbd] Correlation ID: 6cae9150, Job ID: 6aae65f2-ff61-4bec-a513-18b31828442b, Call Stack: null, Custom Event ID: -1, Message: Failed to attach Storage Domain newone to Data Center NewDefault. (User: admin@internal) 2016-07-25 13:04:45,253 WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (default task-90) [4e0e7cbd] Trying to release exclusive lock which does not exist, lock key: '5b8a4477-4d87-43a1-aa52-b664b1bd9e08STORAGE' 2016-07-25 13:04:45,253 INFO [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (default task-90) [4e0e7cbd] Lock freed to object 'EngineLock:{exclusiveLocks='[5b8a4477-4d87-43a1-aa52-b664b1bd9e08=<STORAGE, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'

...

-Krutika

...
On Mon, Jul 25, 2016 at 4:57 PM, Samuli Heinonen <samppah@neutraali.net> wrote:

...
Hi,

...
On 25 Jul 2016, at 12:34, David Gossage <dgossage@carouselchecks.com> wrote:

On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay < kdhananj@redhat.com> wrote: Hi,

Thanks for the logs. So I have identified one issue from the logs for which the fix is this: http://review.gluster.org/#/c/14669/. Because of a bug in the code, ENOENT was getting converted to EPERM and being propagated up the stack causing the reads to bail out early with 'Operation not permitted' errors. I still need to find out two things: i) why there was a readv() sent on a non-existent (ENOENT) file (this is important since some of the other users have not faced or reported this issue on gluster-users with 3.7.13) ii) need to see if there's a way to work around this issue.

Do you mind sharing the steps needed to be executed to run into this issue? This is so that we can apply our patches, test and ensure they fix the problem.

Unfortunately I can’t test this right away nor give exact steps how to test this. This is just a theory but please correct me if you see some mistakes.

oVirt uses cache=none settings for VM’s by default which requires direct I/O. oVirt also uses dd with iflag=direct to check that storage has direct I/O enabled. Problems exist with GlusterFS with sharding enabled and bricks running on ZFS on Linux. Everything seems to be fine with GlusterFS 3.7.11 and problems exist at least with version .12 and .13. There has been some posts saying that GlusterFS 3.8.x is also affected.

Steps to reproduce: 1. Sharded file is created with GlusterFS 3.7.11. Everything works ok. 2. GlusterFS is upgraded to 3.7.12+ 3. Sharded file cannot be read or written with direct I/O enabled. (Ie. oVirt uses to check storage connection with command "dd if=/rhev/data-center/00000001-0001-0001-0001-0000000002b6/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000”)

Please let me know if you need more information.

-samuli

...
Well after upgrade of gluster all I did was start ovirt hosts up which launched and started their ha-agent and broker processes. I don't believe I started getting any errors till it mounted GLUSTER1. I had enabled sharding but had no sharded disk images yet. Not sure if the check for shards would have caused that. Unfortunately I can't just update this cluster and try and see what caused it as it has sme VM's users expect to be available in few hours.

I can see if I can get my test setup to recreate it. I think I'll need to de-activate data center so I can detach the storage thats on xfs and attach the one thats over zfs with sharding enabled. My test is 3 bricks on same local machine, with 3 different volumes but I think im running into sanlock issue or something as it won't mount more than one volume that was created locally.

-Krutika

On Fri, Jul 22, 2016 at 7:17 PM, David Gossage < dgossage@carouselchecks.com> wrote: Trimmed out the logs to just about when I was shutting down ovirt servers for updates which was 14:30 UTC 2016-07-09

Pre-update settings were

Volume Name: GLUSTER1 Type: Replicate Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 Options Reconfigured: performance.readdir-ahead: on storage.owner-uid: 36 storage.owner-gid: 36 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server server.allow-insecure: on cluster.self-heal-window-size: 1024 cluster.background-self-heal-count: 16 performance.strict-write-ordering: off nfs.disable: on nfs.addr-namelookup: off nfs.enable-ino32: off

At the time of updates ccgl3 was offline from bad nic on server but had been so for about a week with no issues in volume

Shortly after update I added these settings to enable sharding but did not as of yet have any VM images sharded. features.shard-block-size: 64MB features.shard: on

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284

On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay < kdhananj@redhat.com> wrote: Hi David,

Could you also share the brick logs from the affected volume? They're located at /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log.

Also, could you share the volume configuration (output of `gluster volume info <VOL>`) for the affected volume(s) AND at the time you actually saw this issue?

-Krutika

On Thu, Jul 21, 2016 at 11:23 PM, David Gossage < dgossage@carouselchecks.com> wrote: On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> wrote: Hi David,

My backend storage is ZFS.

I thought about moving from FUSE to NFS mounts for my Gluster volumes to help test. But since I use hosted engine this would be a real pain. Its difficult to modify the storage domain type/path in the hosted-engine.conf. And I don't want to go through the process of re-deploying hosted engine.

I found this

https://bugzilla.redhat.com/show_bug.cgi?id=1347553

Not sure if related.

But I also have zfs backend, another user in gluster mailing list had issues and used zfs backend although she used proxmox and got it working by changing disk to writeback cache I think it was.

I also use hosted engine, but I run my gluster volume for HE actually on a LVM separate from zfs on xfs and if i recall it did not have the issues my gluster on zfs did. I'm wondering now if the issue was zfs settings.

Hopefully should have a test machone up soon I can play around with more.

Scott

On Thu, Jul 21, 2016 at 11:36 AM David Gossage < dgossage@carouselchecks.com> wrote: What back end storage do you run gluster on? xfs/zfs/ext4 etc?

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284

On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote: I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following:

$ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag all_stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted

If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with:

[2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted)

Scott

On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < f.rothenstein@bodden-kliniken.de> wrote: Hey Devid,

I have the very same problem on my test-cluster, despite on running ovirt 4.0. If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS.

Frank

...
Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed.

I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings

Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage: then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts.

...
Saw a lot of messages like these that went away once gluster rollback

finished

...
[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init]

0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22

...
[2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted)

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284 _______________________________________________ Users mailing list

Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

______________________________________________________________________________

...
BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten

Telefon: 03821-700-0 Fax: 03821-700-240

E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188 Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski

Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge- sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf- fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen.

Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

David Gossage

8:23 p.m.

On Mon, Jul 25, 2016 at 1:07 PM, David Gossage <dgossage@carouselchecks.com> wrote:

...

On Mon, Jul 25, 2016 at 1:00 PM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Mon, Jul 25, 2016 at 9:58 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote:

...
OK, could you try the following:

i. Set network.remote-dio to off # gluster volume set <VOL> network.remote-dio off

ii. Set performance.strict-o-direct to on # gluster volume set <VOL> performance.strict-o-direct on

iii. Stop the affected vm(s) and start again

and tell me if you notice any improvement?

Not sure if helpful but over the gluster mount it creates even though it won't attech to data center I get this error from bricks log running following dd if=/dev/zero of=/rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test oflag=direct count=100 bs=1M dd: error writing ‘/rhev/data-center/mnt/glusterSD/192.168.71.10:_glustershard/5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test’: Invalid argument dd: closing output file ‘/rhev/data-center/mnt/glusterSD/192.168.71.10:_glustershard/5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test’: Invalid argument [2016-07-25 18:20:19.393121] E [MSGID: 113039] [posix.c:2939:posix_open] 0-glustershard-posix: open on /gluster2/brick1/1/.glusterfs/02/f4/02f4783b-2799-46d9-b787-53e4ccd9a052, flags: 16385 [Invalid argument] [2016-07-25 18:20:19.393204] E [MSGID: 115070] [server-rpc-fops.c:1568:server_open_cbk] 0-glustershard-server: 120: OPEN /5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test (02f4783b-2799-46d9-b787-53e4ccd9a052) ==> (Invalid argument) [Invalid argument] and /var/log/glusterfs/rhev-data-center-mnt-glusterSD-192.168.71.10\:_glustershard.log [2016-07-25 18:20:19.393275] E [MSGID: 114031] [client-rpc-fops.c:466:client3_3_open_cbk] 0-glustershard-client-0: remote operation failed. Path: /5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test (02f4783b-2799-46d9-b787-53e4ccd9a052) [Invalid argument] [2016-07-25 18:20:19.393270] E [MSGID: 114031] [client-rpc-fops.c:466:client3_3_open_cbk] 0-glustershard-client-1: remote operation failed. Path: /5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test (02f4783b-2799-46d9-b787-53e4ccd9a052) [Invalid argument] [2016-07-25 18:20:19.393317] E [MSGID: 114031] [client-rpc-fops.c:466:client3_3_open_cbk] 0-glustershard-client-2: remote operation failed. Path: /5b8a4477-4d87-43a1-aa52-b664b1bd9e08/images/test (02f4783b-2799-46d9-b787-53e4ccd9a052) [Invalid argument] [2016-07-25 18:20:19.393357] W [fuse-bridge.c:2311:fuse_writev_cbk] 0-glusterfs-fuse: 117: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid argument) [2016-07-25 18:20:19.393389] W [fuse-bridge.c:2311:fuse_writev_cbk] 0-glusterfs-fuse: 118: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid argument) [2016-07-25 18:20:19.393611] W [fuse-bridge.c:2311:fuse_writev_cbk] 0-glusterfs-fuse: 119: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid argument) [2016-07-25 18:20:19.393708] W [fuse-bridge.c:2311:fuse_writev_cbk] 0-glusterfs-fuse: 120: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid argument) [2016-07-25 18:20:19.393771] W [fuse-bridge.c:2311:fuse_writev_cbk] 0-glusterfs-fuse: 121: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid argument) [2016-07-25 18:20:19.393840] W [fuse-bridge.c:2311:fuse_writev_cbk] 0-glusterfs-fuse: 122: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid argument) [2016-07-25 18:20:19.393914] W [fuse-bridge.c:2311:fuse_writev_cbk] 0-glusterfs-fuse: 123: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid argument) [2016-07-25 18:20:19.393982] W [fuse-bridge.c:2311:fuse_writev_cbk] 0-glusterfs-fuse: 124: WRITE => -1 gfid=02f4783b-2799-46d9-b787-53e4ccd9a052 fd=0x7f5fec0ba08c (Invalid argument) [2016-07-25 18:20:19.394045] W [fuse-bridge.c:709:fuse_truncate_cbk] 0-glusterfs-fuse: 125: FTRUNCATE() ERR => -1 (Invalid argument) [2016-07-25 18:20:19.394338] W [fuse-bridge.c:1290:fuse_err_cbk] 0-glusterfs-fuse: 126: FLUSH() ERR => -1 (Invalid argument)

...

...
...
Previous instll I had issue with is still on gluster 3.7.11

My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a locak disk right now isn't allowing me to add the gluster storage at all.

Keep getting some type of UI error

2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785 2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Uncaught exception: : java.lang.ClassCastException at Unknown.ps( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.ts( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.vs( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.iJf( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Xab( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.P8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.jQr( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.A8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.u8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Eap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.p8n( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Cao( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Bap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.kRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.nRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.eVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.hVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.MTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.PTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.KJe( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Izk( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.P3( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.g4( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown._t( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.du( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8... )

If I add from storage tab it creates storage domaibn but won't attach to a datacenter

Error while executing action Attach Storage Domain: AcquireHostIdFailure engine.log 2016-07-25 13:04:45,186 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-90) [4e0e7cbd] Failed in 'CreateStoragePoolVDS' method 2016-07-25 13:04:45,211 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-90) [4e0e7cbd] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM local command failed: Cannot acquire host id: (u'5b8a4477-4d87-43a1-aa52-b664b1bd9e08', SanlockException(1, 'Sanlock lockspace add failure', 'Operation not permitted')) 2016-07-25 13:04:45,211 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-90) [4e0e7cbd] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand' return value 'StatusOnlyReturnForXmlRpc [status=StatusForXmlRpc [code=661, message=Cannot acquire host id: (u'5b8a4477-4d87-43a1-aa52-b664b1bd9e08', SanlockException(1, 'Sanlock lockspace add failure', 'Operation not permitted'))]]' 2016-07-25 13:04:45,211 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-90) [4e0e7cbd] HostName = local 2016-07-25 13:04:45,212 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-90) [4e0e7cbd] Command 'CreateStoragePoolVDSCommand(HostName = local, CreateStoragePoolVDSCommandParameters:{runAsync='true', hostId='b4d03420-3de8-45b8-a671-45bbe7c05e06', storagePoolId='7fe4f6ec-71aa-485b-8bba-958e493b66eb', storagePoolName='NewDefault', masterDomainId='5b8a4477-4d87-43a1-aa52-b664b1bd9e08', domainsIdList='[5b8a4477-4d87-43a1-aa52-b664b1bd9e08]', masterVersion='4'})' execution failed: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: (u'5b8a4477-4d87-43a1-aa52-b664b1bd9e08', SanlockException(1, 'Sanlock lockspace add failure', 'Operation not permitted')), code = 661 2016-07-25 13:04:45,212 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (default task-90) [4e0e7cbd] FINISH, CreateStoragePoolVDSCommand, log id: 2ed8b2b6 2016-07-25 13:04:45,212 ERROR [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (default task-90) [4e0e7cbd] Command 'org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: (u'5b8a4477-4d87-43a1-aa52-b664b1bd9e08', SanlockException(1, 'Sanlock lockspace add failure', 'Operation not permitted')), code = 661 (Failed with error AcquireHostIdFailure and code 661) 2016-07-25 13:04:45,220 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-90) [4e0e7cbd] Correlation ID: 4f77f0e0, Job ID: 6aae65f2-ff61-4bec-a513-18b31828442b, Call Stack: null, Custom Event ID: -1, Message: Failed to attach Storage Domains to Data Center NewDefault. (User: admin@internal) 2016-07-25 13:04:45,228 INFO [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (default task-90) [4e0e7cbd] Lock freed to object 'EngineLock:{exclusiveLocks='[5b8a4477-4d87-43a1-aa52-b664b1bd9e08=<STORAGE, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}' 2016-07-25 13:04:45,229 INFO [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (default task-90) [4e0e7cbd] Command [id=d08f24d6-f0f9-4df8-aa34-3718ab44f454]: Compensating DELETED_OR_UPDATED_ENTITY of org.ovirt.engine.core.common.businessentities.StoragePool; snapshot: id=7fe4f6ec-71aa-485b-8bba-958e493b66eb. 2016-07-25 13:04:45,231 INFO [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (default task-90) [4e0e7cbd] Command [id=d08f24d6-f0f9-4df8-aa34-3718ab44f454]: Compensating NEW_ENTITY_ID of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: StoragePoolIsoMapId:{storagePoolId='7fe4f6ec-71aa-485b-8bba-958e493b66eb', storageId='5b8a4477-4d87-43a1-aa52-b664b1bd9e08'}. 2016-07-25 13:04:45,231 INFO [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (default task-90) [4e0e7cbd] Command [id=d08f24d6-f0f9-4df8-aa34-3718ab44f454]: Compensating DELETED_OR_UPDATED_ENTITY of org.ovirt.engine.core.common.businessentities.StorageDomainStatic; snapshot: id=5b8a4477-4d87-43a1-aa52-b664b1bd9e08. 2016-07-25 13:04:45,245 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-90) [4e0e7cbd] Correlation ID: 6cae9150, Job ID: 6aae65f2-ff61-4bec-a513-18b31828442b, Call Stack: null, Custom Event ID: -1, Message: Failed to attach Storage Domain newone to Data Center NewDefault. (User: admin@internal) 2016-07-25 13:04:45,253 WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (default task-90) [4e0e7cbd] Trying to release exclusive lock which does not exist, lock key: '5b8a4477-4d87-43a1-aa52-b664b1bd9e08STORAGE' 2016-07-25 13:04:45,253 INFO [org.ovirt.engine.core.bll.storage.AttachStorageDomainToPoolCommand] (default task-90) [4e0e7cbd] Lock freed to object 'EngineLock:{exclusiveLocks='[5b8a4477-4d87-43a1-aa52-b664b1bd9e08=<STORAGE, ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'

...
-Krutika

...
On Mon, Jul 25, 2016 at 4:57 PM, Samuli Heinonen <samppah@neutraali.net> wrote:

...
Hi,

...
On 25 Jul 2016, at 12:34, David Gossage <dgossage@carouselchecks.com> wrote:

On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay < kdhananj@redhat.com> wrote: Hi,

Thanks for the logs. So I have identified one issue from the logs for which the fix is this: http://review.gluster.org/#/c/14669/. Because of a bug in the code, ENOENT was getting converted to EPERM and being propagated up the stack causing the reads to bail out early with 'Operation not permitted' errors. I still need to find out two things: i) why there was a readv() sent on a non-existent (ENOENT) file (this is important since some of the other users have not faced or reported this issue on gluster-users with 3.7.13) ii) need to see if there's a way to work around this issue.

Do you mind sharing the steps needed to be executed to run into this issue? This is so that we can apply our patches, test and ensure they fix the problem.

Unfortunately I can’t test this right away nor give exact steps how to test this. This is just a theory but please correct me if you see some mistakes.

oVirt uses cache=none settings for VM’s by default which requires direct I/O. oVirt also uses dd with iflag=direct to check that storage has direct I/O enabled. Problems exist with GlusterFS with sharding enabled and bricks running on ZFS on Linux. Everything seems to be fine with GlusterFS 3.7.11 and problems exist at least with version .12 and .13. There has been some posts saying that GlusterFS 3.8.x is also affected.

Steps to reproduce: 1. Sharded file is created with GlusterFS 3.7.11. Everything works ok. 2. GlusterFS is upgraded to 3.7.12+ 3. Sharded file cannot be read or written with direct I/O enabled. (Ie. oVirt uses to check storage connection with command "dd if=/rhev/data-center/00000001-0001-0001-0001-0000000002b6/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000”)

Please let me know if you need more information.

-samuli

...
Well after upgrade of gluster all I did was start ovirt hosts up which launched and started their ha-agent and broker processes. I don't believe I started getting any errors till it mounted GLUSTER1. I had enabled sharding but had no sharded disk images yet. Not sure if the check for shards would have caused that. Unfortunately I can't just update this cluster and try and see what caused it as it has sme VM's users expect to be available in few hours.

I can see if I can get my test setup to recreate it. I think I'll need to de-activate data center so I can detach the storage thats on xfs and attach the one thats over zfs with sharding enabled. My test is 3 bricks on same local machine, with 3 different volumes but I think im running into sanlock issue or something as it won't mount more than one volume that was created locally.

-Krutika

On Fri, Jul 22, 2016 at 7:17 PM, David Gossage < dgossage@carouselchecks.com> wrote: Trimmed out the logs to just about when I was shutting down ovirt servers for updates which was 14:30 UTC 2016-07-09

Pre-update settings were

Volume Name: GLUSTER1 Type: Replicate Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 Options Reconfigured: performance.readdir-ahead: on storage.owner-uid: 36 storage.owner-gid: 36 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server server.allow-insecure: on cluster.self-heal-window-size: 1024 cluster.background-self-heal-count: 16 performance.strict-write-ordering: off nfs.disable: on nfs.addr-namelookup: off nfs.enable-ino32: off

At the time of updates ccgl3 was offline from bad nic on server but had been so for about a week with no issues in volume

Shortly after update I added these settings to enable sharding but did not as of yet have any VM images sharded. features.shard-block-size: 64MB features.shard: on

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284

On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay < kdhananj@redhat.com> wrote: Hi David,

Could you also share the brick logs from the affected volume? They're located at /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log.

Also, could you share the volume configuration (output of `gluster volume info <VOL>`) for the affected volume(s) AND at the time you actually saw this issue?

-Krutika

On Thu, Jul 21, 2016 at 11:23 PM, David Gossage < dgossage@carouselchecks.com> wrote: On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> wrote: Hi David,

My backend storage is ZFS.

I thought about moving from FUSE to NFS mounts for my Gluster volumes to help test. But since I use hosted engine this would be a real pain. Its difficult to modify the storage domain type/path in the hosted-engine.conf. And I don't want to go through the process of re-deploying hosted engine.

I found this

https://bugzilla.redhat.com/show_bug.cgi?id=1347553

Not sure if related.

But I also have zfs backend, another user in gluster mailing list had issues and used zfs backend although she used proxmox and got it working by changing disk to writeback cache I think it was.

I also use hosted engine, but I run my gluster volume for HE actually on a LVM separate from zfs on xfs and if i recall it did not have the issues my gluster on zfs did. I'm wondering now if the issue was zfs settings.

Hopefully should have a test machone up soon I can play around with more.

Scott

On Thu, Jul 21, 2016 at 11:36 AM David Gossage < dgossage@carouselchecks.com> wrote: What back end storage do you run gluster on? xfs/zfs/ext4 etc?

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284

On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote: I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following:

$ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag all_stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted

If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with:

[2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted)

Scott

On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < f.rothenstein@bodden-kliniken.de> wrote: Hey Devid,

I have the very same problem on my test-cluster, despite on running ovirt 4.0. If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS.

Frank

...
Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed.

I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings

Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage: then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts.

...
Saw a lot of messages like these that went away once gluster

rollback finished

...
[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init]

0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22

...
[2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted)

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284 _______________________________________________ Users mailing list

Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

______________________________________________________________________________

...
BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten

Telefon: 03821-700-0 Fax: 03821-700-240

E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188 Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski

Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge- sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf- fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen.

Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Alexander Wels

8:24 p.m.

On Monday, July 25, 2016 01:00:58 PM David Gossage wrote:

...

On Mon, Jul 25, 2016 at 9:58 AM, Krutika Dhananjay <kdhananj@redhat.com>

wrote:

...
OK, could you try the following:

i. Set network.remote-dio to off

# gluster volume set <VOL> network.remote-dio off

ii. Set performance.strict-o-direct to on

# gluster volume set <VOL> performance.strict-o-direct on

iii. Stop the affected vm(s) and start again

and tell me if you notice any improvement?

Previous instll I had issue with is still on gluster 3.7.11

My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a locak disk right now isn't allowing me to add the gluster storage at all.

Keep getting some type of UI error

Yes that is definitely a UI error. To get a better stack trace can you install the debuginfo yum install ovirt-engine-webadmin-portal-debuginfo ovirt-engine-userportal-debuginfo And recreate the exception, that should give a better stack trace.

...

2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785 2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Uncaught exception: : java.lang.ClassCastException at Unknown.ps( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@3837) at Unknown.ts( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@20) at Unknown.vs( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@18) at Unknown.iJf( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@19) at Unknown.Xab( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@48) at Unknown.P8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@4447) at Unknown.jQr( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@21) at Unknown.A8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@51) at Unknown.u8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@101) at Unknown.Eap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@10718) at Unknown.p8n( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@161) at Unknown.Cao( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@31) at Unknown.Bap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@10469) at Unknown.kRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@49) at Unknown.nRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@438) at Unknown.eVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@40) at Unknown.hVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@25827) at Unknown.MTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@25) at Unknown.PTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@24052) at Unknown.KJe( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@21125) at Unknown.Izk( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@10384) at Unknown.P3( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@137) at Unknown.g4( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@8271) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@65) at Unknown._t( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@29) at Unknown.du( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@57) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@54 )

...
-Krutika

On Mon, Jul 25, 2016 at 4:57 PM, Samuli Heinonen <samppah@neutraali.net>

wrote:

...
Hi,

...
On 25 Jul 2016, at 12:34, David Gossage <dgossage@carouselchecks.com>

wrote:

...
On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay <kdhananj@redhat.com>

wrote:

...
Hi,

Thanks for the logs. So I have identified one issue from the logs for

which the fix is this: http://review.gluster.org/#/c/14669/. Because of a bug in the code, ENOENT was getting converted to EPERM and being propagated up the stack causing the reads to bail out early with 'Operation not permitted' errors.

...
I still need to find out two things: i) why there was a readv() sent on a non-existent (ENOENT) file (this

is important since some of the other users have not faced or reported this issue on gluster-users with 3.7.13)

...
ii) need to see if there's a way to work around this issue.

Do you mind sharing the steps needed to be executed to run into this

issue? This is so that we can apply our patches, test and ensure they fix the problem.

Unfortunately I can’t test this right away nor give exact steps how to

David Gossage

8:37 p.m.

...

...
...
My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a locak

...
disk right now isn't allowing me to add the gluster storage at all.

...
...
Keep getting some type of UI error

...
Yes that is definitely a UI error. To get a better stack trace can you install the debuginfo

yum install ovirt-engine-webadmin-portal-debuginfo ovirt-engine-userportal-debuginfo

And recreate the exception, that should give a better stack trace.

Alexander Wels

8:39 p.m.

On Monday, July 25, 2016 01:37:47 PM David Gossage wrote:

...

...
...
My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a

locak

...
disk right now isn't allowing me to add the gluster storage at all.

Keep getting some type of UI error

Yes that is definitely a UI error. To get a better stack trace can you install the debuginfo

yum install ovirt-engine-webadmin-portal-debuginfo ovirt-engine-userportal-debuginfo

And recreate the exception, that should give a better stack trace.

Do I need to restart engine?

Yes, you will need to restart engine before the log starts showing a better stack trace.

...

Installed packages and attempted to create storage again from the guide me of data center where I received last time and I end up with another red banner about uncaught exception and to reload page.

ui.log seems about same

2016-07-25 13:34:03,471 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-80) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785 2016-07-25 13:34:03,471 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-80) [] Uncaught exception: : java.lang.ClassCastException at Unknown.ps( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@3837) at Unknown.ts( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@20) at Unknown.vs( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@18) at Unknown.iJf( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@19) at Unknown.Xab( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@48) at Unknown.P8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@4447) at Unknown.jQr( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@21) at Unknown.A8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@51) at Unknown.u8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@101) at Unknown.Eap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@10718) at Unknown.p8n( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@161) at Unknown.Cao( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@31) at Unknown.Bap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@10469) at Unknown.kRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@49) at Unknown.nRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@438) at Unknown.eVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@40) at Unknown.hVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@25827) at Unknown.MTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@25) at Unknown.PTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@24052) at Unknown.KJe( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@21125) at Unknown.Izk( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@10384) at Unknown.P3( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@137) at Unknown.g4( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@8271) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@65) at Unknown._t( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@29) at Unknown.du( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@57) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@54 )

David Gossage

8:49 p.m.

On Mon, Jul 25, 2016 at 1:39 PM, Alexander Wels <awels@redhat.com> wrote:

...

On Monday, July 25, 2016 01:37:47 PM David Gossage wrote:

...
...
...
My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a

locak

...
disk right now isn't allowing me to add the gluster storage at all.

Keep getting some type of UI error

Yes that is definitely a UI error. To get a better stack trace can you install the debuginfo

yum install ovirt-engine-webadmin-portal-debuginfo ovirt-engine-userportal-debuginfo

And recreate the exception, that should give a better stack trace.

Do I need to restart engine?

Yes, you will need to restart engine before the log starts showing a better stack trace.

Hopefully more informative. 2016-07-25 13:46:54,701 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785 2016-07-25 13:46:54,702 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Uncaught exception: : java.lang.ClassCastException at java.lang.Throwable.fillInStackTrace(Throwable.java:114) [rt.jar:1.8.0_101] at java.lang.Exception.Exception(Exception.java:25) [rt.jar:1.8.0_101] at java.lang.RuntimeException.RuntimeException(RuntimeException.java:25) [rt.jar:1.8.0_101] at java.lang.ClassCastException.ClassCastException(ClassCastException.java:23) [rt.jar:1.8.0_101] at com.google.gwt.lang.Cast.dynamicCast(Cast.java:53) at org.ovirt.engine.ui.uicommonweb.models.datacenters.DataCenterGuideModel.run(DataCenterGuideModel.java:1679) at org.ovirt.engine.ui.uicompat.Task.$run(Task.java:19) at org.ovirt.engine.ui.uicommonweb.models.datacenters.DataCenterGuideModel.$saveSanStorage(DataCenterGuideModel.java:955) at org.ovirt.engine.ui.uicommonweb.models.datacenters.DataCenterGuideModel.$postOnAddStorage(DataCenterGuideModel.java:667) at org.ovirt.engine.ui.uicommonweb.models.datacenters.DataCenterGuideModel$9$1.onSuccess(DataCenterGuideModel.java:646) at org.ovirt.engine.ui.uicommonweb.dataprovider.AsyncDataProvider.$getConfigFromCache(AsyncDataProvider.java:2853) at org.ovirt.engine.ui.uicommonweb.dataprovider.AsyncDataProvider.$getStorageDomainMaxNameLength(AsyncDataProvider.java:2267) at org.ovirt.engine.ui.uicommonweb.models.datacenters.DataCenterGuideModel$9.onSuccess(DataCenterGuideModel.java:629) at org.ovirt.engine.ui.frontend.Frontend$2.$onSuccess(Frontend.java:244) [frontend.jar:] at org.ovirt.engine.ui.frontend.Frontend$2.onSuccess(Frontend.java:244) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.OperationProcessor$2.$onSuccess(OperationProcessor.java:141) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.OperationProcessor$2.onSuccess(OperationProcessor.java:141) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.GWTRPCCommunicationProvider$3$1.$onSuccess(GWTRPCCommunicationProvider.java:161) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.GWTRPCCommunicationProvider$3$1.onSuccess(GWTRPCCommunicationProvider.java:161) [frontend.jar:] at com.google.gwt.rpc.client.impl.RpcCallbackAdapter.onResponseReceived(RpcCallbackAdapter.java:72) [gwt-servlet.jar:] at org.ovirt.engine.ui.common.gin.BaseSystemModule$1$1.onResponseReceived(BaseSystemModule.java:140) at com.google.gwt.http.client.Request.$fireOnResponseReceived(Request.java:237) [gwt-servlet.jar:] at com.google.gwt.http.client.RequestBuilder$1.onReadyStateChange(RequestBuilder.java:409) [gwt-servlet.jar:] at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at com.google.gwt.core.client.impl.Impl.apply(Impl.java:296) [gwt-servlet.jar:] at com.google.gwt.core.client.impl.Impl.entry0(Impl.java:335) [gwt-servlet.jar:] at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8... )

Alexander Wels

10:48 p.m.

On Monday, July 25, 2016 01:49:32 PM David Gossage wrote:

...

On Mon, Jul 25, 2016 at 1:39 PM, Alexander Wels <awels@redhat.com> wrote:

...
On Monday, July 25, 2016 01:37:47 PM David Gossage wrote:

...
...
...
My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a

locak

...
disk right now isn't allowing me to add the gluster storage at all.

Keep getting some type of UI error

Yes that is definitely a UI error. To get a better stack trace can you install the debuginfo

yum install ovirt-engine-webadmin-portal-debuginfo ovirt-engine-userportal-debuginfo

And recreate the exception, that should give a better stack trace.

Do I need to restart engine?

Yes, you will need to restart engine before the log starts showing a better stack trace.

Hopefully more informative.

2016-07-25 13:46:54,701 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785 2016-07-25 13:46:54,702 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Uncaught exception: : java.lang.ClassCastException at java.lang.Throwable.fillInStackTrace(Throwable.java:114) [rt.jar:1.8.0_101] at java.lang.Exception.Exception(Exception.java:25) [rt.jar:1.8.0_101] at java.lang.RuntimeException.RuntimeException(RuntimeException.java:25) [rt.jar:1.8.0_101] at java.lang.ClassCastException.ClassCastException(ClassCastException.java:23) [rt.jar:1.8.0_101] at com.google.gwt.lang.Cast.dynamicCast(Cast.java:53) at org.ovirt.engine.ui.uicommonweb.models.datacenters.DataCenterGuideModel.run( DataCenterGuideModel.java:1679) at org.ovirt.engine.ui.uicompat.Task.$run(Task.java:19) at org.ovirt.engine.ui.uicommonweb.models.datacenters.DataCenterGuideModel.$sav eSanStorage(DataCenterGuideModel.java:955) at org.ovirt.engine.ui.uicommonweb.models.datacenters.DataCenterGuideModel.$pos tOnAddStorage(DataCenterGuideModel.java:667) at org.ovirt.engine.ui.uicommonweb.models.datacenters.DataCenterGuideModel$9$1. onSuccess(DataCenterGuideModel.java:646) at org.ovirt.engine.ui.uicommonweb.dataprovider.AsyncDataProvider.$getConfigFro mCache(AsyncDataProvider.java:2853) at org.ovirt.engine.ui.uicommonweb.dataprovider.AsyncDataProvider.$getStorageDo mainMaxNameLength(AsyncDataProvider.java:2267) at org.ovirt.engine.ui.uicommonweb.models.datacenters.DataCenterGuideModel$9.on Success(DataCenterGuideModel.java:629) at org.ovirt.engine.ui.frontend.Frontend$2.$onSuccess(Frontend.java:244) [frontend.jar:] at org.ovirt.engine.ui.frontend.Frontend$2.onSuccess(Frontend.java:244) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.OperationProcessor$2.$onSuccess(O perationProcessor.java:141) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.OperationProcessor$2.onSuccess(Op erationProcessor.java:141) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.GWTRPCCommunicationProvider$3$1.$ onSuccess(GWTRPCCommunicationProvider.java:161) [frontend.jar:] at org.ovirt.engine.ui.frontend.communication.GWTRPCCommunicationProvider$3$1.o nSuccess(GWTRPCCommunicationProvider.java:161) [frontend.jar:] at com.google.gwt.rpc.client.impl.RpcCallbackAdapter.onResponseReceived(RpcCall backAdapter.java:72) [gwt-servlet.jar:] at org.ovirt.engine.ui.common.gin.BaseSystemModule$1$1.onResponseReceived(BaseS ystemModule.java:140) at com.google.gwt.http.client.Request.$fireOnResponseReceived(Request.java:237) [gwt-servlet.jar:] at com.google.gwt.http.client.RequestBuilder$1.onReadyStateChange(RequestBuilde r.java:409) [gwt-servlet.jar:] at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@65) at com.google.gwt.core.client.impl.Impl.apply(Impl.java:296) [gwt-servlet.jar:] at com.google.gwt.core.client.impl.Impl.entry0(Impl.java:335) [gwt-servlet.jar:] at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8 BE1C7FDD91EDAA785.cache.html@54 )

Looks like a bug in the code to me, there are a couple of casts in org.ovirt.engine.ui.uicommonweb.models.datacenters.DataCenterGuideModel.run(DataCenterGuideModel.java:1679) that could be the culprit, can you open a bugzilla against the webadmin?

David Gossage

11:18 p.m.

On Mon, Jul 25, 2016 at 3:48 PM, Alexander Wels <awels@redhat.com> wrote:

...

On Monday, July 25, 2016 01:49:32 PM David Gossage wrote:

...
On Mon, Jul 25, 2016 at 1:39 PM, Alexander Wels <awels@redhat.com> wrote:

...
On Monday, July 25, 2016 01:37:47 PM David Gossage wrote:

...
...
...
My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a

locak

...
disk right now isn't allowing me to add the gluster storage at all.

Keep getting some type of UI error

Yes that is definitely a UI error. To get a better stack trace can you install the debuginfo

yum install ovirt-engine-webadmin-portal-debuginfo ovirt-engine-userportal-debuginfo

And recreate the exception, that should give a better stack trace.

Do I need to restart engine?

Yes, you will need to restart engine before the log starts showing a better stack trace.

Hopefully more informative.

2016-07-25 13:46:54,701 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785 2016-07-25 13:46:54,702 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Uncaught exception: : java.lang.ClassCastException at java.lang.Throwable.fillInStackTrace(Throwable.java:114) [rt.jar:1.8.0_101] at java.lang.Exception.Exception(Exception.java:25) [rt.jar:1.8.0_101] at java.lang.RuntimeException.RuntimeException(RuntimeException.java:25) [rt.jar:1.8.0_101] at

java.lang.ClassCastException.ClassCastException(ClassCastException.java:23)

...
[rt.jar:1.8.0_101] at com.google.gwt.lang.Cast.dynamicCast(Cast.java:53) at

org.ovirt.engine.ui.uicommonweb.models.datacenters.DataCenterGuideModel.run(

...
DataCenterGuideModel.java:1679) at org.ovirt.engine.ui.uicompat.Task.$run(Task.java:19) at

org.ovirt.engine.ui.uicommonweb.models.datacenters.DataCenterGuideModel.$sav

...
eSanStorage(DataCenterGuideModel.java:955) at

org.ovirt.engine.ui.uicommonweb.models.datacenters.DataCenterGuideModel.$pos

...
tOnAddStorage(DataCenterGuideModel.java:667) at

org.ovirt.engine.ui.uicommonweb.models.datacenters.DataCenterGuideModel$9$1.

...
onSuccess(DataCenterGuideModel.java:646) at

org.ovirt.engine.ui.uicommonweb.dataprovider.AsyncDataProvider.$getConfigFro

...
mCache(AsyncDataProvider.java:2853) at

org.ovirt.engine.ui.uicommonweb.dataprovider.AsyncDataProvider.$getStorageDo

...
mainMaxNameLength(AsyncDataProvider.java:2267) at

org.ovirt.engine.ui.uicommonweb.models.datacenters.DataCenterGuideModel$9.on

...
Success(DataCenterGuideModel.java:629) at org.ovirt.engine.ui.frontend.Frontend$2.$onSuccess(Frontend.java:244) [frontend.jar:] at org.ovirt.engine.ui.frontend.Frontend$2.onSuccess(Frontend.java:244) [frontend.jar:] at

org.ovirt.engine.ui.frontend.communication.OperationProcessor$2.$onSuccess(O

...
perationProcessor.java:141) [frontend.jar:] at

org.ovirt.engine.ui.frontend.communication.OperationProcessor$2.onSuccess(Op

...
erationProcessor.java:141) [frontend.jar:] at

org.ovirt.engine.ui.frontend.communication.GWTRPCCommunicationProvider$3$1.$

...
onSuccess(GWTRPCCommunicationProvider.java:161) [frontend.jar:] at

org.ovirt.engine.ui.frontend.communication.GWTRPCCommunicationProvider$3$1.o

...
nSuccess(GWTRPCCommunicationProvider.java:161) [frontend.jar:] at

com.google.gwt.rpc.client.impl.RpcCallbackAdapter.onResponseReceived(RpcCall

...
backAdapter.java:72) [gwt-servlet.jar:] at

org.ovirt.engine.ui.common.gin.BaseSystemModule$1$1.onResponseReceived(BaseS

...
ystemModule.java:140) at

com.google.gwt.http.client.Request.$fireOnResponseReceived(Request.java:237)

...
[gwt-servlet.jar:] at

com.google.gwt.http.client.RequestBuilder$1.onReadyStateChange(RequestBuilde

...
r.java:409) [gwt-servlet.jar:] at Unknown.<anonymous>(

https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8

...
BE1C7FDD91EDAA785.cache.html@65) at com.google.gwt.core.client.impl.Impl.apply(Impl.java:296) [gwt-servlet.jar:] at com.google.gwt.core.client.impl.Impl.entry0(Impl.java:335) [gwt-servlet.jar:] at Unknown.<anonymous>(

https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8

...
BE1C7FDD91EDAA785.cache.html@54 )

Looks like a bug in the code to me, there are a couple of casts in

org.ovirt.engine.ui.uicommonweb.models.datacenters.DataCenterGuideModel.run(DataCenterGuideModel.java:1679) that could be the culprit, can you open a bugzilla against the webadmin?

https://bugzilla.redhat.com/show_bug.cgi?id=1359943

Krutika Dhananjay

26 Jul 26 Jul

11:37 a.m.

Hi, 1. Could you please attach the glustershd logs from all three nodes? 2. Also, so far what we know is that the 'Operation not permitted' errors are on the main vm image itself and not its individual shards (ex deb61291-5176-4b81-8315-3f1cf8e3534d). Could you do the following: Get the inode number of .glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d (ls -li) from the first brick. I'll call this number INODE_NUMBER. Execute `find . -inum INODE_NUMBER` from the brick root on first brick to print the hard links against the file in the prev step and share the output. 3. Did you delete any vms at any point before or after the upgrade? -Krutika On Mon, Jul 25, 2016 at 11:30 PM, David Gossage <dgossage@carouselchecks.com

...

wrote:

...

On Mon, Jul 25, 2016 at 9:58 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote:

...
OK, could you try the following:

i. Set network.remote-dio to off # gluster volume set <VOL> network.remote-dio off

ii. Set performance.strict-o-direct to on # gluster volume set <VOL> performance.strict-o-direct on

iii. Stop the affected vm(s) and start again

and tell me if you notice any improvement?

Previous instll I had issue with is still on gluster 3.7.11

My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a locak disk right now isn't allowing me to add the gluster storage at all.

Keep getting some type of UI error

2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785 2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Uncaught exception: : java.lang.ClassCastException at Unknown.ps( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.ts( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.vs( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.iJf( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Xab( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.P8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.jQr( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.A8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.u8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Eap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.p8n( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Cao( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Bap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.kRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.nRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.eVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.hVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.MTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.PTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.KJe( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Izk( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.P3( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.g4( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown._t( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.du( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8... )

...
-Krutika

On Mon, Jul 25, 2016 at 4:57 PM, Samuli Heinonen <samppah@neutraali.net> wrote:

...
Hi,

...
On 25 Jul 2016, at 12:34, David Gossage <dgossage@carouselchecks.com> wrote:

On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay < kdhananj@redhat.com> wrote: Hi,

Thanks for the logs. So I have identified one issue from the logs for which the fix is this: http://review.gluster.org/#/c/14669/. Because of a bug in the code, ENOENT was getting converted to EPERM and being propagated up the stack causing the reads to bail out early with 'Operation not permitted' errors. I still need to find out two things: i) why there was a readv() sent on a non-existent (ENOENT) file (this is important since some of the other users have not faced or reported this issue on gluster-users with 3.7.13) ii) need to see if there's a way to work around this issue.

Do you mind sharing the steps needed to be executed to run into this issue? This is so that we can apply our patches, test and ensure they fix the problem.

Unfortunately I can’t test this right away nor give exact steps how to test this. This is just a theory but please correct me if you see some mistakes.

oVirt uses cache=none settings for VM’s by default which requires direct I/O. oVirt also uses dd with iflag=direct to check that storage has direct I/O enabled. Problems exist with GlusterFS with sharding enabled and bricks running on ZFS on Linux. Everything seems to be fine with GlusterFS 3.7.11 and problems exist at least with version .12 and .13. There has been some posts saying that GlusterFS 3.8.x is also affected.

Steps to reproduce: 1. Sharded file is created with GlusterFS 3.7.11. Everything works ok. 2. GlusterFS is upgraded to 3.7.12+ 3. Sharded file cannot be read or written with direct I/O enabled. (Ie. oVirt uses to check storage connection with command "dd if=/rhev/data-center/00000001-0001-0001-0001-0000000002b6/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000”)

Please let me know if you need more information.

-samuli

...
Well after upgrade of gluster all I did was start ovirt hosts up which launched and started their ha-agent and broker processes. I don't believe I started getting any errors till it mounted GLUSTER1. I had enabled sharding but had no sharded disk images yet. Not sure if the check for shards would have caused that. Unfortunately I can't just update this cluster and try and see what caused it as it has sme VM's users expect to be available in few hours.

I can see if I can get my test setup to recreate it. I think I'll need to de-activate data center so I can detach the storage thats on xfs and attach the one thats over zfs with sharding enabled. My test is 3 bricks on same local machine, with 3 different volumes but I think im running into sanlock issue or something as it won't mount more than one volume that was created locally.

-Krutika

On Fri, Jul 22, 2016 at 7:17 PM, David Gossage < dgossage@carouselchecks.com> wrote: Trimmed out the logs to just about when I was shutting down ovirt servers for updates which was 14:30 UTC 2016-07-09

Pre-update settings were

Volume Name: GLUSTER1 Type: Replicate Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 Options Reconfigured: performance.readdir-ahead: on storage.owner-uid: 36 storage.owner-gid: 36 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server server.allow-insecure: on cluster.self-heal-window-size: 1024 cluster.background-self-heal-count: 16 performance.strict-write-ordering: off nfs.disable: on nfs.addr-namelookup: off nfs.enable-ino32: off

At the time of updates ccgl3 was offline from bad nic on server but had been so for about a week with no issues in volume

Shortly after update I added these settings to enable sharding but did not as of yet have any VM images sharded. features.shard-block-size: 64MB features.shard: on

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284

On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay < kdhananj@redhat.com> wrote: Hi David,

Could you also share the brick logs from the affected volume? They're located at /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log.

Also, could you share the volume configuration (output of `gluster volume info <VOL>`) for the affected volume(s) AND at the time you actually saw this issue?

-Krutika

On Thu, Jul 21, 2016 at 11:23 PM, David Gossage < dgossage@carouselchecks.com> wrote: On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> wrote: Hi David,

My backend storage is ZFS.

I thought about moving from FUSE to NFS mounts for my Gluster volumes to help test. But since I use hosted engine this would be a real pain. Its difficult to modify the storage domain type/path in the hosted-engine.conf. And I don't want to go through the process of re-deploying hosted engine.

I found this

https://bugzilla.redhat.com/show_bug.cgi?id=1347553

Not sure if related.

But I also have zfs backend, another user in gluster mailing list had issues and used zfs backend although she used proxmox and got it working by changing disk to writeback cache I think it was.

I also use hosted engine, but I run my gluster volume for HE actually on a LVM separate from zfs on xfs and if i recall it did not have the issues my gluster on zfs did. I'm wondering now if the issue was zfs settings.

Hopefully should have a test machone up soon I can play around with more.

Scott

On Thu, Jul 21, 2016 at 11:36 AM David Gossage < dgossage@carouselchecks.com> wrote: What back end storage do you run gluster on? xfs/zfs/ext4 etc?

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284

On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote: I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following:

$ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag all_stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted

If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with:

[2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted)

Scott

On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < f.rothenstein@bodden-kliniken.de> wrote: Hey Devid,

I have the very same problem on my test-cluster, despite on running ovirt 4.0. If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS.

Frank

...
Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed.

I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings

Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage: then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts.

...
Saw a lot of messages like these that went away once gluster rollback

finished

...
[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init]

0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22

...
[2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted)

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284 _______________________________________________ Users mailing list

Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

______________________________________________________________________________

...
BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten

Telefon: 03821-700-0 Fax: 03821-700-240

E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188 Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski

Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge- sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf- fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen.

Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

David Gossage

3:15 p.m.

On Tue, Jul 26, 2016 at 4:37 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote:

...

Hi,

1. Could you please attach the glustershd logs from all three nodes?

Here are ccgl1 and ccgl2. as previously mentioned ccgl3 third node was down from bad nic so no relevant logs would be on that node.

...

2. Also, so far what we know is that the 'Operation not permitted' errors are on the main vm image itself and not its individual shards (ex deb61291-5176-4b81-8315-3f1cf8e3534d). Could you do the following: Get the inode number of .glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d (ls -li) from the first brick. I'll call this number INODE_NUMBER. Execute `find . -inum INODE_NUMBER` from the brick root on first brick to print the hard links against the file in the prev step and share the output.

[dgossage@ccgl1 ~]$ sudo ls -li /gluster1/BRICK1/1/.glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d 16407 -rw-r--r--. 2 36 36 466 Jun 5 16:52 /gluster1/BRICK1/1/.glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d [dgossage@ccgl1 ~]$ cd /gluster1/BRICK1/1/ [dgossage@ccgl1 1]$ sudo find . -inum 16407 ./7c73a8dd-a72e-4556-ac88-7f6813131e64/dom_md/metadata ./.glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d

...

3. Did you delete any vms at any point before or after the upgrade?

Immediately before or after on same day pretty sure I deleted nothing. During week prior I deleted a few dev vm's that were never setup and some the week after upgrade as I was preparing for moving disks off and on storage to get them sharded and felt it would be easier to just recreate some disks that had no data yet rather than move them off and on later.

...

-Krutika

On Mon, Jul 25, 2016 at 11:30 PM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Mon, Jul 25, 2016 at 9:58 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote:

...
OK, could you try the following:

i. Set network.remote-dio to off # gluster volume set <VOL> network.remote-dio off

ii. Set performance.strict-o-direct to on # gluster volume set <VOL> performance.strict-o-direct on

iii. Stop the affected vm(s) and start again

and tell me if you notice any improvement?

Previous instll I had issue with is still on gluster 3.7.11

My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a locak disk right now isn't allowing me to add the gluster storage at all.

Keep getting some type of UI error

2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785 2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Uncaught exception: : java.lang.ClassCastException at Unknown.ps( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.ts( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.vs( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.iJf( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Xab( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.P8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.jQr( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.A8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.u8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Eap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.p8n( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Cao( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Bap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.kRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.nRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.eVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.hVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.MTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.PTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.KJe( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Izk( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.P3( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.g4( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown._t( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.du( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8... )

...
-Krutika

On Mon, Jul 25, 2016 at 4:57 PM, Samuli Heinonen <samppah@neutraali.net> wrote:

...
Hi,

...
On 25 Jul 2016, at 12:34, David Gossage <dgossage@carouselchecks.com> wrote:

On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay < kdhananj@redhat.com> wrote: Hi,

Thanks for the logs. So I have identified one issue from the logs for which the fix is this: http://review.gluster.org/#/c/14669/. Because of a bug in the code, ENOENT was getting converted to EPERM and being propagated up the stack causing the reads to bail out early with 'Operation not permitted' errors. I still need to find out two things: i) why there was a readv() sent on a non-existent (ENOENT) file (this is important since some of the other users have not faced or reported this issue on gluster-users with 3.7.13) ii) need to see if there's a way to work around this issue.

Do you mind sharing the steps needed to be executed to run into this issue? This is so that we can apply our patches, test and ensure they fix the problem.

Unfortunately I can’t test this right away nor give exact steps how to test this. This is just a theory but please correct me if you see some mistakes.

oVirt uses cache=none settings for VM’s by default which requires direct I/O. oVirt also uses dd with iflag=direct to check that storage has direct I/O enabled. Problems exist with GlusterFS with sharding enabled and bricks running on ZFS on Linux. Everything seems to be fine with GlusterFS 3.7.11 and problems exist at least with version .12 and .13. There has been some posts saying that GlusterFS 3.8.x is also affected.

Steps to reproduce: 1. Sharded file is created with GlusterFS 3.7.11. Everything works ok. 2. GlusterFS is upgraded to 3.7.12+ 3. Sharded file cannot be read or written with direct I/O enabled. (Ie. oVirt uses to check storage connection with command "dd if=/rhev/data-center/00000001-0001-0001-0001-0000000002b6/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000”)

Please let me know if you need more information.

-samuli

...
Well after upgrade of gluster all I did was start ovirt hosts up which launched and started their ha-agent and broker processes. I don't believe I started getting any errors till it mounted GLUSTER1. I had enabled sharding but had no sharded disk images yet. Not sure if the check for shards would have caused that. Unfortunately I can't just update this cluster and try and see what caused it as it has sme VM's users expect to be available in few hours.

I can see if I can get my test setup to recreate it. I think I'll need to de-activate data center so I can detach the storage thats on xfs and attach the one thats over zfs with sharding enabled. My test is 3 bricks on same local machine, with 3 different volumes but I think im running into sanlock issue or something as it won't mount more than one volume that was created locally.

-Krutika

On Fri, Jul 22, 2016 at 7:17 PM, David Gossage < dgossage@carouselchecks.com> wrote: Trimmed out the logs to just about when I was shutting down ovirt servers for updates which was 14:30 UTC 2016-07-09

Pre-update settings were

Volume Name: GLUSTER1 Type: Replicate Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 Options Reconfigured: performance.readdir-ahead: on storage.owner-uid: 36 storage.owner-gid: 36 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server server.allow-insecure: on cluster.self-heal-window-size: 1024 cluster.background-self-heal-count: 16 performance.strict-write-ordering: off nfs.disable: on nfs.addr-namelookup: off nfs.enable-ino32: off

At the time of updates ccgl3 was offline from bad nic on server but had been so for about a week with no issues in volume

Shortly after update I added these settings to enable sharding but did not as of yet have any VM images sharded. features.shard-block-size: 64MB features.shard: on

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284

On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay < kdhananj@redhat.com> wrote: Hi David,

Could you also share the brick logs from the affected volume? They're located at /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log.

Also, could you share the volume configuration (output of `gluster volume info <VOL>`) for the affected volume(s) AND at the time you actually saw this issue?

-Krutika

On Thu, Jul 21, 2016 at 11:23 PM, David Gossage < dgossage@carouselchecks.com> wrote: On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> wrote: Hi David,

My backend storage is ZFS.

I thought about moving from FUSE to NFS mounts for my Gluster volumes to help test. But since I use hosted engine this would be a real pain. Its difficult to modify the storage domain type/path in the hosted-engine.conf. And I don't want to go through the process of re-deploying hosted engine.

I found this

https://bugzilla.redhat.com/show_bug.cgi?id=1347553

Not sure if related.

But I also have zfs backend, another user in gluster mailing list had issues and used zfs backend although she used proxmox and got it working by changing disk to writeback cache I think it was.

I also use hosted engine, but I run my gluster volume for HE actually on a LVM separate from zfs on xfs and if i recall it did not have the issues my gluster on zfs did. I'm wondering now if the issue was zfs settings.

Hopefully should have a test machone up soon I can play around with more.

Scott

On Thu, Jul 21, 2016 at 11:36 AM David Gossage < dgossage@carouselchecks.com> wrote: What back end storage do you run gluster on? xfs/zfs/ext4 etc?

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284

On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote: I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following:

$ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag all_stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted

If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with:

[2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted)

Scott

On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < f.rothenstein@bodden-kliniken.de> wrote: Hey Devid,

I have the very same problem on my test-cluster, despite on running ovirt 4.0. If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS.

Frank

...
Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed.

I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings

Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage: then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts.

...
Saw a lot of messages like these that went away once gluster

rollback finished

...
[2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init]

0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22

...
[2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted)

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284 _______________________________________________ Users mailing list

Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

______________________________________________________________________________

...
BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten

Telefon: 03821-700-0 Fax: 03821-700-240

E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188 Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski

Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge- sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf- fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen.

Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

David Gossage

2 Aug 2 Aug

4:05 p.m.

So far gluster 3.7.14 seems to have resolved issues at least on my test box. dd commands that failed previously now work with sharding on zfs backend, Where before I couldn't even mount a new storage domain it now mounted and I have a test vm being created. Still have to let VM run for a few days and make sure no locking freezing occurs but looks hopeful so far. *David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284 On Tue, Jul 26, 2016 at 8:15 AM, David Gossage <dgossage@carouselchecks.com> wrote:

...

On Tue, Jul 26, 2016 at 4:37 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote:

...
Hi,

1. Could you please attach the glustershd logs from all three nodes?

Here are ccgl1 and ccgl2. as previously mentioned ccgl3 third node was down from bad nic so no relevant logs would be on that node.

...
2. Also, so far what we know is that the 'Operation not permitted' errors are on the main vm image itself and not its individual shards (ex deb61291-5176-4b81-8315-3f1cf8e3534d). Could you do the following: Get the inode number of .glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d (ls -li) from the first brick. I'll call this number INODE_NUMBER. Execute `find . -inum INODE_NUMBER` from the brick root on first brick to print the hard links against the file in the prev step and share the output.

[dgossage@ccgl1 ~]$ sudo ls -li /gluster1/BRICK1/1/.glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d 16407 -rw-r--r--. 2 36 36 466 Jun 5 16:52 /gluster1/BRICK1/1/.glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d [dgossage@ccgl1 ~]$ cd /gluster1/BRICK1/1/ [dgossage@ccgl1 1]$ sudo find . -inum 16407 ./7c73a8dd-a72e-4556-ac88-7f6813131e64/dom_md/metadata ./.glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d

...
3. Did you delete any vms at any point before or after the upgrade?

Immediately before or after on same day pretty sure I deleted nothing. During week prior I deleted a few dev vm's that were never setup and some the week after upgrade as I was preparing for moving disks off and on storage to get them sharded and felt it would be easier to just recreate some disks that had no data yet rather than move them off and on later.

...
-Krutika

On Mon, Jul 25, 2016 at 11:30 PM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Mon, Jul 25, 2016 at 9:58 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote:

...
OK, could you try the following:

i. Set network.remote-dio to off # gluster volume set <VOL> network.remote-dio off

ii. Set performance.strict-o-direct to on # gluster volume set <VOL> performance.strict-o-direct on

iii. Stop the affected vm(s) and start again

and tell me if you notice any improvement?

Previous instll I had issue with is still on gluster 3.7.11

My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a locak disk right now isn't allowing me to add the gluster storage at all.

Keep getting some type of UI error

2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785 2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Uncaught exception: : java.lang.ClassCastException at Unknown.ps( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.ts( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.vs( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.iJf( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Xab( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.P8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.jQr( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.A8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.u8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Eap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.p8n( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Cao( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Bap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.kRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.nRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.eVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.hVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.MTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.PTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.KJe( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Izk( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.P3( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.g4( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown._t( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.du( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8... )

...
-Krutika

On Mon, Jul 25, 2016 at 4:57 PM, Samuli Heinonen <samppah@neutraali.net

...
wrote:

...
Hi,

...
On 25 Jul 2016, at 12:34, David Gossage <dgossage@carouselchecks.com> wrote:

On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay < kdhananj@redhat.com> wrote: Hi,

Thanks for the logs. So I have identified one issue from the logs for which the fix is this: http://review.gluster.org/#/c/14669/. Because of a bug in the code, ENOENT was getting converted to EPERM and being propagated up the stack causing the reads to bail out early with 'Operation not permitted' errors. I still need to find out two things: i) why there was a readv() sent on a non-existent (ENOENT) file (this is important since some of the other users have not faced or reported this issue on gluster-users with 3.7.13) ii) need to see if there's a way to work around this issue.

Do you mind sharing the steps needed to be executed to run into this issue? This is so that we can apply our patches, test and ensure they fix the problem.

Unfortunately I can’t test this right away nor give exact steps how to test this. This is just a theory but please correct me if you see some mistakes.

oVirt uses cache=none settings for VM’s by default which requires direct I/O. oVirt also uses dd with iflag=direct to check that storage has direct I/O enabled. Problems exist with GlusterFS with sharding enabled and bricks running on ZFS on Linux. Everything seems to be fine with GlusterFS 3.7.11 and problems exist at least with version .12 and .13. There has been some posts saying that GlusterFS 3.8.x is also affected.

Steps to reproduce: 1. Sharded file is created with GlusterFS 3.7.11. Everything works ok. 2. GlusterFS is upgraded to 3.7.12+ 3. Sharded file cannot be read or written with direct I/O enabled. (Ie. oVirt uses to check storage connection with command "dd if=/rhev/data-center/00000001-0001-0001-0001-0000000002b6/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000”)

Please let me know if you need more information.

-samuli

...
Well after upgrade of gluster all I did was start ovirt hosts up which launched and started their ha-agent and broker processes. I don't believe I started getting any errors till it mounted GLUSTER1. I had enabled sharding but had no sharded disk images yet. Not sure if the check for shards would have caused that. Unfortunately I can't just update this cluster and try and see what caused it as it has sme VM's users expect to be available in few hours.

I can see if I can get my test setup to recreate it. I think I'll need to de-activate data center so I can detach the storage thats on xfs and attach the one thats over zfs with sharding enabled. My test is 3 bricks on same local machine, with 3 different volumes but I think im running into sanlock issue or something as it won't mount more than one volume that was created locally.

-Krutika

On Fri, Jul 22, 2016 at 7:17 PM, David Gossage < dgossage@carouselchecks.com> wrote: Trimmed out the logs to just about when I was shutting down ovirt servers for updates which was 14:30 UTC 2016-07-09

Pre-update settings were

Volume Name: GLUSTER1 Type: Replicate Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 Options Reconfigured: performance.readdir-ahead: on storage.owner-uid: 36 storage.owner-gid: 36 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server server.allow-insecure: on cluster.self-heal-window-size: 1024 cluster.background-self-heal-count: 16 performance.strict-write-ordering: off nfs.disable: on nfs.addr-namelookup: off nfs.enable-ino32: off

At the time of updates ccgl3 was offline from bad nic on server but had been so for about a week with no issues in volume

Shortly after update I added these settings to enable sharding but did not as of yet have any VM images sharded. features.shard-block-size: 64MB features.shard: on

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284

On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay < kdhananj@redhat.com> wrote: Hi David,

Could you also share the brick logs from the affected volume? They're located at /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log.

Also, could you share the volume configuration (output of `gluster volume info <VOL>`) for the affected volume(s) AND at the time you actually saw this issue?

-Krutika

On Thu, Jul 21, 2016 at 11:23 PM, David Gossage < dgossage@carouselchecks.com> wrote: On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> wrote: Hi David,

My backend storage is ZFS.

I thought about moving from FUSE to NFS mounts for my Gluster volumes to help test. But since I use hosted engine this would be a real pain. Its difficult to modify the storage domain type/path in the hosted-engine.conf. And I don't want to go through the process of re-deploying hosted engine.

I found this

https://bugzilla.redhat.com/show_bug.cgi?id=1347553

Not sure if related.

But I also have zfs backend, another user in gluster mailing list had issues and used zfs backend although she used proxmox and got it working by changing disk to writeback cache I think it was.

I also use hosted engine, but I run my gluster volume for HE actually on a LVM separate from zfs on xfs and if i recall it did not have the issues my gluster on zfs did. I'm wondering now if the issue was zfs settings.

Hopefully should have a test machone up soon I can play around with more.

Scott

On Thu, Jul 21, 2016 at 11:36 AM David Gossage < dgossage@carouselchecks.com> wrote: What back end storage do you run gluster on? xfs/zfs/ext4 etc?

David Gossage Carousel Checks Inc. | System Administrator Office 708.613.2284

On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote: I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following:

$ sudo hosted-engine --set-maintenance --mode=none Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> if not maintenance.set_mode(sys.argv[1]): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode value=m_global, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode str(value)) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag all_stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted

If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with:

[2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted)

Scott

On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < f.rothenstein@bodden-kliniken.de> wrote: Hey Devid,

I have the very same problem on my test-cluster, despite on running ovirt 4.0. If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS.

Frank

Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage: > Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed. > > I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts. > > Saw a lot of messages like these that went away once gluster rollback finished > > [2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22 > [2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] > [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] > [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) > [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] > [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] > [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) > > David Gossage > Carousel Checks Inc. | System Administrator > Office 708.613.2284 > _______________________________________________ > Users mailing list > > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users

______________________________________________________________________________

...
BODDEN-KLINIKEN Ribnitz-Damgarten GmbH Sandhufe 2 18311 Ribnitz-Damgarten

Telefon: 03821-700-0 Fax: 03821-700-240

E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de

Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188 Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski

Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge- sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf- fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den Absender zu informieren und die E-Mail zu löschen.

Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Scott

13 Aug 13 Aug

3:19 p.m.

Had a chance to upgrade my cluster to Gluster 3.7.14 and can confirm it works for me too where 3.7.12/13 did not. I did find that you should NOT turn off network.remote-dio or turn on performance.strict-o-direct as suggested earlier in the thread. They will prevent dd (using direct flag) and other things from working properly. I'd leave those at network.remote-dio=enabled and performance.strict-o-direct=off. Hopefully we can see Gluster 3.7.14 moved out of testing repo soon. Scott On Tue, Aug 2, 2016 at 9:05 AM, David Gossage <dgossage@carouselchecks.com> wrote:

...

So far gluster 3.7.14 seems to have resolved issues at least on my test box. dd commands that failed previously now work with sharding on zfs backend,

Where before I couldn't even mount a new storage domain it now mounted and I have a test vm being created.

Still have to let VM run for a few days and make sure no locking freezing occurs but looks hopeful so far.

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

On Tue, Jul 26, 2016 at 8:15 AM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Tue, Jul 26, 2016 at 4:37 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote:

...
Hi,

1. Could you please attach the glustershd logs from all three nodes?

Here are ccgl1 and ccgl2. as previously mentioned ccgl3 third node was down from bad nic so no relevant logs would be on that node.

...
2. Also, so far what we know is that the 'Operation not permitted' errors are on the main vm image itself and not its individual shards (ex deb61291-5176-4b81-8315-3f1cf8e3534d). Could you do the following: Get the inode number of .glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d (ls -li) from the first brick. I'll call this number INODE_NUMBER. Execute `find . -inum INODE_NUMBER` from the brick root on first brick to print the hard links against the file in the prev step and share the output.

[dgossage@ccgl1 ~]$ sudo ls -li /gluster1/BRICK1/1/.glusterfs/ de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d 16407 -rw-r--r--. 2 36 36 466 Jun 5 16:52 /gluster1/BRICK1/1/.glusterfs/ de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d [dgossage@ccgl1 ~]$ cd /gluster1/BRICK1/1/ [dgossage@ccgl1 1]$ sudo find . -inum 16407 ./7c73a8dd-a72e-4556-ac88-7f6813131e64/dom_md/metadata ./.glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d

...
3. Did you delete any vms at any point before or after the upgrade?

Immediately before or after on same day pretty sure I deleted nothing. During week prior I deleted a few dev vm's that were never setup and some the week after upgrade as I was preparing for moving disks off and on storage to get them sharded and felt it would be easier to just recreate some disks that had no data yet rather than move them off and on later.

...
-Krutika

On Mon, Jul 25, 2016 at 11:30 PM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Mon, Jul 25, 2016 at 9:58 AM, Krutika Dhananjay <kdhananj@redhat.com

...
wrote:

...
OK, could you try the following:

i. Set network.remote-dio to off # gluster volume set <VOL> network.remote-dio off

ii. Set performance.strict-o-direct to on # gluster volume set <VOL> performance.strict-o-direct on

iii. Stop the affected vm(s) and start again

and tell me if you notice any improvement?

Previous instll I had issue with is still on gluster 3.7.11

My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a locak disk right now isn't allowing me to add the gluster storage at all.

Keep getting some type of UI error

2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785 2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Uncaught exception: : java.lang.ClassCastException at Unknown.ps(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@3837) at Unknown.ts(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@20) at Unknown.vs(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@18) at Unknown.iJf(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@19) at Unknown.Xab(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@48) at Unknown.P8o(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@4447) at Unknown.jQr(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@21) at Unknown.A8o(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@51) at Unknown.u8o(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@101) at Unknown.Eap(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@10718) at Unknown.p8n(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@161) at Unknown.Cao(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@31) at Unknown.Bap(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@10469) at Unknown.kRn(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@49) at Unknown.nRn(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@438) at Unknown.eVn(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@40) at Unknown.hVn(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@25827) at Unknown.MTn(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@25) at Unknown.PTn(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@24052) at Unknown.KJe(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@21125) at Unknown.Izk(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@10384) at Unknown.P3(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@137) at Unknown.g4(https://ccengine2.carouselchecks.local/ovirt- engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@8271) at Unknown.<anonymous>(https://ccengine2.carouselchecks. local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA7 85.cache.html@65) at Unknown._t(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/ 430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@29) at Unknown.du( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/ 430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@57) at Unknown.<anonymous>(https://ccengine2.carouselchecks. local/ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA7 85.cache.html@54)

...
-Krutika

On Mon, Jul 25, 2016 at 4:57 PM, Samuli Heinonen < samppah@neutraali.net> wrote:

...
Hi,

> On 25 Jul 2016, at 12:34, David Gossage < dgossage@carouselchecks.com> wrote: > > On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay < kdhananj@redhat.com> wrote: > Hi, > > Thanks for the logs. So I have identified one issue from the logs for which the fix is this: http://review.gluster.org/#/c/14669/. Because of a bug in the code, ENOENT was getting converted to EPERM and being propagated up the stack causing the reads to bail out early with 'Operation not permitted' errors. > I still need to find out two things: > i) why there was a readv() sent on a non-existent (ENOENT) file (this is important since some of the other users have not faced or reported this issue on gluster-users with 3.7.13) > ii) need to see if there's a way to work around this issue. > > Do you mind sharing the steps needed to be executed to run into this issue? This is so that we can apply our patches, test and ensure they fix the problem.

Unfortunately I can’t test this right away nor give exact steps how to test this. This is just a theory but please correct me if you see some mistakes.

oVirt uses cache=none settings for VM’s by default which requires direct I/O. oVirt also uses dd with iflag=direct to check that storage has direct I/O enabled. Problems exist with GlusterFS with sharding enabled and bricks running on ZFS on Linux. Everything seems to be fine with GlusterFS 3.7.11 and problems exist at least with version .12 and .13. There has been some posts saying that GlusterFS 3.8.x is also affected.

Steps to reproduce: 1. Sharded file is created with GlusterFS 3.7.11. Everything works ok. 2. GlusterFS is upgraded to 3.7.12+ 3. Sharded file cannot be read or written with direct I/O enabled. (Ie. oVirt uses to check storage connection with command "dd if=/rhev/data-center/00000001-0001-0001-0001-0000000002b6/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000”)

Please let me know if you need more information.

-samuli

> Well after upgrade of gluster all I did was start ovirt hosts up which launched and started their ha-agent and broker processes. I don't believe I started getting any errors till it mounted GLUSTER1. I had enabled sharding but had no sharded disk images yet. Not sure if the check for shards would have caused that. Unfortunately I can't just update this cluster and try and see what caused it as it has sme VM's users expect to be available in few hours. > > I can see if I can get my test setup to recreate it. I think I'll need to de-activate data center so I can detach the storage thats on xfs and attach the one thats over zfs with sharding enabled. My test is 3 bricks on same local machine, with 3 different volumes but I think im running into sanlock issue or something as it won't mount more than one volume that was created locally. > > > -Krutika > > On Fri, Jul 22, 2016 at 7:17 PM, David Gossage < dgossage@carouselchecks.com> wrote: > Trimmed out the logs to just about when I was shutting down ovirt servers for updates which was 14:30 UTC 2016-07-09 > > Pre-update settings were > > Volume Name: GLUSTER1 > Type: Replicate > Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 > Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 > Options Reconfigured: > performance.readdir-ahead: on > storage.owner-uid: 36 > storage.owner-gid: 36 > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > server.allow-insecure: on > cluster.self-heal-window-size: 1024 > cluster.background-self-heal-count: 16 > performance.strict-write-ordering: off > nfs.disable: on > nfs.addr-namelookup: off > nfs.enable-ino32: off > > At the time of updates ccgl3 was offline from bad nic on server but had been so for about a week with no issues in volume > > Shortly after update I added these settings to enable sharding but did not as of yet have any VM images sharded. > features.shard-block-size: 64MB > features.shard: on > > > > > David Gossage > Carousel Checks Inc. | System Administrator > Office 708.613.2284 > > On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay < kdhananj@redhat.com> wrote: > Hi David, > > Could you also share the brick logs from the affected volume? They're located at /var/log/glusterfs/bricks/< hyphenated-path-to-the-brick-directory>.log. > > Also, could you share the volume configuration (output of `gluster volume info <VOL>`) for the affected volume(s) AND at the time you actually saw this issue? > > -Krutika > > > > > On Thu, Jul 21, 2016 at 11:23 PM, David Gossage < dgossage@carouselchecks.com> wrote: > On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> wrote: > Hi David, > > My backend storage is ZFS. > > I thought about moving from FUSE to NFS mounts for my Gluster volumes to help test. But since I use hosted engine this would be a real pain. Its difficult to modify the storage domain type/path in the hosted-engine.conf. And I don't want to go through the process of re-deploying hosted engine. > > > I found this > > https://bugzilla.redhat.com/show_bug.cgi?id=1347553 > > Not sure if related. > > But I also have zfs backend, another user in gluster mailing list had issues and used zfs backend although she used proxmox and got it working by changing disk to writeback cache I think it was. > > I also use hosted engine, but I run my gluster volume for HE actually on a LVM separate from zfs on xfs and if i recall it did not have the issues my gluster on zfs did. I'm wondering now if the issue was zfs settings. > > Hopefully should have a test machone up soon I can play around with more. > > Scott > > On Thu, Jul 21, 2016 at 11:36 AM David Gossage < dgossage@carouselchecks.com> wrote: > What back end storage do you run gluster on? xfs/zfs/ext4 etc? > > David Gossage > Carousel Checks Inc. | System Administrator > Office 708.613.2284 > > On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote: > I get similar problems with oVirt 4.0.1 and hosted engine. After upgrading all my hosts to Gluster 3.7.13 (client and server), I get the following: > > $ sudo hosted-engine --set-maintenance --mode=none > Traceback (most recent call last): > File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main > "__main__", fname, loader, pkg_name) > File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code > exec code in run_globals > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in <module> > if not maintenance.set_mode(sys.argv[1]): > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in set_mode > value=m_global, > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 259, in set_maintenance_mode > str(value)) > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 204, in set_global_md_flag > all_stats = broker.get_stats_from_storage(service) > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage > result = self._checked_communicate(request) > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate > .format(message or response)) > ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: failed to read metadata: [Errno 1] Operation not permitted > > If I only upgrade one host, then things will continue to work but my nodes are constantly healing shards. My logs are also flooded with: > > [2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 gfid=4 > 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) > The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted]" repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] > The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted]" repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] > The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted]" repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] > [2016-07-21 13:15:24.134647] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote operation failed [Operation not permitted] > [2016-07-21 13:15:24.134764] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote operation failed [Operation not permitted] > [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) > [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) > [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not permitted) > [2016-07-21 13:15:54.133582] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote operation failed [Operation not permitted] > [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) > [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not permitted) > [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not permitted) > > Scott > > > On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < f.rothenstein@bodden-kliniken.de> wrote: > Hey Devid, > > I have the very same problem on my test-cluster, despite on running ovirt 4.0. > If you access your volumes via NFS all is fine, problem is FUSE. I stayed on 3.7.13, but have no solution yet, now I use NFS. > > Frank > > Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage: >> Anyone running one of recent 3.6.x lines and gluster using 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug fixes, but have been told by users on gluster mail list due to some gluster changes I'd need to change the disk parameters to use writeback cache. Something to do with aio support being removed. >> >> I believe this could be done with custom parameters? But I believe strage tests are done using dd and would they fail with current settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to stability isues where gluster storage would go into down state and always show N/A as space available/used. Even if hosts saw storage still and VM's were running on it on all 3 hosts. >> >> Saw a lot of messages like these that went away once gluster rollback finished >> >> [2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22 >> [2016-07-09 15:27:49.555466] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] >> [2016-07-09 15:27:49.556574] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] >> [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) >> [2016-07-09 15:27:59.612477] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote operation failed [Operation not permitted] >> [2016-07-09 15:27:59.613700] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote operation failed [Operation not permitted] >> [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not permitted) >> >> David Gossage >> Carousel Checks Inc. | System Administrator >> Office 708.613.2284 >> _______________________________________________ >> Users mailing list >> >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > > > > > > ____________________________________________________________ __________________ > BODDEN-KLINIKEN Ribnitz-Damgarten GmbH > Sandhufe 2 > 18311 Ribnitz-Damgarten > > Telefon: 03821-700-0 > Fax: 03821-700-240 > > E-Mail: info@bodden-kliniken.de Internet: http://www.bodden-kliniken.de > > Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, Steuer-Nr.: 079/133/40188 > Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. Falko Milski > > Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorge- > sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, beachten Sie bitte, dass jede Form der Veröf- > fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser E-Mail unzulässig ist. Wir bitten Sie, sofort den > Absender zu informieren und die E-Mail zu löschen. > > > Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 > *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > > > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

David Gossage

6 p.m.

On Sat, Aug 13, 2016 at 8:19 AM, Scott <romracer@gmail.com> wrote:

...

Had a chance to upgrade my cluster to Gluster 3.7.14 and can confirm it works for me too where 3.7.12/13 did not.

I did find that you should NOT turn off network.remote-dio or turn on performance.strict-o-direct as suggested earlier in the thread. They will prevent dd (using direct flag) and other things from working properly. I'd leave those at network.remote-dio=enabled and performance.strict-o-direct=off.

Those were actually just suggested during a testing phase trying to trace down the issue. Neither of those 2 I think have ever been suggested as good practice. At least not for VM storage.

...

Hopefully we can see Gluster 3.7.14 moved out of testing repo soon.

Scott

On Tue, Aug 2, 2016 at 9:05 AM, David Gossage <dgossage@carouselchecks.com

...
wrote:

...
So far gluster 3.7.14 seems to have resolved issues at least on my test box. dd commands that failed previously now work with sharding on zfs backend,

Where before I couldn't even mount a new storage domain it now mounted and I have a test vm being created.

Still have to let VM run for a few days and make sure no locking freezing occurs but looks hopeful so far.

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

On Tue, Jul 26, 2016 at 8:15 AM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Tue, Jul 26, 2016 at 4:37 AM, Krutika Dhananjay <kdhananj@redhat.com> wrote:

...
Hi,

1. Could you please attach the glustershd logs from all three nodes?

Here are ccgl1 and ccgl2. as previously mentioned ccgl3 third node was down from bad nic so no relevant logs would be on that node.

...
2. Also, so far what we know is that the 'Operation not permitted' errors are on the main vm image itself and not its individual shards (ex deb61291-5176-4b81-8315-3f1cf8e3534d). Could you do the following: Get the inode number of .glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d (ls -li) from the first brick. I'll call this number INODE_NUMBER. Execute `find . -inum INODE_NUMBER` from the brick root on first brick to print the hard links against the file in the prev step and share the output.

[dgossage@ccgl1 ~]$ sudo ls -li /gluster1/BRICK1/1/.glusterfs/ de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d 16407 -rw-r--r--. 2 36 36 466 Jun 5 16:52 /gluster1/BRICK1/1/.glusterfs/ de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d [dgossage@ccgl1 ~]$ cd /gluster1/BRICK1/1/ [dgossage@ccgl1 1]$ sudo find . -inum 16407 ./7c73a8dd-a72e-4556-ac88-7f6813131e64/dom_md/metadata ./.glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d

...
3. Did you delete any vms at any point before or after the upgrade?

Immediately before or after on same day pretty sure I deleted nothing. During week prior I deleted a few dev vm's that were never setup and some the week after upgrade as I was preparing for moving disks off and on storage to get them sharded and felt it would be easier to just recreate some disks that had no data yet rather than move them off and on later.

...
-Krutika

On Mon, Jul 25, 2016 at 11:30 PM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Mon, Jul 25, 2016 at 9:58 AM, Krutika Dhananjay < kdhananj@redhat.com> wrote:

...
OK, could you try the following:

i. Set network.remote-dio to off # gluster volume set <VOL> network.remote-dio off

ii. Set performance.strict-o-direct to on # gluster volume set <VOL> performance.strict-o-direct on

iii. Stop the affected vm(s) and start again

and tell me if you notice any improvement?

Previous instll I had issue with is still on gluster 3.7.11

My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a locak disk right now isn't allowing me to add the gluster storage at all.

Keep getting some type of UI error

2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend. server.gwt.OvirtRemoteLoggingService] (default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785 2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend. server.gwt.OvirtRemoteLoggingService] (default task-33) [] Uncaught exception: : java.lang.ClassCastException at Unknown.ps(https://ccengine2.c arouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1 C7FDD91EDAA785.cache.html@3837) at Unknown.ts(https://ccengine2.c arouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1 C7FDD91EDAA785.cache.html@20) at Unknown.vs(https://ccengine2.c arouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1 C7FDD91EDAA785.cache.html@18) at Unknown.iJf(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@19) at Unknown.Xab(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@48) at Unknown.P8o(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@4447) at Unknown.jQr(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@21) at Unknown.A8o(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@51) at Unknown.u8o(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@101) at Unknown.Eap(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@10718) at Unknown.p8n(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@161) at Unknown.Cao(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@31) at Unknown.Bap(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@10469) at Unknown.kRn(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@49) at Unknown.nRn(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@438) at Unknown.eVn(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@40) at Unknown.hVn(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@25827) at Unknown.MTn(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@25) at Unknown.PTn(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@24052) at Unknown.KJe(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@21125) at Unknown.Izk(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@10384) at Unknown.P3(https://ccengine2.c arouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1 C7FDD91EDAA785.cache.html@137) at Unknown.g4(https://ccengine2.c arouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1 C7FDD91EDAA785.cache.html@8271) at Unknown.<anonymous>(https://cc engine2.carouselchecks.local/ovirt-engine/webadmin/430985F2 3DFC1C8BE1C7FDD91EDAA785.cache.html@65) at Unknown._t( https://ccengine2.carouselchecks.local/ovirt-engi ne/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@29) at Unknown.du(https://ccengine2.carouselchecks.local/ovirt-engi ne/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@57) at Unknown.<anonymous>(https://ccengine2.carouselchecks.local/ ovirt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@54)

...
-Krutika

On Mon, Jul 25, 2016 at 4:57 PM, Samuli Heinonen < samppah@neutraali.net> wrote:

> Hi, > > > On 25 Jul 2016, at 12:34, David Gossage < > dgossage@carouselchecks.com> wrote: > > > > On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay < > kdhananj@redhat.com> wrote: > > Hi, > > > > Thanks for the logs. So I have identified one issue from the logs > for which the fix is this: http://review.gluster.org/#/c/14669/. > Because of a bug in the code, ENOENT was getting converted to EPERM and > being propagated up the stack causing the reads to bail out early with > 'Operation not permitted' errors. > > I still need to find out two things: > > i) why there was a readv() sent on a non-existent (ENOENT) file > (this is important since some of the other users have not faced or reported > this issue on gluster-users with 3.7.13) > > ii) need to see if there's a way to work around this issue. > > > > Do you mind sharing the steps needed to be executed to run into > this issue? This is so that we can apply our patches, test and ensure they > fix the problem. > > > Unfortunately I can’t test this right away nor give exact steps how > to test this. This is just a theory but please correct me if you see some > mistakes. > > oVirt uses cache=none settings for VM’s by default which requires > direct I/O. oVirt also uses dd with iflag=direct to check that storage has > direct I/O enabled. Problems exist with GlusterFS with sharding enabled and > bricks running on ZFS on Linux. Everything seems to be fine with GlusterFS > 3.7.11 and problems exist at least with version .12 and .13. There has been > some posts saying that GlusterFS 3.8.x is also affected. > > Steps to reproduce: > 1. Sharded file is created with GlusterFS 3.7.11. Everything works > ok. > 2. GlusterFS is upgraded to 3.7.12+ > 3. Sharded file cannot be read or written with direct I/O enabled. > (Ie. oVirt uses to check storage connection with command "dd > if=/rhev/data-center/00000001-0001-0001-0001-0000000002b6/mastersd/dom_md/inbox > iflag=direct,fullblock count=1 bs=1024000”) > > Please let me know if you need more information. > > -samuli > > > Well after upgrade of gluster all I did was start ovirt hosts up > which launched and started their ha-agent and broker processes. I don't > believe I started getting any errors till it mounted GLUSTER1. I had > enabled sharding but had no sharded disk images yet. Not sure if the check > for shards would have caused that. Unfortunately I can't just update this > cluster and try and see what caused it as it has sme VM's users expect to > be available in few hours. > > > > I can see if I can get my test setup to recreate it. I think I'll > need to de-activate data center so I can detach the storage thats on xfs > and attach the one thats over zfs with sharding enabled. My test is 3 > bricks on same local machine, with 3 different volumes but I think im > running into sanlock issue or something as it won't mount more than one > volume that was created locally. > > > > > > -Krutika > > > > On Fri, Jul 22, 2016 at 7:17 PM, David Gossage < > dgossage@carouselchecks.com> wrote: > > Trimmed out the logs to just about when I was shutting down ovirt > servers for updates which was 14:30 UTC 2016-07-09 > > > > Pre-update settings were > > > > Volume Name: GLUSTER1 > > Type: Replicate > > Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f > > Status: Started > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 > > Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 > > Options Reconfigured: > > performance.readdir-ahead: on > > storage.owner-uid: 36 > > storage.owner-gid: 36 > > performance.quick-read: off > > performance.read-ahead: off > > performance.io-cache: off > > performance.stat-prefetch: off > > cluster.eager-lock: enable > > network.remote-dio: enable > > cluster.quorum-type: auto > > cluster.server-quorum-type: server > > server.allow-insecure: on > > cluster.self-heal-window-size: 1024 > > cluster.background-self-heal-count: 16 > > performance.strict-write-ordering: off > > nfs.disable: on > > nfs.addr-namelookup: off > > nfs.enable-ino32: off > > > > At the time of updates ccgl3 was offline from bad nic on server > but had been so for about a week with no issues in volume > > > > Shortly after update I added these settings to enable sharding but > did not as of yet have any VM images sharded. > > features.shard-block-size: 64MB > > features.shard: on > > > > > > > > > > David Gossage > > Carousel Checks Inc. | System Administrator > > Office 708.613.2284 > > > > On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay < > kdhananj@redhat.com> wrote: > > Hi David, > > > > Could you also share the brick logs from the affected volume? > They're located at /var/log/glusterfs/bricks/<hyp > henated-path-to-the-brick-directory>.log. > > > > Also, could you share the volume configuration (output of `gluster > volume info <VOL>`) for the affected volume(s) AND at the time you actually > saw this issue? > > > > -Krutika > > > > > > > > > > On Thu, Jul 21, 2016 at 11:23 PM, David Gossage < > dgossage@carouselchecks.com> wrote: > > On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> > wrote: > > Hi David, > > > > My backend storage is ZFS. > > > > I thought about moving from FUSE to NFS mounts for my Gluster > volumes to help test. But since I use hosted engine this would be a real > pain. Its difficult to modify the storage domain type/path in the > hosted-engine.conf. And I don't want to go through the process of > re-deploying hosted engine. > > > > > > I found this > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1347553 > > > > Not sure if related. > > > > But I also have zfs backend, another user in gluster mailing list > had issues and used zfs backend although she used proxmox and got it > working by changing disk to writeback cache I think it was. > > > > I also use hosted engine, but I run my gluster volume for HE > actually on a LVM separate from zfs on xfs and if i recall it did not have > the issues my gluster on zfs did. I'm wondering now if the issue was zfs > settings. > > > > Hopefully should have a test machone up soon I can play around > with more. > > > > Scott > > > > On Thu, Jul 21, 2016 at 11:36 AM David Gossage < > dgossage@carouselchecks.com> wrote: > > What back end storage do you run gluster on? xfs/zfs/ext4 etc? > > > > David Gossage > > Carousel Checks Inc. | System Administrator > > Office 708.613.2284 > > > > On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> wrote: > > I get similar problems with oVirt 4.0.1 and hosted engine. After > upgrading all my hosts to Gluster 3.7.13 (client and server), I get the > following: > > > > $ sudo hosted-engine --set-maintenance --mode=none > > Traceback (most recent call last): > > File "/usr/lib64/python2.7/runpy.py", line 162, in > _run_module_as_main > > "__main__", fname, loader, pkg_name) > > File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code > > exec code in run_globals > > File "/usr/lib/python2.7/site-packa > ges/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in > <module> > > if not maintenance.set_mode(sys.argv[1]): > > File "/usr/lib/python2.7/site-packa > ges/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in > set_mode > > value=m_global, > > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", > line 259, in set_maintenance_mode > > str(value)) > > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", > line 204, in set_global_md_flag > > all_stats = broker.get_stats_from_storage(service) > > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > line 232, in get_stats_from_storage > > result = self._checked_communicate(request) > > File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > line 260, in _checked_communicate > > .format(message or response)) > > ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request > failed: failed to read metadata: [Errno 1] Operation not permitted > > > > If I only upgrade one host, then things will continue to work but > my nodes are constantly healing shards. My logs are also flooded with: > > > > [2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 274714: READ => -1 gfid=4 > > 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation > not permitted) > > The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] > 0-data-client-0: remote operation failed [Operation not permitted]" > repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 > 13:15:04.132226] > > The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] > 0-data-client-1: remote operation failed [Operation not permitted]" > repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 > 13:15:14.137178] > > The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] > 0-data-client-2: remote operation failed [Operation not permitted]" > repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 > 13:15:14.137666] > > [2016-07-21 13:15:24.134647] W [MSGID: 114031] > [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: > remote operation failed [Operation not permitted] > > [2016-07-21 13:15:24.134764] W [MSGID: 114031] > [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: > remote operation failed [Operation not permitted] > > [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 > fd=0x7f19bc0038f4 (Operation not permitted) > > [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 > fd=0x7f19bc0041d0 (Operation not permitted) > > [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 > fd=0x7f19bc0038f4 (Operation not permitted) > > [2016-07-21 13:15:54.133582] W [MSGID: 114031] > [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: > remote operation failed [Operation not permitted] > > [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 > fd=0x7f19bc0036d8 (Operation not permitted) > > [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 > fd=0x7f19bc0041d0 (Operation not permitted) > > [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 > fd=0x7f19bc0036d8 (Operation not permitted) > > > > Scott > > > > > > On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < > f.rothenstein@bodden-kliniken.de> wrote: > > Hey Devid, > > > > I have the very same problem on my test-cluster, despite on > running ovirt 4.0. > > If you access your volumes via NFS all is fine, problem is FUSE. I > stayed on 3.7.13, but have no solution yet, now I use NFS. > > > > Frank > > > > Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage: > >> Anyone running one of recent 3.6.x lines and gluster using > 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug > fixes, but have been told by users on gluster mail list due to some gluster > changes I'd need to change the disk parameters to use writeback cache. > Something to do with aio support being removed. > >> > >> I believe this could be done with custom parameters? But I > believe strage tests are done using dd and would they fail with current > settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to > stability isues where gluster storage would go into down state and always > show N/A as space available/used. Even if hosts saw storage still and VM's > were running on it on all 3 hosts. > >> > >> Saw a lot of messages like these that went away once gluster > rollback finished > >> > >> [2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel > 7.22 > >> [2016-07-09 15:27:49.555466] W [MSGID: 114031] > [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: > remote operation failed [Operation not permitted] > >> [2016-07-09 15:27:49.556574] W [MSGID: 114031] > [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: > remote operation failed [Operation not permitted] > >> [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d > fd=0x7f5224002f68 (Operation not permitted) > >> [2016-07-09 15:27:59.612477] W [MSGID: 114031] > [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: > remote operation failed [Operation not permitted] > >> [2016-07-09 15:27:59.613700] W [MSGID: 114031] > [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: > remote operation failed [Operation not permitted] > >> [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] > 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d > fd=0x7f5224002f68 (Operation not permitted) > >> > >> David Gossage > >> Carousel Checks Inc. | System Administrator > >> Office 708.613.2284 > >> _______________________________________________ > >> Users mailing list > >> > >> Users@ovirt.org > >> http://lists.ovirt.org/mailman/listinfo/users > > > > > > > > > > > > ____________________________________________________________ > __________________ > > BODDEN-KLINIKEN Ribnitz-Damgarten GmbH > > Sandhufe 2 > > 18311 Ribnitz-Damgarten > > > > Telefon: 03821-700-0 > > Fax: 03821-700-240 > > > > E-Mail: info@bodden-kliniken.de Internet: > http://www.bodden-kliniken.de > > > > Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, > Steuer-Nr.: 079/133/40188 > > Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. > Falko Milski > > > > Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten > Adressaten bestimmt. Wenn Sie nicht der vorge- > > sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, > beachten Sie bitte, dass jede Form der Veröf- > > fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser > E-Mail unzulässig ist. Wir bitten Sie, sofort den > > Absender zu informieren und die E-Mail zu löschen. > > > > > > Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 > > *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** > > _______________________________________________ > > Users mailing list > > Users@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/users > > > > > > _______________________________________________ > > Users mailing list > > Users@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/users > > > > > > > > > > > > _______________________________________________ > > Users mailing list > > Users@ovirt.org > > http://lists.ovirt.org/mailman/listinfo/users > >

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

David Gossage

6:23 p.m.

On Sat, Aug 13, 2016 at 11:00 AM, David Gossage <dgossage@carouselchecks.com

...

wrote:

...

On Sat, Aug 13, 2016 at 8:19 AM, Scott <romracer@gmail.com> wrote:

...
Had a chance to upgrade my cluster to Gluster 3.7.14 and can confirm it works for me too where 3.7.12/13 did not.

I did find that you should NOT turn off network.remote-dio or turn on performance.strict-o-direct as suggested earlier in the thread. They will prevent dd (using direct flag) and other things from working properly. I'd leave those at network.remote-dio=enabled and performance.strict-o-direct=off.

Those were actually just suggested during a testing phase trying to trace down the issue. Neither of those 2 I think have ever been suggested as good practice. At least not for VM storage.

...
Hopefully we can see Gluster 3.7.14 moved out of testing repo soon.

Is it still in testing repo? I updated my production cluster I think 2 weeks ago from default repo on centos7.

...

...
Scott

On Tue, Aug 2, 2016 at 9:05 AM, David Gossage < dgossage@carouselchecks.com> wrote:

...
So far gluster 3.7.14 seems to have resolved issues at least on my test box. dd commands that failed previously now work with sharding on zfs backend,

Where before I couldn't even mount a new storage domain it now mounted and I have a test vm being created.

Still have to let VM run for a few days and make sure no locking freezing occurs but looks hopeful so far.

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

On Tue, Jul 26, 2016 at 8:15 AM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Tue, Jul 26, 2016 at 4:37 AM, Krutika Dhananjay <kdhananj@redhat.com

...
wrote:

...
Hi,

1. Could you please attach the glustershd logs from all three nodes?

Here are ccgl1 and ccgl2. as previously mentioned ccgl3 third node was down from bad nic so no relevant logs would be on that node.

...
2. Also, so far what we know is that the 'Operation not permitted' errors are on the main vm image itself and not its individual shards (ex deb61291-5176-4b81-8315-3f1cf8e3534d). Could you do the following: Get the inode number of .glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d (ls -li) from the first brick. I'll call this number INODE_NUMBER. Execute `find . -inum INODE_NUMBER` from the brick root on first brick to print the hard links against the file in the prev step and share the output.

[dgossage@ccgl1 ~]$ sudo ls -li /gluster1/BRICK1/1/.glusterfs/ de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d 16407 -rw-r--r--. 2 36 36 466 Jun 5 16:52 /gluster1/BRICK1/1/.glusterfs/de/b6/deb61291-5176-4b81-8315- 3f1cf8e3534d [dgossage@ccgl1 ~]$ cd /gluster1/BRICK1/1/ [dgossage@ccgl1 1]$ sudo find . -inum 16407 ./7c73a8dd-a72e-4556-ac88-7f6813131e64/dom_md/metadata ./.glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d

...
3. Did you delete any vms at any point before or after the upgrade?

Immediately before or after on same day pretty sure I deleted nothing. During week prior I deleted a few dev vm's that were never setup and some the week after upgrade as I was preparing for moving disks off and on storage to get them sharded and felt it would be easier to just recreate some disks that had no data yet rather than move them off and on later.

...
-Krutika

On Mon, Jul 25, 2016 at 11:30 PM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Mon, Jul 25, 2016 at 9:58 AM, Krutika Dhananjay < kdhananj@redhat.com> wrote:

> OK, could you try the following: > > i. Set network.remote-dio to off > # gluster volume set <VOL> network.remote-dio off > > ii. Set performance.strict-o-direct to on > # gluster volume set <VOL> performance.strict-o-direct on > > iii. Stop the affected vm(s) and start again > > and tell me if you notice any improvement? > > Previous instll I had issue with is still on gluster 3.7.11

My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a locak disk right now isn't allowing me to add the gluster storage at all.

Keep getting some type of UI error

2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend. server.gwt.OvirtRemoteLoggingService] (default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785 2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend. server.gwt.OvirtRemoteLoggingService] (default task-33) [] Uncaught exception: : java.lang.ClassCastException at Unknown.ps(https://ccengine2.c arouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1 C7FDD91EDAA785.cache.html@3837) at Unknown.ts(https://ccengine2.c arouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1 C7FDD91EDAA785.cache.html@20) at Unknown.vs(https://ccengine2.c arouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1 C7FDD91EDAA785.cache.html@18) at Unknown.iJf(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@19) at Unknown.Xab(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@48) at Unknown.P8o(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@4447) at Unknown.jQr(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@21) at Unknown.A8o(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@51) at Unknown.u8o(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@101) at Unknown.Eap(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@10718) at Unknown.p8n(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@161) at Unknown.Cao(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@31) at Unknown.Bap(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@10469) at Unknown.kRn(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@49) at Unknown.nRn(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@438) at Unknown.eVn(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@40) at Unknown.hVn(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@25827) at Unknown.MTn(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@25) at Unknown.PTn(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@24052) at Unknown.KJe(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@21125) at Unknown.Izk(https://ccengine2. carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE 1C7FDD91EDAA785.cache.html@10384) at Unknown.P3(https://ccengine2.c arouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1 C7FDD91EDAA785.cache.html@137) at Unknown.g4(https://ccengine2.c arouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8BE1 C7FDD91EDAA785.cache.html@8271) at Unknown.<anonymous>(https://cc engine2.carouselchecks.local/ovirt-engine/webadmin/430985F23 DFC1C8BE1C7FDD91EDAA785.cache.html@65) at Unknown._t( https://ccengine2.carouselchecks.local/ovirt-engi ne/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@29) at Unknown.du(https://ccengine2.carouselchecks.local/ovirt-engi ne/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@57) at Unknown.<anonymous>(https://ccengine2.carouselchecks.local/o virt-engine/webadmin/430985F23DFC1C8BE1C7FDD91EDAA785.cache.html@54)

> -Krutika > > On Mon, Jul 25, 2016 at 4:57 PM, Samuli Heinonen < > samppah@neutraali.net> wrote: > >> Hi, >> >> > On 25 Jul 2016, at 12:34, David Gossage < >> dgossage@carouselchecks.com> wrote: >> > >> > On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay < >> kdhananj@redhat.com> wrote: >> > Hi, >> > >> > Thanks for the logs. So I have identified one issue from the logs >> for which the fix is this: http://review.gluster.org/#/c/14669/. >> Because of a bug in the code, ENOENT was getting converted to EPERM and >> being propagated up the stack causing the reads to bail out early with >> 'Operation not permitted' errors. >> > I still need to find out two things: >> > i) why there was a readv() sent on a non-existent (ENOENT) file >> (this is important since some of the other users have not faced or reported >> this issue on gluster-users with 3.7.13) >> > ii) need to see if there's a way to work around this issue. >> > >> > Do you mind sharing the steps needed to be executed to run into >> this issue? This is so that we can apply our patches, test and ensure they >> fix the problem. >> >> >> Unfortunately I can’t test this right away nor give exact steps how >> to test this. This is just a theory but please correct me if you see some >> mistakes. >> >> oVirt uses cache=none settings for VM’s by default which requires >> direct I/O. oVirt also uses dd with iflag=direct to check that storage has >> direct I/O enabled. Problems exist with GlusterFS with sharding enabled and >> bricks running on ZFS on Linux. Everything seems to be fine with GlusterFS >> 3.7.11 and problems exist at least with version .12 and .13. There has been >> some posts saying that GlusterFS 3.8.x is also affected. >> >> Steps to reproduce: >> 1. Sharded file is created with GlusterFS 3.7.11. Everything works >> ok. >> 2. GlusterFS is upgraded to 3.7.12+ >> 3. Sharded file cannot be read or written with direct I/O enabled. >> (Ie. oVirt uses to check storage connection with command "dd >> if=/rhev/data-center/00000001-0001-0001-0001-0000000002b6/mastersd/dom_md/inbox >> iflag=direct,fullblock count=1 bs=1024000”) >> >> Please let me know if you need more information. >> >> -samuli >> >> > Well after upgrade of gluster all I did was start ovirt hosts up >> which launched and started their ha-agent and broker processes. I don't >> believe I started getting any errors till it mounted GLUSTER1. I had >> enabled sharding but had no sharded disk images yet. Not sure if the check >> for shards would have caused that. Unfortunately I can't just update this >> cluster and try and see what caused it as it has sme VM's users expect to >> be available in few hours. >> > >> > I can see if I can get my test setup to recreate it. I think >> I'll need to de-activate data center so I can detach the storage thats on >> xfs and attach the one thats over zfs with sharding enabled. My test is 3 >> bricks on same local machine, with 3 different volumes but I think im >> running into sanlock issue or something as it won't mount more than one >> volume that was created locally. >> > >> > >> > -Krutika >> > >> > On Fri, Jul 22, 2016 at 7:17 PM, David Gossage < >> dgossage@carouselchecks.com> wrote: >> > Trimmed out the logs to just about when I was shutting down ovirt >> servers for updates which was 14:30 UTC 2016-07-09 >> > >> > Pre-update settings were >> > >> > Volume Name: GLUSTER1 >> > Type: Replicate >> > Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f >> > Status: Started >> > Number of Bricks: 1 x 3 = 3 >> > Transport-type: tcp >> > Bricks: >> > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 >> > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 >> > Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 >> > Options Reconfigured: >> > performance.readdir-ahead: on >> > storage.owner-uid: 36 >> > storage.owner-gid: 36 >> > performance.quick-read: off >> > performance.read-ahead: off >> > performance.io-cache: off >> > performance.stat-prefetch: off >> > cluster.eager-lock: enable >> > network.remote-dio: enable >> > cluster.quorum-type: auto >> > cluster.server-quorum-type: server >> > server.allow-insecure: on >> > cluster.self-heal-window-size: 1024 >> > cluster.background-self-heal-count: 16 >> > performance.strict-write-ordering: off >> > nfs.disable: on >> > nfs.addr-namelookup: off >> > nfs.enable-ino32: off >> > >> > At the time of updates ccgl3 was offline from bad nic on server >> but had been so for about a week with no issues in volume >> > >> > Shortly after update I added these settings to enable sharding >> but did not as of yet have any VM images sharded. >> > features.shard-block-size: 64MB >> > features.shard: on >> > >> > >> > >> > >> > David Gossage >> > Carousel Checks Inc. | System Administrator >> > Office 708.613.2284 >> > >> > On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay < >> kdhananj@redhat.com> wrote: >> > Hi David, >> > >> > Could you also share the brick logs from the affected volume? >> They're located at /var/log/glusterfs/bricks/<hyp >> henated-path-to-the-brick-directory>.log. >> > >> > Also, could you share the volume configuration (output of >> `gluster volume info <VOL>`) for the affected volume(s) AND at the time you >> actually saw this issue? >> > >> > -Krutika >> > >> > >> > >> > >> > On Thu, Jul 21, 2016 at 11:23 PM, David Gossage < >> dgossage@carouselchecks.com> wrote: >> > On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> >> wrote: >> > Hi David, >> > >> > My backend storage is ZFS. >> > >> > I thought about moving from FUSE to NFS mounts for my Gluster >> volumes to help test. But since I use hosted engine this would be a real >> pain. Its difficult to modify the storage domain type/path in the >> hosted-engine.conf. And I don't want to go through the process of >> re-deploying hosted engine. >> > >> > >> > I found this >> > >> > https://bugzilla.redhat.com/show_bug.cgi?id=1347553 >> > >> > Not sure if related. >> > >> > But I also have zfs backend, another user in gluster mailing list >> had issues and used zfs backend although she used proxmox and got it >> working by changing disk to writeback cache I think it was. >> > >> > I also use hosted engine, but I run my gluster volume for HE >> actually on a LVM separate from zfs on xfs and if i recall it did not have >> the issues my gluster on zfs did. I'm wondering now if the issue was zfs >> settings. >> > >> > Hopefully should have a test machone up soon I can play around >> with more. >> > >> > Scott >> > >> > On Thu, Jul 21, 2016 at 11:36 AM David Gossage < >> dgossage@carouselchecks.com> wrote: >> > What back end storage do you run gluster on? xfs/zfs/ext4 etc? >> > >> > David Gossage >> > Carousel Checks Inc. | System Administrator >> > Office 708.613.2284 >> > >> > On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> >> wrote: >> > I get similar problems with oVirt 4.0.1 and hosted engine. After >> upgrading all my hosts to Gluster 3.7.13 (client and server), I get the >> following: >> > >> > $ sudo hosted-engine --set-maintenance --mode=none >> > Traceback (most recent call last): >> > File "/usr/lib64/python2.7/runpy.py", line 162, in >> _run_module_as_main >> > "__main__", fname, loader, pkg_name) >> > File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code >> > exec code in run_globals >> > File "/usr/lib/python2.7/site-packa >> ges/ovirt_hosted_engine_setup/set_maintenance.py", line 73, in >> <module> >> > if not maintenance.set_mode(sys.argv[1]): >> > File "/usr/lib/python2.7/site-packa >> ges/ovirt_hosted_engine_setup/set_maintenance.py", line 61, in >> set_mode >> > value=m_global, >> > File "/usr/lib/python2.7/site-packa >> ges/ovirt_hosted_engine_ha/client/client.py", line 259, in >> set_maintenance_mode >> > str(value)) >> > File "/usr/lib/python2.7/site-packa >> ges/ovirt_hosted_engine_ha/client/client.py", line 204, in >> set_global_md_flag >> > all_stats = broker.get_stats_from_storage(service) >> > File "/usr/lib/python2.7/site-packa >> ges/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in >> get_stats_from_storage >> > result = self._checked_communicate(request) >> > File "/usr/lib/python2.7/site-packa >> ges/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in >> _checked_communicate >> > .format(message or response)) >> > ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request >> failed: failed to read metadata: [Errno 1] Operation not permitted >> > >> > If I only upgrade one host, then things will continue to work but >> my nodes are constantly healing shards. My logs are also flooded with: >> > >> > [2016-07-21 13:15:14.137734] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 274714: READ => -1 gfid=4 >> > 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation >> not permitted) >> > The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] >> 0-data-client-0: remote operation failed [Operation not permitted]" >> repeated 6 times between [2016-07-21 13:13:24.134985] and [2016-07-21 >> 13:15:04.132226] >> > The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] >> 0-data-client-1: remote operation failed [Operation not permitted]" >> repeated 8 times between [2016-07-21 13:13:34.133116] and [2016-07-21 >> 13:15:14.137178] >> > The message "W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] >> 0-data-client-2: remote operation failed [Operation not permitted]" >> repeated 7 times between [2016-07-21 13:13:24.135071] and [2016-07-21 >> 13:15:14.137666] >> > [2016-07-21 13:15:24.134647] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: >> remote operation failed [Operation not permitted] >> > [2016-07-21 13:15:24.134764] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: >> remote operation failed [Operation not permitted] >> > [2016-07-21 13:15:24.134793] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 274741: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 >> fd=0x7f19bc0038f4 (Operation not permitted) >> > [2016-07-21 13:15:34.135413] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 274756: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 >> fd=0x7f19bc0041d0 (Operation not permitted) >> > [2016-07-21 13:15:44.141062] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 274818: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 >> fd=0x7f19bc0038f4 (Operation not permitted) >> > [2016-07-21 13:15:54.133582] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: >> remote operation failed [Operation not permitted] >> > [2016-07-21 13:15:54.133629] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 274853: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 >> fd=0x7f19bc0036d8 (Operation not permitted) >> > [2016-07-21 13:16:04.133666] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 274879: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 >> fd=0x7f19bc0041d0 (Operation not permitted) >> > [2016-07-21 13:16:14.134954] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 274894: READ => -1 gfid=441f2789-f6b1-4918-a280-1b9905a11429 >> fd=0x7f19bc0036d8 (Operation not permitted) >> > >> > Scott >> > >> > >> > On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < >> f.rothenstein@bodden-kliniken.de> wrote: >> > Hey Devid, >> > >> > I have the very same problem on my test-cluster, despite on >> running ovirt 4.0. >> > If you access your volumes via NFS all is fine, problem is FUSE. >> I stayed on 3.7.13, but have no solution yet, now I use NFS. >> > >> > Frank >> > >> > Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage: >> >> Anyone running one of recent 3.6.x lines and gluster using >> 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug >> fixes, but have been told by users on gluster mail list due to some gluster >> changes I'd need to change the disk parameters to use writeback cache. >> Something to do with aio support being removed. >> >> >> >> I believe this could be done with custom parameters? But I >> believe strage tests are done using dd and would they fail with current >> settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to >> stability isues where gluster storage would go into down state and always >> show N/A as space available/used. Even if hosts saw storage still and VM's >> were running on it on all 3 hosts. >> >> >> >> Saw a lot of messages like these that went away once gluster >> rollback finished >> >> >> >> [2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] >> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel >> 7.22 >> >> [2016-07-09 15:27:49.555466] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: >> remote operation failed [Operation not permitted] >> >> [2016-07-09 15:27:49.556574] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: >> remote operation failed [Operation not permitted] >> >> [2016-07-09 15:27:49.556659] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 80: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d >> fd=0x7f5224002f68 (Operation not permitted) >> >> [2016-07-09 15:27:59.612477] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: >> remote operation failed [Operation not permitted] >> >> [2016-07-09 15:27:59.613700] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: >> remote operation failed [Operation not permitted] >> >> [2016-07-09 15:27:59.613781] W [fuse-bridge.c:2227:fuse_readv_cbk] >> 0-glusterfs-fuse: 168: READ => -1 gfid=deb61291-5176-4b81-8315-3f1cf8e3534d >> fd=0x7f5224002f68 (Operation not permitted) >> >> >> >> David Gossage >> >> Carousel Checks Inc. | System Administrator >> >> Office 708.613.2284 >> >> _______________________________________________ >> >> Users mailing list >> >> >> >> Users@ovirt.org >> >> http://lists.ovirt.org/mailman/listinfo/users >> > >> > >> > >> > >> > >> > ____________________________________________________________ >> __________________ >> > BODDEN-KLINIKEN Ribnitz-Damgarten GmbH >> > Sandhufe 2 >> > 18311 Ribnitz-Damgarten >> > >> > Telefon: 03821-700-0 >> > Fax: 03821-700-240 >> > >> > E-Mail: info@bodden-kliniken.de Internet: >> http://www.bodden-kliniken.de >> > >> > Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, >> Steuer-Nr.: 079/133/40188 >> > Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. >> Falko Milski >> > >> > Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten >> Adressaten bestimmt. Wenn Sie nicht der vorge- >> > sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, >> beachten Sie bitte, dass jede Form der Veröf- >> > fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser >> E-Mail unzulässig ist. Wir bitten Sie, sofort den >> > Absender zu informieren und die E-Mail zu löschen. >> > >> > >> > Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 >> > *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** >> > _______________________________________________ >> > Users mailing list >> > Users@ovirt.org >> > http://lists.ovirt.org/mailman/listinfo/users >> > >> > >> > _______________________________________________ >> > Users mailing list >> > Users@ovirt.org >> > http://lists.ovirt.org/mailman/listinfo/users >> > >> > >> > >> > >> > >> > _______________________________________________ >> > Users mailing list >> > Users@ovirt.org >> > http://lists.ovirt.org/mailman/listinfo/users >> >> >

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Scott

14 Aug 14 Aug

1:08 a.m.

Nope, not in any official repo. I only use those suggested by oVirt, ie: http://centos.bhs.mirrors.ovh.net/ftp.centos.org/7/storage/x86_64/gluster-3.... No 3.7.14 there. Thanks though. Scott On Sat, Aug 13, 2016 at 11:23 AM David Gossage <dgossage@carouselchecks.com> wrote:

...

On Sat, Aug 13, 2016 at 11:00 AM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Sat, Aug 13, 2016 at 8:19 AM, Scott <romracer@gmail.com> wrote:

...
Had a chance to upgrade my cluster to Gluster 3.7.14 and can confirm it works for me too where 3.7.12/13 did not.

I did find that you should NOT turn off network.remote-dio or turn on performance.strict-o-direct as suggested earlier in the thread. They will prevent dd (using direct flag) and other things from working properly. I'd leave those at network.remote-dio=enabled and performance.strict-o-direct=off.

Those were actually just suggested during a testing phase trying to trace down the issue. Neither of those 2 I think have ever been suggested as good practice. At least not for VM storage.

...
Hopefully we can see Gluster 3.7.14 moved out of testing repo soon.

Is it still in testing repo? I updated my production cluster I think 2 weeks ago from default repo on centos7.

...
...
Scott

On Tue, Aug 2, 2016 at 9:05 AM, David Gossage < dgossage@carouselchecks.com> wrote:

...
So far gluster 3.7.14 seems to have resolved issues at least on my test box. dd commands that failed previously now work with sharding on zfs backend,

Where before I couldn't even mount a new storage domain it now mounted and I have a test vm being created.

Still have to let VM run for a few days and make sure no locking freezing occurs but looks hopeful so far.

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

On Tue, Jul 26, 2016 at 8:15 AM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Tue, Jul 26, 2016 at 4:37 AM, Krutika Dhananjay < kdhananj@redhat.com> wrote:

...
Hi,

1. Could you please attach the glustershd logs from all three nodes?

Here are ccgl1 and ccgl2. as previously mentioned ccgl3 third node was down from bad nic so no relevant logs would be on that node.

...
2. Also, so far what we know is that the 'Operation not permitted' errors are on the main vm image itself and not its individual shards (ex deb61291-5176-4b81-8315-3f1cf8e3534d). Could you do the following: Get the inode number of .glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d (ls -li) from the first brick. I'll call this number INODE_NUMBER. Execute `find . -inum INODE_NUMBER` from the brick root on first brick to print the hard links against the file in the prev step and share the output.

[dgossage@ccgl1 ~]$ sudo ls -li /gluster1/BRICK1/1/.glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d 16407 -rw-r--r--. 2 36 36 466 Jun 5 16:52 /gluster1/BRICK1/1/.glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d [dgossage@ccgl1 ~]$ cd /gluster1/BRICK1/1/ [dgossage@ccgl1 1]$ sudo find . -inum 16407 ./7c73a8dd-a72e-4556-ac88-7f6813131e64/dom_md/metadata ./.glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d

...
3. Did you delete any vms at any point before or after the upgrade?

Immediately before or after on same day pretty sure I deleted nothing. During week prior I deleted a few dev vm's that were never setup and some the week after upgrade as I was preparing for moving disks off and on storage to get them sharded and felt it would be easier to just recreate some disks that had no data yet rather than move them off and on later.

...
-Krutika

On Mon, Jul 25, 2016 at 11:30 PM, David Gossage < dgossage@carouselchecks.com> wrote:

> > On Mon, Jul 25, 2016 at 9:58 AM, Krutika Dhananjay < > kdhananj@redhat.com> wrote: > >> OK, could you try the following: >> >> i. Set network.remote-dio to off >> # gluster volume set <VOL> network.remote-dio off >> >> ii. Set performance.strict-o-direct to on >> # gluster volume set <VOL> performance.strict-o-direct on >> >> iii. Stop the affected vm(s) and start again >> >> and tell me if you notice any improvement? >> >> > Previous instll I had issue with is still on gluster 3.7.11 > > My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a > locak disk right now isn't allowing me to add the gluster storage at all. > > Keep getting some type of UI error > > 2016-07-25 12:49:09,277 ERROR > [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] > (default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785 > 2016-07-25 12:49:09,277 ERROR > [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] > (default task-33) [] Uncaught exception: : java.lang.ClassCastException > at Unknown.ps( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.ts( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.vs( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.iJf( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.Xab( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.P8o( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.jQr( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.A8o( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.u8o( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.Eap( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.p8n( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.Cao( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.Bap( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.kRn( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.nRn( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.eVn( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.hVn( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.MTn( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.PTn( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.KJe( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.Izk( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.P3( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.g4( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.<anonymous>( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown._t( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.du( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) > at Unknown.<anonymous>( > https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8... > ) > > >> -Krutika >> >> On Mon, Jul 25, 2016 at 4:57 PM, Samuli Heinonen < >> samppah@neutraali.net> wrote: >> >>> Hi, >>> >>> > On 25 Jul 2016, at 12:34, David Gossage < >>> dgossage@carouselchecks.com> wrote: >>> > >>> > On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay < >>> kdhananj@redhat.com> wrote: >>> > Hi, >>> > >>> > Thanks for the logs. So I have identified one issue from the >>> logs for which the fix is this: >>> http://review.gluster.org/#/c/14669/. Because of a bug in the >>> code, ENOENT was getting converted to EPERM and being propagated up the >>> stack causing the reads to bail out early with 'Operation not permitted' >>> errors. >>> > I still need to find out two things: >>> > i) why there was a readv() sent on a non-existent (ENOENT) file >>> (this is important since some of the other users have not faced or reported >>> this issue on gluster-users with 3.7.13) >>> > ii) need to see if there's a way to work around this issue. >>> > >>> > Do you mind sharing the steps needed to be executed to run into >>> this issue? This is so that we can apply our patches, test and ensure they >>> fix the problem. >>> >>> >>> Unfortunately I can’t test this right away nor give exact steps >>> how to test this. This is just a theory but please correct me if you see >>> some mistakes. >>> >>> oVirt uses cache=none settings for VM’s by default which requires >>> direct I/O. oVirt also uses dd with iflag=direct to check that storage has >>> direct I/O enabled. Problems exist with GlusterFS with sharding enabled and >>> bricks running on ZFS on Linux. Everything seems to be fine with GlusterFS >>> 3.7.11 and problems exist at least with version .12 and .13. There has been >>> some posts saying that GlusterFS 3.8.x is also affected. >>> >>> Steps to reproduce: >>> 1. Sharded file is created with GlusterFS 3.7.11. Everything works >>> ok. >>> 2. GlusterFS is upgraded to 3.7.12+ >>> 3. Sharded file cannot be read or written with direct I/O enabled. >>> (Ie. oVirt uses to check storage connection with command "dd >>> if=/rhev/data-center/00000001-0001-0001-0001-0000000002b6/mastersd/dom_md/inbox >>> iflag=direct,fullblock count=1 bs=1024000”) >>> >>> Please let me know if you need more information. >>> >>> -samuli >>> >>> > Well after upgrade of gluster all I did was start ovirt hosts up >>> which launched and started their ha-agent and broker processes. I don't >>> believe I started getting any errors till it mounted GLUSTER1. I had >>> enabled sharding but had no sharded disk images yet. Not sure if the check >>> for shards would have caused that. Unfortunately I can't just update this >>> cluster and try and see what caused it as it has sme VM's users expect to >>> be available in few hours. >>> > >>> > I can see if I can get my test setup to recreate it. I think >>> I'll need to de-activate data center so I can detach the storage thats on >>> xfs and attach the one thats over zfs with sharding enabled. My test is 3 >>> bricks on same local machine, with 3 different volumes but I think im >>> running into sanlock issue or something as it won't mount more than one >>> volume that was created locally. >>> > >>> > >>> > -Krutika >>> > >>> > On Fri, Jul 22, 2016 at 7:17 PM, David Gossage < >>> dgossage@carouselchecks.com> wrote: >>> > Trimmed out the logs to just about when I was shutting down >>> ovirt servers for updates which was 14:30 UTC 2016-07-09 >>> > >>> > Pre-update settings were >>> > >>> > Volume Name: GLUSTER1 >>> > Type: Replicate >>> > Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f >>> > Status: Started >>> > Number of Bricks: 1 x 3 = 3 >>> > Transport-type: tcp >>> > Bricks: >>> > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 >>> > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 >>> > Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 >>> > Options Reconfigured: >>> > performance.readdir-ahead: on >>> > storage.owner-uid: 36 >>> > storage.owner-gid: 36 >>> > performance.quick-read: off >>> > performance.read-ahead: off >>> > performance.io-cache: off >>> > performance.stat-prefetch: off >>> > cluster.eager-lock: enable >>> > network.remote-dio: enable >>> > cluster.quorum-type: auto >>> > cluster.server-quorum-type: server >>> > server.allow-insecure: on >>> > cluster.self-heal-window-size: 1024 >>> > cluster.background-self-heal-count: 16 >>> > performance.strict-write-ordering: off >>> > nfs.disable: on >>> > nfs.addr-namelookup: off >>> > nfs.enable-ino32: off >>> > >>> > At the time of updates ccgl3 was offline from bad nic on server >>> but had been so for about a week with no issues in volume >>> > >>> > Shortly after update I added these settings to enable sharding >>> but did not as of yet have any VM images sharded. >>> > features.shard-block-size: 64MB >>> > features.shard: on >>> > >>> > >>> > >>> > >>> > David Gossage >>> > Carousel Checks Inc. | System Administrator >>> > Office 708.613.2284 >>> > >>> > On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay < >>> kdhananj@redhat.com> wrote: >>> > Hi David, >>> > >>> > Could you also share the brick logs from the affected volume? >>> They're located at >>> /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log. >>> > >>> > Also, could you share the volume configuration (output of >>> `gluster volume info <VOL>`) for the affected volume(s) AND at the time you >>> actually saw this issue? >>> > >>> > -Krutika >>> > >>> > >>> > >>> > >>> > On Thu, Jul 21, 2016 at 11:23 PM, David Gossage < >>> dgossage@carouselchecks.com> wrote: >>> > On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> >>> wrote: >>> > Hi David, >>> > >>> > My backend storage is ZFS. >>> > >>> > I thought about moving from FUSE to NFS mounts for my Gluster >>> volumes to help test. But since I use hosted engine this would be a real >>> pain. Its difficult to modify the storage domain type/path in the >>> hosted-engine.conf. And I don't want to go through the process of >>> re-deploying hosted engine. >>> > >>> > >>> > I found this >>> > >>> > https://bugzilla.redhat.com/show_bug.cgi?id=1347553 >>> > >>> > Not sure if related. >>> > >>> > But I also have zfs backend, another user in gluster mailing >>> list had issues and used zfs backend although she used proxmox and got it >>> working by changing disk to writeback cache I think it was. >>> > >>> > I also use hosted engine, but I run my gluster volume for HE >>> actually on a LVM separate from zfs on xfs and if i recall it did not have >>> the issues my gluster on zfs did. I'm wondering now if the issue was zfs >>> settings. >>> > >>> > Hopefully should have a test machone up soon I can play around >>> with more. >>> > >>> > Scott >>> > >>> > On Thu, Jul 21, 2016 at 11:36 AM David Gossage < >>> dgossage@carouselchecks.com> wrote: >>> > What back end storage do you run gluster on? xfs/zfs/ext4 etc? >>> > >>> > David Gossage >>> > Carousel Checks Inc. | System Administrator >>> > Office 708.613.2284 >>> > >>> > On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> >>> wrote: >>> > I get similar problems with oVirt 4.0.1 and hosted engine. >>> After upgrading all my hosts to Gluster 3.7.13 (client and server), I get >>> the following: >>> > >>> > $ sudo hosted-engine --set-maintenance --mode=none >>> > Traceback (most recent call last): >>> > File "/usr/lib64/python2.7/runpy.py", line 162, in >>> _run_module_as_main >>> > "__main__", fname, loader, pkg_name) >>> > File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code >>> > exec code in run_globals >>> > File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", >>> line 73, in <module> >>> > if not maintenance.set_mode(sys.argv[1]): >>> > File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", >>> line 61, in set_mode >>> > value=m_global, >>> > File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", >>> line 259, in set_maintenance_mode >>> > str(value)) >>> > File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", >>> line 204, in set_global_md_flag >>> > all_stats = broker.get_stats_from_storage(service) >>> > File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>> line 232, in get_stats_from_storage >>> > result = self._checked_communicate(request) >>> > File >>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >>> line 260, in _checked_communicate >>> > .format(message or response)) >>> > ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request >>> failed: failed to read metadata: [Errno 1] Operation not permitted >>> > >>> > If I only upgrade one host, then things will continue to work >>> but my nodes are constantly healing shards. My logs are also flooded with: >>> > >>> > [2016-07-21 13:15:14.137734] W >>> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 >>> gfid=4 >>> > 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation >>> not permitted) >>> > The message "W [MSGID: 114031] >>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote >>> operation failed [Operation not permitted]" repeated 6 times between >>> [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] >>> > The message "W [MSGID: 114031] >>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote >>> operation failed [Operation not permitted]" repeated 8 times between >>> [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] >>> > The message "W [MSGID: 114031] >>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote >>> operation failed [Operation not permitted]" repeated 7 times between >>> [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] >>> > [2016-07-21 13:15:24.134647] W [MSGID: 114031] >>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote >>> operation failed [Operation not permitted] >>> > [2016-07-21 13:15:24.134764] W [MSGID: 114031] >>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote >>> operation failed [Operation not permitted] >>> > [2016-07-21 13:15:24.134793] W >>> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 >>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not >>> permitted) >>> > [2016-07-21 13:15:34.135413] W >>> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 >>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not >>> permitted) >>> > [2016-07-21 13:15:44.141062] W >>> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 >>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not >>> permitted) >>> > [2016-07-21 13:15:54.133582] W [MSGID: 114031] >>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote >>> operation failed [Operation not permitted] >>> > [2016-07-21 13:15:54.133629] W >>> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 >>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not >>> permitted) >>> > [2016-07-21 13:16:04.133666] W >>> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 >>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not >>> permitted) >>> > [2016-07-21 13:16:14.134954] W >>> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 >>> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not >>> permitted) >>> > >>> > Scott >>> > >>> > >>> > On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < >>> f.rothenstein@bodden-kliniken.de> wrote: >>> > Hey Devid, >>> > >>> > I have the very same problem on my test-cluster, despite on >>> running ovirt 4.0. >>> > If you access your volumes via NFS all is fine, problem is FUSE. >>> I stayed on 3.7.13, but have no solution yet, now I use NFS. >>> > >>> > Frank >>> > >>> > Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage: >>> >> Anyone running one of recent 3.6.x lines and gluster using >>> 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug >>> fixes, but have been told by users on gluster mail list due to some gluster >>> changes I'd need to change the disk parameters to use writeback cache. >>> Something to do with aio support being removed. >>> >> >>> >> I believe this could be done with custom parameters? But I >>> believe strage tests are done using dd and would they fail with current >>> settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to >>> stability isues where gluster storage would go into down state and always >>> show N/A as space available/used. Even if hosts saw storage still and VM's >>> were running on it on all 3 hosts. >>> >> >>> >> Saw a lot of messages like these that went away once gluster >>> rollback finished >>> >> >>> >> [2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] >>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel >>> 7.22 >>> >> [2016-07-09 15:27:49.555466] W [MSGID: 114031] >>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote >>> operation failed [Operation not permitted] >>> >> [2016-07-09 15:27:49.556574] W [MSGID: 114031] >>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote >>> operation failed [Operation not permitted] >>> >> [2016-07-09 15:27:49.556659] W >>> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 >>> gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not >>> permitted) >>> >> [2016-07-09 15:27:59.612477] W [MSGID: 114031] >>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote >>> operation failed [Operation not permitted] >>> >> [2016-07-09 15:27:59.613700] W [MSGID: 114031] >>> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote >>> operation failed [Operation not permitted] >>> >> [2016-07-09 15:27:59.613781] W >>> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 >>> gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not >>> permitted) >>> >> >>> >> David Gossage >>> >> Carousel Checks Inc. | System Administrator >>> >> Office 708.613.2284 >>> >> _______________________________________________ >>> >> Users mailing list >>> >> >>> >> Users@ovirt.org >>> >> http://lists.ovirt.org/mailman/listinfo/users >>> > >>> > >>> > >>> > >>> > >>> > >>> ______________________________________________________________________________ >>> > BODDEN-KLINIKEN Ribnitz-Damgarten GmbH >>> > Sandhufe 2 >>> > 18311 Ribnitz-Damgarten >>> > >>> > Telefon: 03821-700-0 >>> > Fax: 03821-700-240 >>> > >>> > E-Mail: info@bodden-kliniken.de Internet: >>> http://www.bodden-kliniken.de >>> > >>> > Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, >>> Steuer-Nr.: 079/133/40188 >>> > Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. >>> Falko Milski >>> > >>> > Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten >>> Adressaten bestimmt. Wenn Sie nicht der vorge- >>> > sehene Adressat dieser E-Mail oder dessen Vertreter sein >>> sollten, beachten Sie bitte, dass jede Form der Veröf- >>> > fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser >>> E-Mail unzulässig ist. Wir bitten Sie, sofort den >>> > Absender zu informieren und die E-Mail zu löschen. >>> > >>> > >>> > Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 >>> > *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** >>> > _______________________________________________ >>> > Users mailing list >>> > Users@ovirt.org >>> > http://lists.ovirt.org/mailman/listinfo/users >>> > >>> > >>> > _______________________________________________ >>> > Users mailing list >>> > Users@ovirt.org >>> > http://lists.ovirt.org/mailman/listinfo/users >>> > >>> > >>> > >>> > >>> > >>> > _______________________________________________ >>> > Users mailing list >>> > Users@ovirt.org >>> > http://lists.ovirt.org/mailman/listinfo/users >>> >>> >> >

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Scott

1:07 a.m.

Sounds good, except they aren't even that ("suggestions during testing phase"). They will flat out break the configuration. So they shouldn't be tests AT ALL. They shouldn't be anything except the "don't do this." Thanks. Scott On Sat, Aug 13, 2016 at 11:01 AM David Gossage <dgossage@carouselchecks.com> wrote:

...

On Sat, Aug 13, 2016 at 8:19 AM, Scott <romracer@gmail.com> wrote:

...
Had a chance to upgrade my cluster to Gluster 3.7.14 and can confirm it works for me too where 3.7.12/13 did not.

I did find that you should NOT turn off network.remote-dio or turn on performance.strict-o-direct as suggested earlier in the thread. They will prevent dd (using direct flag) and other things from working properly. I'd leave those at network.remote-dio=enabled and performance.strict-o-direct=off.

Those were actually just suggested during a testing phase trying to trace down the issue. Neither of those 2 I think have ever been suggested as good practice. At least not for VM storage.

...
Hopefully we can see Gluster 3.7.14 moved out of testing repo soon.

Scott

On Tue, Aug 2, 2016 at 9:05 AM, David Gossage < dgossage@carouselchecks.com> wrote:

...
So far gluster 3.7.14 seems to have resolved issues at least on my test box. dd commands that failed previously now work with sharding on zfs backend,

Where before I couldn't even mount a new storage domain it now mounted and I have a test vm being created.

Still have to let VM run for a few days and make sure no locking freezing occurs but looks hopeful so far.

*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284

On Tue, Jul 26, 2016 at 8:15 AM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Tue, Jul 26, 2016 at 4:37 AM, Krutika Dhananjay <kdhananj@redhat.com

...
wrote:

...
Hi,

1. Could you please attach the glustershd logs from all three nodes?

Here are ccgl1 and ccgl2. as previously mentioned ccgl3 third node was down from bad nic so no relevant logs would be on that node.

...
2. Also, so far what we know is that the 'Operation not permitted' errors are on the main vm image itself and not its individual shards (ex deb61291-5176-4b81-8315-3f1cf8e3534d). Could you do the following: Get the inode number of .glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d (ls -li) from the first brick. I'll call this number INODE_NUMBER. Execute `find . -inum INODE_NUMBER` from the brick root on first brick to print the hard links against the file in the prev step and share the output.

[dgossage@ccgl1 ~]$ sudo ls -li /gluster1/BRICK1/1/.glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d 16407 -rw-r--r--. 2 36 36 466 Jun 5 16:52 /gluster1/BRICK1/1/.glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d [dgossage@ccgl1 ~]$ cd /gluster1/BRICK1/1/ [dgossage@ccgl1 1]$ sudo find . -inum 16407 ./7c73a8dd-a72e-4556-ac88-7f6813131e64/dom_md/metadata ./.glusterfs/de/b6/deb61291-5176-4b81-8315-3f1cf8e3534d

...
3. Did you delete any vms at any point before or after the upgrade?

Immediately before or after on same day pretty sure I deleted nothing. During week prior I deleted a few dev vm's that were never setup and some the week after upgrade as I was preparing for moving disks off and on storage to get them sharded and felt it would be easier to just recreate some disks that had no data yet rather than move them off and on later.

...
-Krutika

On Mon, Jul 25, 2016 at 11:30 PM, David Gossage < dgossage@carouselchecks.com> wrote:

...
On Mon, Jul 25, 2016 at 9:58 AM, Krutika Dhananjay < kdhananj@redhat.com> wrote:

> OK, could you try the following: > > i. Set network.remote-dio to off > # gluster volume set <VOL> network.remote-dio off > > ii. Set performance.strict-o-direct to on > # gluster volume set <VOL> performance.strict-o-direct on > > iii. Stop the affected vm(s) and start again > > and tell me if you notice any improvement? > > Previous instll I had issue with is still on gluster 3.7.11

My test install of ovirt 3.6.7 and gluster 3.7.13 with 3 bricks on a locak disk right now isn't allowing me to add the gluster storage at all.

Keep getting some type of UI error

2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Permutation name: 430985F23DFC1C8BE1C7FDD91EDAA785 2016-07-25 12:49:09,277 ERROR [org.ovirt.engine.ui.frontend.server.gwt.OvirtRemoteLoggingService] (default task-33) [] Uncaught exception: : java.lang.ClassCastException at Unknown.ps( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.ts( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.vs( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.iJf( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Xab( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.P8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.jQr( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.A8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.u8o( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Eap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.p8n( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Cao( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Bap( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.kRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.nRn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.eVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.hVn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.MTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.PTn( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.KJe( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.Izk( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.P3( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.g4( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown._t( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.du( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8...) at Unknown.<anonymous>( https://ccengine2.carouselchecks.local/ovirt-engine/webadmin/430985F23DFC1C8... )

> -Krutika > > On Mon, Jul 25, 2016 at 4:57 PM, Samuli Heinonen < > samppah@neutraali.net> wrote: > >> Hi, >> >> > On 25 Jul 2016, at 12:34, David Gossage < >> dgossage@carouselchecks.com> wrote: >> > >> > On Mon, Jul 25, 2016 at 1:01 AM, Krutika Dhananjay < >> kdhananj@redhat.com> wrote: >> > Hi, >> > >> > Thanks for the logs. So I have identified one issue from the logs >> for which the fix is this: http://review.gluster.org/#/c/14669/. >> Because of a bug in the code, ENOENT was getting converted to EPERM and >> being propagated up the stack causing the reads to bail out early with >> 'Operation not permitted' errors. >> > I still need to find out two things: >> > i) why there was a readv() sent on a non-existent (ENOENT) file >> (this is important since some of the other users have not faced or reported >> this issue on gluster-users with 3.7.13) >> > ii) need to see if there's a way to work around this issue. >> > >> > Do you mind sharing the steps needed to be executed to run into >> this issue? This is so that we can apply our patches, test and ensure they >> fix the problem. >> >> >> Unfortunately I can’t test this right away nor give exact steps how >> to test this. This is just a theory but please correct me if you see some >> mistakes. >> >> oVirt uses cache=none settings for VM’s by default which requires >> direct I/O. oVirt also uses dd with iflag=direct to check that storage has >> direct I/O enabled. Problems exist with GlusterFS with sharding enabled and >> bricks running on ZFS on Linux. Everything seems to be fine with GlusterFS >> 3.7.11 and problems exist at least with version .12 and .13. There has been >> some posts saying that GlusterFS 3.8.x is also affected. >> >> Steps to reproduce: >> 1. Sharded file is created with GlusterFS 3.7.11. Everything works >> ok. >> 2. GlusterFS is upgraded to 3.7.12+ >> 3. Sharded file cannot be read or written with direct I/O enabled. >> (Ie. oVirt uses to check storage connection with command "dd >> if=/rhev/data-center/00000001-0001-0001-0001-0000000002b6/mastersd/dom_md/inbox >> iflag=direct,fullblock count=1 bs=1024000”) >> >> Please let me know if you need more information. >> >> -samuli >> >> > Well after upgrade of gluster all I did was start ovirt hosts up >> which launched and started their ha-agent and broker processes. I don't >> believe I started getting any errors till it mounted GLUSTER1. I had >> enabled sharding but had no sharded disk images yet. Not sure if the check >> for shards would have caused that. Unfortunately I can't just update this >> cluster and try and see what caused it as it has sme VM's users expect to >> be available in few hours. >> > >> > I can see if I can get my test setup to recreate it. I think >> I'll need to de-activate data center so I can detach the storage thats on >> xfs and attach the one thats over zfs with sharding enabled. My test is 3 >> bricks on same local machine, with 3 different volumes but I think im >> running into sanlock issue or something as it won't mount more than one >> volume that was created locally. >> > >> > >> > -Krutika >> > >> > On Fri, Jul 22, 2016 at 7:17 PM, David Gossage < >> dgossage@carouselchecks.com> wrote: >> > Trimmed out the logs to just about when I was shutting down ovirt >> servers for updates which was 14:30 UTC 2016-07-09 >> > >> > Pre-update settings were >> > >> > Volume Name: GLUSTER1 >> > Type: Replicate >> > Volume ID: 167b8e57-28c3-447a-95cc-8410cbdf3f7f >> > Status: Started >> > Number of Bricks: 1 x 3 = 3 >> > Transport-type: tcp >> > Bricks: >> > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 >> > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 >> > Brick3: ccgl3.gl.local:/gluster1/BRICK1/1 >> > Options Reconfigured: >> > performance.readdir-ahead: on >> > storage.owner-uid: 36 >> > storage.owner-gid: 36 >> > performance.quick-read: off >> > performance.read-ahead: off >> > performance.io-cache: off >> > performance.stat-prefetch: off >> > cluster.eager-lock: enable >> > network.remote-dio: enable >> > cluster.quorum-type: auto >> > cluster.server-quorum-type: server >> > server.allow-insecure: on >> > cluster.self-heal-window-size: 1024 >> > cluster.background-self-heal-count: 16 >> > performance.strict-write-ordering: off >> > nfs.disable: on >> > nfs.addr-namelookup: off >> > nfs.enable-ino32: off >> > >> > At the time of updates ccgl3 was offline from bad nic on server >> but had been so for about a week with no issues in volume >> > >> > Shortly after update I added these settings to enable sharding >> but did not as of yet have any VM images sharded. >> > features.shard-block-size: 64MB >> > features.shard: on >> > >> > >> > >> > >> > David Gossage >> > Carousel Checks Inc. | System Administrator >> > Office 708.613.2284 >> > >> > On Fri, Jul 22, 2016 at 5:00 AM, Krutika Dhananjay < >> kdhananj@redhat.com> wrote: >> > Hi David, >> > >> > Could you also share the brick logs from the affected volume? >> They're located at >> /var/log/glusterfs/bricks/<hyphenated-path-to-the-brick-directory>.log. >> > >> > Also, could you share the volume configuration (output of >> `gluster volume info <VOL>`) for the affected volume(s) AND at the time you >> actually saw this issue? >> > >> > -Krutika >> > >> > >> > >> > >> > On Thu, Jul 21, 2016 at 11:23 PM, David Gossage < >> dgossage@carouselchecks.com> wrote: >> > On Thu, Jul 21, 2016 at 11:47 AM, Scott <romracer@gmail.com> >> wrote: >> > Hi David, >> > >> > My backend storage is ZFS. >> > >> > I thought about moving from FUSE to NFS mounts for my Gluster >> volumes to help test. But since I use hosted engine this would be a real >> pain. Its difficult to modify the storage domain type/path in the >> hosted-engine.conf. And I don't want to go through the process of >> re-deploying hosted engine. >> > >> > >> > I found this >> > >> > https://bugzilla.redhat.com/show_bug.cgi?id=1347553 >> > >> > Not sure if related. >> > >> > But I also have zfs backend, another user in gluster mailing list >> had issues and used zfs backend although she used proxmox and got it >> working by changing disk to writeback cache I think it was. >> > >> > I also use hosted engine, but I run my gluster volume for HE >> actually on a LVM separate from zfs on xfs and if i recall it did not have >> the issues my gluster on zfs did. I'm wondering now if the issue was zfs >> settings. >> > >> > Hopefully should have a test machone up soon I can play around >> with more. >> > >> > Scott >> > >> > On Thu, Jul 21, 2016 at 11:36 AM David Gossage < >> dgossage@carouselchecks.com> wrote: >> > What back end storage do you run gluster on? xfs/zfs/ext4 etc? >> > >> > David Gossage >> > Carousel Checks Inc. | System Administrator >> > Office 708.613.2284 >> > >> > On Thu, Jul 21, 2016 at 8:18 AM, Scott <romracer@gmail.com> >> wrote: >> > I get similar problems with oVirt 4.0.1 and hosted engine. After >> upgrading all my hosts to Gluster 3.7.13 (client and server), I get the >> following: >> > >> > $ sudo hosted-engine --set-maintenance --mode=none >> > Traceback (most recent call last): >> > File "/usr/lib64/python2.7/runpy.py", line 162, in >> _run_module_as_main >> > "__main__", fname, loader, pkg_name) >> > File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code >> > exec code in run_globals >> > File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", >> line 73, in <module> >> > if not maintenance.set_mode(sys.argv[1]): >> > File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_maintenance.py", >> line 61, in set_mode >> > value=m_global, >> > File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", >> line 259, in set_maintenance_mode >> > str(value)) >> > File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", >> line 204, in set_global_md_flag >> > all_stats = broker.get_stats_from_storage(service) >> > File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >> line 232, in get_stats_from_storage >> > result = self._checked_communicate(request) >> > File >> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", >> line 260, in _checked_communicate >> > .format(message or response)) >> > ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request >> failed: failed to read metadata: [Errno 1] Operation not permitted >> > >> > If I only upgrade one host, then things will continue to work but >> my nodes are constantly healing shards. My logs are also flooded with: >> > >> > [2016-07-21 13:15:14.137734] W >> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274714: READ => -1 >> gfid=4 >> > 41f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation >> not permitted) >> > The message "W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote >> operation failed [Operation not permitted]" repeated 6 times between >> [2016-07-21 13:13:24.134985] and [2016-07-21 13:15:04.132226] >> > The message "W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote >> operation failed [Operation not permitted]" repeated 8 times between >> [2016-07-21 13:13:34.133116] and [2016-07-21 13:15:14.137178] >> > The message "W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote >> operation failed [Operation not permitted]" repeated 7 times between >> [2016-07-21 13:13:24.135071] and [2016-07-21 13:15:14.137666] >> > [2016-07-21 13:15:24.134647] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-0: remote >> operation failed [Operation not permitted] >> > [2016-07-21 13:15:24.134764] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-2: remote >> operation failed [Operation not permitted] >> > [2016-07-21 13:15:24.134793] W >> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274741: READ => -1 >> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not >> permitted) >> > [2016-07-21 13:15:34.135413] W >> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274756: READ => -1 >> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not >> permitted) >> > [2016-07-21 13:15:44.141062] W >> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274818: READ => -1 >> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0038f4 (Operation not >> permitted) >> > [2016-07-21 13:15:54.133582] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-data-client-1: remote >> operation failed [Operation not permitted] >> > [2016-07-21 13:15:54.133629] W >> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274853: READ => -1 >> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not >> permitted) >> > [2016-07-21 13:16:04.133666] W >> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274879: READ => -1 >> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0041d0 (Operation not >> permitted) >> > [2016-07-21 13:16:14.134954] W >> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 274894: READ => -1 >> gfid=441f2789-f6b1-4918-a280-1b9905a11429 fd=0x7f19bc0036d8 (Operation not >> permitted) >> > >> > Scott >> > >> > >> > On Thu, Jul 21, 2016 at 6:57 AM Frank Rothenstein < >> f.rothenstein@bodden-kliniken.de> wrote: >> > Hey Devid, >> > >> > I have the very same problem on my test-cluster, despite on >> running ovirt 4.0. >> > If you access your volumes via NFS all is fine, problem is FUSE. >> I stayed on 3.7.13, but have no solution yet, now I use NFS. >> > >> > Frank >> > >> > Am Donnerstag, den 21.07.2016, 04:28 -0500 schrieb David Gossage: >> >> Anyone running one of recent 3.6.x lines and gluster using >> 3.7.13? I am looking to upgrade gluster from 3.7.11->3.7.13 for some bug >> fixes, but have been told by users on gluster mail list due to some gluster >> changes I'd need to change the disk parameters to use writeback cache. >> Something to do with aio support being removed. >> >> >> >> I believe this could be done with custom parameters? But I >> believe strage tests are done using dd and would they fail with current >> settings then? Last upgrade to 3.7.13 I had to rollback to 3.7.11 due to >> stability isues where gluster storage would go into down state and always >> show N/A as space available/used. Even if hosts saw storage still and VM's >> were running on it on all 3 hosts. >> >> >> >> Saw a lot of messages like these that went away once gluster >> rollback finished >> >> >> >> [2016-07-09 15:27:46.935694] I [fuse-bridge.c:4083:fuse_init] >> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel >> 7.22 >> >> [2016-07-09 15:27:49.555466] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote >> operation failed [Operation not permitted] >> >> [2016-07-09 15:27:49.556574] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote >> operation failed [Operation not permitted] >> >> [2016-07-09 15:27:49.556659] W >> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 80: READ => -1 >> gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not >> permitted) >> >> [2016-07-09 15:27:59.612477] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-1: remote >> operation failed [Operation not permitted] >> >> [2016-07-09 15:27:59.613700] W [MSGID: 114031] >> [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-0: remote >> operation failed [Operation not permitted] >> >> [2016-07-09 15:27:59.613781] W >> [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 168: READ => -1 >> gfid=deb61291-5176-4b81-8315-3f1cf8e3534d fd=0x7f5224002f68 (Operation not >> permitted) >> >> >> >> David Gossage >> >> Carousel Checks Inc. | System Administrator >> >> Office 708.613.2284 >> >> _______________________________________________ >> >> Users mailing list >> >> >> >> Users@ovirt.org >> >> http://lists.ovirt.org/mailman/listinfo/users >> > >> > >> > >> > >> > >> > >> ______________________________________________________________________________ >> > BODDEN-KLINIKEN Ribnitz-Damgarten GmbH >> > Sandhufe 2 >> > 18311 Ribnitz-Damgarten >> > >> > Telefon: 03821-700-0 >> > Fax: 03821-700-240 >> > >> > E-Mail: info@bodden-kliniken.de Internet: >> http://www.bodden-kliniken.de >> > >> > Sitz: Ribnitz-Damgarten, Amtsgericht: Stralsund, HRB 2919, >> Steuer-Nr.: 079/133/40188 >> > Aufsichtsratsvorsitzende: Carmen Schröter, Geschäftsführer: Dr. >> Falko Milski >> > >> > Der Inhalt dieser E-Mail ist ausschließlich für den bezeichneten >> Adressaten bestimmt. Wenn Sie nicht der vorge- >> > sehene Adressat dieser E-Mail oder dessen Vertreter sein sollten, >> beachten Sie bitte, dass jede Form der Veröf- >> > fentlichung, Vervielfältigung oder Weitergabe des Inhalts dieser >> E-Mail unzulässig ist. Wir bitten Sie, sofort den >> > Absender zu informieren und die E-Mail zu löschen. >> > >> > >> > Bodden-Kliniken Ribnitz-Damgarten GmbH 2016 >> > *** Virenfrei durch Kerio Mail Server und Sophos Antivirus *** >> > _______________________________________________ >> > Users mailing list >> > Users@ovirt.org >> > http://lists.ovirt.org/mailman/listinfo/users >> > >> > >> > _______________________________________________ >> > Users mailing list >> > Users@ovirt.org >> > http://lists.ovirt.org/mailman/listinfo/users >> > >> > >> > >> > >> > >> > _______________________________________________ >> > Users mailing list >> > Users@ovirt.org >> > http://lists.ovirt.org/mailman/listinfo/users >> >> >

_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

3397

Age (days ago)

3420

Last active (days ago)

List overview

Download

32 comments

7 participants

participants (7)

Alexander Wels
David Gossage
Frank Rothenstein
Krutika Dhananjay
Samuli Heinonen
Sandro Bonazzola
Scott

ovirt 3.6.6 and gluster 3.7.13

tags

participants (7)