engine.log is looping with Volume XXX contains a apparently corrupt brick(s).

--=_96b09eee02d72544a4bd8a6df3818a5b Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Hi Recently, i built a small oVirt platform with 2 dedicated servers and GlusterFS to synch the VM storage. oVirt Setup is simple: ovirt01 : Host Agent (VDSM) + oVirt Engine ovirt02 : Host Agent (VDSM) Version : ovirt-release35-005-1.noarch ovirt-engine-3.5.4.2-1.el7.centos.noarch vdsm-4.16.26-0.el7.centos.x86_64 vdsm-gluster-4.16.26-0.el7.centos.noarch glusterfs-server-3.7.4-2.el7.x86_64 GlusterFS Setup is simple, 2 bricks in replicate mode. It was done in shell; not using oVirt GUI, and then it was added in STORAGE as a new DOMAIN; Type:DATA GlusterFS V3 # gluster volume info Volume Name: ovirt Type: Replicate Volume ID: 043d2d36-dc2c-4f75-9d28-96dbac25d07c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: ovirt01:/gluster/ovirt Brick2: ovirt02:/gluster/ovirt Options Reconfigured: performance.readdir-ahead: on nfs.disable: true auth.allow: IP_A, IP_B network.ping-timeout: 10 storage.owner-uid: 36 storage.owner-gid: 36 server.allow-insecure: on the data are reachable on the 2 nodes through a moint point that oVirt created when i configured the Storage with the GUI: localhost:/ovirt 306G 216G 78G 74% /rhev/data-center/mnt/glusterSD/localhost:_ovirt I created 7 VM on this shared storage and all is working fine. I can do Live migration; all is working. But when i check /var/log/ovirt/engine.log on ovirt01, there are error in loop every 2 seconds: 2015-10-11 17:29:50,971 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler_Worker-29) [34dbe5cf] START, GlusterVolumesListVDSCommand(HostName = ovirt02, HostId = 65a5bb5d-721f-4a4b-9e77-c4b9162c0aa6), log id: 41443b77 2015-10-11 17:29:50,998 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc] (DefaultQuartzScheduler_Worker-29) [34dbe5cf] Could not add brick ovirt02:/gluster/ovirt to volume 043d2d36-dc2c-4f75-9d28-96dbac25d07c - server uuid 3c340e59-334f-4aa6-ad61-af2acaf3cad6 not found in cluster fb976d4f-de13-449b-93e8-600fcb59d4e6 2015-10-11 17:29:50,999 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler_Worker-29) [34dbe5cf] FINISH, GlusterVolumesListVDSCommand, return: {043d2d36-dc2c-4f75-9d28-96dbac25d07c=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@200ae0d1}, log id: 41443b77 2015-10-11 17:29:51,001 WARN [org.ovirt.engine.core.bll.gluster.GlusterSyncJob] (DefaultQuartzScheduler_Worker-29) [34dbe5cf] Volume ovirt contains a apparently corrupt brick(s). Hence will not add it to engine at this point. I played a lot with oVirt at first it was running on a single node; in Local Datacenter; then i added a second node, move the first host to a new datacenter; migrated the images VM etc; with some pain at some very moment and now all looks fine but i prefer to double check. So, i want to know if there is a real issue with ovirt/gluster setup that i don't see, any info are welcome because i'm a bit worried to see these message in LOOP on the log. Thanks in advance; Regards Nico --=_96b09eee02d72544a4bd8a6df3818a5b Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=UTF-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; charset= =3DUTF-8" /></head><body style=3D'font-size: 10pt; font-family: Verdana,Gen= eva,sans-serif'> <p class=3D"p1"><span class=3D"s1">Hi</span></p> <p class=3D"p2"> </p> <p class=3D"p1"><span class=3D"s1">Recently, i built a small oVirt platform= with 2 dedicated servers and GlusterFS to synch the VM storage.</span= ></p> <p class=3D"p2"> </p> <p class=3D"p1"><span class=3D"s1">oVirt Setup is simple:</span></p> <p class=3D"p2"> </p> <p class=3D"p1"><span class=3D"s1">ovirt01 : Host Agent (VDSM) = + oVirt Engine </span></p> <p class=3D"p1"><span class=3D"s1">ovirt02 : Host Agent (VDSM)</span>= </p> <p class=3D"p2"> </p> <p class=3D"p1"><span class=3D"s1">Version : </span></p> <p class=3D"p2"> </p> <p class=3D"p1"><span class=3D"s1">ovirt-release35-005-1.noarch</span></p> <p class=3D"p1"><span class=3D"s1">ovirt-engine-3.5.4.2-1.el7.centos.noarch= </span></p> <p class=3D"p1"><span class=3D"s1">vdsm-4.16.26-0.el7.centos.x86_64</span><= /p> <p class=3D"p1"><span class=3D"s1">vdsm-gluster-4.16.26-0.el7.centos.noarch= </span></p> <p class=3D"p1"><span class=3D"s1">glusterfs-server-3.7.4-2.el7.x86_64</spa= n></p> <p class=3D"p2"> </p> <p class=3D"p2"> </p> <p class=3D"p1"><span class=3D"s1">GlusterFS Setup is simple, 2 bricks in r= eplicate mode.</span></p> <p class=3D"p1"><span class=3D"s1">It was done in shell; not using oV= irt GUI, and then it was added in STORAGE as a new DOMAIN; Type:DATA Gluste= rFS V3</span></p> <p class=3D"p2"> </p> <p class=3D"p1"><span class=3D"s1"># gluster volume info</span></p> <p class=3D"p2"> </p> <p class=3D"p1"><span class=3D"s1">Volume Name: ovirt</span></p> <p class=3D"p1"><span class=3D"s1">Type: Replicate</span></p> <p class=3D"p1"><span class=3D"s1">Volume ID: 043d2d36-dc2c-4f75-9d28-96dba= c25d07c</span></p> <p class=3D"p1"><span class=3D"s1">Status: Started</span></p> <p class=3D"p1"><span class=3D"s1">Number of Bricks: 1 x 2 =3D 2</span></p> <p class=3D"p1"><span class=3D"s1">Transport-type: tcp</span></p> <p class=3D"p1"><span class=3D"s1">Bricks:</span></p> <p class=3D"p1"><span class=3D"s1">Brick1: ovirt01:/gluster/ovirt</span></p= > <p class=3D"p1"><span class=3D"s1">Brick2: ovirt02:/gluster/ovirt</span></p= > <p class=3D"p1"><span class=3D"s1">Options Reconfigured:</span></p> <p class=3D"p1"><span class=3D"s1">performance.readdir-ahead: on</span></p> <p class=3D"p1"><span class=3D"s1">nfs.disable: true</span></p> <p class=3D"p1"><span class=3D"s1">auth.allow: IP_A, IP_B</span></p> <p class=3D"p1"><span class=3D"s1">network.ping-timeout: 10</span></p> <p class=3D"p1"><span class=3D"s1">storage.owner-uid: 36</span></p> <p class=3D"p1"><span class=3D"s1">storage.owner-gid: 36</span></p> <p class=3D"p1"><span class=3D"s1">server.allow-insecure: on</span></p> <p class=3D"p2"> </p> <p class=3D"p2"> </p> <p class=3D"p1"><span class=3D"s1">the data are reachable on the 2 nodes th= rough a moint point that oVirt created when i configured the Storage with t= he GUI:</span></p> <p class=3D"p2"> </p> <p class=3D"p1"><span class=3D"s1">localhost:/ovirt &n= bsp; 306G 216G 78G 74= % /rhev/data-center/mnt/glusterSD/localhost:_ovirt</span></p> <p class=3D"p2"> </p> <p class=3D"p1"><span class=3D"s1">I created 7 VM on this shared storage an= d all is working fine. I can do Live migration; all is working.</span></p> <p class=3D"p2"> </p> <p class=3D"p1"><span class=3D"s1">But when i check /var/log/ovirt/engine= =2Elog on ovirt01, there are error in loop every 2 seconds:</span></p> <p class=3D"p2"> </p> <p class=3D"p1"><span class=3D"s1">2015-10-11 17:29:50,971 INFO [org= =2Eovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (Defau= ltQuartzScheduler_Worker-29) [34dbe5cf] START, GlusterVolumesListVDSCommand= (HostName =3D ovirt02, HostId =3D 65a5bb5d-721f-4a4b-9e77-c4b9162c0aa6), lo= g id: 41443b77</span></p> <p class=3D"p1"><span class=3D"s1">2015-10-11 17:29:50,998 WARN [org= =2Eovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc] (= DefaultQuartzScheduler_Worker-29) [34dbe5cf] Could not add brick ovirt02:/g= luster/ovirt to volume 043d2d36-dc2c-4f75-9d28-96dbac25d07c - server uuid 3= c340e59-334f-4aa6-ad61-af2acaf3cad6 not found in cluster fb976d4f-de13-449b= -93e8-600fcb59d4e6</span></p> <p class=3D"p1"><span class=3D"s1">2015-10-11 17:29:50,999 INFO [org= =2Eovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (Defau= ltQuartzScheduler_Worker-29) [34dbe5cf] FINISH, GlusterVolumesListVDSComman= d, return: {043d2d36-dc2c-4f75-9d28-96dbac25d07c=3Dorg.ovirt.engine.core.co= mmon.businessentities.gluster.GlusterVolumeEntity@200ae0d1}, log id: 41443b= 77</span></p> <p class=3D"p1"><span class=3D"s1">2015-10-11 17:29:51,001 WARN [org= =2Eovirt.engine.core.bll.gluster.GlusterSyncJob] (DefaultQuartzScheduler_Wo= rker-29) [34dbe5cf] Volume ovirt contains a apparently corrupt brick(s). He= nce will not add it to engine at this point.</span></p> <p class=3D"p2"> </p> <p class=3D"p2"><span class=3D"s1"></span>I played a lot with oVirt at firs= t it was running on a single node; in Local Datacenter; then i added a seco= nd node, move the first host to a new datacenter; migrated the images VM et= c; with some pain at some very moment and now all looks fine but i prefer t= o double check.</p> <p class=3D"p1"><span class=3D"s1">So, i want to know if there is a real is= sue with ovirt/gluster setup that i don’t see, any info are welcome b= ecause i’m a bit worried to see these message in LOOP on the log.</sp= an></p> <p class=3D"p2"> </p> <p class=3D"p1"><span class=3D"s1">Thanks in advance;</span></p> <p class=3D"p2"> </p> <p class=3D"p1"><span class=3D"s1">Regards</span></p> <p class=3D"p1"><span class=3D"s1">Nico</span></p> <div> </div> </body></html> --=_96b09eee02d72544a4bd8a6df3818a5b--

On Sun, Oct 11, 2015 at 6:43 PM, Nico <gluster@distran.org> wrote:
Recently, i built a small oVirt platform with 2 dedicated servers and GlusterFS to synch the VM storage. Bricks:
Brick1: ovirt01:/gluster/ovirt
Brick2: ovirt02:/gluster/ovirt
This looks like replica 2 - this is not supported. You can use either replica 1 (testing) or replica 3 (production).
But when i check /var/log/ovirt/engine.log on ovirt01, there are error in loop every 2 seconds: To understand such error we need to see the vdsm log.
Nir

--=_d69a1f5c639decb7303d9c14e20f8797 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Le 2015-10-12 09:59, Nir Soffer a écrit :
On Sun, Oct 11, 2015 at 6:43 PM, Nico <gluster@distran.org> wrote:
Recently, i built a small oVirt platform with 2 dedicated servers and GlusterFS to synch the VM storage. Bricks:
Brick1: ovirt01:/gluster/ovirt
Brick2: ovirt02:/gluster/ovirt
This looks like replica 2 - this is not supported.
You can use either replica 1 (testing) or replica 3 (production).
But when i check /var/log/ovirt/engine.log on ovirt01, there are error in loop every 2 seconds: To understand such error we need to see the vdsm log.
Nir
Yeah it is replica 2 as i've only 2 dedicated servers. why are you saying it is not supported ? Through oVirt GUI, it is possible to create a Gluster Volume with 2 bricks in repllcate mode; i tried it also. here the last entries of vdsm.log hread-167405::DEBUG::2015-10-12 10:12:20,132::stompReactor::163::yajsonrpc.StompServer::(send) Sending response Thread-55245::DEBUG::2015-10-12 10:12:22,529::task::595::Storage.TaskManager.Task::(_updateState) Task=`c887acfa-bd10-4dfb-9374-da607c133e68`::moving from state init -> state preparing Thread-55245::INFO::2015-10-12 10:12:22,530::logUtils::44::dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID='d44ee4b0-8d36-467a-9610-c682a618b698', spUUID='0ae7120a-430d-4534-9a7e-59c53fb2e804', imgUUID='3454b077-297b-4b89-b8ce-a77f6ec5d22e', volUUID='933da0b6-6a05-4e64-958a-e1c030cf5ddb', options=None) Thread-55245::INFO::2015-10-12 10:12:22,535::logUtils::47::dispatcher::(wrapper) Run and protect: getVolumeSize, Return response: {'truesize': '158983839744', 'apparentsize': '161061273600'} Thread-55245::DEBUG::2015-10-12 10:12:22,535::task::1191::Storage.TaskManager.Task::(prepare) Task=`c887acfa-bd10-4dfb-9374-da607c133e68`::finished: {'truesize': '158983839744', 'apparentsize': '161061273600'} Thread-55245::DEBUG::2015-10-12 10:12:22,535::task::595::Storage.TaskManager.Task::(_updateState) Task=`c887acfa-bd10-4dfb-9374-da607c133e68`::moving from state preparing -> state finished Thread-55245::DEBUG::2015-10-12 10:12:22,535::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-55245::DEBUG::2015-10-12 10:12:22,536::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-55245::DEBUG::2015-10-12 10:12:22,536::task::993::Storage.TaskManager.Task::(_decref) Task=`c887acfa-bd10-4dfb-9374-da607c133e68`::ref 0 aborting False Thread-55245::DEBUG::2015-10-12 10:12:22,545::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 80 edom: 20 level: 2 message: metadata not found: Requested metadata element is not present JsonRpc (StompReactor)::DEBUG::2015-10-12 10:12:23,138::stompReactor::98::Broker.StompAdapter::(handle_frame) Handling message <StompFrame command='SEND'> JsonRpcServer::DEBUG::2015-10-12 10:12:23,139::__init__::530::jsonrpc.JsonRpcServer::(serve_requests) Waiting for request Thread-167406::DEBUG::2015-10-12 10:12:23,142::stompReactor::163::yajsonrpc.StompServer::(send) Sending response Thread-37810::DEBUG::2015-10-12 10:12:24,194::fileSD::262::Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd if=/rhev/data-center/mnt/ovirt01:_data_iso/5aec30fa-be8b-4f4e-832e-eafb6fa4a8e0/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None) Thread-37810::DEBUG::2015-10-12 10:12:24,201::fileSD::262::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n317 bytes (317 B) copied, 0.000131729 s, 2.4 MB/s\n'; <rc> = 0 JsonRpc (StompReactor)::DEBUG::2015-10-12 10:12:26,148::stompReactor::98::Broker.StompAdapter::(handle_frame) Handling message <StompFrame command='SEND'> JsonRpcServer::DEBUG::2015-10-12 10:12:26,149::__init__::530::jsonrpc.JsonRpcServer::(serve_requests) Waiting for request Thread-167407::DEBUG::2015-10-12 10:12:26,151::stompReactor::163::yajsonrpc.StompServer::(send) Sending response VM Channels Listener::DEBUG::2015-10-12 10:12:26,972::vmchannels::96::vds::(_handle_timeouts) Timeout on fileno 35. Thread-30::DEBUG::2015-10-12 10:12:28,358::fileSD::262::Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd if=/rhev/data-center/mnt/glusterSD/localhost:_ovirt/d44ee4b0-8d36-467a-9610-c682a618b698/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None) Thread-30::DEBUG::2015-10-12 10:12:28,451::fileSD::262::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n470 bytes (470 B) copied, 0.000152738 s, 3.1 MB/s\n'; <rc> = 0 JsonRpc (StompReactor)::DEBUG::2015-10-12 10:12:29,157::stompReactor::98::Broker.StompAdapter::(handle_frame) Handling message <StompFrame command='SEND'> JsonRpcServer::DEBUG::2015-10-12 10:12:29,252::__init__::530::jsonrpc.JsonRpcServer::(serve_requests) Waiting for request Thread-167408::DEBUG::2015-10-12 10:12:29,254::stompReactor::163::yajsonrpc.StompServer::(send) Sending response JsonRpc (StompReactor)::DEBUG::2015-10-12 10:12:32,260::stompReactor::98::Broker.StompAdapter::(handle_frame) Handling message <StompFrame command='SEND'> JsonRpcServer::DEBUG::2015-10-12 10:12:32,262::__init__::530::jsonrpc.JsonRpcServer::(serve_requests) Waiting for request Thread-167409::DEBUG::2015-10-12 10:12:32,264::task::595::Storage.TaskManager.Task::(_updateState) Task=`7d55817b-a5c4-4c27-b2d5-e892ba645476`::moving from state init -> state preparing Thread-167409::INFO::2015-10-12 10:12:32,264::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None) Thread-167409::INFO::2015-10-12 10:12:32,265::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {u'd44ee4b0-8d36-467a-9610-c682a618b698': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000152738', 'lastCheck': '3.6', 'valid': True}, u'5aec30fa-be8b-4f4e-832e-eafb6fa4a8e0': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000131729', 'lastCheck': '8.1', 'valid': True}} Thread-167409::DEBUG::2015-10-12 10:12:32,265::task::1191::Storage.TaskManager.Task::(prepare) Task=`7d55817b-a5c4-4c27-b2d5-e892ba645476`::finished: {u'd44ee4b0-8d36-467a-9610-c682a618b698': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000152738', 'lastCheck': '3.6', 'valid': True}, u'5aec30fa-be8b-4f4e-832e-eafb6fa4a8e0': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000131729', 'lastCheck': '8.1', 'valid': True}} Thread-167409::DEBUG::2015-10-12 10:12:32,265::task::595::Storage.TaskManager.Task::(_updateState) Task=`7d55817b-a5c4-4c27-b2d5-e892ba645476`::moving from state preparing -> state finished Thread-167409::DEBUG::2015-10-12 10:12:32,265::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-167409::DEBUG::2015-10-12 10:12:32,265::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-167409::DEBUG::2015-10-12 10:12:32,265::task::993::Storage.TaskManager.Task::(_decref) Task=`7d55817b-a5c4-4c27-b2d5-e892ba645476`::ref 0 aborting False Thread-167409::DEBUG::2015-10-12 10:12:32,268::stompReactor::163::yajsonrpc.StompServer::(send) Sending response JsonRpc (StompReactor)::DEBUG::2015-10-12 10:12:32,275::stompReactor::98::Broker.StompAdapter::(handle_frame) Handling message <StompFrame command='SEND'> JsonRpcServer::DEBUG::2015-10-12 10:12:32,278::__init__::530::jsonrpc.JsonRpcServer::(serve_requests) Waiting for request Thread-167410::DEBUG::2015-10-12 10:12:32,283::stompReactor::163::yajsonrpc.StompServer::(send) Sending response --=_d69a1f5c639decb7303d9c14e20f8797 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=UTF-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; charset= =3DUTF-8" /></head><body style=3D'font-size: 10pt; font-family: Verdana,Gen= eva,sans-serif'> <p> </p> <p>Le 2015-10-12 09:59, Nir Soffer a écrit :</p> <blockquote type=3D"cite" style=3D"padding: 0 0.4em; border-left: #1010ff 2= px solid; margin: 0"><!-- html ignored --><!-- head ignored --><!-- meta ig= nored --> <div class=3D"pre" style=3D"margin: 0; padding: 0; font-family: monospace">= On Sun, Oct 11, 2015 at 6:43 PM, Nico <<a href=3D"mailto:gluster@distran= =2Eorg">gluster@distran.org</a>> wrote: <blockquote type=3D"cite" style=3D"padding: 0 0.4em; border-left: #1010ff 2= px solid; margin: 0">Recently, i built a small oVirt platform with 2 dedica= ted servers and GlusterFS to synch the VM storage.<br /> Bricks:<br /> <br = /> Brick1: ovirt01:/gluster/ovirt<br /> <br /> Brick2: ovirt02:/gluster/ovi= rt</blockquote> <br /> This looks like replica 2 - this is not supported.<br /> <br /> You = can use either replica 1 (testing) or replica 3 (production).<br /> <br /> <blockquote type=3D"cite" style=3D"padding: 0 0.4em; border-left: #1010ff 2= px solid; margin: 0">But when i check /var/log/ovirt/engine.log on ovirt01,= there are error in loop every 2 seconds:</blockquote> To understand such error we need to see the vdsm log.<br /> <br /> Nir</div=
</blockquote> <p>Yeah it is replica 2 as i've only 2 dedicated servers.</p> <p>why are you saying it is not supported ? Through oVirt GUI, it is p= ossible to create a Gluster Volume with 2 bricks in repllcate mode; i tried= it also.</p> <p>here the last entries of vdsm.log</p> <p> </p> <p>hread-167405::DEBUG::2015-10-12 10:12:20,132::stompReactor::163::yajsonr= pc.StompServer::(send) Sending response<br />Thread-55245::DEBUG::2015-10-1= 2 10:12:22,529::task::595::Storage.TaskManager.Task::(_updateState) Task=3D= `c887acfa-bd10-4dfb-9374-da607c133e68`::moving from state init -> state = preparing<br />Thread-55245::INFO::2015-10-12 10:12:22,530::logUtils::44::d= ispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID=3D'd44ee4b0-8d36= -467a-9610-c682a618b698', spUUID=3D'0ae7120a-430d-4534-9a7e-59c53fb2e804', = imgUUID=3D'3454b077-297b-4b89-b8ce-a77f6ec5d22e', volUUID=3D'933da0b6-6a05-= 4e64-958a-e1c030cf5ddb', options=3DNone)<br />Thread-55245::INFO::2015-10-1= 2 10:12:22,535::logUtils::47::dispatcher::(wrapper) Run and protect: getVol= umeSize, Return response: {'truesize': '158983839744', 'apparentsize': '161= 061273600'}<br />Thread-55245::DEBUG::2015-10-12 10:12:22,535::task::1191::= Storage.TaskManager.Task::(prepare) Task=3D`c887acfa-bd10-4dfb-9374-da607c1= 33e68`::finished: {'truesize': '158983839744', 'apparentsize': '16106127360= 0'}<br />Thread-55245::DEBUG::2015-10-12 10:12:22,535::task::595::Storage= =2ETaskManager.Task::(_updateState) Task=3D`c887acfa-bd10-4dfb-9374-da607c1= 33e68`::moving from state preparing -> state finished<br />Thread-55245:= :DEBUG::2015-10-12 10:12:22,535::resourceManager::940::Storage.ResourceMana= ger.Owner::(releaseAll) Owner.releaseAll requests {} resources {}<br />Thre= ad-55245::DEBUG::2015-10-12 10:12:22,536::resourceManager::977::Storage.Res= ourceManager.Owner::(cancelAll) Owner.cancelAll requests {}<br />Thread-552= 45::DEBUG::2015-10-12 10:12:22,536::task::993::Storage.TaskManager.Task::(_= decref) Task=3D`c887acfa-bd10-4dfb-9374-da607c133e68`::ref 0 aborting False= <br />Thread-55245::DEBUG::2015-10-12 10:12:22,545::libvirtconnection::143:= :root::(wrapper) Unknown libvirterror: ecode: 80 edom: 20 level: 2 message:= metadata not found: Requested metadata element is not present<br />JsonRpc= (StompReactor)::DEBUG::2015-10-12 10:12:23,138::stompReactor::98::Broker= =2EStompAdapter::(handle_frame) Handling message <StompFrame command=3D'= SEND'><br />JsonRpcServer::DEBUG::2015-10-12 10:12:23,139::__init__::530= ::jsonrpc.JsonRpcServer::(serve_requests) Waiting for request<br />Thread-1= 67406::DEBUG::2015-10-12 10:12:23,142::stompReactor::163::yajsonrpc.StompSe= rver::(send) Sending response<br />Thread-37810::DEBUG::2015-10-12 10:12:24= ,194::fileSD::262::Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd if=3D/rh= ev/data-center/mnt/ovirt01:_data_iso/5aec30fa-be8b-4f4e-832e-eafb6fa4a8e0/d= om_md/metadata iflag=3Ddirect of=3D/dev/null bs=3D4096 count=3D1 (cwd None)= <br />Thread-37810::DEBUG::2015-10-12 10:12:24,201::fileSD::262::Storage.Mi= sc.excCmd::(getReadDelay) SUCCESS: <err> =3D '0+1 records in\n0+1 rec= ords out\n317 bytes (317 B) copied, 0.000131729 s, 2.4 MB/s\n'; <rc> = =3D 0<br />JsonRpc (StompReactor)::DEBUG::2015-10-12 10:12:26,148::stompRea= ctor::98::Broker.StompAdapter::(handle_frame) Handling message <StompFra= me command=3D'SEND'><br />JsonRpcServer::DEBUG::2015-10-12 10:12:26,149:= :__init__::530::jsonrpc.JsonRpcServer::(serve_requests) Waiting for request= <br />Thread-167407::DEBUG::2015-10-12 10:12:26,151::stompReactor::163::yaj= sonrpc.StompServer::(send) Sending response<br />VM Channels Listener::DEBU= G::2015-10-12 10:12:26,972::vmchannels::96::vds::(_handle_timeouts) Timeout= on fileno 35.<br />Thread-30::DEBUG::2015-10-12 10:12:28,358::fileSD::262:= :Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd if=3D/rhev/data-center/mnt= /glusterSD/localhost:_ovirt/d44ee4b0-8d36-467a-9610-c682a618b698/dom_md/met= adata iflag=3Ddirect of=3D/dev/null bs=3D4096 count=3D1 (cwd None)<br />Thr= ead-30::DEBUG::2015-10-12 10:12:28,451::fileSD::262::Storage.Misc.excCmd::(= getReadDelay) SUCCESS: <err> =3D '0+1 records in\n0+1 records out\n47= 0 bytes (470 B) copied, 0.000152738 s, 3.1 MB/s\n'; <rc> =3D 0<br />J= sonRpc (StompReactor)::DEBUG::2015-10-12 10:12:29,157::stompReactor::98::Br= oker.StompAdapter::(handle_frame) Handling message <StompFrame command= =3D'SEND'><br />JsonRpcServer::DEBUG::2015-10-12 10:12:29,252::__init__:= :530::jsonrpc.JsonRpcServer::(serve_requests) Waiting for request<br />Thre= ad-167408::DEBUG::2015-10-12 10:12:29,254::stompReactor::163::yajsonrpc.Sto= mpServer::(send) Sending response<br />JsonRpc (StompReactor)::DEBUG::2015-= 10-12 10:12:32,260::stompReactor::98::Broker.StompAdapter::(handle_frame) H= andling message <StompFrame command=3D'SEND'><br />JsonRpcServer::DEB= UG::2015-10-12 10:12:32,262::__init__::530::jsonrpc.JsonRpcServer::(serve_r= equests) Waiting for request<br />Thread-167409::DEBUG::2015-10-12 10:12:32= ,264::task::595::Storage.TaskManager.Task::(_updateState) Task=3D`7d55817b-= a5c4-4c27-b2d5-e892ba645476`::moving from state init -> state preparing<= br />Thread-167409::INFO::2015-10-12 10:12:32,264::logUtils::44::dispatcher= ::(wrapper) Run and protect: repoStats(options=3DNone)<br />Thread-167409::= INFO::2015-10-12 10:12:32,265::logUtils::47::dispatcher::(wrapper) Run and = protect: repoStats, Return response: {u'd44ee4b0-8d36-467a-9610-c682a618b69= 8': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0= =2E000152738', 'lastCheck': '3.6', 'valid': True}, u'5aec30fa-be8b-4f4e-832= e-eafb6fa4a8e0': {'code': 0, 'actual': True, 'version': 0, 'acquired': True= , 'delay': '0.000131729', 'lastCheck': '8.1', 'valid': True}}<br />Thread-1= 67409::DEBUG::2015-10-12 10:12:32,265::task::1191::Storage.TaskManager.Task= ::(prepare) Task=3D`7d55817b-a5c4-4c27-b2d5-e892ba645476`::finished: {u'd44= ee4b0-8d36-467a-9610-c682a618b698': {'code': 0, 'actual': True, 'version': = 3, 'acquired': True, 'delay': '0.000152738', 'lastCheck': '3.6', 'valid': T= rue}, u'5aec30fa-be8b-4f4e-832e-eafb6fa4a8e0': {'code': 0, 'actual': True, = 'version': 0, 'acquired': True, 'delay': '0.000131729', 'lastCheck': '8.1',= 'valid': True}}<br />Thread-167409::DEBUG::2015-10-12 10:12:32,265::task::= 595::Storage.TaskManager.Task::(_updateState) Task=3D`7d55817b-a5c4-4c27-b2= d5-e892ba645476`::moving from state preparing -> state finished<br />Thr= ead-167409::DEBUG::2015-10-12 10:12:32,265::resourceManager::940::Storage= =2EResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resourc= es {}<br />Thread-167409::DEBUG::2015-10-12 10:12:32,265::resourceManager::= 977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}= <br />Thread-167409::DEBUG::2015-10-12 10:12:32,265::task::993::Storage.Tas= kManager.Task::(_decref) Task=3D`7d55817b-a5c4-4c27-b2d5-e892ba645476`::ref= 0 aborting False<br />Thread-167409::DEBUG::2015-10-12 10:12:32,268::stomp= Reactor::163::yajsonrpc.StompServer::(send) Sending response<br />JsonRpc (= StompReactor)::DEBUG::2015-10-12 10:12:32,275::stompReactor::98::Broker.Sto= mpAdapter::(handle_frame) Handling message <StompFrame command=3D'SEND'&= gt;<br />JsonRpcServer::DEBUG::2015-10-12 10:12:32,278::__init__::530::json= rpc.JsonRpcServer::(serve_requests) Waiting for request<br />Thread-167410:= :DEBUG::2015-10-12 10:12:32,283::stompReactor::163::yajsonrpc.StompServer::= (send) Sending response</p> <p> </p> <div> </div> </body></html> --=_d69a1f5c639decb7303d9c14e20f8797--

On Mon, Oct 12, 2015 at 11:14 AM, Nico <gluster@distran.org> wrote:
Le 2015-10-12 09:59, Nir Soffer a écrit :
On Sun, Oct 11, 2015 at 6:43 PM, Nico <gluster@distran.org> wrote:
Recently, i built a small oVirt platform with 2 dedicated servers and GlusterFS to synch the VM storage. Bricks:
Brick1: ovirt01:/gluster/ovirt
Brick2: ovirt02:/gluster/ovirt
This looks like replica 2 - this is not supported.
You can use either replica 1 (testing) or replica 3 (production).
But when i check /var/log/ovirt/engine.log on ovirt01, there are error in loop every 2 seconds:
To understand such error we need to see the vdsm log.
Nir
Yeah it is replica 2 as i've only 2 dedicated servers.
why are you saying it is not supported ? Through oVirt GUI, it is possible to create a Gluster Volume with 2 bricks in repllcate mode; i tried it also.
Yes, engine will let you use such volume in 3.5 - this is a bug. In 3.6 you will not be able to use such setup. replica 2 fails in a very bad way when one brick is down; the application may get stale data, and this breaks sanlock. You will be get stuck with spm that cannot be stopped and other fun stuff. You don't want to go in this direction, and we will not be able to support that.
here the last entries of vdsm.log
We need the whole file. I suggest you file an ovirt bug and attach the full vdsm log file showing the timeframe of the error. Probably from the time you created the glusterfs domain. Nir
hread-167405::DEBUG::2015-10-12 10:12:20,132::stompReactor::163::yajsonrpc.StompServer::(send) Sending response Thread-55245::DEBUG::2015-10-12 10:12:22,529::task::595::Storage.TaskManager.Task::(_updateState) Task=`c887acfa-bd10-4dfb-9374-da607c133e68`::moving from state init -> state preparing Thread-55245::INFO::2015-10-12 10:12:22,530::logUtils::44::dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID='d44ee4b0-8d36-467a-9610-c682a618b698', spUUID='0ae7120a-430d-4534-9a7e-59c53fb2e804', imgUUID='3454b077-297b-4b89-b8ce-a77f6ec5d22e', volUUID='933da0b6-6a05-4e64-958a-e1c030cf5ddb', options=None) Thread-55245::INFO::2015-10-12 10:12:22,535::logUtils::47::dispatcher::(wrapper) Run and protect: getVolumeSize, Return response: {'truesize': '158983839744', 'apparentsize': '161061273600'} Thread-55245::DEBUG::2015-10-12 10:12:22,535::task::1191::Storage.TaskManager.Task::(prepare) Task=`c887acfa-bd10-4dfb-9374-da607c133e68`::finished: {'truesize': '158983839744', 'apparentsize': '161061273600'} Thread-55245::DEBUG::2015-10-12 10:12:22,535::task::595::Storage.TaskManager.Task::(_updateState) Task=`c887acfa-bd10-4dfb-9374-da607c133e68`::moving from state preparing -> state finished Thread-55245::DEBUG::2015-10-12 10:12:22,535::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-55245::DEBUG::2015-10-12 10:12:22,536::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-55245::DEBUG::2015-10-12 10:12:22,536::task::993::Storage.TaskManager.Task::(_decref) Task=`c887acfa-bd10-4dfb-9374-da607c133e68`::ref 0 aborting False Thread-55245::DEBUG::2015-10-12 10:12:22,545::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 80 edom: 20 level: 2 message: metadata not found: Requested metadata element is not present JsonRpc (StompReactor)::DEBUG::2015-10-12 10:12:23,138::stompReactor::98::Broker.StompAdapter::(handle_frame) Handling message <StompFrame command='SEND'> JsonRpcServer::DEBUG::2015-10-12 10:12:23,139::__init__::530::jsonrpc.JsonRpcServer::(serve_requests) Waiting for request Thread-167406::DEBUG::2015-10-12 10:12:23,142::stompReactor::163::yajsonrpc.StompServer::(send) Sending response Thread-37810::DEBUG::2015-10-12 10:12:24,194::fileSD::262::Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd if=/rhev/data-center/mnt/ovirt01:_data_iso/5aec30fa-be8b-4f4e-832e-eafb6fa4a8e0/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None) Thread-37810::DEBUG::2015-10-12 10:12:24,201::fileSD::262::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n317 bytes (317 B) copied, 0.000131729 s, 2.4 MB/s\n'; <rc> = 0 JsonRpc (StompReactor)::DEBUG::2015-10-12 10:12:26,148::stompReactor::98::Broker.StompAdapter::(handle_frame) Handling message <StompFrame command='SEND'> JsonRpcServer::DEBUG::2015-10-12 10:12:26,149::__init__::530::jsonrpc.JsonRpcServer::(serve_requests) Waiting for request Thread-167407::DEBUG::2015-10-12 10:12:26,151::stompReactor::163::yajsonrpc.StompServer::(send) Sending response VM Channels Listener::DEBUG::2015-10-12 10:12:26,972::vmchannels::96::vds::(_handle_timeouts) Timeout on fileno 35. Thread-30::DEBUG::2015-10-12 10:12:28,358::fileSD::262::Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd if=/rhev/data-center/mnt/glusterSD/localhost:_ovirt/d44ee4b0-8d36-467a-9610-c682a618b698/dom_md/metadata iflag=direct of=/dev/null bs=4096 count=1 (cwd None) Thread-30::DEBUG::2015-10-12 10:12:28,451::fileSD::262::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n470 bytes (470 B) copied, 0.000152738 s, 3.1 MB/s\n'; <rc> = 0 JsonRpc (StompReactor)::DEBUG::2015-10-12 10:12:29,157::stompReactor::98::Broker.StompAdapter::(handle_frame) Handling message <StompFrame command='SEND'> JsonRpcServer::DEBUG::2015-10-12 10:12:29,252::__init__::530::jsonrpc.JsonRpcServer::(serve_requests) Waiting for request Thread-167408::DEBUG::2015-10-12 10:12:29,254::stompReactor::163::yajsonrpc.StompServer::(send) Sending response JsonRpc (StompReactor)::DEBUG::2015-10-12 10:12:32,260::stompReactor::98::Broker.StompAdapter::(handle_frame) Handling message <StompFrame command='SEND'> JsonRpcServer::DEBUG::2015-10-12 10:12:32,262::__init__::530::jsonrpc.JsonRpcServer::(serve_requests) Waiting for request Thread-167409::DEBUG::2015-10-12 10:12:32,264::task::595::Storage.TaskManager.Task::(_updateState) Task=`7d55817b-a5c4-4c27-b2d5-e892ba645476`::moving from state init -> state preparing Thread-167409::INFO::2015-10-12 10:12:32,264::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None) Thread-167409::INFO::2015-10-12 10:12:32,265::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {u'd44ee4b0-8d36-467a-9610-c682a618b698': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000152738', 'lastCheck': '3.6', 'valid': True}, u'5aec30fa-be8b-4f4e-832e-eafb6fa4a8e0': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000131729', 'lastCheck': '8.1', 'valid': True}} Thread-167409::DEBUG::2015-10-12 10:12:32,265::task::1191::Storage.TaskManager.Task::(prepare) Task=`7d55817b-a5c4-4c27-b2d5-e892ba645476`::finished: {u'd44ee4b0-8d36-467a-9610-c682a618b698': {'code': 0, 'actual': True, 'version': 3, 'acquired': True, 'delay': '0.000152738', 'lastCheck': '3.6', 'valid': True}, u'5aec30fa-be8b-4f4e-832e-eafb6fa4a8e0': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000131729', 'lastCheck': '8.1', 'valid': True}} Thread-167409::DEBUG::2015-10-12 10:12:32,265::task::595::Storage.TaskManager.Task::(_updateState) Task=`7d55817b-a5c4-4c27-b2d5-e892ba645476`::moving from state preparing -> state finished Thread-167409::DEBUG::2015-10-12 10:12:32,265::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-167409::DEBUG::2015-10-12 10:12:32,265::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-167409::DEBUG::2015-10-12 10:12:32,265::task::993::Storage.TaskManager.Task::(_decref) Task=`7d55817b-a5c4-4c27-b2d5-e892ba645476`::ref 0 aborting False Thread-167409::DEBUG::2015-10-12 10:12:32,268::stompReactor::163::yajsonrpc.StompServer::(send) Sending response JsonRpc (StompReactor)::DEBUG::2015-10-12 10:12:32,275::stompReactor::98::Broker.StompAdapter::(handle_frame) Handling message <StompFrame command='SEND'> JsonRpcServer::DEBUG::2015-10-12 10:12:32,278::__init__::530::jsonrpc.JsonRpcServer::(serve_requests) Waiting for request Thread-167410::DEBUG::2015-10-12 10:12:32,283::stompReactor::163::yajsonrpc.StompServer::(send) Sending response

--=_31eaa4a1620c103e9af9f3ded80f2bd2 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Le 2015-10-12 14:04, Nir Soffer a écrit :
Yes, engine will let you use such volume in 3.5 - this is a bug. In 3.6 you will not be able to use such setup.
replica 2 fails in a very bad way when one brick is down; the application may get stale data, and this breaks sanlock. You will be get stuck with spm that cannot be stopped and other fun stuff.
You don't want to go in this direction, and we will not be able to support that.
here the last entries of vdsm.log
We need the whole file.
I suggest you file an ovirt bug and attach the full vdsm log file showing the timeframe of the error. Probably from the time you created the glusterfs domain.
Nir
Please find the full logs there: https://94.23.2.63/log_vdsm/vdsm.log [1] https://94.23.2.63/log_vdsm/ [2] https://94.23.2.63/log_engine/ [3] Links: ------ [1] https://94.23.2.63/log_vdsm/vdsm.log [2] https://94.23.2.63/log_vdsm/ [3] https://94.23.2.63/log_engine/ --=_31eaa4a1620c103e9af9f3ded80f2bd2 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=UTF-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; charset= =3DUTF-8" /></head><body style=3D'font-size: 10pt; font-family: Verdana,Gen= eva,sans-serif'> <p>Le 2015-10-12 14:04, Nir Soffer a écrit :</p> <blockquote type=3D"cite" style=3D"padding: 0 0.4em; border-left: #1010ff 2= px solid; margin: 0"><!-- html ignored --><!-- head ignored --><!-- meta ig= nored --> <div class=3D"pre" style=3D"margin: 0; padding: 0; font-family: monospace">= <br /> Yes, engine will let you use such volume in 3.5 - this is a bug. In = 3.6 you will<br /> not be able to use such setup.<br /> <br /> replica 2 fa= ils in a very bad way when one brick is down; the<br /> application may get= <br /> stale data, and this breaks sanlock. You will be get stuck with spm<= br /> that cannot be<br /> stopped and other fun stuff.<br /> <br /> You do= n't want to go in this direction, and we will not be able to support that= =2E<br /> <br /> <blockquote type=3D"cite" style=3D"padding: 0 0.4em; border-left: #1010ff 2= px solid; margin: 0">here the last entries of vdsm.log</blockquote> <br /> We need the whole file.<br /> <br /> I suggest you file an ovirt bug= and attach the full vdsm log file<br /> showing the timeframe of<br /> the= error. Probably from the time you created the glusterfs domain.<br /> <br = /> Nir<br /> <br /></div> </blockquote> <p> </p> <p>Please find the full logs there:</p> <p> </p> <p><a href=3D"https://94.23.2.63/log_vdsm/vdsm.log">https://94.23.2.63/log_= vdsm/vdsm.log</a></p> <p><a href=3D"https://94.23.2.63/log_vdsm/">https://94.23.2.63/log_vdsm/</a=
</p> <p><a href=3D"https://94.23.2.63/log_engine/">https://94.23.2.63/log_engine= /</a></p> <p> </p> <p> </p> <p> </p> <div> </div> </body></html>
--=_31eaa4a1620c103e9af9f3ded80f2bd2--

This is a multi-part message in MIME format. --------------090203080600010300040807 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit On 10/12/2015 06:43 PM, Nico wrote:
Le 2015-10-12 14:04, Nir Soffer a écrit :
Yes, engine will let you use such volume in 3.5 - this is a bug. In 3.6 you will not be able to use such setup.
replica 2 fails in a very bad way when one brick is down; the application may get stale data, and this breaks sanlock. You will be get stuck with spm that cannot be stopped and other fun stuff.
You don't want to go in this direction, and we will not be able to support that.
here the last entries of vdsm.log
We need the whole file.
I suggest you file an ovirt bug and attach the full vdsm log file showing the timeframe of the error. Probably from the time you created the glusterfs domain.
Nir
Please find the full logs there:
The engine log looping with "Volume contains apparently corrupt bricks"- is when engine tries to get information from gluster CLI about the volumes and updates its database. These errors do not affect the functioning of the storage domain and running virtual machines, but affect the monitoring/management of the gluster volume from oVirt. Now to identify the cause of the error - the logs indicate that the gluster's server uuid has either not been updated/ or is different in the engine. Could be one of these scenarios 1. Did you create the cluster with only virt service enabled and later enable gluster service? In this case, the gluster server uuid may not be updated. You will need to put host to maintenance and then activate it to resolve this 2. Did you re-install the gluster server nodes after adding it to oVirt? If this is the case, we need to investigate further how there's a mismatch.
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
--------------090203080600010300040807 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: 8bit <html> <head> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> </head> <body text="#000000" bgcolor="#FFFFFF"> <br> <br> <div class="moz-cite-prefix">On 10/12/2015 06:43 PM, Nico wrote:<br> </div> <blockquote cite="mid:90c06d56781006f10ad99b5ad02fe43a@lienard.name" type="cite"> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <p>Le 2015-10-12 14:04, Nir Soffer a écrit :</p> <blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0"><!-- html ignored --><!-- head ignored --><!-- meta ignored --> <div class="pre" style="margin: 0; padding: 0; font-family: monospace"><br> Yes, engine will let you use such volume in 3.5 - this is a bug. In 3.6 you will<br> not be able to use such setup.<br> <br> replica 2 fails in a very bad way when one brick is down; the<br> application may get<br> stale data, and this breaks sanlock. You will be get stuck with spm<br> that cannot be<br> stopped and other fun stuff.<br> <br> You don't want to go in this direction, and we will not be able to support that.<br> <br> <blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0">here the last entries of vdsm.log</blockquote> <br> We need the whole file.<br> <br> I suggest you file an ovirt bug and attach the full vdsm log file<br> showing the timeframe of<br> the error. Probably from the time you created the glusterfs domain.<br> <br> Nir<br> <br> </div> </blockquote> <p> </p> <p>Please find the full logs there:</p> <p> </p> <p><a moz-do-not-send="true" href="https://94.23.2.63/log_vdsm/vdsm.log">https://94.23.2.63/log_vdsm/vdsm.log</a></p> <p><a moz-do-not-send="true" href="https://94.23.2.63/log_vdsm/">https://94.23.2.63/log_vdsm/</a></p> <p><a moz-do-not-send="true" href="https://94.23.2.63/log_engine/">https://94.23.2.63/log_engine/</a></p> </blockquote> <br> The engine log looping with "Volume contains apparently corrupt bricks"- is when engine tries to get information from gluster CLI about the volumes and updates its database. These errors do not affect the functioning of the storage domain and running virtual machines, but affect the monitoring/management of the gluster volume from oVirt.<br> <br> Now to identify the cause of the error - the logs indicate that the gluster's server uuid has either not been updated/ or is different in the engine. Could be one of these scenarios<br> 1. Did you create the cluster with only virt service enabled and later enable gluster service? In this case, the gluster server uuid may not be updated. You will need to put host to maintenance and then activate it to resolve this<br> <br> 2. Did you re-install the gluster server nodes after adding it to oVirt? If this is the case, we need to investigate further how there's a mismatch.<br> <br> <br> <blockquote cite="mid:90c06d56781006f10ad99b5ad02fe43a@lienard.name" type="cite"> <p> </p> <p> </p> <p> </p> <div> </div> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ Users mailing list <a class="moz-txt-link-abbreviated" href="mailto:Users@ovirt.org">Users@ovirt.org</a> <a class="moz-txt-link-freetext" href="http://lists.ovirt.org/mailman/listinfo/users">http://lists.ovirt.org/mailman/listinfo/users</a> </pre> </blockquote> <br> </body> </html> --------------090203080600010300040807--

--=_22a8f6a92661e1f6532f09f6ce07eee9 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Le 2015-10-12 14:04, Nir Soffer a écrit :
On Mon, Oct 12, 2015 at 11:14 AM, Nico <gluster@distran.org> wrote:
Yes, engine will let you use such volume in 3.5 - this is a bug. In 3.6 you will not be able to use such setup.
replica 2 fails in a very bad way when one brick is down; the application may get stale data, and this breaks sanlock. You will be get stuck with spm that cannot be stopped and other fun stuff.
You don't want to go in this direction, and we will not be able to support that.
For the record, I already rebooted node1; and the node2 took over the existing VM from node 1 and vice-versa. GlusterFS worked fine, oVirt application was still working fine .. i guess it is because it was a soft reboot which stops softly the services. I got another case where i stuck the network on the 2 nodes simultaneously after a bad manipulation on oVirt GUI and i got a split brain. i kept the error at this very moment: root@devnix-virt-master02 nets]# gluster volume heal ovirt info split-brain Brick devnix-virt-master01:/gluster/ovirt/ /d44ee4b0-8d36-467a-9610-c682a618b698/dom_md/ids Number of entries in split-brain: 1 Brick devnix-virt-master02:/gluster/ovirt/ /d44ee4b0-8d36-467a-9610-c682a618b698/dom_md/ids Number of entries in split-brain: 1 This file was having same size on both nodes; so it was hard to select one. Finally i chose the younger one and all was back online after the heal. It is this kind of stuff you are talking about with 2 nodes ? For now, I don't have budget to take a third one; so i'm a bit stuck and disappointing. I've a third device but for backup, it has lot of storage but low cpu abilities (no VT-X) so i can't use it as hypervisor. I could maybe use it as a third brick but is it possible to have this kind of configuration ? 2 actives nodes as hypervisor and 1 third only for gluster replica 3 ? Cheers Nico --=_22a8f6a92661e1f6532f09f6ce07eee9 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=UTF-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; charset= =3DUTF-8" /></head><body style=3D'font-size: 10pt; font-family: Verdana,Gen= eva,sans-serif'> <p>Le 2015-10-12 14:04, Nir Soffer a écrit :</p> <blockquote type=3D"cite" style=3D"padding: 0 0.4em; border-left: #1010ff 2= px solid; margin: 0"><!-- html ignored --><!-- head ignored --><!-- meta ig= nored --> <div class=3D"pre" style=3D"margin: 0; padding: 0; font-family: monospace"> <blockquote type=3D"cite" style=3D"padding: 0 0.4em; border-left: #1010ff 2= px solid; margin: 0">On Mon, Oct 12, 2015 at 11:14 AM, Nico <<a href=3D"= mailto:gluster@distran.org">gluster@distran.org</a>> wrote:</blockquote> <br /> Yes, engine will let you use such volume in 3.5 - this is a bug. In = 3.6 you will<br /> not be able to use such setup.<br /> <br /> replica 2 fa= ils in a very bad way when one brick is down; the<br /> application may get= <br /> stale data, and this breaks sanlock. You will be get stuck with spm<= br /> that cannot be<br /> stopped and other fun stuff.<br /> <br /> You do= n't want to go in this direction, and we will not be able to support that= =2E<br /> </div> </blockquote> <p> </p> <p>For the record, I already rebooted node1; and the node2 took over the ex= isting VM from node 1 and vice-versa.</p> <p>GlusterFS worked fine, oVirt application was still working fin= e .. i guess it is because it was a soft reboot which stops soft= ly the services.</p> <p>I got another case where i stuck the network on the 2 nodes simultaneous= ly after a bad manipulation on oVirt GUI and i got a split brain.</p> <p>i kept the error at this very moment:</p> <p>root@devnix-virt-master02 nets]# gluster volume heal ovirt info split-br= ain<br />Brick devnix-virt-master01:/gluster/ovirt/<br />/d44ee4b0-8d36-467= a-9610-c682a618b698/dom_md/ids<br />Number of entries in split-brain: 1</p> <p>Brick devnix-virt-master02:/gluster/ovirt/<br />/d44ee4b0-8d36-467a-9610= -c682a618b698/dom_md/ids<br />Number of entries in split-brain: 1</p> <p> </p> <p> </p> <p>This file was having same size on both nodes; so it was hard to select o= ne. Finally i chose the younger one and all was back online after the heal= =2E</p> <p>It is this kind of stuff you are talking about with 2 nodes ?</p> <p> </p> <p>For now, I don't have budget to take a third one; so i'm a bit stuck and= disappointing.</p> <p>I've a third device but for backup, it has lot of storage but low cpu ab= ilities (no VT-X) so i can't use it as hypervisor.</p> <p>I could maybe use it as a third brick but is it possible to have this ki= nd of configuration ? 2 actives nodes as hypervisor and 1 third only for gl= uster replica 3 ?</p> <p>Cheers</p> <p>Nico</p> <p> </p> <p> </p> <div> </div> </body></html> --=_22a8f6a92661e1f6532f09f6ce07eee9--
participants (3)
-
Nico
-
Nir Soffer
-
Sahina Bose