<div dir="ltr"><div><div><div><div><div>I just discovered in the logs several troubles:<br></div>1) the rdma support was not installed from glusterfs (but the RDMA check box was selected)<br></div>2) somehow every second during the resync the connection was going down and up...<br></div>3)Due to 2) the hosts are restarging daemon glusterfs several times, with correct parameters and with no parameters.. they where giving conflict and one other other was overtaking.<br></div>Maybe the fault was due to the onboot enabled glusterfs service.<br><br></div>I can try to destroy whole cluster and reinstall from scratch to see if we can figure-out why the vol config files are disappears.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Mar 2, 2017 at 5:34 AM, Ramesh Nachimuthu <span dir="ltr"><<a href="mailto:rnachimu@redhat.com" target="_blank">rnachimu@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>
<br>
<br>
<br>
----- Original Message -----<br>
> From: "Arman Khalatyan" <<a href="mailto:arm2arm@gmail.com">arm2arm@gmail.com</a>><br>
</span><span class="">> To: "Ramesh Nachimuthu" <<a href="mailto:rnachimu@redhat.com">rnachimu@redhat.com</a>><br>
> Cc: "users" <<a href="mailto:users@ovirt.org">users@ovirt.org</a>>, "Sahina Bose" <<a href="mailto:sabose@redhat.com">sabose@redhat.com</a>><br>
> Sent: Wednesday, March 1, 2017 11:22:32 PM<br>
> Subject: Re: [ovirt-users] Gluster setup disappears any chance to recover?<br>
><br>
</span><span class="">> ok I will answer by my self:<br>
> yes gluster daemon is managed by vdms:)<br>
> and to recover lost config simply one should add "force" keyword<br>
> gluster volume create GluReplica replica 3 arbiter 1 transport TCP,RDMA<br>
> 10.10.10.44:/zclei22/01/glu 10.10.10.42:/zclei21/01/glu<br>
> 10.10.10.41:/zclei26/01/glu<br>
> force<br>
><br>
> now everything is up an running !<br>
> one annoying thing is epel dependency in the zfs and conflicting ovirt...<br>
> every time one need to enable and then disable epel.<br>
><br>
><br>
<br>
</span>Glusterd service will be started when you add/activate the host in oVirt. It will be configured to start after every reboot.<br>
Volumes disappearing seems to be a serious issue. We have never seen such an issue with XFS file system. Are you able to reproduce this issue consistently?.<br>
<br>
Regards,<br>
Ramesh<br>
<div class="HOEnZb"><div class="h5"><br>
><br>
> On Wed, Mar 1, 2017 at 5:33 PM, Arman Khalatyan <<a href="mailto:arm2arm@gmail.com">arm2arm@gmail.com</a>> wrote:<br>
><br>
> > ok Finally by single brick up and running so I can access to data.<br>
> > Now the question is do we need to run glusterd daemon on startup? or it is<br>
> > managed by vdsmd?<br>
> ><br>
> ><br>
> > On Wed, Mar 1, 2017 at 2:36 PM, Arman Khalatyan <<a href="mailto:arm2arm@gmail.com">arm2arm@gmail.com</a>> wrote:<br>
> ><br>
> >> all folders /var/lib/glusterd/vols/ are empty<br>
> >> In the history of one of the servers I found the command how it was<br>
> >> created:<br>
> >><br>
> >> gluster volume create GluReplica replica 3 arbiter 1 transport TCP,RDMA<br>
> >> 10.10.10.44:/zclei22/01/glu 10.10.10.42:/zclei21/01/glu <a href="http://10.10.10.41" rel="noreferrer" target="_blank">10.10.10.41</a>:<br>
> >> /zclei26/01/glu<br>
> >><br>
> >> But executing this command it claims that:<br>
> >> volume create: GluReplica: failed: /zclei22/01/glu is already part of a<br>
> >> volume<br>
> >><br>
> >> Any chance to force it?<br>
> >><br>
> >><br>
> >><br>
> >> On Wed, Mar 1, 2017 at 12:13 PM, Ramesh Nachimuthu <<a href="mailto:rnachimu@redhat.com">rnachimu@redhat.com</a>><br>
> >> wrote:<br>
> >><br>
> >>><br>
> >>><br>
> >>><br>
> >>><br>
> >>> ----- Original Message -----<br>
> >>> > From: "Arman Khalatyan" <<a href="mailto:arm2arm@gmail.com">arm2arm@gmail.com</a>><br>
> >>> > To: "users" <<a href="mailto:users@ovirt.org">users@ovirt.org</a>><br>
> >>> > Sent: Wednesday, March 1, 2017 3:10:38 PM<br>
> >>> > Subject: Re: [ovirt-users] Gluster setup disappears any chance to<br>
> >>> recover?<br>
> >>> ><br>
> >>> > engine throws following errors:<br>
> >>> > 2017-03-01 10:39:59,608+01 WARN<br>
> >>> > [org.ovirt.engine.core.dal.<wbr>dbbroker.auditloghandling.<wbr>AuditLogDirector]<br>
> >>> > (DefaultQuartzScheduler6) [d7f7d83] EVENT_ID:<br>
> >>> > GLUSTER_VOLUME_DELETED_FROM_<wbr>CLI(4,027), Correlation ID: null, Call<br>
> >>> Stack:<br>
> >>> > null, Custom Event ID: -1, Message: Detected deletion of volume<br>
> >>> GluReplica<br>
> >>> > on cluster HaGLU, and deleted it from engine DB.<br>
> >>> > 2017-03-01 10:39:59,610+01 ERROR<br>
> >>> > [org.ovirt.engine.core.bll.<wbr>gluster.GlusterSyncJob]<br>
> >>> (DefaultQuartzScheduler6)<br>
> >>> > [d7f7d83] Error while removing volumes from database!:<br>
> >>> > org.springframework.dao.<wbr>DataIntegrityViolationExceptio<wbr>n:<br>
> >>> > CallableStatementCallback; SQL [{call deleteglustervolumesbyguids(?)<br>
> >>> }];<br>
> >>> > ERROR: update or delete on table "gluster_volumes" violates foreign key<br>
> >>> > constraint "fk_storage_connection_to_<wbr>glustervolume" on table<br>
> >>> > "storage_server_connections"<br>
> >>> > Detail: Key (id)=(3d8bfa9d-1c83-46ac-b4e9-<wbr>bd317623ed2d) is still<br>
> >>> referenced<br>
> >>> > from table "storage_server_connections".<br>
> >>> > Where: SQL statement "DELETE<br>
> >>> > FROM gluster_volumes<br>
> >>> > WHERE id IN (<br>
> >>> > SELECT *<br>
> >>> > FROM fnSplitterUuid(v_volume_ids)<br>
> >>> > )"<br>
> >>> > PL/pgSQL function deleteglustervolumesbyguids(<wbr>character varying) line<br>
> >>> 3 at<br>
> >>> > SQL statement; nested exception is org.postgresql.util.<wbr>PSQLException:<br>
> >>> ERROR:<br>
> >>> > update or delete on table "gluster_volumes" violates foreign key<br>
> >>> constraint<br>
> >>> > "fk_storage_connection_to_<wbr>glustervolume" on table<br>
> >>> > "storage_server_connections"<br>
> >>> > Detail: Key (id)=(3d8bfa9d-1c83-46ac-b4e9-<wbr>bd317623ed2d) is still<br>
> >>> referenced<br>
> >>> > from table "storage_server_connections".<br>
> >>> > Where: SQL statement "DELETE<br>
> >>> > FROM gluster_volumes<br>
> >>> > WHERE id IN (<br>
> >>> > SELECT *<br>
> >>> > FROM fnSplitterUuid(v_volume_ids)<br>
> >>> > )"<br>
> >>> > PL/pgSQL function deleteglustervolumesbyguids(<wbr>character varying) line<br>
> >>> 3 at<br>
> >>> > SQL statement<br>
> >>> > at<br>
> >>> > org.springframework.jdbc.<wbr>support.<wbr>SQLErrorCodeSQLExceptionTra<br>
> >>> nslator.doTranslate(<wbr>SQLErrorCodeSQLExceptionTransl<wbr>ator.java:243)<br>
> >>> > [spring-jdbc.jar:4.2.4.<wbr>RELEASE]<br>
> >>> > at<br>
> >>> > org.springframework.jdbc.<wbr>support.<wbr>AbstractFallbackSQLExceptio<br>
> >>> nTranslator.translate(<wbr>AbstractFallbackSQLExceptionTr<wbr>anslator.java:73)<br>
> >>> > [spring-jdbc.jar:4.2.4.<wbr>RELEASE]<br>
> >>> > at org.springframework.jdbc.core.<wbr>JdbcTemplate.execute(JdbcTempl<br>
> >>> ate.java:1094)<br>
> >>> > [spring-jdbc.jar:4.2.4.<wbr>RELEASE]<br>
> >>> > at org.springframework.jdbc.core.<wbr>JdbcTemplate.call(JdbcTemplate<br>
> >>> .java:1130)<br>
> >>> > [spring-jdbc.jar:4.2.4.<wbr>RELEASE]<br>
> >>> > at<br>
> >>> > org.springframework.jdbc.core.<wbr>simple.AbstractJdbcCall.execut<br>
> >>> eCallInternal(<wbr>AbstractJdbcCall.java:405)<br>
> >>> > [spring-jdbc.jar:4.2.4.<wbr>RELEASE]<br>
> >>> > at<br>
> >>> > org.springframework.jdbc.core.<wbr>simple.AbstractJdbcCall.doExec<br>
> >>> ute(AbstractJdbcCall.java:365)<br>
> >>> > [spring-jdbc.jar:4.2.4.<wbr>RELEASE]<br>
> >>> > at<br>
> >>> > org.springframework.jdbc.core.<wbr>simple.SimpleJdbcCall.execute(<br>
> >>> SimpleJdbcCall.java:198)<br>
> >>> > [spring-jdbc.jar:4.2.4.<wbr>RELEASE]<br>
> >>> > at<br>
> >>> > org.ovirt.engine.core.dal.<wbr>dbbroker.<wbr>SimpleJdbcCallsHandler.ex<br>
> >>> ecuteImpl(<wbr>SimpleJdbcCallsHandler.java:<wbr>135)<br>
> >>> > [dal.jar:]<br>
> >>> > at<br>
> >>> > org.ovirt.engine.core.dal.<wbr>dbbroker.<wbr>SimpleJdbcCallsHandler.ex<br>
> >>> ecuteImpl(<wbr>SimpleJdbcCallsHandler.java:<wbr>130)<br>
> >>> > [dal.jar:]<br>
> >>> > at<br>
> >>> > org.ovirt.engine.core.dal.<wbr>dbbroker.<wbr>SimpleJdbcCallsHandler.ex<br>
> >>> ecuteModification(<wbr>SimpleJdbcCallsHandler.java:<wbr>76)<br>
> >>> > [dal.jar:]<br>
> >>> > at<br>
> >>> > org.ovirt.engine.core.dao.<wbr>gluster.GlusterVolumeDaoImpl.<wbr>remov<br>
> >>> eAll(GlusterVolumeDaoImpl.<wbr>java:233)<br>
> >>> > [dal.jar:]<br>
> >>> > at<br>
> >>> > org.ovirt.engine.core.bll.<wbr>gluster.GlusterSyncJob.<wbr>removeDelet<br>
> >>> edVolumes(GlusterSyncJob.java:<wbr>521)<br>
> >>> > [bll.jar:]<br>
> >>> > at<br>
> >>> > org.ovirt.engine.core.bll.<wbr>gluster.GlusterSyncJob.<wbr>refreshVolu<br>
> >>> meData(GlusterSyncJob.java:<wbr>465)<br>
> >>> > [bll.jar:]<br>
> >>> > at<br>
> >>> > org.ovirt.engine.core.bll.<wbr>gluster.GlusterSyncJob.<wbr>refreshClus<br>
> >>> terData(GlusterSyncJob.java:<wbr>133)<br>
> >>> > [bll.jar:]<br>
> >>> > at<br>
> >>> > org.ovirt.engine.core.bll.<wbr>gluster.GlusterSyncJob.<wbr>refreshLigh<br>
> >>> tWeightData(GlusterSyncJob.<wbr>java:111)<br>
> >>> > [bll.jar:]<br>
> >>> > at sun.reflect.<wbr>NativeMethodAccessorImpl.<wbr>invoke0(Native Method)<br>
> >>> > [rt.jar:1.8.0_121]<br>
> >>> > at<br>
> >>> > sun.reflect.<wbr>NativeMethodAccessorImpl.<wbr>invoke(NativeMethodAcce<br>
> >>> ssorImpl.java:62)<br>
> >>> > [rt.jar:1.8.0_121]<br>
> >>> > at<br>
> >>> > sun.reflect.<wbr>DelegatingMethodAccessorImpl.<wbr>invoke(DelegatingMe<br>
> >>> thodAccessorImpl.java:43)<br>
> >>> > [rt.jar:1.8.0_121]<br>
> >>> > at java.lang.reflect.Method.<wbr>invoke(Method.java:498) [rt.jar:1.8.0_121]<br>
> >>> > at<br>
> >>> > org.ovirt.engine.core.utils.<wbr>timer.JobWrapper.invokeMethod(<wbr>Jo<br>
> >>> bWrapper.java:77)<br>
> >>> > [scheduler.jar:]<br>
> >>> > at org.ovirt.engine.core.utils.<wbr>timer.JobWrapper.execute(<wbr>JobWrap<br>
> >>> per.java:51)<br>
> >>> > [scheduler.jar:]<br>
> >>> > at org.quartz.core.JobRunShell.<wbr>run(JobRunShell.java:213) [quartz.jar:]<br>
> >>> > at java.util.concurrent.<wbr>Executors$RunnableAdapter.<wbr>call(Executor<br>
> >>> s.java:511)<br>
> >>> > [rt.jar:1.8.0_121]<br>
> >>> > at java.util.concurrent.<wbr>FutureTask.run(FutureTask.<wbr>java:266)<br>
> >>> > [rt.jar:1.8.0_121]<br>
> >>> > at<br>
> >>> > java.util.concurrent.<wbr>ThreadPoolExecutor.runWorker(<wbr>ThreadPool<br>
> >>> Executor.java:1142)<br>
> >>> > [rt.jar:1.8.0_121]<br>
> >>> > at<br>
> >>> > java.util.concurrent.<wbr>ThreadPoolExecutor$Worker.run(<wbr>ThreadPoo<br>
> >>> lExecutor.java:617)<br>
> >>> > [rt.jar:1.8.0_121]<br>
> >>> > at java.lang.Thread.run(Thread.<wbr>java:745) [rt.jar:1.8.0_121]<br>
> >>> > Caused by: org.postgresql.util.<wbr>PSQLException: ERROR: update or delete<br>
> >>> on<br>
> >>> > table "gluster_volumes" violates foreign key constraint<br>
> >>> > "fk_storage_connection_to_<wbr>glustervolume" on table<br>
> >>> > "storage_server_connections"<br>
> >>> > Detail: Key (id)=(3d8bfa9d-1c83-46ac-b4e9-<wbr>bd317623ed2d) is still<br>
> >>> referenced<br>
> >>> > from table "storage_server_connections".<br>
> >>> > Where: SQL statement "DELETE<br>
> >>> > FROM gluster_volumes<br>
> >>> > WHERE id IN (<br>
> >>> > SELECT *<br>
> >>> > FROM fnSplitterUuid(v_volume_ids)<br>
> >>> > )"<br>
> >>> > PL/pgSQL function deleteglustervolumesbyguids(<wbr>character varying) line<br>
> >>> 3 at<br>
> >>> > SQL statement<br>
> >>> > at<br>
> >>> > org.postgresql.core.v3.<wbr>QueryExecutorImpl.<wbr>receiveErrorRespons<br>
> >>> e(QueryExecutorImpl.java:2157)<br>
> >>> > at<br>
> >>> > org.postgresql.core.v3.<wbr>QueryExecutorImpl.<wbr>processResults(Quer<br>
> >>> yExecutorImpl.java:1886)<br>
> >>> > at<br>
> >>> > org.postgresql.core.v3.<wbr>QueryExecutorImpl.execute(<wbr>QueryExecut<br>
> >>> orImpl.java:255)<br>
> >>> > at<br>
> >>> > org.postgresql.jdbc2.<wbr>AbstractJdbc2Statement.<wbr>execute(Abstract<br>
> >>> Jdbc2Statement.java:555)<br>
> >>> > at<br>
> >>> > org.postgresql.jdbc2.<wbr>AbstractJdbc2Statement.<wbr>executeWithFlags<br>
> >>> (AbstractJdbc2Statement.java:<wbr>417)<br>
> >>> > at<br>
> >>> > org.postgresql.jdbc2.<wbr>AbstractJdbc2Statement.<wbr>execute(Abstract<br>
> >>> Jdbc2Statement.java:410)<br>
> >>> > at<br>
> >>> > org.jboss.jca.adapters.jdbc.<wbr>CachedPreparedStatement.<wbr>execute(<br>
> >>> CachedPreparedStatement.java:<wbr>303)<br>
> >>> > at<br>
> >>> > org.jboss.jca.adapters.jdbc.<wbr>WrappedPreparedStatement.<wbr>execute<br>
> >>> (WrappedPreparedStatement.<wbr>java:442)<br>
> >>> > at<br>
> >>> > org.springframework.jdbc.core.<wbr>JdbcTemplate$6.doInCallableSta<br>
> >>> tement(JdbcTemplate.java:1133)<br>
> >>> > [spring-jdbc.jar:4.2.4.<wbr>RELEASE]<br>
> >>> > at<br>
> >>> > org.springframework.jdbc.core.<wbr>JdbcTemplate$6.doInCallableSta<br>
> >>> tement(JdbcTemplate.java:1130)<br>
> >>> > [spring-jdbc.jar:4.2.4.<wbr>RELEASE]<br>
> >>> > at org.springframework.jdbc.core.<wbr>JdbcTemplate.execute(JdbcTempl<br>
> >>> ate.java:1078)<br>
> >>> > [spring-jdbc.jar:4.2.4.<wbr>RELEASE]<br>
> >>> > ... 24 more<br>
> >>> ><br>
> >>> ><br>
> >>> ><br>
> >>><br>
> >>> This is a side effect volume deletion in the gluster side. Looks like<br>
> >>> you have storage domains created using those volumes.<br>
> >>><br>
> >>> > On Wed, Mar 1, 2017 at 9:49 AM, Arman Khalatyan < <a href="mailto:arm2arm@gmail.com">arm2arm@gmail.com</a> ><br>
> >>> wrote:<br>
> >>> ><br>
> >>> ><br>
> >>> ><br>
> >>> > Hi,<br>
> >>> > I just tested power cut on the test system:<br>
> >>> ><br>
> >>> > Cluster with 3-Hosts each host has 4TB localdisk with zfs on it<br>
> >>> /zhost/01/glu<br>
> >>> > folder as a brick.<br>
> >>> ><br>
> >>> > Glusterfs was with replicated to 3 disks with arbiter. So far so good.<br>
> >>> Vm was<br>
> >>> > up an running with 5oGB OS disk: dd was showing 100-70MB/s performance<br>
> >>> with<br>
> >>> > the Vm disk.<br>
> >>> > I just simulated disaster powercut: with ipmi power-cycle all 3 hosts<br>
> >>> same<br>
> >>> > time.<br>
> >>> > the result is all hosts are green up and running but bricks are down.<br>
> >>> > in the processes I can see:<br>
> >>> > ps aux | grep gluster<br>
> >>> > root 16156 0.8 0.0 475360 16964 ? Ssl 08:47 0:00 /usr/sbin/glusterd -p<br>
> >>> > /var/run/glusterd.pid --log-level INFO<br>
> >>> ><br>
> >>> > What happened with my volume setup??<br>
> >>> > Is it possible to recover it??<br>
> >>> > [root@clei21 ~]# gluster peer status<br>
> >>> > Number of Peers: 2<br>
> >>> ><br>
> >>> > Hostname: clei22.cls<br>
> >>> > Uuid: 96b52c7e-3526-44fd-af80-<wbr>14a3073ebac2<br>
> >>> > State: Peer in Cluster (Connected)<br>
> >>> > Other names:<br>
> >>> > 192.168.101.40<br>
> >>> > 10.10.10.44<br>
> >>> ><br>
> >>> > Hostname: clei26.cls<br>
> >>> > Uuid: c9fab907-5053-41a8-a1fa-<wbr>d069f34e42dc<br>
> >>> > State: Peer in Cluster (Connected)<br>
> >>> > Other names:<br>
> >>> > 10.10.10.41<br>
> >>> > [root@clei21 ~]# gluster volume info<br>
> >>> > No volumes present<br>
> >>> > [root@clei21 ~]#<br>
> >>><br>
> >>> I not sure why all volumes are getting deleted after reboot. Do you see<br>
> >>> any vol files under the directory /var/lib/glusterd/vols/?. Also<br>
> >>> /var/log/glusterfs/cmd_<wbr>history.log should have all the gluster commands<br>
> >>> executed.<br>
> >>><br>
> >>> Regards,<br>
> >>> Ramesh<br>
> >>><br>
> >>> ><br>
> >>> ><br>
> >>> ><br>
> >>> > ______________________________<wbr>_________________<br>
> >>> > Users mailing list<br>
> >>> > <a href="mailto:Users@ovirt.org">Users@ovirt.org</a><br>
> >>> > <a href="http://lists.ovirt.org/mailman/listinfo/users" rel="noreferrer" target="_blank">http://lists.ovirt.org/<wbr>mailman/listinfo/users</a><br>
> >>> ><br>
> >>><br>
> >><br>
> >><br>
> ><br>
><br>
</div></div></blockquote></div><br></div>