[ovirt-users] Gluster setup disappears any chance to recover?
Arman Khalatyan
arm2arm at gmail.com
Thu Mar 2 09:27:37 UTC 2017
forgot to mention number 4) my fault was with glustefs on zfs: setup was
with the xattr=on one should put xattr=sa
On Thu, Mar 2, 2017 at 10:08 AM, Arman Khalatyan <arm2arm at gmail.com> wrote:
> I just discovered in the logs several troubles:
> 1) the rdma support was not installed from glusterfs (but the RDMA check
> box was selected)
> 2) somehow every second during the resync the connection was going down
> and up...
> 3)Due to 2) the hosts are restarging daemon glusterfs several times, with
> correct parameters and with no parameters.. they where giving conflict and
> one other other was overtaking.
> Maybe the fault was due to the onboot enabled glusterfs service.
>
> I can try to destroy whole cluster and reinstall from scratch to see if we
> can figure-out why the vol config files are disappears.
>
> On Thu, Mar 2, 2017 at 5:34 AM, Ramesh Nachimuthu <rnachimu at redhat.com>
> wrote:
>
>>
>>
>>
>>
>> ----- Original Message -----
>> > From: "Arman Khalatyan" <arm2arm at gmail.com>
>> > To: "Ramesh Nachimuthu" <rnachimu at redhat.com>
>> > Cc: "users" <users at ovirt.org>, "Sahina Bose" <sabose at redhat.com>
>> > Sent: Wednesday, March 1, 2017 11:22:32 PM
>> > Subject: Re: [ovirt-users] Gluster setup disappears any chance to
>> recover?
>> >
>> > ok I will answer by my self:
>> > yes gluster daemon is managed by vdms:)
>> > and to recover lost config simply one should add "force" keyword
>> > gluster volume create GluReplica replica 3 arbiter 1 transport TCP,RDMA
>> > 10.10.10.44:/zclei22/01/glu 10.10.10.42:/zclei21/01/glu
>> > 10.10.10.41:/zclei26/01/glu
>> > force
>> >
>> > now everything is up an running !
>> > one annoying thing is epel dependency in the zfs and conflicting
>> ovirt...
>> > every time one need to enable and then disable epel.
>> >
>> >
>>
>> Glusterd service will be started when you add/activate the host in oVirt.
>> It will be configured to start after every reboot.
>> Volumes disappearing seems to be a serious issue. We have never seen such
>> an issue with XFS file system. Are you able to reproduce this issue
>> consistently?.
>>
>> Regards,
>> Ramesh
>>
>> >
>> > On Wed, Mar 1, 2017 at 5:33 PM, Arman Khalatyan <arm2arm at gmail.com>
>> wrote:
>> >
>> > > ok Finally by single brick up and running so I can access to data.
>> > > Now the question is do we need to run glusterd daemon on startup? or
>> it is
>> > > managed by vdsmd?
>> > >
>> > >
>> > > On Wed, Mar 1, 2017 at 2:36 PM, Arman Khalatyan <arm2arm at gmail.com>
>> wrote:
>> > >
>> > >> all folders /var/lib/glusterd/vols/ are empty
>> > >> In the history of one of the servers I found the command how it was
>> > >> created:
>> > >>
>> > >> gluster volume create GluReplica replica 3 arbiter 1 transport
>> TCP,RDMA
>> > >> 10.10.10.44:/zclei22/01/glu 10.10.10.42:/zclei21/01/glu 10.10.10.41:
>> > >> /zclei26/01/glu
>> > >>
>> > >> But executing this command it claims that:
>> > >> volume create: GluReplica: failed: /zclei22/01/glu is already part
>> of a
>> > >> volume
>> > >>
>> > >> Any chance to force it?
>> > >>
>> > >>
>> > >>
>> > >> On Wed, Mar 1, 2017 at 12:13 PM, Ramesh Nachimuthu <
>> rnachimu at redhat.com>
>> > >> wrote:
>> > >>
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>> ----- Original Message -----
>> > >>> > From: "Arman Khalatyan" <arm2arm at gmail.com>
>> > >>> > To: "users" <users at ovirt.org>
>> > >>> > Sent: Wednesday, March 1, 2017 3:10:38 PM
>> > >>> > Subject: Re: [ovirt-users] Gluster setup disappears any chance to
>> > >>> recover?
>> > >>> >
>> > >>> > engine throws following errors:
>> > >>> > 2017-03-01 10:39:59,608+01 WARN
>> > >>> > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLo
>> gDirector]
>> > >>> > (DefaultQuartzScheduler6) [d7f7d83] EVENT_ID:
>> > >>> > GLUSTER_VOLUME_DELETED_FROM_CLI(4,027), Correlation ID: null,
>> Call
>> > >>> Stack:
>> > >>> > null, Custom Event ID: -1, Message: Detected deletion of volume
>> > >>> GluReplica
>> > >>> > on cluster HaGLU, and deleted it from engine DB.
>> > >>> > 2017-03-01 10:39:59,610+01 ERROR
>> > >>> > [org.ovirt.engine.core.bll.gluster.GlusterSyncJob]
>> > >>> (DefaultQuartzScheduler6)
>> > >>> > [d7f7d83] Error while removing volumes from database!:
>> > >>> > org.springframework.dao.DataIntegrityViolationException:
>> > >>> > CallableStatementCallback; SQL [{call
>> deleteglustervolumesbyguids(?)
>> > >>> }];
>> > >>> > ERROR: update or delete on table "gluster_volumes" violates
>> foreign key
>> > >>> > constraint "fk_storage_connection_to_glustervolume" on table
>> > >>> > "storage_server_connections"
>> > >>> > Detail: Key (id)=(3d8bfa9d-1c83-46ac-b4e9-bd317623ed2d) is still
>> > >>> referenced
>> > >>> > from table "storage_server_connections".
>> > >>> > Where: SQL statement "DELETE
>> > >>> > FROM gluster_volumes
>> > >>> > WHERE id IN (
>> > >>> > SELECT *
>> > >>> > FROM fnSplitterUuid(v_volume_ids)
>> > >>> > )"
>> > >>> > PL/pgSQL function deleteglustervolumesbyguids(character varying)
>> line
>> > >>> 3 at
>> > >>> > SQL statement; nested exception is org.postgresql.util.PSQLExcept
>> ion:
>> > >>> ERROR:
>> > >>> > update or delete on table "gluster_volumes" violates foreign key
>> > >>> constraint
>> > >>> > "fk_storage_connection_to_glustervolume" on table
>> > >>> > "storage_server_connections"
>> > >>> > Detail: Key (id)=(3d8bfa9d-1c83-46ac-b4e9-bd317623ed2d) is still
>> > >>> referenced
>> > >>> > from table "storage_server_connections".
>> > >>> > Where: SQL statement "DELETE
>> > >>> > FROM gluster_volumes
>> > >>> > WHERE id IN (
>> > >>> > SELECT *
>> > >>> > FROM fnSplitterUuid(v_volume_ids)
>> > >>> > )"
>> > >>> > PL/pgSQL function deleteglustervolumesbyguids(character varying)
>> line
>> > >>> 3 at
>> > >>> > SQL statement
>> > >>> > at
>> > >>> > org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTra
>> > >>> nslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:243)
>> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
>> > >>> > at
>> > >>> > org.springframework.jdbc.support.AbstractFallbackSQLExceptio
>> > >>> nTranslator.translate(AbstractFallbackSQLExceptionTranslator
>> .java:73)
>> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
>> > >>> > at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTempl
>> > >>> ate.java:1094)
>> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
>> > >>> > at org.springframework.jdbc.core.JdbcTemplate.call(JdbcTemplate
>> > >>> .java:1130)
>> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
>> > >>> > at
>> > >>> > org.springframework.jdbc.core.simple.AbstractJdbcCall.execut
>> > >>> eCallInternal(AbstractJdbcCall.java:405)
>> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
>> > >>> > at
>> > >>> > org.springframework.jdbc.core.simple.AbstractJdbcCall.doExec
>> > >>> ute(AbstractJdbcCall.java:365)
>> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
>> > >>> > at
>> > >>> > org.springframework.jdbc.core.simple.SimpleJdbcCall.execute(
>> > >>> SimpleJdbcCall.java:198)
>> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
>> > >>> > at
>> > >>> > org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.ex
>> > >>> ecuteImpl(SimpleJdbcCallsHandler.java:135)
>> > >>> > [dal.jar:]
>> > >>> > at
>> > >>> > org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.ex
>> > >>> ecuteImpl(SimpleJdbcCallsHandler.java:130)
>> > >>> > [dal.jar:]
>> > >>> > at
>> > >>> > org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.ex
>> > >>> ecuteModification(SimpleJdbcCallsHandler.java:76)
>> > >>> > [dal.jar:]
>> > >>> > at
>> > >>> > org.ovirt.engine.core.dao.gluster.GlusterVolumeDaoImpl.remov
>> > >>> eAll(GlusterVolumeDaoImpl.java:233)
>> > >>> > [dal.jar:]
>> > >>> > at
>> > >>> > org.ovirt.engine.core.bll.gluster.GlusterSyncJob.removeDelet
>> > >>> edVolumes(GlusterSyncJob.java:521)
>> > >>> > [bll.jar:]
>> > >>> > at
>> > >>> > org.ovirt.engine.core.bll.gluster.GlusterSyncJob.refreshVolu
>> > >>> meData(GlusterSyncJob.java:465)
>> > >>> > [bll.jar:]
>> > >>> > at
>> > >>> > org.ovirt.engine.core.bll.gluster.GlusterSyncJob.refreshClus
>> > >>> terData(GlusterSyncJob.java:133)
>> > >>> > [bll.jar:]
>> > >>> > at
>> > >>> > org.ovirt.engine.core.bll.gluster.GlusterSyncJob.refreshLigh
>> > >>> tWeightData(GlusterSyncJob.java:111)
>> > >>> > [bll.jar:]
>> > >>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > >>> > [rt.jar:1.8.0_121]
>> > >>> > at
>> > >>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> > >>> ssorImpl.java:62)
>> > >>> > [rt.jar:1.8.0_121]
>> > >>> > at
>> > >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> > >>> thodAccessorImpl.java:43)
>> > >>> > [rt.jar:1.8.0_121]
>> > >>> > at java.lang.reflect.Method.invoke(Method.java:498)
>> [rt.jar:1.8.0_121]
>> > >>> > at
>> > >>> > org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(Jo
>> > >>> bWrapper.java:77)
>> > >>> > [scheduler.jar:]
>> > >>> > at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrap
>> > >>> per.java:51)
>> > >>> > [scheduler.jar:]
>> > >>> > at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
>> [quartz.jar:]
>> > >>> > at java.util.concurrent.Executors$RunnableAdapter.call(Executor
>> > >>> s.java:511)
>> > >>> > [rt.jar:1.8.0_121]
>> > >>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> > >>> > [rt.jar:1.8.0_121]
>> > >>> > at
>> > >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> > >>> Executor.java:1142)
>> > >>> > [rt.jar:1.8.0_121]
>> > >>> > at
>> > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> > >>> lExecutor.java:617)
>> > >>> > [rt.jar:1.8.0_121]
>> > >>> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_121]
>> > >>> > Caused by: org.postgresql.util.PSQLException: ERROR: update or
>> delete
>> > >>> on
>> > >>> > table "gluster_volumes" violates foreign key constraint
>> > >>> > "fk_storage_connection_to_glustervolume" on table
>> > >>> > "storage_server_connections"
>> > >>> > Detail: Key (id)=(3d8bfa9d-1c83-46ac-b4e9-bd317623ed2d) is still
>> > >>> referenced
>> > >>> > from table "storage_server_connections".
>> > >>> > Where: SQL statement "DELETE
>> > >>> > FROM gluster_volumes
>> > >>> > WHERE id IN (
>> > >>> > SELECT *
>> > >>> > FROM fnSplitterUuid(v_volume_ids)
>> > >>> > )"
>> > >>> > PL/pgSQL function deleteglustervolumesbyguids(character varying)
>> line
>> > >>> 3 at
>> > >>> > SQL statement
>> > >>> > at
>> > >>> > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorRespons
>> > >>> e(QueryExecutorImpl.java:2157)
>> > >>> > at
>> > >>> > org.postgresql.core.v3.QueryExecutorImpl.processResults(Quer
>> > >>> yExecutorImpl.java:1886)
>> > >>> > at
>> > >>> > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecut
>> > >>> orImpl.java:255)
>> > >>> > at
>> > >>> > org.postgresql.jdbc2.AbstractJdbc2Statement.execute(Abstract
>> > >>> Jdbc2Statement.java:555)
>> > >>> > at
>> > >>> > org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags
>> > >>> (AbstractJdbc2Statement.java:417)
>> > >>> > at
>> > >>> > org.postgresql.jdbc2.AbstractJdbc2Statement.execute(Abstract
>> > >>> Jdbc2Statement.java:410)
>> > >>> > at
>> > >>> > org.jboss.jca.adapters.jdbc.CachedPreparedStatement.execute(
>> > >>> CachedPreparedStatement.java:303)
>> > >>> > at
>> > >>> > org.jboss.jca.adapters.jdbc.WrappedPreparedStatement.execute
>> > >>> (WrappedPreparedStatement.java:442)
>> > >>> > at
>> > >>> > org.springframework.jdbc.core.JdbcTemplate$6.doInCallableSta
>> > >>> tement(JdbcTemplate.java:1133)
>> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
>> > >>> > at
>> > >>> > org.springframework.jdbc.core.JdbcTemplate$6.doInCallableSta
>> > >>> tement(JdbcTemplate.java:1130)
>> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
>> > >>> > at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTempl
>> > >>> ate.java:1078)
>> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
>> > >>> > ... 24 more
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>>
>> > >>> This is a side effect volume deletion in the gluster side. Looks
>> like
>> > >>> you have storage domains created using those volumes.
>> > >>>
>> > >>> > On Wed, Mar 1, 2017 at 9:49 AM, Arman Khalatyan <
>> arm2arm at gmail.com >
>> > >>> wrote:
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>> > Hi,
>> > >>> > I just tested power cut on the test system:
>> > >>> >
>> > >>> > Cluster with 3-Hosts each host has 4TB localdisk with zfs on it
>> > >>> /zhost/01/glu
>> > >>> > folder as a brick.
>> > >>> >
>> > >>> > Glusterfs was with replicated to 3 disks with arbiter. So far so
>> good.
>> > >>> Vm was
>> > >>> > up an running with 5oGB OS disk: dd was showing 100-70MB/s
>> performance
>> > >>> with
>> > >>> > the Vm disk.
>> > >>> > I just simulated disaster powercut: with ipmi power-cycle all 3
>> hosts
>> > >>> same
>> > >>> > time.
>> > >>> > the result is all hosts are green up and running but bricks are
>> down.
>> > >>> > in the processes I can see:
>> > >>> > ps aux | grep gluster
>> > >>> > root 16156 0.8 0.0 475360 16964 ? Ssl 08:47 0:00
>> /usr/sbin/glusterd -p
>> > >>> > /var/run/glusterd.pid --log-level INFO
>> > >>> >
>> > >>> > What happened with my volume setup??
>> > >>> > Is it possible to recover it??
>> > >>> > [root at clei21 ~]# gluster peer status
>> > >>> > Number of Peers: 2
>> > >>> >
>> > >>> > Hostname: clei22.cls
>> > >>> > Uuid: 96b52c7e-3526-44fd-af80-14a3073ebac2
>> > >>> > State: Peer in Cluster (Connected)
>> > >>> > Other names:
>> > >>> > 192.168.101.40
>> > >>> > 10.10.10.44
>> > >>> >
>> > >>> > Hostname: clei26.cls
>> > >>> > Uuid: c9fab907-5053-41a8-a1fa-d069f34e42dc
>> > >>> > State: Peer in Cluster (Connected)
>> > >>> > Other names:
>> > >>> > 10.10.10.41
>> > >>> > [root at clei21 ~]# gluster volume info
>> > >>> > No volumes present
>> > >>> > [root at clei21 ~]#
>> > >>>
>> > >>> I not sure why all volumes are getting deleted after reboot. Do you
>> see
>> > >>> any vol files under the directory /var/lib/glusterd/vols/?. Also
>> > >>> /var/log/glusterfs/cmd_history.log should have all the gluster
>> commands
>> > >>> executed.
>> > >>>
>> > >>> Regards,
>> > >>> Ramesh
>> > >>>
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>> > _______________________________________________
>> > >>> > Users mailing list
>> > >>> > Users at ovirt.org
>> > >>> > http://lists.ovirt.org/mailman/listinfo/users
>> > >>> >
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ovirt.org/pipermail/users/attachments/20170302/0f4608f7/attachment-0001.html>
More information about the Users
mailing list