[Users] Is there a way to force remove a host?

Fri Sep 28 13:42:53 UTC 2012

On Friday 28 September 2012 01:00 PM, Itamar Heim wrote:
> On 09/25/2012 01:45 PM, Shireesh Anjal wrote:
>> On Tuesday 25 September 2012 04:04 PM, Itamar Heim wrote:
>>> On 09/25/2012 12:32 PM, Shireesh Anjal wrote:
>>>> On Tuesday 25 September 2012 01:42 PM, Itamar Heim wrote:
>>>>> On 09/25/2012 09:44 AM, Shireesh Anjal wrote:
>>>>>> On Tuesday 25 September 2012 03:25 AM, Itamar Heim wrote:
>>>>>>> On 09/24/2012 11:53 PM, Jason Brooks wrote:
>>>>>>>> On Mon 24 Sep 2012 01:24:44 PM PDT, Itamar Heim wrote:
>>>>>>>>> On 09/24/2012 08:49 PM, Dominic Kaiser wrote:
>>>>>>>>>> This conversation is fine but if I want to force remove no 
>>>>>>>>>> matter
>>>>>>>>>> what I
>>>>>>>>>> should be able to from the GUI.  The nodes are no longer
>>>>>>>>>> available I
>>>>>>>>>> want to get rid of them ovirt does not let me. I can delete from
>>>>>>>>>> database but why not from the GUI?  I am sure others may run 
>>>>>>>>>> into
>>>>>>>>>> this
>>>>>>>>>> problem as well.
>>>>>>>>>
>>>>>>>>> what happens to the status of the host when you right click on 
>>>>>>>>> the
>>>>>>>>> host and specify you confirm it was shutdown?
>>>>>>>>
>>>>>>>> I'm having this same issue. Confirming the host is shut down 
>>>>>>>> doesn't
>>>>>>>> make a difference.
>>>>>>>>
>>>>>>>> I'm seeing lots of "Failed to GlusterHostRemoveVDS, error =
>>>>>>>> Unexpected
>>>>>>>> exception" errors in my engine log that seem to correspond w/ the
>>>>>>>> failed
>>>>>>>> remove host attempts.
>>>>>>>
>>>>>>> is cluster defined as gluster as well?
>>>>>>> what is the status of the host after you confirm shutdown?
>>>>>>> any error on log on this specific command?
>>>>>>>
>>>>>>> shireesh - not sure if relevant to this flow, but need to make sure
>>>>>>> removing a host from the engine isn't blocked on gluster needing to
>>>>>>> remove it from the gluster cluster if the host is not available any
>>>>>>> more, or last host in gluster cluster?
>>>>>>
>>>>>> Yes, currently the system tries the 'gluster peer detach <hostname>'
>>>>>> command when trying to remove a server, which fails if the server is
>>>>>> unavailable. This can be enhanced to show the error to user and then
>>>>>> allow 'force remove' which can use the 'gluster peer detach 
>>>>>> <hostname>
>>>>>> *force*' command that forcefully removes the server from the 
>>>>>> cluster,
>>>>>> even if it is not available or has bricks on it.
>>>>>
>>>>> what if it is the last server in the cluster?
>>>>> what if there is another server in the cluster but no 
>>>>> communication to
>>>>> it as well?
>>>>
>>>> A quick look at code tells me that in case of virt, we don't allow
>>>> removing a host if it has  VM(s) in it (even if the host is currently
>>>> not available) i.e. vdsDynamic.getvm_count() > 0. Please correct me if
>>>> I'm wrong. If that's correct, and if we want to keep it consistent for
>>>> gluster as well, then we should not allow removing a host if it has
>>>> gluster volume(s) in it. This is how it behaves in case of 'last 
>>>> server
>>>> in cluster' today.
>>>
>>> true, but user can fence the host or confirm shutdown manually, which
>>> will release all resources on it, then it can be removed.
>>
>> I see. In that case, we can just remove the validation and allow
>> removing the host irrespective of whether it contains volume(s) or not.
>> Since it's the only host in the cluster, this won't cause any harm.
>>
>>>
>>>>
>>>> In case of no up server available in the cluster, we can show the 
>>>> error
>>>> and provide a 'force' option that will just remove it from the 
>>>> engine DB
>>>> and will not attempt gluster peer detach.
>>>
>>> something like that.
>>> i assume the gluster storage will handle this somehow?
>>
>> What would you expect gluster storage to do in such a case? If all
>> servers are not accessible to a gluster client, the client can't
>> read/write from/to volumes of the cluster. Cluster management operations
>> in gluster (like removing a server from the cluster) are always done
>> from one of the servers of the cluster. So if no servers are available,
>> nothing can be done. Vijay can shed more light on this if required.
>>
>> Assuming that some of the servers come up at a later point in time, they
>> would continue to consider this (removed from engine) server as one of
>> the peers. This would create an inconsistency between actual gluster
>> configuration and the engine DB. This, however can be handled once we
>> have a feature to sync configuration with gluster (this is WIP). This
>> feature will automatically identify such servers, and allow the user to
>> either import them to engine, or remove (peer detach) from the gluster
>> cluster.
>
> why is that an issue though - worst case the server wouldn't appear in 
> the admin console[1] if it is alive, and if it is dead, it is 
> something the gluster cluster is supposed to deal with?

It's just that I think it's not good to have the management console 
being out of sync with gluster configuration. However, as I said, we 
will soon have a mechanism to handle such cases.

Also, we're thinking of a simpler approach by just providing a 'force 
remove' checkbox on the remove host confirmation dialog (only if the 
host belongs to a gluster enabled cluster). User can then tick this 
checkbox when normal remove flow doesn't work in above discussed scenarios.

>
> [1] though i assume the admin will continue to alert on its presence 
> for being out-of-sync on list of servers in cluster.

Yes - this feature is WIP.

>
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Dominic
>>>>>>>>>>
>>>>>>>>>> On Sep 22, 2012 4:19 PM, "Eli Mesika" <emesika at redhat.com
>>>>>>>>>> <mailto:emesika at redhat.com>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     ----- Original Message -----
>>>>>>>>>>      > From: "Douglas Landgraf" <dougsland at redhat.com
>>>>>>>>>>     <mailto:dougsland at redhat.com>>
>>>>>>>>>>      > To: "Dominic Kaiser" <dominic at bostonvineyard.org
>>>>>>>>>> <mailto:dominic at bostonvineyard.org>>
>>>>>>>>>>      > Cc: "Eli Mesika" <emesika at redhat.com
>>>>>>>>>>     <mailto:emesika at redhat.com>>, users at ovirt.org
>>>>>>>>>>     <mailto:users at ovirt.org>, "Robert Middleswarth"
>>>>>>>>>>     <robert at middleswarth.net <mailto:robert at middleswarth.net>>
>>>>>>>>>>      > Sent: Friday, September 21, 2012 8:12:27 PM
>>>>>>>>>>      > Subject: Re: [Users] Is there a way to force remove a 
>>>>>>>>>> host?
>>>>>>>>>>      >
>>>>>>>>>>      > Hi Dominic,
>>>>>>>>>>      >
>>>>>>>>>>      > On 09/20/2012 12:11 PM, Dominic Kaiser wrote:
>>>>>>>>>>      > > Sorry I did not explain.
>>>>>>>>>>      > >
>>>>>>>>>>      > > I had tried to remove the host and had not luck
>>>>>>>>>> troubleshooting it.
>>>>>>>>>>      > >  I
>>>>>>>>>>      > > then had removed it and used it for a storage unit
>>>>>>>>>> reinstalling
>>>>>>>>>>      > > fedora
>>>>>>>>>>      > > 17.  I foolishly thought that I could just remove the
>>>>>>>>>> host
>>>>>>>>>>      > > manually.
>>>>>>>>>>      > >  It physically is not there. (My fault I know)  Is
>>>>>>>>>> there a
>>>>>>>>>> way that
>>>>>>>>>>      > > you know of to remove a host brute force.
>>>>>>>>>>      > >
>>>>>>>>>>      > > dk
>>>>>>>>>>      >
>>>>>>>>>>      > Fell free to try the below script (not part of official
>>>>>>>>>> project) for
>>>>>>>>>>      > brute force:
>>>>>>>>>>      >
>>>>>>>>>>      > (from the engine side)
>>>>>>>>>>      > # yum install python-psycopg2 -y
>>>>>>>>>>      > # wget
>>>>>>>>>>      >
>>>>>>>>>>
>>>>>>>>>> https://raw.github.com/dougsland/misc-rhev/master/engine_force_remove_Host.py 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>      > # (edit the file and change the db password)
>>>>>>>>>>      > # python ./engine_force_remove_Host.py
>>>>>>>>>>
>>>>>>>>>>     Hi , had looked in the Python script you had provided:
>>>>>>>>>>     First, I must say that handling the database directly may
>>>>>>>>>> leave DB
>>>>>>>>>>     in inconsistent state, therefore, if there is no other
>>>>>>>>>> option, the
>>>>>>>>>>     database should be backed up prior to this operation.
>>>>>>>>>>     In addition, I do not like the execution of the SQL
>>>>>>>>>> statements in
>>>>>>>>>>     the script.
>>>>>>>>>>     There is a SP called DeleteVds(v_vds_id UUID) and you
>>>>>>>>>> should use
>>>>>>>>>>     that since it encapsulates all details.
>>>>>>>>>>     For example, your script does not handle permission
>>>>>>>>>> clean-up as
>>>>>>>>>> the
>>>>>>>>>>     SP does and therefore leaves garbage in the database.
>>>>>>>>>>     In addition, a failure in your script may leave database in
>>>>>>>>>>     inconsistent state while the SP is executed in one
>>>>>>>>>> transaction and
>>>>>>>>>>     will leave DB consistent.
>>>>>>>>>>     So, in short I would prefer in this case that the 
>>>>>>>>>> relevant SP
>>>>>>>>>> will
>>>>>>>>>>     do the clean-up since this is the one that is used by the
>>>>>>>>>> code and
>>>>>>>>>>     that insures (at least I hope so) , that all related 
>>>>>>>>>> entities
>>>>>>>>>> are
>>>>>>>>>>     removed as well.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>      >
>>>>>>>>>>      > Thanks
>>>>>>>>>>      >
>>>>>>>>>>      > --
>>>>>>>>>>      > Cheers
>>>>>>>>>>      > Douglas
>>>>>>>>>>      >
>>>>>>>>>>      >
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Users mailing list
>>>>>>>>>> Users at ovirt.org
>>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list
>>>>>>>>> Users at ovirt.org
>>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>>
>>>>>>>> @jasonbrooks
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>