----- Original Message -----
From: "Gianluca Cecchi" <gianluca.cecchi(a)gmail.com>
To: "Omer Frenkel" <ofrenkel(a)redhat.com>
Cc: "users" <users(a)ovirt.org>
Sent: Tuesday, August 27, 2013 4:57:45 PM
Subject: Re: [Users] Clarify message: "Failed to connect Host to Storage Pool
Default"
On Tue, Aug 27, 2013 at 3:14 PM, Omer Frenkel wrote:
>
>
> ----- Original Message -----
>> From: "Gianluca Cecchi" <gianluca.cecchi(a)gmail.com>
>> To: "users" <users(a)ovirt.org>
>> Sent: Monday, August 26, 2013 6:44:39 PM
>> Subject: [Users] Clarify message: "Failed to connect Host to Storage Pool
>> Default"
>>
>> Hello,
>> after a induced failure of a whole site for testing reaction and
>> restart, what would be sequence of actions to do from a physical point
>> of view and from a gui point of view after powering on the hw
>> components?
[snip]
>> I'm going to eventually send full logs, but I would like to ask if it
>> is possible to send clearer messages inside the gui, for example what
>> are the SDs that the host cannot access in case there are many of
>> them.
>
> afair, the SDs not logged in audit log since you might have 20 domains or
> more,
> and it would not look good, so full information is in the log,
> and audit log just gives you a general information what is wrong.
OK. what about recording the first one (say SD1) and putting in "audit
log" (does this term mean what displayed in web adin gui?) something
like
"Host XXX cannot access at least storage domain SD1 attached to the
Data Center Default. See logfile (which one? put the path in message)
for full log. Setting Host state to Non-Operational."
Does this mean that if only one out of 20 SDs is not able to be
reconnected all the DC is automatically impacted?
Questions:
1) suppose one out of 20 SDs is not able to be reconnected (hw failure
caused by power fault)
what are the steps to correct/acknowledge the failure and let at least
start the VMs not depending on it in the mean time one analyzes the
problem and resolves it?
if only one (or few) domains are problematic then the dc should be able to recover to up
state,
and only these domains will be in 'inactive' status.
vms that not depend on these should work ok.
this should happen automatically, no manual steps needed.
2) suppose that the particular faulty SD is the one that was the SD
Master before crash, does this mean I am forced to use some db
commands to switch it to an available SD or can I follow steps in 1)
(if there are...) and another SD will be automatically "elected" as
the new Master?
no, there is a mechanism to change the master domain to some other available domain,
assuming there is one like this.
this is the "reconstruct master" that you see in the logs.
>
> sounds like error connecting to your storage
Yes, in my simulation I have an IBM DS6800 where I can formally reach
the SAN disks from hosts but the TUR command configured in multipath
fails (and for exampe the command "fdisk -l dev/sdb" where sdb is one
disk on the san exits with error "invalid parameter" due to DS6800
incorrect configuration)
Thanks in advance.
Gianluca