Re: [ovirt-users] SPM error

What did you do to try to fix the sanlock? Anything is better than nothing at this point ----- Original Message ----- From: "Ted Miller" <tmiller@hcjb.org> To: "Maurice James" <mjames@media-node.com> Sent: Friday, April 11, 2014 7:27:24 PM Subject: Re: [ovirt-users] SPM error I did receive some help on one stage of rebuilding my sanlock, but there were too many other things wrong to get it started again. Only advice I have is -- look at your sanlock logs, and see if you can find anything there that is helpful. On 4/11/2014 7:23 PM, Maurice James wrote:
Nooooooo.
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 7:08 PM (GMT-05:00) To: Maurice James <mjames@media-node.com> Subject: Re: [ovirt-users] SPM error
I didn't, really. I did something wrong along the way, and ended up having to rebuild the engine and hosts. (My problems were due to a glusterfs split-brain.) Ted Miller
On 4/11/2014 6:03 PM, Maurice James wrote:
How did you fix it?
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 6:00 PM (GMT-05:00) To: users@ovirt.org Subject: Re: [ovirt-users] SPM error
On 4/11/2014 2:05 PM, Maurice James wrote:
I have an error trying to bring the master DC back online. After several reboots, no luck. I took the other cluster members offline to try to troubleshoot. The remaining host is constantly in contention with itself for SPM
ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-40) [38d400ea] IrsBroker::Failed::GetStoragePoolInfoVDS due to: IrsSpmStartFailedException: IRSGenericException: IRSErrorException: SpmStart failed
I'm no expert, but the last time I beat my head on that rock, something was wrong with my sanlock storage. YMMV Ted Miller Elkhart, IN, USA
-- "He is no fool who gives what he cannot keep, to gain what he cannot lose." - - Jim Elliot For more information about Jim Elliot and his unusual life, see http://www.christianliteratureandliving.com/march2003/carolyn.html. Ted Miller Design Engineer HCJB Global Technology Center, a ministry of Reach Beyond 2830 South 17th St Elkhart, IN 46517 574--970-4272 my desk 574--970-4252 receptionist

On 04/12/2014 03:40 PM, Maurice James wrote:
What did you do to try to fix the sanlock? Anything is better than nothing at this point
----- Original Message ----- From: "Ted Miller" <tmiller@hcjb.org> To: "Maurice James" <mjames@media-node.com> Sent: Friday, April 11, 2014 7:27:24 PM Subject: Re: [ovirt-users] SPM error
I did receive some help on one stage of rebuilding my sanlock, but there were too many other things wrong to get it started again. Only advice I have is -- look at your sanlock logs, and see if you can find anything there that is helpful.
On 4/11/2014 7:23 PM, Maurice James wrote:
Nooooooo.
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 7:08 PM (GMT-05:00) To: Maurice James <mjames@media-node.com> Subject: Re: [ovirt-users] SPM error
I didn't, really. I did something wrong along the way, and ended up having to rebuild the engine and hosts. (My problems were due to a glusterfs split-brain.) Ted Miller
On 4/11/2014 6:03 PM, Maurice James wrote:
How did you fix it?
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 6:00 PM (GMT-05:00) To: users@ovirt.org Subject: Re: [ovirt-users] SPM error
On 4/11/2014 2:05 PM, Maurice James wrote:
I have an error trying to bring the master DC back online. After several reboots, no luck. I took the other cluster members offline to try to troubleshoot. The remaining host is constantly in contention with itself for SPM
ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-40) [38d400ea] IrsBroker::Failed::GetStoragePoolInfoVDS due to: IrsSpmStartFailedException: IRSGenericException: IRSErrorException: SpmStart failed
I'm no expert, but the last time I beat my head on that rock, something was wrong with my sanlock storage. YMMV Ted Miller Elkhart, IN, USA
Maurice - which type of storage is this?

Its is NFS storage On Sat, 2014-04-12 at 19:23 +0300, Itamar Heim wrote:
On 04/12/2014 03:40 PM, Maurice James wrote:
What did you do to try to fix the sanlock? Anything is better than nothing at this point
----- Original Message ----- From: "Ted Miller" <tmiller@hcjb.org> To: "Maurice James" <mjames@media-node.com> Sent: Friday, April 11, 2014 7:27:24 PM Subject: Re: [ovirt-users] SPM error
I did receive some help on one stage of rebuilding my sanlock, but there were too many other things wrong to get it started again. Only advice I have is -- look at your sanlock logs, and see if you can find anything there that is helpful.
On 4/11/2014 7:23 PM, Maurice James wrote:
Nooooooo.
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 7:08 PM (GMT-05:00) To: Maurice James <mjames@media-node.com> Subject: Re: [ovirt-users] SPM error
I didn't, really. I did something wrong along the way, and ended up having to rebuild the engine and hosts. (My problems were due to a glusterfs split-brain.) Ted Miller
On 4/11/2014 6:03 PM, Maurice James wrote:
How did you fix it?
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 6:00 PM (GMT-05:00) To: users@ovirt.org Subject: Re: [ovirt-users] SPM error
On 4/11/2014 2:05 PM, Maurice James wrote:
I have an error trying to bring the master DC back online. After several reboots, no luck. I took the other cluster members offline to try to troubleshoot. The remaining host is constantly in contention with itself for SPM
ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-40) [38d400ea] IrsBroker::Failed::GetStoragePoolInfoVDS due to: IrsSpmStartFailedException: IRSGenericException: IRSErrorException: SpmStart failed
I'm no expert, but the last time I beat my head on that rock, something was wrong with my sanlock storage. YMMV Ted Miller Elkhart, IN, USA
Maurice - which type of storage is this?

I uploaded logs to this bug report https://bugzilla.redhat.com/show_bug.cgi?id=1086951 On Sat, 2014-04-12 at 19:23 +0300, Itamar Heim wrote:
On 04/12/2014 03:40 PM, Maurice James wrote:
What did you do to try to fix the sanlock? Anything is better than nothing at this point
----- Original Message ----- From: "Ted Miller" <tmiller@hcjb.org> To: "Maurice James" <mjames@media-node.com> Sent: Friday, April 11, 2014 7:27:24 PM Subject: Re: [ovirt-users] SPM error
I did receive some help on one stage of rebuilding my sanlock, but there were too many other things wrong to get it started again. Only advice I have is -- look at your sanlock logs, and see if you can find anything there that is helpful.
On 4/11/2014 7:23 PM, Maurice James wrote:
Nooooooo.
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 7:08 PM (GMT-05:00) To: Maurice James <mjames@media-node.com> Subject: Re: [ovirt-users] SPM error
I didn't, really. I did something wrong along the way, and ended up having to rebuild the engine and hosts. (My problems were due to a glusterfs split-brain.) Ted Miller
On 4/11/2014 6:03 PM, Maurice James wrote:
How did you fix it?
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 6:00 PM (GMT-05:00) To: users@ovirt.org Subject: Re: [ovirt-users] SPM error
On 4/11/2014 2:05 PM, Maurice James wrote:
I have an error trying to bring the master DC back online. After several reboots, no luck. I took the other cluster members offline to try to troubleshoot. The remaining host is constantly in contention with itself for SPM
ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-40) [38d400ea] IrsBroker::Failed::GetStoragePoolInfoVDS due to: IrsSpmStartFailedException: IRSGenericException: IRSErrorException: SpmStart failed
I'm no expert, but the last time I beat my head on that rock, something was wrong with my sanlock storage. YMMV Ted Miller Elkhart, IN, USA
Maurice - which type of storage is this?

Were you able to find out anything? Is there anything that I can check in the meanwhile? On Sat, 2014-04-12 at 19:23 +0300, Itamar Heim wrote:
On 04/12/2014 03:40 PM, Maurice James wrote:
What did you do to try to fix the sanlock? Anything is better than nothing at this point
----- Original Message ----- From: "Ted Miller" <tmiller@hcjb.org> To: "Maurice James" <mjames@media-node.com> Sent: Friday, April 11, 2014 7:27:24 PM Subject: Re: [ovirt-users] SPM error
I did receive some help on one stage of rebuilding my sanlock, but there were too many other things wrong to get it started again. Only advice I have is -- look at your sanlock logs, and see if you can find anything there that is helpful.
On 4/11/2014 7:23 PM, Maurice James wrote:
Nooooooo.
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 7:08 PM (GMT-05:00) To: Maurice James <mjames@media-node.com> Subject: Re: [ovirt-users] SPM error
I didn't, really. I did something wrong along the way, and ended up having to rebuild the engine and hosts. (My problems were due to a glusterfs split-brain.) Ted Miller
On 4/11/2014 6:03 PM, Maurice James wrote:
How did you fix it?
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 6:00 PM (GMT-05:00) To: users@ovirt.org Subject: Re: [ovirt-users] SPM error
On 4/11/2014 2:05 PM, Maurice James wrote:
I have an error trying to bring the master DC back online. After several reboots, no luck. I took the other cluster members offline to try to troubleshoot. The remaining host is constantly in contention with itself for SPM
ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-40) [38d400ea] IrsBroker::Failed::GetStoragePoolInfoVDS due to: IrsSpmStartFailedException: IRSGenericException: IRSErrorException: SpmStart failed
I'm no expert, but the last time I beat my head on that rock, something was wrong with my sanlock storage. YMMV Ted Miller Elkhart, IN, USA
Maurice - which type of storage is this?

----- Original Message -----
From: "Maurice \"Moe\" James" <mjames@media-node.com> To: "Itamar Heim" <iheim@redhat.com> Cc: users@ovirt.org Sent: Sunday, April 13, 2014 2:28:45 AM Subject: Re: [ovirt-users] SPM error
Were you able to find out anything? Is there anything that I can check in the meanwhile?
Hi Muarice, can you please attach the ovirt engine/vdsm logs? thanks, Liron
On Sat, 2014-04-12 at 19:23 +0300, Itamar Heim wrote:
On 04/12/2014 03:40 PM, Maurice James wrote:
What did you do to try to fix the sanlock? Anything is better than nothing at this point
----- Original Message ----- From: "Ted Miller" <tmiller@hcjb.org> To: "Maurice James" <mjames@media-node.com> Sent: Friday, April 11, 2014 7:27:24 PM Subject: Re: [ovirt-users] SPM error
I did receive some help on one stage of rebuilding my sanlock, but there were too many other things wrong to get it started again. Only advice I have is -- look at your sanlock logs, and see if you can find anything there that is helpful.
On 4/11/2014 7:23 PM, Maurice James wrote:
Nooooooo.
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 7:08 PM (GMT-05:00) To: Maurice James <mjames@media-node.com> Subject: Re: [ovirt-users] SPM error
I didn't, really. I did something wrong along the way, and ended up having to rebuild the engine and hosts. (My problems were due to a glusterfs split-brain.) Ted Miller
On 4/11/2014 6:03 PM, Maurice James wrote:
How did you fix it?
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 6:00 PM (GMT-05:00) To: users@ovirt.org Subject: Re: [ovirt-users] SPM error
On 4/11/2014 2:05 PM, Maurice James wrote:
I have an error trying to bring the master DC back online. After several reboots, no luck. I took the other cluster members offline to try to troubleshoot. The remaining host is constantly in contention with itself for SPM
ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-40) [38d400ea] IrsBroker::Failed::GetStoragePoolInfoVDS due to: IrsSpmStartFailedException: IRSGenericException: IRSErrorException: SpmStart failed
I'm no expert, but the last time I beat my head on that rock, something was wrong with my sanlock storage. YMMV Ted Miller Elkhart, IN, USA
Maurice - which type of storage is this?
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Log files are here On Mon, 2014-04-14 at 17:11 -0400, Liron Aravot wrote:
----- Original Message -----
From: "Maurice \"Moe\" James" <mjames@media-node.com> To: "Itamar Heim" <iheim@redhat.com> Cc: users@ovirt.org Sent: Sunday, April 13, 2014 2:28:45 AM Subject: Re: [ovirt-users] SPM error
Were you able to find out anything? Is there anything that I can check in the meanwhile?
Hi Muarice, can you please attach the ovirt engine/vdsm logs? thanks, Liron
On Sat, 2014-04-12 at 19:23 +0300, Itamar Heim wrote:
On 04/12/2014 03:40 PM, Maurice James wrote:
What did you do to try to fix the sanlock? Anything is better than nothing at this point
----- Original Message ----- From: "Ted Miller" <tmiller@hcjb.org> To: "Maurice James" <mjames@media-node.com> Sent: Friday, April 11, 2014 7:27:24 PM Subject: Re: [ovirt-users] SPM error
I did receive some help on one stage of rebuilding my sanlock, but there were too many other things wrong to get it started again. Only advice I have is -- look at your sanlock logs, and see if you can find anything there that is helpful.
On 4/11/2014 7:23 PM, Maurice James wrote:
Nooooooo.
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 7:08 PM (GMT-05:00) To: Maurice James <mjames@media-node.com> Subject: Re: [ovirt-users] SPM error
I didn't, really. I did something wrong along the way, and ended up having to rebuild the engine and hosts. (My problems were due to a glusterfs split-brain.) Ted Miller
On 4/11/2014 6:03 PM, Maurice James wrote:
How did you fix it?
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 6:00 PM (GMT-05:00) To: users@ovirt.org Subject: Re: [ovirt-users] SPM error
On 4/11/2014 2:05 PM, Maurice James wrote: > I have an error trying to bring the master DC back online. After > several > reboots, no luck. I took the other cluster members offline to try to > troubleshoot. The remaining host is constantly in contention with > itself > for SPM > > > ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] > (DefaultQuartzScheduler_Worker-40) [38d400ea] > IrsBroker::Failed::GetStoragePoolInfoVDS due to: > IrsSpmStartFailedException: IRSGenericException: IRSErrorException: > SpmStart failed > I'm no expert, but the last time I beat my head on that rock, something was wrong with my sanlock storage. YMMV Ted Miller Elkhart, IN, USA
Maurice - which type of storage is this?
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Sorry forgot to paste https://bugzilla.redhat.com/show_bug.cgi?id=1086951 On Mon, 2014-04-14 at 17:11 -0400, Liron Aravot wrote:
----- Original Message -----
From: "Maurice \"Moe\" James" <mjames@media-node.com> To: "Itamar Heim" <iheim@redhat.com> Cc: users@ovirt.org Sent: Sunday, April 13, 2014 2:28:45 AM Subject: Re: [ovirt-users] SPM error
Were you able to find out anything? Is there anything that I can check in the meanwhile?
Hi Muarice, can you please attach the ovirt engine/vdsm logs? thanks, Liron
On Sat, 2014-04-12 at 19:23 +0300, Itamar Heim wrote:
On 04/12/2014 03:40 PM, Maurice James wrote:
What did you do to try to fix the sanlock? Anything is better than nothing at this point
----- Original Message ----- From: "Ted Miller" <tmiller@hcjb.org> To: "Maurice James" <mjames@media-node.com> Sent: Friday, April 11, 2014 7:27:24 PM Subject: Re: [ovirt-users] SPM error
I did receive some help on one stage of rebuilding my sanlock, but there were too many other things wrong to get it started again. Only advice I have is -- look at your sanlock logs, and see if you can find anything there that is helpful.
On 4/11/2014 7:23 PM, Maurice James wrote:
Nooooooo.
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 7:08 PM (GMT-05:00) To: Maurice James <mjames@media-node.com> Subject: Re: [ovirt-users] SPM error
I didn't, really. I did something wrong along the way, and ended up having to rebuild the engine and hosts. (My problems were due to a glusterfs split-brain.) Ted Miller
On 4/11/2014 6:03 PM, Maurice James wrote:
How did you fix it?
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 6:00 PM (GMT-05:00) To: users@ovirt.org Subject: Re: [ovirt-users] SPM error
On 4/11/2014 2:05 PM, Maurice James wrote: > I have an error trying to bring the master DC back online. After > several > reboots, no luck. I took the other cluster members offline to try to > troubleshoot. The remaining host is constantly in contention with > itself > for SPM > > > ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] > (DefaultQuartzScheduler_Worker-40) [38d400ea] > IrsBroker::Failed::GetStoragePoolInfoVDS due to: > IrsSpmStartFailedException: IRSGenericException: IRSErrorException: > SpmStart failed > I'm no expert, but the last time I beat my head on that rock, something was wrong with my sanlock storage. YMMV Ted Miller Elkhart, IN, USA
Maurice - which type of storage is this?
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

----- Original Message -----
From: "Maurice \"Moe\" James" <mjames@media-node.com> To: "Liron Aravot" <laravot@redhat.com> Cc: "Itamar Heim" <iheim@redhat.com>, users@ovirt.org Sent: Tuesday, April 15, 2014 3:14:16 AM Subject: Re: [ovirt-users] SPM error
Sorry forgot to paste https://bugzilla.redhat.com/show_bug.cgi?id=1086951
Hi Maurice, the issue is that the host doesn't have access to all the storage domains which causes to the spm start process to fail. There's a bug open for that issue - https://bugzilla.redhat.com/show_bug.cgi?id=1072900 so seems we'll be able to close the one you opened as a duplicate but let's wait with that till your issue is solved.
From looking in the logs, it seems like that host have problem accessing two storage domains - 3406665e-4adc-4fd4-aa1e-037547b29adb f3b51811-4a7f-43af-8633-322b3db23c48
Can you verify that the host can access those domains? from the log it seems like the nfs paths for those are: shtistg01.suprtekstic.com:/storage/infrastructure shtistg01.suprtekstic.com:/storage/exports log snippet: 1. Thread-14::DEBUG::2014-04-11 22:54:44,331::mount::226::Storage.Misc.excCmd::(_runcmd) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retra ns=6,nfsvers=3 ashtistg01.suprtekstic.com:/storage/exports /rhev/data-center/mnt/ashtistg01.suprtekstic.com:_storage_exports' (cwd None) Thread-14::ERROR::2014-04-11 22:55:36,659::storageServer::209::StorageServer.MountConnection::(connect) Mount failed: (32, ';mount.nfs: Failed to resolve serv er ashtistg01.suprtekstic.com: Name or service not known\n') Traceback (most recent call last): File "/usr/share/vdsm/storage/storageServer.py", line 207, in connect self._mount.mount(self.options, self._vfsType) File "/usr/share/vdsm/storage/mount.py", line 222, in mount return self._runcmd(cmd, timeout) File "/usr/share/vdsm/storage/mount.py", line 238, in _runcmd raise MountError(rc, ";".join((out, err))) MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n') Thread-14::ERROR::2014-04-11 22:55:36,705::hsm::2379::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2376, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 320, in connect return self._mountCon.connect() File "/usr/share/vdsm/storage/storageServer.py", line 215, in connect raise e MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n') 2. Thread-14::ERROR::2014-04-11 22:56:29,307::storageServer::209::StorageServer.MountConnection::(connect) Mount failed: (32, ';mount.nfs: Failed to resolve serv er ashtistg01.suprtekstic.com: Name or service not known\n') Traceback (most recent call last): File "/usr/share/vdsm/storage/storageServer.py", line 207, in connect self._mount.mount(self.options, self._vfsType) File "/usr/share/vdsm/storage/mount.py", line 222, in mount return self._runcmd(cmd, timeout) File "/usr/share/vdsm/storage/mount.py", line 238, in _runcmd raise MountError(rc, ";".join((out, err))) MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n') Thread-14::ERROR::2014-04-11 22:56:29,309::hsm::2379::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2376, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 320, in connect return self._mountCon.connect() File "/usr/share/vdsm/storage/storageServer.py", line 215, in connect raise e MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n') Regardless of that, there are sanlock errors over the log when trying to acquire host-id over the log. I'd start with check the connectivity issues to the storage domains above, later on you can attach check that sanlock is running and operational and/or attach the sanlock logs.
On Mon, 2014-04-14 at 17:11 -0400, Liron Aravot wrote:
----- Original Message -----
From: "Maurice \"Moe\" James" <mjames@media-node.com> To: "Itamar Heim" <iheim@redhat.com> Cc: users@ovirt.org Sent: Sunday, April 13, 2014 2:28:45 AM Subject: Re: [ovirt-users] SPM error
Were you able to find out anything? Is there anything that I can check in the meanwhile?
Hi Muarice, can you please attach the ovirt engine/vdsm logs? thanks, Liron
On Sat, 2014-04-12 at 19:23 +0300, Itamar Heim wrote:
On 04/12/2014 03:40 PM, Maurice James wrote:
What did you do to try to fix the sanlock? Anything is better than nothing at this point
----- Original Message ----- From: "Ted Miller" <tmiller@hcjb.org> To: "Maurice James" <mjames@media-node.com> Sent: Friday, April 11, 2014 7:27:24 PM Subject: Re: [ovirt-users] SPM error
I did receive some help on one stage of rebuilding my sanlock, but there were too many other things wrong to get it started again. Only advice I have is -- look at your sanlock logs, and see if you can find anything there that is helpful.
On 4/11/2014 7:23 PM, Maurice James wrote:
Nooooooo.
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 7:08 PM (GMT-05:00) To: Maurice James <mjames@media-node.com> Subject: Re: [ovirt-users] SPM error
I didn't, really. I did something wrong along the way, and ended up having to rebuild the engine and hosts. (My problems were due to a glusterfs split-brain.) Ted Miller
On 4/11/2014 6:03 PM, Maurice James wrote: > How did you fix it? > > > Sent from my Galaxy S®III > > -------- Original message -------- > From: Ted Miller <tmiller@hcjb.org> > Date:04/11/2014 6:00 PM (GMT-05:00) > To: users@ovirt.org > Subject: Re: [ovirt-users] SPM error > > > > On 4/11/2014 2:05 PM, Maurice James wrote: >> I have an error trying to bring the master DC back online. After >> several >> reboots, no luck. I took the other cluster members offline to try >> to >> troubleshoot. The remaining host is constantly in contention with >> itself >> for SPM >> >> >> ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] >> (DefaultQuartzScheduler_Worker-40) [38d400ea] >> IrsBroker::Failed::GetStoragePoolInfoVDS due to: >> IrsSpmStartFailedException: IRSGenericException: >> IRSErrorException: >> SpmStart failed >> > I'm no expert, but the last time I beat my head on that rock, > something > was > wrong with my sanlock storage. YMMV > Ted Miller > Elkhart, IN, USA >
Maurice - which type of storage is this?
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Hi Maurice, any updates on the above? thanks, Liron ----- Original Message -----
From: "Liron Aravot" <laravot@redhat.com> To: "Maurice \"Moe\" James" <mjames@media-node.com> Cc: users@ovirt.org Sent: Tuesday, April 15, 2014 11:53:40 AM Subject: Re: [ovirt-users] SPM error
----- Original Message -----
From: "Maurice \"Moe\" James" <mjames@media-node.com> To: "Liron Aravot" <laravot@redhat.com> Cc: "Itamar Heim" <iheim@redhat.com>, users@ovirt.org Sent: Tuesday, April 15, 2014 3:14:16 AM Subject: Re: [ovirt-users] SPM error
Sorry forgot to paste https://bugzilla.redhat.com/show_bug.cgi?id=1086951
Hi Maurice, the issue is that the host doesn't have access to all the storage domains which causes to the spm start process to fail. There's a bug open for that issue - https://bugzilla.redhat.com/show_bug.cgi?id=1072900 so seems we'll be able to close the one you opened as a duplicate but let's wait with that till your issue is solved. From looking in the logs, it seems like that host have problem accessing two storage domains - 3406665e-4adc-4fd4-aa1e-037547b29adb f3b51811-4a7f-43af-8633-322b3db23c48
Can you verify that the host can access those domains? from the log it seems like the nfs paths for those are: shtistg01.suprtekstic.com:/storage/infrastructure shtistg01.suprtekstic.com:/storage/exports
log snippet: 1. Thread-14::DEBUG::2014-04-11 22:54:44,331::mount::226::Storage.Misc.excCmd::(_runcmd) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retra ns=6,nfsvers=3 ashtistg01.suprtekstic.com:/storage/exports /rhev/data-center/mnt/ashtistg01.suprtekstic.com:_storage_exports' (cwd None) Thread-14::ERROR::2014-04-11 22:55:36,659::storageServer::209::StorageServer.MountConnection::(connect) Mount failed: (32, ';mount.nfs: Failed to resolve serv er ashtistg01.suprtekstic.com: Name or service not known\n') Traceback (most recent call last): File "/usr/share/vdsm/storage/storageServer.py", line 207, in connect self._mount.mount(self.options, self._vfsType) File "/usr/share/vdsm/storage/mount.py", line 222, in mount return self._runcmd(cmd, timeout) File "/usr/share/vdsm/storage/mount.py", line 238, in _runcmd raise MountError(rc, ";".join((out, err))) MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n') Thread-14::ERROR::2014-04-11 22:55:36,705::hsm::2379::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2376, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 320, in connect return self._mountCon.connect() File "/usr/share/vdsm/storage/storageServer.py", line 215, in connect raise e MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n')
2. Thread-14::ERROR::2014-04-11 22:56:29,307::storageServer::209::StorageServer.MountConnection::(connect) Mount failed: (32, ';mount.nfs: Failed to resolve serv er ashtistg01.suprtekstic.com: Name or service not known\n') Traceback (most recent call last): File "/usr/share/vdsm/storage/storageServer.py", line 207, in connect self._mount.mount(self.options, self._vfsType) File "/usr/share/vdsm/storage/mount.py", line 222, in mount return self._runcmd(cmd, timeout) File "/usr/share/vdsm/storage/mount.py", line 238, in _runcmd raise MountError(rc, ";".join((out, err))) MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n') Thread-14::ERROR::2014-04-11 22:56:29,309::hsm::2379::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2376, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 320, in connect return self._mountCon.connect() File "/usr/share/vdsm/storage/storageServer.py", line 215, in connect raise e MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n')
Regardless of that, there are sanlock errors over the log when trying to acquire host-id over the log. I'd start with check the connectivity issues to the storage domains above, later on you can attach check that sanlock is running and operational and/or attach the sanlock logs.
On Mon, 2014-04-14 at 17:11 -0400, Liron Aravot wrote:
----- Original Message -----
From: "Maurice \"Moe\" James" <mjames@media-node.com> To: "Itamar Heim" <iheim@redhat.com> Cc: users@ovirt.org Sent: Sunday, April 13, 2014 2:28:45 AM Subject: Re: [ovirt-users] SPM error
Were you able to find out anything? Is there anything that I can check in the meanwhile?
Hi Muarice, can you please attach the ovirt engine/vdsm logs? thanks, Liron
On Sat, 2014-04-12 at 19:23 +0300, Itamar Heim wrote:
On 04/12/2014 03:40 PM, Maurice James wrote:
What did you do to try to fix the sanlock? Anything is better than nothing at this point
----- Original Message ----- From: "Ted Miller" <tmiller@hcjb.org> To: "Maurice James" <mjames@media-node.com> Sent: Friday, April 11, 2014 7:27:24 PM Subject: Re: [ovirt-users] SPM error
I did receive some help on one stage of rebuilding my sanlock, but there were too many other things wrong to get it started again. Only advice I have is -- look at your sanlock logs, and see if you can find anything there that is helpful.
On 4/11/2014 7:23 PM, Maurice James wrote: > Nooooooo. > > > Sent from my Galaxy S®III > > -------- Original message -------- > From: Ted Miller <tmiller@hcjb.org> > Date:04/11/2014 7:08 PM (GMT-05:00) > To: Maurice James <mjames@media-node.com> > Subject: Re: [ovirt-users] SPM error > > > > I didn't, really. I did something wrong along the way, and ended > up > having > to rebuild the engine and hosts. (My problems were due to a > glusterfs > split-brain.) > Ted Miller > > On 4/11/2014 6:03 PM, Maurice James wrote: >> How did you fix it? >> >> >> Sent from my Galaxy S®III >> >> -------- Original message -------- >> From: Ted Miller <tmiller@hcjb.org> >> Date:04/11/2014 6:00 PM (GMT-05:00) >> To: users@ovirt.org >> Subject: Re: [ovirt-users] SPM error >> >> >> >> On 4/11/2014 2:05 PM, Maurice James wrote: >>> I have an error trying to bring the master DC back online. After >>> several >>> reboots, no luck. I took the other cluster members offline to >>> try >>> to >>> troubleshoot. The remaining host is constantly in contention >>> with >>> itself >>> for SPM >>> >>> >>> ERROR >>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] >>> (DefaultQuartzScheduler_Worker-40) [38d400ea] >>> IrsBroker::Failed::GetStoragePoolInfoVDS due to: >>> IrsSpmStartFailedException: IRSGenericException: >>> IRSErrorException: >>> SpmStart failed >>> >> I'm no expert, but the last time I beat my head on that rock, >> something >> was >> wrong with my sanlock storage. YMMV >> Ted Miller >> Elkhart, IN, USA >>
Maurice - which type of storage is this?
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

After a few sleepless nights. I went through my entire system again and I found an interface on one of my hosts that had already been removed from the UI. Even after multiple "service network restart" it would still show up when I ran "ip addr". I had to end up forcefully removing it with rm -rf /etc/sysconfig/network-scripts/ifcfg-<interface>. After that I rebooted the node and the SPM came out of contention. I cant make sense of it but it worked ----- Original Message ----- From: "Liron Aravot" <laravot@redhat.com> To: "Maurice \"Moe\" James" <mjames@media-node.com> Cc: users@ovirt.org Sent: Wednesday, April 16, 2014 8:49:05 AM Subject: Re: [ovirt-users] SPM error Hi Maurice, any updates on the above? thanks, Liron ----- Original Message -----
From: "Liron Aravot" <laravot@redhat.com> To: "Maurice \"Moe\" James" <mjames@media-node.com> Cc: users@ovirt.org Sent: Tuesday, April 15, 2014 11:53:40 AM Subject: Re: [ovirt-users] SPM error
----- Original Message -----
From: "Maurice \"Moe\" James" <mjames@media-node.com> To: "Liron Aravot" <laravot@redhat.com> Cc: "Itamar Heim" <iheim@redhat.com>, users@ovirt.org Sent: Tuesday, April 15, 2014 3:14:16 AM Subject: Re: [ovirt-users] SPM error
Sorry forgot to paste https://bugzilla.redhat.com/show_bug.cgi?id=1086951
Hi Maurice, the issue is that the host doesn't have access to all the storage domains which causes to the spm start process to fail. There's a bug open for that issue - https://bugzilla.redhat.com/show_bug.cgi?id=1072900 so seems we'll be able to close the one you opened as a duplicate but let's wait with that till your issue is solved. From looking in the logs, it seems like that host have problem accessing two storage domains - 3406665e-4adc-4fd4-aa1e-037547b29adb f3b51811-4a7f-43af-8633-322b3db23c48
Can you verify that the host can access those domains? from the log it seems like the nfs paths for those are: shtistg01.suprtekstic.com:/storage/infrastructure shtistg01.suprtekstic.com:/storage/exports
log snippet: 1. Thread-14::DEBUG::2014-04-11 22:54:44,331::mount::226::Storage.Misc.excCmd::(_runcmd) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retra ns=6,nfsvers=3 ashtistg01.suprtekstic.com:/storage/exports /rhev/data-center/mnt/ashtistg01.suprtekstic.com:_storage_exports' (cwd None) Thread-14::ERROR::2014-04-11 22:55:36,659::storageServer::209::StorageServer.MountConnection::(connect) Mount failed: (32, ';mount.nfs: Failed to resolve serv er ashtistg01.suprtekstic.com: Name or service not known\n') Traceback (most recent call last): File "/usr/share/vdsm/storage/storageServer.py", line 207, in connect self._mount.mount(self.options, self._vfsType) File "/usr/share/vdsm/storage/mount.py", line 222, in mount return self._runcmd(cmd, timeout) File "/usr/share/vdsm/storage/mount.py", line 238, in _runcmd raise MountError(rc, ";".join((out, err))) MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n') Thread-14::ERROR::2014-04-11 22:55:36,705::hsm::2379::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2376, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 320, in connect return self._mountCon.connect() File "/usr/share/vdsm/storage/storageServer.py", line 215, in connect raise e MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n')
2. Thread-14::ERROR::2014-04-11 22:56:29,307::storageServer::209::StorageServer.MountConnection::(connect) Mount failed: (32, ';mount.nfs: Failed to resolve serv er ashtistg01.suprtekstic.com: Name or service not known\n') Traceback (most recent call last): File "/usr/share/vdsm/storage/storageServer.py", line 207, in connect self._mount.mount(self.options, self._vfsType) File "/usr/share/vdsm/storage/mount.py", line 222, in mount return self._runcmd(cmd, timeout) File "/usr/share/vdsm/storage/mount.py", line 238, in _runcmd raise MountError(rc, ";".join((out, err))) MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n') Thread-14::ERROR::2014-04-11 22:56:29,309::hsm::2379::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2376, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 320, in connect return self._mountCon.connect() File "/usr/share/vdsm/storage/storageServer.py", line 215, in connect raise e MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n')
Regardless of that, there are sanlock errors over the log when trying to acquire host-id over the log. I'd start with check the connectivity issues to the storage domains above, later on you can attach check that sanlock is running and operational and/or attach the sanlock logs.
On Mon, 2014-04-14 at 17:11 -0400, Liron Aravot wrote:
----- Original Message -----
From: "Maurice \"Moe\" James" <mjames@media-node.com> To: "Itamar Heim" <iheim@redhat.com> Cc: users@ovirt.org Sent: Sunday, April 13, 2014 2:28:45 AM Subject: Re: [ovirt-users] SPM error
Were you able to find out anything? Is there anything that I can check in the meanwhile?
Hi Muarice, can you please attach the ovirt engine/vdsm logs? thanks, Liron
On Sat, 2014-04-12 at 19:23 +0300, Itamar Heim wrote:
On 04/12/2014 03:40 PM, Maurice James wrote:
What did you do to try to fix the sanlock? Anything is better than nothing at this point
----- Original Message ----- From: "Ted Miller" <tmiller@hcjb.org> To: "Maurice James" <mjames@media-node.com> Sent: Friday, April 11, 2014 7:27:24 PM Subject: Re: [ovirt-users] SPM error
I did receive some help on one stage of rebuilding my sanlock, but there were too many other things wrong to get it started again. Only advice I have is -- look at your sanlock logs, and see if you can find anything there that is helpful.
On 4/11/2014 7:23 PM, Maurice James wrote: > Nooooooo. > > > Sent from my Galaxy S®III > > -------- Original message -------- > From: Ted Miller <tmiller@hcjb.org> > Date:04/11/2014 7:08 PM (GMT-05:00) > To: Maurice James <mjames@media-node.com> > Subject: Re: [ovirt-users] SPM error > > > > I didn't, really. I did something wrong along the way, and ended > up > having > to rebuild the engine and hosts. (My problems were due to a > glusterfs > split-brain.) > Ted Miller > > On 4/11/2014 6:03 PM, Maurice James wrote: >> How did you fix it? >> >> >> Sent from my Galaxy S®III >> >> -------- Original message -------- >> From: Ted Miller <tmiller@hcjb.org> >> Date:04/11/2014 6:00 PM (GMT-05:00) >> To: users@ovirt.org >> Subject: Re: [ovirt-users] SPM error >> >> >> >> On 4/11/2014 2:05 PM, Maurice James wrote: >>> I have an error trying to bring the master DC back online. After >>> several >>> reboots, no luck. I took the other cluster members offline to >>> try >>> to >>> troubleshoot. The remaining host is constantly in contention >>> with >>> itself >>> for SPM >>> >>> >>> ERROR >>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] >>> (DefaultQuartzScheduler_Worker-40) [38d400ea] >>> IrsBroker::Failed::GetStoragePoolInfoVDS due to: >>> IrsSpmStartFailedException: IRSGenericException: >>> IRSErrorException: >>> SpmStart failed >>> >> I'm no expert, but the last time I beat my head on that rock, >> something >> was >> wrong with my sanlock storage. YMMV >> Ted Miller >> Elkhart, IN, USA >>
Maurice - which type of storage is this?
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

----- Original Message -----
From: "Maurice James" <mjames@media-node.com> To: "Liron Aravot" <laravot@redhat.com> Cc: users@ovirt.org Sent: Wednesday, April 16, 2014 4:49:18 PM Subject: Re: [ovirt-users] SPM error
After a few sleepless nights. I went through my entire system again and I found an interface on one of my hosts that had already been removed from the UI. Even after multiple "service network restart" it would still show up when I ran "ip addr". I had to end up forcefully removing it with rm -rf /etc/sysconfig/network-scripts/ifcfg-<interface>. After that I rebooted the node and the SPM came out of contention. I cant make sense of it but it worked
ok, so we might have a bug here - what os are you running? as it seems the initial issue SPM issue is as the bug i provided earlier, seems like the bug you opened on that issue can be closed as a duplicate, adding federico to verify that there's no further sanlock issue there.
----- Original Message ----- From: "Liron Aravot" <laravot@redhat.com> To: "Maurice \"Moe\" James" <mjames@media-node.com> Cc: users@ovirt.org Sent: Wednesday, April 16, 2014 8:49:05 AM Subject: Re: [ovirt-users] SPM error
Hi Maurice, any updates on the above?
thanks, Liron
----- Original Message -----
From: "Liron Aravot" <laravot@redhat.com> To: "Maurice \"Moe\" James" <mjames@media-node.com> Cc: users@ovirt.org Sent: Tuesday, April 15, 2014 11:53:40 AM Subject: Re: [ovirt-users] SPM error
----- Original Message -----
From: "Maurice \"Moe\" James" <mjames@media-node.com> To: "Liron Aravot" <laravot@redhat.com> Cc: "Itamar Heim" <iheim@redhat.com>, users@ovirt.org Sent: Tuesday, April 15, 2014 3:14:16 AM Subject: Re: [ovirt-users] SPM error
Sorry forgot to paste https://bugzilla.redhat.com/show_bug.cgi?id=1086951
Hi Maurice, the issue is that the host doesn't have access to all the storage domains which causes to the spm start process to fail. There's a bug open for that issue - https://bugzilla.redhat.com/show_bug.cgi?id=1072900 so seems we'll be able to close the one you opened as a duplicate but let's wait with that till your issue is solved. From looking in the logs, it seems like that host have problem accessing two storage domains - 3406665e-4adc-4fd4-aa1e-037547b29adb f3b51811-4a7f-43af-8633-322b3db23c48
Can you verify that the host can access those domains? from the log it seems like the nfs paths for those are: shtistg01.suprtekstic.com:/storage/infrastructure shtistg01.suprtekstic.com:/storage/exports
log snippet: 1. Thread-14::DEBUG::2014-04-11 22:54:44,331::mount::226::Storage.Misc.excCmd::(_runcmd) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retra ns=6,nfsvers=3 ashtistg01.suprtekstic.com:/storage/exports /rhev/data-center/mnt/ashtistg01.suprtekstic.com:_storage_exports' (cwd None) Thread-14::ERROR::2014-04-11 22:55:36,659::storageServer::209::StorageServer.MountConnection::(connect) Mount failed: (32, ';mount.nfs: Failed to resolve serv er ashtistg01.suprtekstic.com: Name or service not known\n') Traceback (most recent call last): File "/usr/share/vdsm/storage/storageServer.py", line 207, in connect self._mount.mount(self.options, self._vfsType) File "/usr/share/vdsm/storage/mount.py", line 222, in mount return self._runcmd(cmd, timeout) File "/usr/share/vdsm/storage/mount.py", line 238, in _runcmd raise MountError(rc, ";".join((out, err))) MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n') Thread-14::ERROR::2014-04-11 22:55:36,705::hsm::2379::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2376, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 320, in connect return self._mountCon.connect() File "/usr/share/vdsm/storage/storageServer.py", line 215, in connect raise e MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n')
2. Thread-14::ERROR::2014-04-11 22:56:29,307::storageServer::209::StorageServer.MountConnection::(connect) Mount failed: (32, ';mount.nfs: Failed to resolve serv er ashtistg01.suprtekstic.com: Name or service not known\n') Traceback (most recent call last): File "/usr/share/vdsm/storage/storageServer.py", line 207, in connect self._mount.mount(self.options, self._vfsType) File "/usr/share/vdsm/storage/mount.py", line 222, in mount return self._runcmd(cmd, timeout) File "/usr/share/vdsm/storage/mount.py", line 238, in _runcmd raise MountError(rc, ";".join((out, err))) MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n') Thread-14::ERROR::2014-04-11 22:56:29,309::hsm::2379::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2376, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 320, in connect return self._mountCon.connect() File "/usr/share/vdsm/storage/storageServer.py", line 215, in connect raise e MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n')
Regardless of that, there are sanlock errors over the log when trying to acquire host-id over the log. I'd start with check the connectivity issues to the storage domains above, later on you can attach check that sanlock is running and operational and/or attach the sanlock logs.
On Mon, 2014-04-14 at 17:11 -0400, Liron Aravot wrote:
----- Original Message -----
From: "Maurice \"Moe\" James" <mjames@media-node.com> To: "Itamar Heim" <iheim@redhat.com> Cc: users@ovirt.org Sent: Sunday, April 13, 2014 2:28:45 AM Subject: Re: [ovirt-users] SPM error
Were you able to find out anything? Is there anything that I can check in the meanwhile?
Hi Muarice, can you please attach the ovirt engine/vdsm logs? thanks, Liron
On Sat, 2014-04-12 at 19:23 +0300, Itamar Heim wrote:
On 04/12/2014 03:40 PM, Maurice James wrote: > What did you do to try to fix the sanlock? Anything is better > than > nothing at this point > > ----- Original Message ----- > From: "Ted Miller" <tmiller@hcjb.org> > To: "Maurice James" <mjames@media-node.com> > Sent: Friday, April 11, 2014 7:27:24 PM > Subject: Re: [ovirt-users] SPM error > > I did receive some help on one stage of rebuilding my sanlock, > but > there > were > too many other things wrong to get it started again. Only advice > I > have > is -- > look at your sanlock logs, and see if you can find anything there > that is > helpful. > > On 4/11/2014 7:23 PM, Maurice James wrote: >> Nooooooo. >> >> >> Sent from my Galaxy S®III >> >> -------- Original message -------- >> From: Ted Miller <tmiller@hcjb.org> >> Date:04/11/2014 7:08 PM (GMT-05:00) >> To: Maurice James <mjames@media-node.com> >> Subject: Re: [ovirt-users] SPM error >> >> >> >> I didn't, really. I did something wrong along the way, and >> ended >> up >> having >> to rebuild the engine and hosts. (My problems were due to a >> glusterfs >> split-brain.) >> Ted Miller >> >> On 4/11/2014 6:03 PM, Maurice James wrote: >>> How did you fix it? >>> >>> >>> Sent from my Galaxy S®III >>> >>> -------- Original message -------- >>> From: Ted Miller <tmiller@hcjb.org> >>> Date:04/11/2014 6:00 PM (GMT-05:00) >>> To: users@ovirt.org >>> Subject: Re: [ovirt-users] SPM error >>> >>> >>> >>> On 4/11/2014 2:05 PM, Maurice James wrote: >>>> I have an error trying to bring the master DC back online. >>>> After >>>> several >>>> reboots, no luck. I took the other cluster members offline to >>>> try >>>> to >>>> troubleshoot. The remaining host is constantly in contention >>>> with >>>> itself >>>> for SPM >>>> >>>> >>>> ERROR >>>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] >>>> (DefaultQuartzScheduler_Worker-40) [38d400ea] >>>> IrsBroker::Failed::GetStoragePoolInfoVDS due to: >>>> IrsSpmStartFailedException: IRSGenericException: >>>> IRSErrorException: >>>> SpmStart failed >>>> >>> I'm no expert, but the last time I beat my head on that rock, >>> something >>> was >>> wrong with my sanlock storage. YMMV >>> Ted Miller >>> Elkhart, IN, USA >>> >
Maurice - which type of storage is this?
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

Im running CentOS 6.5 2.6.32-431.11.2.el6.x86_64 VDSM: vdsm-python-4.14.6-15.git746e2e9.el6.x86_64 vdsm-hook-isolatedprivatevlan-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-vmfex-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-sriov-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-checkimages-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-smbios-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-hostusb-4.14.6-15.git746e2e9.el6.noarch vdsm-4.14.6-15.git746e2e9.el6.x86_64 vdsm-hook-faqemu-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-promisc-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-scratchpad-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-qos-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-numa-4.14.6-15.git746e2e9.el6.noarch vdsm-python-zombiereaper-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-qemucmdline-4.14.6-15.git746e2e9.el6.noarch vdsm-cli-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-fileinject-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-directlun-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-vmdisk-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-macspoof-4.14.6-15.git746e2e9.el6.noarch vdsm-gluster-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-floppy-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-hugepages-4.14.6-15.git746e2e9.el6.noarch vdsm-xmlrpc-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-pincpu-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-openstacknet-4.14.6-15.git746e2e9.el6.noarch Libvirt: libvirt-lock-sanlock-0.10.2-29.el6_5.7.x86_64 libvirt-client-0.10.2-29.el6_5.7.x86_64 libvirt-0.10.2-29.el6_5.7.x86_64 libvirt-python-0.10.2-29.el6_5.7.x86_64 Ovirt-engine: ovirt-engine-dwh-setup-3.4.1-0.0.master.20140406181125.git4081b13.el6.noarch ovirt-engine-dwh-3.4.1-0.0.master.20140406181125.git4081b13.el6.noarch ovirt-engine-websocket-proxy-3.4.1-0.0.master.20140413010852.git43746c6.el6.noarch ovirt-engine-setup-plugin-ovirt-engine-3.4.1-0.0.master.20140413010852.git43746c6.el6.noarch ovirt-engine-dbscripts-3.4.1-0.0.master.20140413010852.git43746c6.el6.noarch ovirt-engine-sdk-python-3.4.0.7-1.20140228.git19e14c5.el6.noarch ovirt-engine-reports-3.4.1-0.0.master.20140410124141.gitbf81400.el6.noarch ovirt-engine-setup-base-3.4.1-0.0.master.20140413010852.git43746c6.el6.noarch ovirt-engine-setup-3.4.1-0.0.master.20140413010852.git43746c6.el6.noarch ovirt-engine-backend-3.4.1-0.0.master.20140413010852.git43746c6.el6.noarch ovirt-engine-tools-3.4.1-0.0.master.20140413010852.git43746c6.el6.noarch ovirt-engine-reports-setup-3.4.1-0.0.master.20140410124141.gitbf81400.el6.noarch ovirt-engine-setup-plugin-websocket-proxy-3.4.1-0.0.master.20140413010852.git43746c6.el6.noarch ovirt-engine-setup-plugin-allinone-3.4.1-0.0.master.20140413010852.git43746c6.el6.noarch ovirt-engine-webadmin-portal-3.4.1-0.0.master.20140413010852.git43746c6.el6.noarch ovirt-engine-3.4.1-0.0.master.20140413010852.git43746c6.el6.noarch ovirt-engine-cli-3.4.0.6-1.20140227.gite87e2bc.el6.noarch ovirt-engine-setup-plugin-ovirt-engine-common-3.4.1-0.0.master.20140413010852.git43746c6.el6.noarch ovirt-engine-userportal-3.4.1-0.0.master.20140413010852.git43746c6.el6.noarch ovirt-engine-lib-3.4.1-0.0.master.20140413010852.git43746c6.el6.noarch ovirt-engine-restapi-3.4.1-0.0.master.20140413010852.git43746c6.el6.noarch Qemu: qemu-kvm-tools-0.12.1.2-2.415.el6_5.7.x86_64 qemu-kvm-0.12.1.2-2.415.el6_5.7.x86_64 vdsm-hook-faqemu-4.14.6-15.git746e2e9.el6.noarch vdsm-hook-qemucmdline-4.14.6-15.git746e2e9.el6.noarch qemu-img-0.12.1.2-2.415.el6_5.7.x86_64 qemu-guest-agent-0.12.1.2-2.415.el6_5.7.x86_64 gpxe-roms-qemu-0.9.7-6.10.el6.noarch ----- Original Message ----- From: "Liron Aravot" <laravot@redhat.com> To: "Maurice James" <mjames@media-node.com>, "fsimonce" <fsimonce@redhat.com> Cc: users@ovirt.org Sent: Wednesday, April 16, 2014 10:30:11 AM Subject: Re: [ovirt-users] SPM error ----- Original Message -----
From: "Maurice James" <mjames@media-node.com> To: "Liron Aravot" <laravot@redhat.com> Cc: users@ovirt.org Sent: Wednesday, April 16, 2014 4:49:18 PM Subject: Re: [ovirt-users] SPM error
After a few sleepless nights. I went through my entire system again and I found an interface on one of my hosts that had already been removed from the UI. Even after multiple "service network restart" it would still show up when I ran "ip addr". I had to end up forcefully removing it with rm -rf /etc/sysconfig/network-scripts/ifcfg-<interface>. After that I rebooted the node and the SPM came out of contention. I cant make sense of it but it worked
ok, so we might have a bug here - what os are you running? as it seems the initial issue SPM issue is as the bug i provided earlier, seems like the bug you opened on that issue can be closed as a duplicate, adding federico to verify that there's no further sanlock issue there.
----- Original Message ----- From: "Liron Aravot" <laravot@redhat.com> To: "Maurice \"Moe\" James" <mjames@media-node.com> Cc: users@ovirt.org Sent: Wednesday, April 16, 2014 8:49:05 AM Subject: Re: [ovirt-users] SPM error
Hi Maurice, any updates on the above?
thanks, Liron
----- Original Message -----
From: "Liron Aravot" <laravot@redhat.com> To: "Maurice \"Moe\" James" <mjames@media-node.com> Cc: users@ovirt.org Sent: Tuesday, April 15, 2014 11:53:40 AM Subject: Re: [ovirt-users] SPM error
----- Original Message -----
From: "Maurice \"Moe\" James" <mjames@media-node.com> To: "Liron Aravot" <laravot@redhat.com> Cc: "Itamar Heim" <iheim@redhat.com>, users@ovirt.org Sent: Tuesday, April 15, 2014 3:14:16 AM Subject: Re: [ovirt-users] SPM error
Sorry forgot to paste https://bugzilla.redhat.com/show_bug.cgi?id=1086951
Hi Maurice, the issue is that the host doesn't have access to all the storage domains which causes to the spm start process to fail. There's a bug open for that issue - https://bugzilla.redhat.com/show_bug.cgi?id=1072900 so seems we'll be able to close the one you opened as a duplicate but let's wait with that till your issue is solved. From looking in the logs, it seems like that host have problem accessing two storage domains - 3406665e-4adc-4fd4-aa1e-037547b29adb f3b51811-4a7f-43af-8633-322b3db23c48
Can you verify that the host can access those domains? from the log it seems like the nfs paths for those are: shtistg01.suprtekstic.com:/storage/infrastructure shtistg01.suprtekstic.com:/storage/exports
log snippet: 1. Thread-14::DEBUG::2014-04-11 22:54:44,331::mount::226::Storage.Misc.excCmd::(_runcmd) '/usr/bin/sudo -n /bin/mount -t nfs -o soft,nosharecache,timeo=600,retra ns=6,nfsvers=3 ashtistg01.suprtekstic.com:/storage/exports /rhev/data-center/mnt/ashtistg01.suprtekstic.com:_storage_exports' (cwd None) Thread-14::ERROR::2014-04-11 22:55:36,659::storageServer::209::StorageServer.MountConnection::(connect) Mount failed: (32, ';mount.nfs: Failed to resolve serv er ashtistg01.suprtekstic.com: Name or service not known\n') Traceback (most recent call last): File "/usr/share/vdsm/storage/storageServer.py", line 207, in connect self._mount.mount(self.options, self._vfsType) File "/usr/share/vdsm/storage/mount.py", line 222, in mount return self._runcmd(cmd, timeout) File "/usr/share/vdsm/storage/mount.py", line 238, in _runcmd raise MountError(rc, ";".join((out, err))) MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n') Thread-14::ERROR::2014-04-11 22:55:36,705::hsm::2379::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2376, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 320, in connect return self._mountCon.connect() File "/usr/share/vdsm/storage/storageServer.py", line 215, in connect raise e MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n')
2. Thread-14::ERROR::2014-04-11 22:56:29,307::storageServer::209::StorageServer.MountConnection::(connect) Mount failed: (32, ';mount.nfs: Failed to resolve serv er ashtistg01.suprtekstic.com: Name or service not known\n') Traceback (most recent call last): File "/usr/share/vdsm/storage/storageServer.py", line 207, in connect self._mount.mount(self.options, self._vfsType) File "/usr/share/vdsm/storage/mount.py", line 222, in mount return self._runcmd(cmd, timeout) File "/usr/share/vdsm/storage/mount.py", line 238, in _runcmd raise MountError(rc, ";".join((out, err))) MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n') Thread-14::ERROR::2014-04-11 22:56:29,309::hsm::2379::Storage.HSM::(connectStorageServer) Could not connect to storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2376, in connectStorageServer conObj.connect() File "/usr/share/vdsm/storage/storageServer.py", line 320, in connect return self._mountCon.connect() File "/usr/share/vdsm/storage/storageServer.py", line 215, in connect raise e MountError: (32, ';mount.nfs: Failed to resolve server ashtistg01.suprtekstic.com: Name or service not known\n')
Regardless of that, there are sanlock errors over the log when trying to acquire host-id over the log. I'd start with check the connectivity issues to the storage domains above, later on you can attach check that sanlock is running and operational and/or attach the sanlock logs.
On Mon, 2014-04-14 at 17:11 -0400, Liron Aravot wrote:
----- Original Message -----
From: "Maurice \"Moe\" James" <mjames@media-node.com> To: "Itamar Heim" <iheim@redhat.com> Cc: users@ovirt.org Sent: Sunday, April 13, 2014 2:28:45 AM Subject: Re: [ovirt-users] SPM error
Were you able to find out anything? Is there anything that I can check in the meanwhile?
Hi Muarice, can you please attach the ovirt engine/vdsm logs? thanks, Liron
On Sat, 2014-04-12 at 19:23 +0300, Itamar Heim wrote:
On 04/12/2014 03:40 PM, Maurice James wrote: > What did you do to try to fix the sanlock? Anything is better > than > nothing at this point > > ----- Original Message ----- > From: "Ted Miller" <tmiller@hcjb.org> > To: "Maurice James" <mjames@media-node.com> > Sent: Friday, April 11, 2014 7:27:24 PM > Subject: Re: [ovirt-users] SPM error > > I did receive some help on one stage of rebuilding my sanlock, > but > there > were > too many other things wrong to get it started again. Only advice > I > have > is -- > look at your sanlock logs, and see if you can find anything there > that is > helpful. > > On 4/11/2014 7:23 PM, Maurice James wrote: >> Nooooooo. >> >> >> Sent from my Galaxy S®III >> >> -------- Original message -------- >> From: Ted Miller <tmiller@hcjb.org> >> Date:04/11/2014 7:08 PM (GMT-05:00) >> To: Maurice James <mjames@media-node.com> >> Subject: Re: [ovirt-users] SPM error >> >> >> >> I didn't, really. I did something wrong along the way, and >> ended >> up >> having >> to rebuild the engine and hosts. (My problems were due to a >> glusterfs >> split-brain.) >> Ted Miller >> >> On 4/11/2014 6:03 PM, Maurice James wrote: >>> How did you fix it? >>> >>> >>> Sent from my Galaxy S®III >>> >>> -------- Original message -------- >>> From: Ted Miller <tmiller@hcjb.org> >>> Date:04/11/2014 6:00 PM (GMT-05:00) >>> To: users@ovirt.org >>> Subject: Re: [ovirt-users] SPM error >>> >>> >>> >>> On 4/11/2014 2:05 PM, Maurice James wrote: >>>> I have an error trying to bring the master DC back online. >>>> After >>>> several >>>> reboots, no luck. I took the other cluster members offline to >>>> try >>>> to >>>> troubleshoot. The remaining host is constantly in contention >>>> with >>>> itself >>>> for SPM >>>> >>>> >>>> ERROR >>>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] >>>> (DefaultQuartzScheduler_Worker-40) [38d400ea] >>>> IrsBroker::Failed::GetStoragePoolInfoVDS due to: >>>> IrsSpmStartFailedException: IRSGenericException: >>>> IRSErrorException: >>>> SpmStart failed >>>> >>> I'm no expert, but the last time I beat my head on that rock, >>> something >>> was >>> wrong with my sanlock storage. YMMV >>> Ted Miller >>> Elkhart, IN, USA >>> >
Maurice - which type of storage is this?
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On 4/12/2014 12:23 PM, Itamar Heim wrote:
On 04/12/2014 03:40 PM, Maurice James wrote:
What did you do to try to fix the sanlock? Anything is better than nothing at this point My thread is at http://lists.ovirt.org/pipermail/users/2014-January/020394.html
----- Original Message ----- From: "Ted Miller" <tmiller@hcjb.org> To: "Maurice James" <mjames@media-node.com> Sent: Friday, April 11, 2014 7:27:24 PM Subject: Re: [ovirt-users] SPM error
I did receive some help on one stage of rebuilding my sanlock, but there were too many other things wrong to get it started again. Only advice I have is -- look at your sanlock logs, and see if you can find anything there that is helpful.
On 4/11/2014 7:23 PM, Maurice James wrote:
Nooooooo.
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 7:08 PM (GMT-05:00) To: Maurice James <mjames@media-node.com> Subject: Re: [ovirt-users] SPM error
I didn't, really. I did something wrong along the way, and ended up having to rebuild the engine and hosts. (My problems were due to a glusterfs split-brain.) Ted Miller
On 4/11/2014 6:03 PM, Maurice James wrote:
How did you fix it?
Sent from my Galaxy S®III
-------- Original message -------- From: Ted Miller <tmiller@hcjb.org> Date:04/11/2014 6:00 PM (GMT-05:00) To: users@ovirt.org Subject: Re: [ovirt-users] SPM error
On 4/11/2014 2:05 PM, Maurice James wrote:
I have an error trying to bring the master DC back online. After several reboots, no luck. I took the other cluster members offline to try to troubleshoot. The remaining host is constantly in contention with itself for SPM
ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler_Worker-40) [38d400ea] IrsBroker::Failed::GetStoragePoolInfoVDS due to: IrsSpmStartFailedException: IRSGenericException: IRSErrorException: SpmStart failed
I'm no expert, but the last time I beat my head on that rock, something was wrong with my sanlock storage. YMMV Ted Miller Elkhart, IN, USA
Maurice - which type of storage is this?
-- "He is no fool who gives what he cannot keep, to gain what he cannot lose." - - Jim Elliot For more information about Jim Elliot and his unusual life, see http://www.christianliteratureandliving.com/march2003/carolyn.html. Ted Miller Design Engineer HCJB Global Technology Center, a ministry of Reach Beyond 2830 South 17th St Elkhart, IN 46517 574--970-4272 my desk 574--970-4252 receptionist
participants (5)
-
Itamar Heim
-
Liron Aravot
-
Maurice "Moe" James
-
Maurice James
-
Ted Miller