January 2014 - Users - oVirt List Archives

Re: [Users] Data Center stuck between "Non Responsive" and "Contending"
by Ted Miller 28 Jan '14

28 Jan '14

On 1/26/2014 4:00 PM, Itamar Heim wrote: > On 01/26/2014 10:51 PM, Ted Miller wrote: >> >> On 1/26/2014 3:10 PM, Itamar Heim wrote: >>> On 01/26/2014 10:08 PM, Ted Miller wrote: >>>> My Data Center is down, and won't come back up. >>>> >>>> Data Center Status on the GUI flips between "Non Responsive" and >>>> "Contending" >>>> >>>> Also noted: >>>> Host sometimes seen flipping between "Low" and "Contending" in SPM >>>> column. >>>> Storage VM2 "Data (Master)" is in "Cross Data-Center Status" = Unknown >>>> VM2 is "up" under "Volumes" tab >>>> >>>> Created another volume for VM storage. It shows up in "volumes" tab, >>>> but when I try to add "New Domain" in storage tab, says that "There are >>>> No Data Centers to which the Storage Domain can be attached" >>>> >>>> Setup: >>>> 2 hosts w/ glusterfs storage >>>> 1 engine >>>> all 3 computers Centos 6.5, just updated >>>> ovirt-engine 3.3.0.1-1.el6 >>>> ovirt-engine-lib 3.3.2-1.el6 >>>> ovirt-host-deploy.noarch 1.1.3-1.el6 >>>> glusterfs.x86_64 3.4.2-1.el6 >>>> >>>> This loop seems to repeat in the ovirt-engine log (grep of log showing >>>> only DefaultQuartzScheduler_Worker-79 thread: >>>> >>>> 2014-01-26 14:44:58,416 INFO >>>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] >>>> (DefaultQuartzScheduler_Worker-79) Irs placed on server >>>> 9a591103-83be-4ca9-b207-06929223b541 failed. Proceed Failover >>>> 2014-01-26 14:44:58,511 INFO >>>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] >>>> (DefaultQuartzScheduler_Worker-79) hostFromVds::selectedVds - office4a, >>>> spmStatus Free, storage pool mill >>>> 2014-01-26 14:44:58,550 INFO >>>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] >>>> (DefaultQuartzScheduler_Worker-79) SpmStatus on vds >>>> 127ed939-34af-41a8-87a0-e2f6174b1877: Free >>>> 2014-01-26 14:44:58,571 INFO >>>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] >>>> (DefaultQuartzScheduler_Worker-79) starting spm on vds office4a, storage >>>> pool mill, prevId 2, LVER 15 >>>> 2014-01-26 14:44:58,579 INFO >>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] >>>> (DefaultQuartzScheduler_Worker-79) START, SpmStartVDSCommand(HostName = >>>> office4a, HostId = 127ed939-34af-41a8-87a0-e2f6174b1877, storagePoolId = >>>> 536a864d-83aa-473a-a675-e38aafdd9071, prevId=2, prevLVER=15, >>>> storagePoolFormatType=V3, recoveryMode=Manual, SCSIFencing=false), log >>>> id: 74c38eb7 >>>> 2014-01-26 14:44:58,617 INFO >>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] >>>> (DefaultQuartzScheduler_Worker-79) spmStart polling started: taskId = >>>> e8986753-fc80-4b11-a11d-6d3470b1728c >>>> 2014-01-26 14:45:00,662 ERROR >>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand] >>>> (DefaultQuartzScheduler_Worker-79) Failed in HSMGetTaskStatusVDS method >>>> 2014-01-26 14:45:00,664 ERROR >>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand] >>>> (DefaultQuartzScheduler_Worker-79) Error code AcquireHostIdFailure and >>>> error message VDSGenericException: VDSErrorException: Failed to >>>> HSMGetTaskStatusVDS, error = Cannot acquire host id >>>> 2014-01-26 14:45:00,665 INFO >>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] >>>> (DefaultQuartzScheduler_Worker-79) spmStart polling ended: taskId = >>>> e8986753-fc80-4b11-a11d-6d3470b1728c task status = finished >>>> 2014-01-26 14:45:00,666 ERROR >>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] >>>> (DefaultQuartzScheduler_Worker-79) Start SPM Task failed - result: >>>> cleanSuccess, message: VDSGenericException: VDSErrorException: Failed to >>>> HSMGetTaskStatusVDS, error = Cannot acquire host id >>>> 2014-01-26 14:45:00,695 INFO >>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] >>>> (DefaultQuartzScheduler_Worker-79) spmStart polling ended, spm >>>> status: Free >>>> 2014-01-26 14:45:00,702 INFO >>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] >>>> (DefaultQuartzScheduler_Worker-79) START, >>>> HSMClearTaskVDSCommand(HostName = office4a, HostId = >>>> 127ed939-34af-41a8-87a0-e2f6174b1877, >>>> taskId=e8986753-fc80-4b11-a11d-6d3470b1728c), log id: 336ec5a6 >>>> 2014-01-26 14:45:00,722 INFO >>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] >>>> (DefaultQuartzScheduler_Worker-79) FINISH, HSMClearTaskVDSCommand, log >>>> id: 336ec5a6 >>>> 2014-01-26 14:45:00,724 INFO >>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] >>>> (DefaultQuartzScheduler_Worker-79) FINISH, SpmStartVDSCommand, return: >>>> org.ovirt.engine.core.common.businessentities.SpmStatusResult@13652652, >>>> log id: 74c38eb7 >>>> 2014-01-26 14:45:00,733 INFO >>>> [org.ovirt.engine.core.bll.storage.SetStoragePoolStatusCommand] >>>> (DefaultQuartzScheduler_Worker-79) Running command: >>>> SetStoragePoolStatusCommand internal: true. Entities affected : ID: >>>> 536a864d-83aa-473a-a675-e38aafdd9071 Type: StoragePool >>>> 2014-01-26 14:45:00,778 ERROR >>>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] >>>> (DefaultQuartzScheduler_Worker-79) >>>> IrsBroker::Failed::GetStoragePoolInfoVDS due to: >>>> IrsSpmStartFailedException: IRSGenericException: IRSErrorException: >>>> SpmStart failed >>>> >>>> Ted Miller >>>> Elkhart, IN, USA >>>> >>>> >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> Users(a)ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>> >>> is this gluster storage (guessing sunce you mentioned a 'volume') >> yes (mentioned under "setup" above) >>> does it have a quorum? >> Volume Name: VM2 >> Type: Replicate >> Volume ID: 7bea8d3b-ec2a-4939-8da8-a82e6bda841e >> Status: Started >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: 10.41.65.2:/bricks/01/VM2 >> Brick2: 10.41.65.4:/bricks/01/VM2 >> Brick3: 10.41.65.4:/bricks/101/VM2 >> Options Reconfigured: >> cluster.server-quorum-type: server >> storage.owner-gid: 36 >> storage.owner-uid: 36 >> auth.allow: * >> user.cifs: off >> nfs.disa >>> (there were reports of split brain on the domain metadata before when >>> no quorum exist for gluster) >> after full heal: >> >> [root@office4a ~]$ gluster volume heal VM2 info >> Gathering Heal info on volume VM2 has been successful >> >> Brick 10.41.65.2:/bricks/01/VM2 >> Number of entries: 0 >> >> Brick 10.41.65.4:/bricks/01/VM2 >> Number of entries: 0 >> >> Brick 10.41.65.4:/bricks/101/VM2 >> Number of entries: 0 >> [root@office4a ~]$ gluster volume heal VM2 info split-brain >> Gathering Heal info on volume VM2 has been successful >> >> Brick 10.41.65.2:/bricks/01/VM2 >> Number of entries: 0 >> >> Brick 10.41.65.4:/bricks/01/VM2 >> Number of entries: 0 >> >> Brick 10.41.65.4:/bricks/101/VM2 >> Number of entries: 0 >> >> noticed this in host /var/log/messages (while looking for something >> else). Loop seems to repeat over and over. >> >> Jan 26 15:35:52 office4a sanlock[3763]: 2014-01-26 15:35:52-0500 14678 >> [30419]: read_sectors delta_leader offset 512 rv -90 >> /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids >> >> Jan 26 15:35:53 office4a sanlock[3763]: 2014-01-26 15:35:53-0500 14679 >> [3771]: s1997 add_lockspace fail result -90 >> Jan 26 15:35:58 office4a vdsm TaskManager.Task ERROR >> Task=`89885661-88eb-4ea3-8793-00438735e4ab`::Unexpected error#012Traceback >> (most recent call last):#012 File "/usr/share/vdsm/storage/task.py", line >> 857, in _run#012 return fn(*args, **kargs)#012 File >> "/usr/share/vdsm/logUtils.py", line 45, in wrapper#012 res = f(*args, >> **kwargs)#012 File "/usr/share/vdsm/storage/hsm.py", line 2111, in >> getAllTasksStatuses#012 allTasksStatus = sp.getAllTasksStatuses()#012 >> File "/usr/share/vdsm/storage/securable.py", line 66, in wrapper#012 raise >> SecureError()#012SecureError >> Jan 26 15:35:59 office4a sanlock[3763]: 2014-01-26 15:35:59-0500 14686 >> [30495]: read_sectors delta_leader offset 512 rv -90 >> /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids >> >> Jan 26 15:36:00 office4a sanlock[3763]: 2014-01-26 15:36:00-0500 14687 >> [3772]: s1998 add_lockspace fail result -90 >> Jan 26 15:36:00 office4a vdsm TaskManager.Task ERROR >> Task=`8db9ff1a-2894-407a-915a-279f6a7eb205`::Unexpected error#012Traceback >> (most recent call last):#012 File "/usr/share/vdsm/storage/task.py", line >> 857, in _run#012 return fn(*args, **kargs)#012 File >> "/usr/share/vdsm/storage/task.py", line 318, in run#012 return >> self.cmd(*self.argslist, **self.argsdict)#012 File >> "/usr/share/vdsm/storage/sp.py", line 273, in startSpm#012 >> self.masterDomain.acquireHostId(self.id)#012 File >> "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId#012 >> self._clusterLock.acquireHostId(hostId, async)#012 File >> "/usr/share/vdsm/storage/clusterlock.py", line 189, in >> acquireHostId#012 raise se.AcquireHostIdFailure(self._sdUUID, >> e)#012AcquireHostIdFailure: Cannot acquire host id: >> ('0322a407-2b16-40dc-ac67-13d387c6eb4c', SanlockException(90, 'Sanlock >> lockspace add failure', 'Message too long')) >> >> Ted Miller >> Elkhart, IN, USA >> > > this is the new storage domain? what about the previous volume for the > first SD? The default/default data center/cluster had to be abandoned because of a split-brain that could not be healed. Can't remove old storage from database, can't get data center up due to corrupt storage, ends up a circular argument. I started over with same hosts, totally new storage in new data center. This mill/one data center/cluster was working fine with VM2 storage, then died. Ted Miller

4 8

[Users] Subject: Outage :: Mailman, downloads :: 2014-01-27 23:00 UTC
by Karsten Wade 28 Jan '14

28 Jan '14

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 There us an outage of Mailman (lists.oviry.org) and downloads (resources.ovirt.org) for about 15 minutes. The outage will occur in one about one hour from now at 2014-01-27 23:00 UTC. To view in your local time: date -d '2014-01-27 23:00 UTC' == Details == We have a no-cost upgrade for RAM and disk space available. Since this host has run out of disk space a few times recently, which has affected Mailman services, it seems like a good idea to grab the extra space immediately. == Affected services == * Mailman (lists.ovirt.org) * Downloads (resources.ovirt.org) + yum repos * Some redirects. === Not-affected services == * www.ovirt.org * jenkins.ovirt.org * gerrit.ovirt.org * etc. == Future plans == This host is due to be de-provisioned, when possible. - -- Karsten 'quaid' Wade .^\ CentOS Engineering Manager http://TheOpenSourceWay.org \ http://community.redhat.com @quaid (identi.ca/twitter/IRC) \v' gpg: AD0E0C41 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlLm2UAACgkQ2ZIOBq0ODEHoxQCcD7hA9E+dEGtoyB38Gh0ZmncS BNIAnipN9fwZmDxxPPYn8DDJjSfqzWFJ =W6Ga -----END PGP SIGNATURE-----

1 2

[Users] oVirt 3.4 - testing days report [iproute2 configurator]
by Douglas Schilling Landgraf 28 Jan '14

28 Jan '14

Hi, During the tests I have faced the below bug: vdsm-4.14.1-2 unable to restart on reboot after a network is defined on ovirt-node https://bugzilla.redhat.com/show_bug.cgi?id=1057657 Additionally (not related to iproute2 tests), I have faced: [RFE] report BOOTPROTO and BONDING_OPTS independent of netdevice.cfg https://bugzilla.redhat.com/show_bug.cgi?id=987813 (I have workaround creating manually ifcfg-em1 and ifcfg-ovirtmgmt) firefox seg faults when using the Admin Portal on RHEL 6.5 https://bugzilla.redhat.com/show_bug.cgi?id=1044010 (Updated to firefox-24.2.0-6.el6_5.x86_64 resolved the problem.) Test data for iproute2: ======================== - Setup Node -> put it in maintenance - Changed the vdsm.conf on node to: [vars] ssl = true net_configurator = iproute2 net_persistence = unified [addresses] management_port = 54321 - Restart vdsm/supervdsm - Host is UP again, no problems - DataCenter -> Logical Network -> New - Name: net25 -> [x] Enable Vlan tagging [ ] VM Network - Since I have just one nic at host I have added dummy interface. #ip link add name dummy_interface type dummy - Put the host in maintenance and put again UP to recognize the new interface - Host -> Network -> Setup Host Networks -> drag/drop net25 to dummy_interace -> [x] save network interface On host vdsClient -s 0 getVdsCaps appears [net25] * Rebooted to check if the new net25 will be persistent. -- Cheers Douglas

1 0

[Users] oVirt 3.4 test day - Template Versions
by Federico Simoncelli 27 Jan '14

27 Jan '14

Feature tested: http://www.ovirt.org/Features/Template_Versions - create a new vm vm1 and make a template template1 from it - create a new vm vm2 based on template1 and make some changes - upgrade to 3.4 - create a new template template1.1 from vm2 - create a new vm vm3 from template1 (clone) - content ok - create a new vm vm4 from template1.1 (thin) - content ok - create a new vm vm5 from template1 last (thin) - content ok (same as 1.1) - try to remove template1 (failed as template1.1 is still present) - try to remove template1.1 (failed as vm5 is still present) - create a new vm vm6 and make a template blank1.1 as new version of the blank template (succeeded) - create a vm pool vmpool1 with the "latest" template from template1 - create a vm pool vmpool2 with the "template1.1" (last) template from template1 - start vmpool1 and vmpool2 and verify that the content is the same - create a new template template1.2 - start vmpool1 and verify that the content is the same as latest (template1.2) - start vmpool2 and verify that the content is the same as template1.1 Suggestions: - the template blank is special, I am not sure if allowing versioning may be confusing (for example is not even editable) - as far as I can see the "Sub Version Name" is not editable anymore (after picking it) -- Federico

2 2

[Users] Hosted-engine runtime issues (3.4 BETA)
by Frank Wall 27 Jan '14

27 Jan '14

Hi, finally I've got the new hosted-engine feature running on RHEL6 using oVirt 3.4 BETA/nightly. I've come across a few issues and wanted to clarify if this is the desired behaviour: 1.) hosted-engine storage domain not visible in GUI The NFS-Storage I've used to install the hosted-engine is not visible in oVirt's Admin Portal. Though it is mounted on my oVirt Node below /rhev/data-center/mnt/. I tried to import this storage domain, but apparently this fails because it's already mounted. Is there any way to make this storage domain visible? 2.) hosted-engine VM device are not visible in GUI The disk and network devices are not visible in the admin portal. Thus I'm unable to change anything. Is this intended? If so, how am I supposed to make changes? 3.) move hosted-engine VM to a different storage Because of all of the above I seem to be unable to move my hosted-engine VM to a different NFS-Storage. How can this be done? Thanks - Frank

3 13

[Users] Cluster compatibility
by Piotr Kliczewski 27 Jan '14

27 Jan '14

I wanted to install two hosts one on f19 and the second on el6. I created additional cluster for el6. Host installation for el6 worked well and it joined the cluster without any issues. Whereas host in f19 was successfully deployed but it failed to join the cluster due to: Host fedora is compatible with versions (3.0,3.1,3.2,3.3) and cannot join Cluster Default which is set to version 3.4. Here are the versions that I use: engine: Name : ovirt-engine Arch : noarch Version : 3.4.0 Release : 0.5.beta1.fc19 Size : 1.5 M Repo : installed >From repo : ovirt-3.4.0-prerelease fedora host: Name : vdsm Arch : x86_64 Version : 4.14.1 Release : 2.fc19 Size : 2.9 M Repo : installed >From repo : ovirt-3.4.0-prerelease el6 host: Name : vdsm Arch : x86_64 Version : 4.14.1 Release : 2.el6 Size : 2.9 M Repo : installed >From repo : ovirt-3.4.0-prerelease Both clusters are set to be compatible with 3.4. Is there anything that I am missing? Piotr

4 3

[Users] Centos 6.5 and bonding: "A slave interface is not properly configured"
by Federico Sayd 27 Jan '14

27 Jan '14

11 26

Re: [Users] networking: basic vlan help
by Juan Pablo Lorier 27 Jan '14

27 Jan '14

This is a multi-part message in MIME format. --------------010004070404070407050707 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi Itamar, I don't know if I get your post right, but to me, it seems that if so many users hit the same rock, it should mean that this should be documented somewhere visible and in my opinion, push on getting bug 1049476 <https://bugzilla.redhat.com/show_bug.cgi?id=1049476> solved asap. Regards, --------------010004070404070407050707 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit <html> <head> <meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> </head> <body text="#000000" bgcolor="#FFFFFF"> Hi Itamar,<br> <br> I don't know if I get your post right, but to me, it seems that if so many users hit the same rock, it should mean that this should be documented somewhere visible and in my opinion, push on getting bug <a name="b1049476" href="https://bugzilla.redhat.com/show_bug.cgi?id=1049476">1049476</a> solved asap.<br> Regards,<br> </body> </html> --------------010004070404070407050707--

4 4

[Users] Reboot causes poweroff of VM 3.4 Beta
by Jon Archer 27 Jan '14

27 Jan '14

Hi, Seem to be suffering an issue in 3.4 where if a vm is rebooted it actually shuts down, this occurs for all guests regardless of OS installed within. Anyone seen this? Jon

4 10

[Users] Issues starting hosted engine VM
by Andrew Lau 27 Jan '14

27 Jan '14

Hi, With the great help from sbonazzo, I managed to step past the initial bug with the hosted-engine-setup but appear to have run into another show stopper. I ran through the install process successfully up to the stage where it completed and the engine VM was to be shutdown. (The engine has already been installed on the VM and the host has been connected to the engine). The issue starts here that the host finds itself not able to start the VM up again. VDSM Logs: http://www.fpaste.org/69592/00427141/ ovirt-hosted-engine-ha agent.log http://www.fpaste.org/69595/43609139/ It seems to keep failing to start the VM.. when I restart the agent I can see the score drop to 0 after 3 boot attempts. The interesting thing seems to be in the VDSM Logs "'Virtual machine does not exist', 'code': 1}}" I'm not sure where else to look. Suggestions? Cheers, Andrew

6 25