Re: [Users] Data Center stuck between "Non Responsive" and "Contending"
by Ted Miller
On 1/26/2014 4:00 PM, Itamar Heim wrote:
> On 01/26/2014 10:51 PM, Ted Miller wrote:
>>
>> On 1/26/2014 3:10 PM, Itamar Heim wrote:
>>> On 01/26/2014 10:08 PM, Ted Miller wrote:
>>>> My Data Center is down, and won't come back up.
>>>>
>>>> Data Center Status on the GUI flips between "Non Responsive" and
>>>> "Contending"
>>>>
>>>> Also noted:
>>>> Host sometimes seen flipping between "Low" and "Contending" in SPM
>>>> column.
>>>> Storage VM2 "Data (Master)" is in "Cross Data-Center Status" = Unknown
>>>> VM2 is "up" under "Volumes" tab
>>>>
>>>> Created another volume for VM storage. It shows up in "volumes" tab,
>>>> but when I try to add "New Domain" in storage tab, says that "There are
>>>> No Data Centers to which the Storage Domain can be attached"
>>>>
>>>> Setup:
>>>> 2 hosts w/ glusterfs storage
>>>> 1 engine
>>>> all 3 computers Centos 6.5, just updated
>>>> ovirt-engine 3.3.0.1-1.el6
>>>> ovirt-engine-lib 3.3.2-1.el6
>>>> ovirt-host-deploy.noarch 1.1.3-1.el6
>>>> glusterfs.x86_64 3.4.2-1.el6
>>>>
>>>> This loop seems to repeat in the ovirt-engine log (grep of log showing
>>>> only DefaultQuartzScheduler_Worker-79 thread:
>>>>
>>>> 2014-01-26 14:44:58,416 INFO
>>>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
>>>> (DefaultQuartzScheduler_Worker-79) Irs placed on server
>>>> 9a591103-83be-4ca9-b207-06929223b541 failed. Proceed Failover
>>>> 2014-01-26 14:44:58,511 INFO
>>>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
>>>> (DefaultQuartzScheduler_Worker-79) hostFromVds::selectedVds - office4a,
>>>> spmStatus Free, storage pool mill
>>>> 2014-01-26 14:44:58,550 INFO
>>>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
>>>> (DefaultQuartzScheduler_Worker-79) SpmStatus on vds
>>>> 127ed939-34af-41a8-87a0-e2f6174b1877: Free
>>>> 2014-01-26 14:44:58,571 INFO
>>>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
>>>> (DefaultQuartzScheduler_Worker-79) starting spm on vds office4a, storage
>>>> pool mill, prevId 2, LVER 15
>>>> 2014-01-26 14:44:58,579 INFO
>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand]
>>>> (DefaultQuartzScheduler_Worker-79) START, SpmStartVDSCommand(HostName =
>>>> office4a, HostId = 127ed939-34af-41a8-87a0-e2f6174b1877, storagePoolId =
>>>> 536a864d-83aa-473a-a675-e38aafdd9071, prevId=2, prevLVER=15,
>>>> storagePoolFormatType=V3, recoveryMode=Manual, SCSIFencing=false), log
>>>> id: 74c38eb7
>>>> 2014-01-26 14:44:58,617 INFO
>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand]
>>>> (DefaultQuartzScheduler_Worker-79) spmStart polling started: taskId =
>>>> e8986753-fc80-4b11-a11d-6d3470b1728c
>>>> 2014-01-26 14:45:00,662 ERROR
>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand]
>>>> (DefaultQuartzScheduler_Worker-79) Failed in HSMGetTaskStatusVDS method
>>>> 2014-01-26 14:45:00,664 ERROR
>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetTaskStatusVDSCommand]
>>>> (DefaultQuartzScheduler_Worker-79) Error code AcquireHostIdFailure and
>>>> error message VDSGenericException: VDSErrorException: Failed to
>>>> HSMGetTaskStatusVDS, error = Cannot acquire host id
>>>> 2014-01-26 14:45:00,665 INFO
>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand]
>>>> (DefaultQuartzScheduler_Worker-79) spmStart polling ended: taskId =
>>>> e8986753-fc80-4b11-a11d-6d3470b1728c task status = finished
>>>> 2014-01-26 14:45:00,666 ERROR
>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand]
>>>> (DefaultQuartzScheduler_Worker-79) Start SPM Task failed - result:
>>>> cleanSuccess, message: VDSGenericException: VDSErrorException: Failed to
>>>> HSMGetTaskStatusVDS, error = Cannot acquire host id
>>>> 2014-01-26 14:45:00,695 INFO
>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand]
>>>> (DefaultQuartzScheduler_Worker-79) spmStart polling ended, spm
>>>> status: Free
>>>> 2014-01-26 14:45:00,702 INFO
>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand]
>>>> (DefaultQuartzScheduler_Worker-79) START,
>>>> HSMClearTaskVDSCommand(HostName = office4a, HostId =
>>>> 127ed939-34af-41a8-87a0-e2f6174b1877,
>>>> taskId=e8986753-fc80-4b11-a11d-6d3470b1728c), log id: 336ec5a6
>>>> 2014-01-26 14:45:00,722 INFO
>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand]
>>>> (DefaultQuartzScheduler_Worker-79) FINISH, HSMClearTaskVDSCommand, log
>>>> id: 336ec5a6
>>>> 2014-01-26 14:45:00,724 INFO
>>>> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand]
>>>> (DefaultQuartzScheduler_Worker-79) FINISH, SpmStartVDSCommand, return:
>>>> org.ovirt.engine.core.common.businessentities.SpmStatusResult@13652652,
>>>> log id: 74c38eb7
>>>> 2014-01-26 14:45:00,733 INFO
>>>> [org.ovirt.engine.core.bll.storage.SetStoragePoolStatusCommand]
>>>> (DefaultQuartzScheduler_Worker-79) Running command:
>>>> SetStoragePoolStatusCommand internal: true. Entities affected : ID:
>>>> 536a864d-83aa-473a-a675-e38aafdd9071 Type: StoragePool
>>>> 2014-01-26 14:45:00,778 ERROR
>>>> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
>>>> (DefaultQuartzScheduler_Worker-79)
>>>> IrsBroker::Failed::GetStoragePoolInfoVDS due to:
>>>> IrsSpmStartFailedException: IRSGenericException: IRSErrorException:
>>>> SpmStart failed
>>>>
>>>> Ted Miller
>>>> Elkhart, IN, USA
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users(a)ovirt.org
>>>> http://lists.ovirt.org/mailman/listinfo/users
>>>>
>>>
>>> is this gluster storage (guessing sunce you mentioned a 'volume')
>> yes (mentioned under "setup" above)
>>> does it have a quorum?
>> Volume Name: VM2
>> Type: Replicate
>> Volume ID: 7bea8d3b-ec2a-4939-8da8-a82e6bda841e
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: 10.41.65.2:/bricks/01/VM2
>> Brick2: 10.41.65.4:/bricks/01/VM2
>> Brick3: 10.41.65.4:/bricks/101/VM2
>> Options Reconfigured:
>> cluster.server-quorum-type: server
>> storage.owner-gid: 36
>> storage.owner-uid: 36
>> auth.allow: *
>> user.cifs: off
>> nfs.disa
>>> (there were reports of split brain on the domain metadata before when
>>> no quorum exist for gluster)
>> after full heal:
>>
>> [root@office4a ~]$ gluster volume heal VM2 info
>> Gathering Heal info on volume VM2 has been successful
>>
>> Brick 10.41.65.2:/bricks/01/VM2
>> Number of entries: 0
>>
>> Brick 10.41.65.4:/bricks/01/VM2
>> Number of entries: 0
>>
>> Brick 10.41.65.4:/bricks/101/VM2
>> Number of entries: 0
>> [root@office4a ~]$ gluster volume heal VM2 info split-brain
>> Gathering Heal info on volume VM2 has been successful
>>
>> Brick 10.41.65.2:/bricks/01/VM2
>> Number of entries: 0
>>
>> Brick 10.41.65.4:/bricks/01/VM2
>> Number of entries: 0
>>
>> Brick 10.41.65.4:/bricks/101/VM2
>> Number of entries: 0
>>
>> noticed this in host /var/log/messages (while looking for something
>> else). Loop seems to repeat over and over.
>>
>> Jan 26 15:35:52 office4a sanlock[3763]: 2014-01-26 15:35:52-0500 14678
>> [30419]: read_sectors delta_leader offset 512 rv -90
>> /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
>>
>> Jan 26 15:35:53 office4a sanlock[3763]: 2014-01-26 15:35:53-0500 14679
>> [3771]: s1997 add_lockspace fail result -90
>> Jan 26 15:35:58 office4a vdsm TaskManager.Task ERROR
>> Task=`89885661-88eb-4ea3-8793-00438735e4ab`::Unexpected error#012Traceback
>> (most recent call last):#012 File "/usr/share/vdsm/storage/task.py", line
>> 857, in _run#012 return fn(*args, **kargs)#012 File
>> "/usr/share/vdsm/logUtils.py", line 45, in wrapper#012 res = f(*args,
>> **kwargs)#012 File "/usr/share/vdsm/storage/hsm.py", line 2111, in
>> getAllTasksStatuses#012 allTasksStatus = sp.getAllTasksStatuses()#012
>> File "/usr/share/vdsm/storage/securable.py", line 66, in wrapper#012 raise
>> SecureError()#012SecureError
>> Jan 26 15:35:59 office4a sanlock[3763]: 2014-01-26 15:35:59-0500 14686
>> [30495]: read_sectors delta_leader offset 512 rv -90
>> /rhev/data-center/mnt/glusterSD/10.41.65.2:VM2/0322a407-2b16-40dc-ac67-13d387c6eb4c/dom_md/ids
>>
>> Jan 26 15:36:00 office4a sanlock[3763]: 2014-01-26 15:36:00-0500 14687
>> [3772]: s1998 add_lockspace fail result -90
>> Jan 26 15:36:00 office4a vdsm TaskManager.Task ERROR
>> Task=`8db9ff1a-2894-407a-915a-279f6a7eb205`::Unexpected error#012Traceback
>> (most recent call last):#012 File "/usr/share/vdsm/storage/task.py", line
>> 857, in _run#012 return fn(*args, **kargs)#012 File
>> "/usr/share/vdsm/storage/task.py", line 318, in run#012 return
>> self.cmd(*self.argslist, **self.argsdict)#012 File
>> "/usr/share/vdsm/storage/sp.py", line 273, in startSpm#012
>> self.masterDomain.acquireHostId(self.id)#012 File
>> "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId#012
>> self._clusterLock.acquireHostId(hostId, async)#012 File
>> "/usr/share/vdsm/storage/clusterlock.py", line 189, in
>> acquireHostId#012 raise se.AcquireHostIdFailure(self._sdUUID,
>> e)#012AcquireHostIdFailure: Cannot acquire host id:
>> ('0322a407-2b16-40dc-ac67-13d387c6eb4c', SanlockException(90, 'Sanlock
>> lockspace add failure', 'Message too long'))
>>
>> Ted Miller
>> Elkhart, IN, USA
>>
>
> this is the new storage domain? what about the previous volume for the
> first SD?
The default/default data center/cluster had to be abandoned because of a
split-brain that could not be healed. Can't remove old storage from
database, can't get data center up due to corrupt storage, ends up a circular
argument.
I started over with same hosts, totally new storage in new data center. This
mill/one data center/cluster was working fine with VM2 storage, then died.
Ted Miller
10 years, 10 months
[Users] oVirt 3.4 - testing days report [iproute2 configurator]
by Douglas Schilling Landgraf
Hi,
During the tests I have faced the below bug:
vdsm-4.14.1-2 unable to restart on reboot after a network is defined on
ovirt-node
https://bugzilla.redhat.com/show_bug.cgi?id=1057657
Additionally (not related to iproute2 tests), I have faced:
[RFE] report BOOTPROTO and BONDING_OPTS independent of netdevice.cfg
https://bugzilla.redhat.com/show_bug.cgi?id=987813
(I have workaround creating manually ifcfg-em1 and ifcfg-ovirtmgmt)
firefox seg faults when using the Admin Portal on RHEL 6.5
https://bugzilla.redhat.com/show_bug.cgi?id=1044010
(Updated to firefox-24.2.0-6.el6_5.x86_64 resolved the problem.)
Test data for iproute2:
========================
- Setup Node -> put it in maintenance
- Changed the vdsm.conf on node to:
[vars]
ssl = true
net_configurator = iproute2
net_persistence = unified
[addresses]
management_port = 54321
- Restart vdsm/supervdsm
- Host is UP again, no problems
- DataCenter -> Logical Network -> New
- Name: net25 -> [x] Enable Vlan tagging [ ] VM Network
- Since I have just one nic at host I have added dummy interface.
#ip link add name dummy_interface type dummy
- Put the host in maintenance and put again UP to recognize the new
interface
- Host -> Network -> Setup Host Networks
-> drag/drop net25 to dummy_interace
-> [x] save network interface
On host vdsClient -s 0 getVdsCaps appears [net25]
* Rebooted to check if the new net25 will be persistent.
--
Cheers
Douglas
10 years, 10 months
[Users] oVirt 3.4 test day - Template Versions
by Federico Simoncelli
Feature tested:
http://www.ovirt.org/Features/Template_Versions
- create a new vm vm1 and make a template template1 from it
- create a new vm vm2 based on template1 and make some changes
- upgrade to 3.4
- create a new template template1.1 from vm2
- create a new vm vm3 from template1 (clone) - content ok
- create a new vm vm4 from template1.1 (thin) - content ok
- create a new vm vm5 from template1 last (thin) - content ok (same as 1.1)
- try to remove template1 (failed as template1.1 is still present)
- try to remove template1.1 (failed as vm5 is still present)
- create a new vm vm6 and make a template blank1.1 as new version of the
blank template (succeeded)
- create a vm pool vmpool1 with the "latest" template from template1
- create a vm pool vmpool2 with the "template1.1" (last) template from template1
- start vmpool1 and vmpool2 and verify that the content is the same
- create a new template template1.2
- start vmpool1 and verify that the content is the same as latest (template1.2)
- start vmpool2 and verify that the content is the same as template1.1
Suggestions:
- the template blank is special, I am not sure if allowing versioning may
be confusing (for example is not even editable)
- as far as I can see the "Sub Version Name" is not editable anymore (after
picking it)
--
Federico
10 years, 10 months
[Users] Hosted-engine runtime issues (3.4 BETA)
by Frank Wall
Hi,
finally I've got the new hosted-engine feature running on
RHEL6 using oVirt 3.4 BETA/nightly. I've come across a few
issues and wanted to clarify if this is the desired
behaviour:
1.) hosted-engine storage domain not visible in GUI
The NFS-Storage I've used to install the hosted-engine
is not visible in oVirt's Admin Portal. Though it is mounted
on my oVirt Node below /rhev/data-center/mnt/. I tried to
import this storage domain, but apparently this fails because
it's already mounted.
Is there any way to make this storage domain visible?
2.) hosted-engine VM device are not visible in GUI
The disk and network devices are not visible in the
admin portal. Thus I'm unable to change anything.
Is this intended? If so, how am I supposed to make changes?
3.) move hosted-engine VM to a different storage
Because of all of the above I seem to be unable to move
my hosted-engine VM to a different NFS-Storage. How can
this be done?
Thanks
- Frank
10 years, 10 months
[Users] Cluster compatibility
by Piotr Kliczewski
I wanted to install two hosts one on f19 and the second on el6. I
created additional cluster for el6.
Host installation for el6 worked well and it joined the cluster
without any issues. Whereas host in f19 was successfully deployed
but it failed to join the cluster due to:
Host fedora is compatible with versions (3.0,3.1,3.2,3.3) and cannot
join Cluster Default which is set to version 3.4.
Here are the versions that I use:
engine:
Name : ovirt-engine
Arch : noarch
Version : 3.4.0
Release : 0.5.beta1.fc19
Size : 1.5 M
Repo : installed
>From repo : ovirt-3.4.0-prerelease
fedora host:
Name : vdsm
Arch : x86_64
Version : 4.14.1
Release : 2.fc19
Size : 2.9 M
Repo : installed
>From repo : ovirt-3.4.0-prerelease
el6 host:
Name : vdsm
Arch : x86_64
Version : 4.14.1
Release : 2.el6
Size : 2.9 M
Repo : installed
>From repo : ovirt-3.4.0-prerelease
Both clusters are set to be compatible with 3.4.
Is there anything that I am missing?
Piotr
10 years, 10 months
Re: [Users] networking: basic vlan help
by Juan Pablo Lorier
This is a multi-part message in MIME format.
--------------010004070404070407050707
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Hi Itamar,
I don't know if I get your post right, but to me, it seems that if so
many users hit the same rock, it should mean that this should be
documented somewhere visible and in my opinion, push on getting bug
1049476 <https://bugzilla.redhat.com/show_bug.cgi?id=1049476> solved asap.
Regards,
--------------010004070404070407050707
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi Itamar,<br>
<br>
I don't know if I get your post right, but to me, it seems that if
so many users hit the same rock, it should mean that this should be
documented somewhere visible and in my opinion, push on getting bug
<a name="b1049476"
href="https://bugzilla.redhat.com/show_bug.cgi?id=1049476">1049476</a>
solved asap.<br>
Regards,<br>
</body>
</html>
--------------010004070404070407050707--
10 years, 10 months
[Users] Reboot causes poweroff of VM 3.4 Beta
by Jon Archer
Hi,
Seem to be suffering an issue in 3.4 where if a vm is rebooted it
actually shuts down, this occurs for all guests regardless of OS
installed within.
Anyone seen this?
Jon
10 years, 10 months
[Users] Issues starting hosted engine VM
by Andrew Lau
Hi,
With the great help from sbonazzo, I managed to step past the initial bug
with the hosted-engine-setup but appear to have run into another show
stopper.
I ran through the install process successfully up to the stage where it
completed and the engine VM was to be shutdown. (The engine has already
been installed on the VM and the host has been connected to the engine).
The issue starts here that the host finds itself not able to start the VM
up again.
VDSM Logs: http://www.fpaste.org/69592/00427141/
ovirt-hosted-engine-ha agent.log http://www.fpaste.org/69595/43609139/
It seems to keep failing to start the VM.. when I restart the agent I can
see the score drop to 0 after 3 boot attempts. The interesting thing seems
to be in the VDSM Logs "'Virtual machine does not exist', 'code': 1}}"
I'm not sure where else to look. Suggestions?
Cheers,
Andrew
10 years, 10 months