HE setup failure
by Sahina Bose
Hi all,
The HE setup fails in ovirt-system-tests while deploying HE on
hyperconverged gluster setup using master
Error :
Failed to execute stage 'Misc configuration': <ProtocolError for
localhost:54321/RPC2: 400 Bad Request>"
Traceback from hosted-engine log:
ProtocolError: <ProtocolError for localhost:54321/RPC2: 400 Bad Request>
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
line 279, in create_volume
volUUID=volume_uuid
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",
line 245, in _get_volume_path
volUUID
File "/usr/lib64/python2.7/xmlrpclib.py", line **FILTERED**3, in __call__
return self.__send(self.__name, args)
File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request
verbose=self.__verbose
File "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
return self.single_request(host, handler, request_body, verbose)
File "/usr/lib64/python2.7/xmlrpclib.py", line 1321, in single_request
response.msg,
ProtocolError: <ProtocolError for localhost:54321/RPC2: 400 Bad Request>
Is this a regression?
7 years, 8 months
OST: HE vm does not restart on HC setup
by Sahina Bose
Hi all,
On the HC setup, the HE VM is not restarted.
The agent.log has
MainThread::INFO::2017-02-21
22:09:58,022::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Global metadata: {}
MainThread::INFO::2017-02-21
22:09:58,023::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Local (id 1): {'engine-health': {'reason': 'failed to getVmStats',
'health': 'unknown', 'vm': 'unknown', 'detail': 'unknown'}, 'bridge':
True, 'mem-free': 4079.0, 'maintenance': False, 'cpu-load': 0.0491,
'gateway': True}
...
MainThread::INFO::2017-02-21
22:10:29,219::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Unknown local engine vm status no actions taken
MainThread::INFO::2017-02-21
22:10:29,219::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Trying: notify time=1487733029.22 type=state_transition
detail=ReinitializeFSM-UnknownLocalVmState
hostname='lago-hc-basic-suite-master-host0'
MainThread::INFO::2017-02-21
22:10:29,317::brokerlink::121::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition
(ReinitializeFSM-UnknownLocalVmState) sent? ignored
and the vdsm.log
2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm]
(vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Changed state to Down:
User shut down from within the guest (code=7) (vm:1269)
2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm]
(vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Stopping connection
(guestagent:429)
2017-02-21 22:09:29,727-0500 ERROR (jsonrpc/4) [api] FINISH getStats
error=Virtual machine does not exist: {'vmId':
u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in method
ret = func(*args, **kwargs)
File "/usr/share/vdsm/API.py", line 335, in getStats
vm = self.vm
File "/usr/share/vdsm/API.py", line 130, in vm
raise exception.NoSuchVM(vmId=self._UUID)
NoSuchVM: Virtual machine does not exist: {'vmId':
u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'}
What should I be looking for to identify the issue?
The logs are at
http://jenkins.ovirt.org/job/ovirt_master_hc-system-tests/lastCompletedBu...
thanks
sahina
7 years, 9 months
Re: [lago-devel] [ovirt-devel] OST: vm_run fails for me (basic-suite-master)
by Nadav Goldin
What is the host CPU you are using?
I came across the same error few days ago, but without running OST, I
tried running with Lago:
fc24 host -> el7 vm -> el7 vm.
I have a slight suspect that it is related to the CPU model we
configure in libvirt, I tried a mixture of few
combinations(host-pass-through, pinning down the CPU model), but it
always failed on the same error:
kvm_put_msrs: Assertion `ret == n' failed.
My CPU is Broadwell btw.
Milan, any ideas? you think it might be related?
Nadav.
On Tue, Feb 7, 2017 at 11:14 PM, Ondrej Svoboda <osvoboda(a)redhat.com> wrote:
> Yes, I stated that in my message.
>
> root@osvoboda-t460p /home/src/ovirt-system-tests (git)-[master] # cat
> /sys/module/kvm_intel/parameters/nested
> :(
> Y
>
> On Tue, Feb 7, 2017 at 1:39 PM, Eyal Edri <eedri(a)redhat.com> wrote:
>>
>> Did you follow the instructions on [1] ?
>>
>> Specifically, verifying ' cat /sys/module/kvm_intel/parameters/nested '
>> gives you 'Y'.
>>
>> [1]
>> http://ovirt-system-tests.readthedocs.io/en/latest/docs/general/installat...
>>
>> On Tue, Feb 7, 2017 at 2:29 PM, Ondrej Svoboda <osvoboda(a)redhat.com>
>> wrote:
>>>
>>> Hi everyone,
>>>
>>> Even though I have nested virtualization enabled in my Arch Linux system
>>> which I use to run OST, vm_run is the first test to fail in 004_basic_sanity
>>> (followed by snapshots_merge and suspend_resume_vm).
>>>
>>> Can you point me to what I might be missing? I believe I get the same
>>> failure even on Fedora.
>>>
>>> This is what host0's CPU capabilities look like (vmx is there):
>>> [root@lago-basic-suite-master-host0 ~]# cat /proc/cpuinfo
>>> processor : 0
>>> vendor_id : GenuineIntel
>>> cpu family : 6
>>> model : 44
>>> model name : Westmere E56xx/L56xx/X56xx (Nehalem-C)
>>> stepping : 1
>>> microcode : 0x1
>>> cpu MHz : 2711.988
>>> cache size : 16384 KB
>>> physical id : 0
>>> siblings : 1
>>> core id : 0
>>> cpu cores : 1
>>> apicid : 0
>>> initial apicid : 0
>>> fpu : yes
>>> fpu_exception : yes
>>> cpuid level : 11
>>> wp : yes
>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>>> cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc rep_good
>>> nopl xtopology pni pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes
>>> hypervisor lahf_lm arat tpr_shadow vnmi flexpriority ept vpid
>>> bogomips : 5423.97
>>> clflush size : 64
>>> cache_alignment : 64
>>> address sizes : 40 bits physical, 48 bits virtual
>>> power management:
>>>
>>> journalctl -b on host0 shows that libvirt complains about NUMA
>>> configuration:
>>>
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: libvirt
>>> version: 2.0.0, package: 10.el7_3.4 (CentOS BuildSystem
>>> <http://bugs.centos.org>, 2017-01-17-23:37:48, c1bm.rdu2.centos.org)
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port
>>> 2(vnet0) entered disabled state
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: device vnet0 left
>>> promiscuous mode
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 kernel: ovirtmgmt: port
>>> 2(vnet0) entered disabled state
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: hostname:
>>> lago-basic-suite-master-host0.lago.local
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: Unable to
>>> read from monitor: Connection reset by peer
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 libvirtd[12888]: internal
>>> error: qemu unexpectedly closed the monitor: 2017-02-07T11:33:23.058571Z
>>> qemu-kvm: warning: CPU(s) not present in any NUMA nodes: 1 2 3 4 5 6 7 8 9
>>> 10 11 12 13 14 15
>>>
>>> 2017-02-07T11:33:23.058826Z qemu-kvm: warning: All CPU(s) up to maxcpus
>>> should be described in NUMA config
>>> qemu-kvm:
>>> /builddir/build/BUILD/qemu-2.6.0/target-i386/kvm.c:1736: kvm_put_msrs:
>>> Assertion `ret == n' failed.
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 NetworkManager[657]: <info>
>>> [1486467203.1025] device (vnet0): state change: disconnected -> unmanaged
>>> (reason 'unmanaged') [30 10 3]
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 kvm[22059]: 0 guests now
>>> active
>>> Feb 07 06:33:23 lago-basic-suite-master-host0 systemd-machined[22044]:
>>> Machine qemu-1-vm0 terminated.
>>>
>>> Thanks,
>>> Ondra
>>>
>>> _______________________________________________
>>> Devel mailing list
>>> Devel(a)ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/devel
>>
>>
>>
>>
>> --
>> Eyal Edri
>> Associate Manager
>> RHV DevOps
>> EMEA ENG Virtualization R&D
>> Red Hat Israel
>>
>> phone: +972-9-7692018
>> irc: eedri (on #tlv #rhev-dev #rhev-integ)
>
>
>
> _______________________________________________
> Devel mailing list
> Devel(a)ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
7 years, 9 months
New Lago release - v0.34
by Nadav Goldin
Hi all,
New Lago version is out - v0.34, notable changes in this release:
1. New command - 'lago ansible_hosts'
This command will generate an ansible inventory file, it requires the
prefix to be initialized before calling it. VMs will be grouped by
their respective 'vm-type' in the inventory file, for an example on
how to use it, see the PR[1].
2. A work-around for python2-paramiko-1.16.1-1.el7 failure[2], which
was caused due to a new python2-crypto release. Hopefully, a new
paramiko version will be pushed to epel soon, so we could remove this
by the next release.
3. Require python-paramiko >= v2.1.1 on Fedora 24/25, it is available
under fedora/updates.
4. The video device will no longer be created.
The full commit log is available at [3],
Thanks everyone for their contributions!
Upgrading
---------------
To upgrade using yum or dnf, simply run:
yum/dnf update lago
Note to ovirt-system-tests users
-----------------------------------------------
Please ensure after upgrading that the 'ovirtlago' service is enabled
by executing:
sudo firewall-cmd --add-service=ovirtlago --permanent
sudo firewall-cmd --reload
As always, if you find any problems, please open an issue in the GitHub page[4].
Enjoy,
Nadav.
Docs: http://lago.readthedocs.io/en/0.34/
RPM Repository: http://resources.ovirt.org/repos/lago/stable/0.0/rpm/
GitHub: https://github.com/lago-project/lago/
For OST docs: http://ovirt-system-tests.readthedocs.io/en/latest/
[1] https://github.com/lago-project/lago/pull/428#issuecomment-274070364
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1419312
[3] https://github.com/lago-project/lago/compare/0.33...0.34
[4] https://github.com/lago-project/lago/issues/
7 years, 9 months
vdsm service fails to start on HC setup
by Sahina Bose
Hi all,
While verifying the test to deploy hyperconverged HE [1], I'm running into
an issue today where vdsm fails to start.
In the logs -
lago-basic-suite-hc-host0 vdsmd_init_common.sh: Error:
Feb 6 02:21:32 lago-basic-suite-hc-host0 vdsmd_init_common.sh: One of the
modules is not configured to work with VDSM.
Starting manually - vdsm-tool configure --force gives:
Units need configuration: {'lvm2-lvmetad.service': {'LoadState': 'masked',
'ActiveState': 'failed'}}
Is this a known issue?
[1] - https://gerrit.ovirt.org/57283
thanks
sahina
7 years, 9 months