Re: cannot migrate hosted engine to sandybridge hypervisor
by Douglas Duckworth
Thanks!
No big deal as it's now working.
On Fri, Aug 24, 2018, 9:04 AM Simone Tiraboschi <stirabos(a)redhat.com> wrote:
>
>
> On Fri, Aug 24, 2018 at 3:06 AM Douglas Duckworth <dod2014(a)med.cornell.edu>
> wrote:
>
>> Rebooted the hosted engine VM
>>
>> Migration now works
>>
>> Guess that's needed after changing cluster CPU type
>>
>
> Yes, exactly: we already have an open bug about that:
> https://bugzilla.redhat.com/show_bug.cgi?id=1585986
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.redhat.com_...>
>
>
>>
>> Thanks,
>>
>> Douglas Duckworth, MSc, LFCS
>> HPC System Administrator
>> Scientific Computing Unit
>> Weill Cornell Medicine
>> 1300 York - LC-502
>> E: doug(a)med.cornell.edu
>> O: 212-746-6305
>> F: 212-746-8690
>>
>>
>> On Thu, Aug 23, 2018 at 8:46 PM, Douglas Duckworth <
>> dod2014(a)med.cornell.edu> wrote:
>>
>>> Hi
>>>
>>> I am unable to migrate to hypervisor that's sandy bridge CPU.
>>>
>>> The cluster supports CPUs that old.
>>>
>>> This is the message on host when migration fails:
>>>
>>> Aug 23 20:40:14 lyon vdsm[9648]: WARN Worker blocked: <Worker
>>> name=periodic/1 running <Task <Operation
>>> action=<vdsm.virt.sampling.HostMonitor object at 0x7f2f8054b310> at
>>> 0x7f2f8054b350> timeout=15, duration=15 at 0x7f2f8048d290> task#=79 at
>>> 0x7f2f80540490>, traceback:#012File: "/usr/lib64/python2.7/threading.py",
>>> line 785, in __bootstrap#012 self.__bootstrap_inner()#012File:
>>> "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner#012
>>> self.run()#012File: "/usr/lib64/python2.7/threading.py", line 765, in
>>> run#012 self.__target(*self.__args, **self.__kwargs)#012File:
>>> "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 194, in
>>> run#012 ret = func(*args, **kwargs)#012File:
>>> "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run#012
>>> self._execute_task()#012File:
>>> "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in
>>> _execute_task#012 task()#012File:
>>> "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in
>>> __call__#012 self._callable()#012File:
>>> "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 220, in
>>> __call__#012 self._func()#012File:
>>> "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 580, in
>>> __call__#012 stats = hostapi.get_stats(self._cif,
>>> self._samples.stats())#012File:
>>> "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 77, in
>>> get_stats#012 ret['haStats'] = _getHaInfo()#012File:
>>> "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in
>>> _getHaInfo#012 stats = instance.get_all_stats()#012File:
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
>>> line 94, in get_all_stats#012 stats =
>>> broker.get_stats_from_storage()#012File:
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>> line 135, in get_stats_from_storage#012 result =
>>> self._proxy.get_stats()#012File: "/usr/lib64/python2.7/xmlrpclib.py", line
>>> 1233, in __call__#012 return self.__send(self.__name, args)#012File:
>>> "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request#012
>>> verbose=self.__verbose#012File: "/usr/lib64/python2.7/xmlrpclib.py", line
>>> 1273, in request#012 return self.single_request(host, handler,
>>> request_body, verbose)#012File: "/usr/lib64/python2.7/xmlrpclib.py", line
>>> 1303, in single_request#012 response =
>>> h.getresponse(buffering=True)#012File: "/usr/lib64/python2.7/httplib.py",
>>> line 1113, in getresponse#012 response.begin()#012File:
>>> "/usr/lib64/python2.7/httplib.py", line 444, in begin#012 version, status,
>>> reason = self._read_status()#012File: "/usr/lib64/python2.7/httplib.py",
>>> line 400, in _read_status#012 line = self.fp.readline(_MAXLINE +
>>> 1)#012File: "/usr/lib64/python2.7/socket.py", line 476, in readline#012
>>> data = self._sock.recv(self._rbufsize)
>>> Aug 23 20:43:45 lyon vdsm[9648]: WARN ping was deprecated in favor of
>>> ping2 and confirmConnectivity
>>> Aug 23 20:43:45 lyon vdsm[9648]: WARN Attempting to add an existing net
>>> user: ovirtmgmt/adf14389-1563-4b1a-9af6-4b40370a825b
>>> Aug 23 20:43:46 lyon lldpad: recvfrom(Event interface): No buffer space
>>> available
>>> Aug 23 20:43:46 lyon kernel: ovirtmgmt: port 2(vnet0) entered blocking
>>> state
>>> Aug 23 20:43:46 lyon kernel: ovirtmgmt: port 2(vnet0) entered disabled
>>> state
>>> Aug 23 20:43:46 lyon kernel: device vnet0 entered promiscuous mode
>>> Aug 23 20:43:46 lyon kernel: ovirtmgmt: port 2(vnet0) entered blocking
>>> state
>>> Aug 23 20:43:46 lyon kernel: ovirtmgmt: port 2(vnet0) entered forwarding
>>> state
>>> Aug 23 20:43:46 lyon NetworkManager[1658]: <info> [1535071426.4326]
>>> manager: (vnet0): new Tun device
>>> (/org/freedesktop/NetworkManager/Devices/57)
>>> Aug 23 20:43:46 lyon NetworkManager[1658]: <info> [1535071426.4336]
>>> device (vnet0): state change: unmanaged -> unavailable (reason
>>> 'connection-assumed', sys-iface-state: 'external')
>>> Aug 23 20:43:46 lyon NetworkManager[1658]: <info> [1535071426.4341]
>>> device (vnet0): state change: unavailable -> disconnected (reason 'none',
>>> sys-iface-state: 'external')
>>> Aug 23 20:43:46 lyon dbus[1396]: [system] Activating via systemd:
>>> service name='org.freedesktop.machine1'
>>> unit='dbus-org.freedesktop.machine1.service'
>>> Aug 23 20:43:46 lyon systemd: Cannot add dependency job for unit
>>> lvm2-lvmetad.socket, ignoring: Unit is masked.
>>> Aug 23 20:43:46 lyon systemd: Starting Virtual Machine and Container
>>> Registration Service...
>>> Aug 23 20:43:46 lyon dbus[1396]: [system] Successfully activated service
>>> 'org.freedesktop.machine1'
>>> Aug 23 20:43:46 lyon systemd: Started Virtual Machine and Container
>>> Registration Service.
>>> Aug 23 20:43:46 lyon systemd-machined: New machine qemu-2-HostedEngine.
>>> Aug 23 20:43:46 lyon systemd: Started Virtual Machine
>>> qemu-2-HostedEngine.
>>> Aug 23 20:43:46 lyon systemd: Starting Virtual Machine
>>> qemu-2-HostedEngine.
>>> Aug 23 20:43:46 lyon kvm: 1 guest now active
>>> Aug 23 20:43:46 lyon libvirtd: 2018-08-24 00:43:46.940+0000: 9195: error
>>> : * virCPUx86UpdateLive:2742 : operation failed: guest CPU doesn't
>>> match specification: missing features:
>>> fma,movbe,f16c,rdrand,fsgsbase,bmi1,hle,avx2,smep,bmi2,erms,invpcid,rtm,mpx,rdseed,adx,smap,xsavec,xgetbv1,abm,3dnowprefetch*
>>> Aug 23 20:43:47 lyon kernel: ovirtmgmt: port 2(vnet0) entered disabled
>>> state
>>> Aug 23 20:43:47 lyon kernel: device vnet0 left promiscuous mode
>>> Aug 23 20:43:47 lyon kernel: ovirtmgmt: port 2(vnet0) entered disabled
>>> state
>>> Aug 23 20:43:47 lyon avahi-daemon[1300]: Withdrawing workstation service
>>> for vnet0.
>>> Aug 23 20:43:47 lyon NetworkManager[1658]: <info> [1535071427.2753]
>>> device (vnet0): state change: disconnected -> unmanaged (reason
>>> 'unmanaged', sys-iface-state: 'removed')
>>> Aug 23 20:43:47 lyon NetworkManager[1658]: <info> [1535071427.2753]
>>> device (vnet0): released from master device ovirtmgmt
>>> Aug 23 20:43:47 lyon lldpad: recvfrom(Event interface): No buffer space
>>> available
>>> Aug 23 20:43:47 lyon kvm: 0 guests now active
>>> Aug 23 20:43:47 lyon systemd-machined: Machine qemu-2-HostedEngine
>>> terminated.
>>> Aug 23 20:43:47 lyon vdsm[9648]: WARN File:
>>> /var/lib/libvirt/qemu/channels/adf14389-1563-4b1a-9af6-4b40370a825b.ovirt-guest-agent.0
>>> already removed
>>> Aug 23 20:43:47 lyon vdsm[9648]: WARN Attempting to remove a non
>>> existing network: ovirtmgmt/adf14389-1563-4b1a-9af6-4b40370a825b
>>> Aug 23 20:43:47 lyon vdsm[9648]: WARN Attempting to remove a non
>>> existing net user: ovirtmgmt/adf14389-1563-4b1a-9af6-4b40370a825b
>>> Aug 23 20:43:47 lyon vdsm[9648]: WARN File:
>>> /var/lib/libvirt/qemu/channels/adf14389-1563-4b1a-9af6-4b40370a825b.org.qemu.guest_agent.0
>>> already removed
>>> Aug 23 20:43:51 lyon lldpad: setsockopt nearest_bridge: Invalid argument
>>> Aug 23 20:43:52 lyon lldpad: setsockopt nearest_bridge: Invalid argument
>>>
>>> Can I fix this somehow?
>>>
>>> Thanks,
>>>
>>> Douglas Duckworth, MSc, LFCS
>>> HPC System Administrator
>>> Scientific Computing Unit
>>> Weill Cornell Medicine
>>> 1300 York - LC-502
>>> E: doug(a)med.cornell.edu
>>> O: 212-746-6305
>>> F: 212-746-8690
>>>
>>>
>> _______________________________________________
>> Users mailing list -- users(a)ovirt.org
>> To unsubscribe send an email to users-leave(a)ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ovirt.org_site_p...>
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ovirt.org_commun...>
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/FBENIU2TRZS...
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ovirt.org_arch...>
>>
>
6 years, 3 months
cannot migrate hosted engine to sandybridge hypervisor
by Douglas Duckworth
Hi
I am unable to migrate to hypervisor that's sandy bridge CPU.
The cluster supports CPUs that old.
This is the message on host when migration fails:
Aug 23 20:40:14 lyon vdsm[9648]: WARN Worker blocked: <Worker
name=periodic/1 running <Task <Operation
action=<vdsm.virt.sampling.HostMonitor object at 0x7f2f8054b310> at
0x7f2f8054b350> timeout=15, duration=15 at 0x7f2f8048d290> task#=79 at
0x7f2f80540490>, traceback:#012File: "/usr/lib64/python2.7/threading.py",
line 785, in __bootstrap#012 self.__bootstrap_inner()#012File:
"/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner#012
self.run()#012File: "/usr/lib64/python2.7/threading.py", line 765, in
run#012 self.__target(*self.__args, **self.__kwargs)#012File:
"/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 194, in
run#012 ret = func(*args, **kwargs)#012File:
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run#012
self._execute_task()#012File:
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in
_execute_task#012 task()#012File:
"/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in
__call__#012 self._callable()#012File:
"/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 220, in
__call__#012 self._func()#012File:
"/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 580, in
__call__#012 stats = hostapi.get_stats(self._cif,
self._samples.stats())#012File:
"/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 77, in
get_stats#012 ret['haStats'] = _getHaInfo()#012File:
"/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in
_getHaInfo#012 stats = instance.get_all_stats()#012File:
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 94, in get_all_stats#012 stats =
broker.get_stats_from_storage()#012File:
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 135, in get_stats_from_storage#012 result =
self._proxy.get_stats()#012File: "/usr/lib64/python2.7/xmlrpclib.py", line
1233, in __call__#012 return self.__send(self.__name, args)#012File:
"/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request#012
verbose=self.__verbose#012File: "/usr/lib64/python2.7/xmlrpclib.py", line
1273, in request#012 return self.single_request(host, handler,
request_body, verbose)#012File: "/usr/lib64/python2.7/xmlrpclib.py", line
1303, in single_request#012 response =
h.getresponse(buffering=True)#012File: "/usr/lib64/python2.7/httplib.py",
line 1113, in getresponse#012 response.begin()#012File:
"/usr/lib64/python2.7/httplib.py", line 444, in begin#012 version, status,
reason = self._read_status()#012File: "/usr/lib64/python2.7/httplib.py",
line 400, in _read_status#012 line = self.fp.readline(_MAXLINE +
1)#012File: "/usr/lib64/python2.7/socket.py", line 476, in readline#012
data = self._sock.recv(self._rbufsize)
Aug 23 20:43:45 lyon vdsm[9648]: WARN ping was deprecated in favor of ping2
and confirmConnectivity
Aug 23 20:43:45 lyon vdsm[9648]: WARN Attempting to add an existing net
user: ovirtmgmt/adf14389-1563-4b1a-9af6-4b40370a825b
Aug 23 20:43:46 lyon lldpad: recvfrom(Event interface): No buffer space
available
Aug 23 20:43:46 lyon kernel: ovirtmgmt: port 2(vnet0) entered blocking state
Aug 23 20:43:46 lyon kernel: ovirtmgmt: port 2(vnet0) entered disabled state
Aug 23 20:43:46 lyon kernel: device vnet0 entered promiscuous mode
Aug 23 20:43:46 lyon kernel: ovirtmgmt: port 2(vnet0) entered blocking state
Aug 23 20:43:46 lyon kernel: ovirtmgmt: port 2(vnet0) entered forwarding
state
Aug 23 20:43:46 lyon NetworkManager[1658]: <info> [1535071426.4326]
manager: (vnet0): new Tun device
(/org/freedesktop/NetworkManager/Devices/57)
Aug 23 20:43:46 lyon NetworkManager[1658]: <info> [1535071426.4336] device
(vnet0): state change: unmanaged -> unavailable (reason
'connection-assumed', sys-iface-state: 'external')
Aug 23 20:43:46 lyon NetworkManager[1658]: <info> [1535071426.4341] device
(vnet0): state change: unavailable -> disconnected (reason 'none',
sys-iface-state: 'external')
Aug 23 20:43:46 lyon dbus[1396]: [system] Activating via systemd: service
name='org.freedesktop.machine1' unit='dbus-org.freedesktop.machine1.service'
Aug 23 20:43:46 lyon systemd: Cannot add dependency job for unit
lvm2-lvmetad.socket, ignoring: Unit is masked.
Aug 23 20:43:46 lyon systemd: Starting Virtual Machine and Container
Registration Service...
Aug 23 20:43:46 lyon dbus[1396]: [system] Successfully activated service
'org.freedesktop.machine1'
Aug 23 20:43:46 lyon systemd: Started Virtual Machine and Container
Registration Service.
Aug 23 20:43:46 lyon systemd-machined: New machine qemu-2-HostedEngine.
Aug 23 20:43:46 lyon systemd: Started Virtual Machine qemu-2-HostedEngine.
Aug 23 20:43:46 lyon systemd: Starting Virtual Machine qemu-2-HostedEngine.
Aug 23 20:43:46 lyon kvm: 1 guest now active
Aug 23 20:43:46 lyon libvirtd: 2018-08-24 00:43:46.940+0000: 9195:
error : *virCPUx86UpdateLive:2742
: operation failed: guest CPU doesn't match specification: missing
features:
fma,movbe,f16c,rdrand,fsgsbase,bmi1,hle,avx2,smep,bmi2,erms,invpcid,rtm,mpx,rdseed,adx,smap,xsavec,xgetbv1,abm,3dnowprefetch*
Aug 23 20:43:47 lyon kernel: ovirtmgmt: port 2(vnet0) entered disabled state
Aug 23 20:43:47 lyon kernel: device vnet0 left promiscuous mode
Aug 23 20:43:47 lyon kernel: ovirtmgmt: port 2(vnet0) entered disabled state
Aug 23 20:43:47 lyon avahi-daemon[1300]: Withdrawing workstation service
for vnet0.
Aug 23 20:43:47 lyon NetworkManager[1658]: <info> [1535071427.2753] device
(vnet0): state change: disconnected -> unmanaged (reason 'unmanaged',
sys-iface-state: 'removed')
Aug 23 20:43:47 lyon NetworkManager[1658]: <info> [1535071427.2753] device
(vnet0): released from master device ovirtmgmt
Aug 23 20:43:47 lyon lldpad: recvfrom(Event interface): No buffer space
available
Aug 23 20:43:47 lyon kvm: 0 guests now active
Aug 23 20:43:47 lyon systemd-machined: Machine qemu-2-HostedEngine
terminated.
Aug 23 20:43:47 lyon vdsm[9648]: WARN File:
/var/lib/libvirt/qemu/channels/adf14389-1563-4b1a-9af6-4b40370a825b.ovirt-guest-agent.0
already removed
Aug 23 20:43:47 lyon vdsm[9648]: WARN Attempting to remove a non existing
network: ovirtmgmt/adf14389-1563-4b1a-9af6-4b40370a825b
Aug 23 20:43:47 lyon vdsm[9648]: WARN Attempting to remove a non existing
net user: ovirtmgmt/adf14389-1563-4b1a-9af6-4b40370a825b
Aug 23 20:43:47 lyon vdsm[9648]: WARN File:
/var/lib/libvirt/qemu/channels/adf14389-1563-4b1a-9af6-4b40370a825b.org.qemu.guest_agent.0
already removed
Aug 23 20:43:51 lyon lldpad: setsockopt nearest_bridge: Invalid argument
Aug 23 20:43:52 lyon lldpad: setsockopt nearest_bridge: Invalid argument
Can I fix this somehow?
Thanks,
Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit
Weill Cornell Medicine
1300 York - LC-502
E: doug(a)med.cornell.edu
O: 212-746-6305
F: 212-746-8690
6 years, 3 months
Upgrading librbd1 and librados2 for oVirt 4.2.x
by andreas.elvers+ovirtforum@solutions.work
Hallo Group !
I am in need of upgrading librbd1 and librados2 for a oVirt 4.2.x cluster.
The cluster was installed via node ng.
Taking the repository for Ceph Mimic or Luminous will end in a dependency
problem because liburcu-cds.so.1 is already installed as a more recent version
provided by the oVirt repository.
A hint on how to get a more recent versions of the above libraries would be very
much appreciated.
Thanks,
- Andreas
6 years, 3 months
HA VMs not starting automatically on another host after host crash.
by Eduardo Mayoral
Hi,
Recently I have had 2 oVirt hosts (oVirt 4.2, CentOS 7.5) crash
unexpectedly (not at the same time). Both seem hardware related.
In both cases oVirt did detect the host as non responsive, did a
fence on the hosts and set the VMs which were running on the host at the
time as "Down". So far, so good.
The hosts were on 2 different clusters. Both had Migration policy /
resilience policy set to "migrate Virtual Machines". The VMs are not
configured as "Highly available", but my understanding is that with that
resiliency policy, this makes no difference. My (probably wrong)
expectation, is that oVirt should have attempted to start the VMs in
another host. It did not. A colleague started the VMs manually and they
started with no further issues.
So. How should I configure oVirt to try to start the VMs on another
host when the physical host they are running in crashes?
Thanks in advance!
--
Eduardo Mayoral.
6 years, 3 months
VM has been paused due to unknown storage error. - Only on NFS / EMC
by jeanbaptiste@nfrance.com
Hello,
I'm facing a strange issue on my OVirt Dev pool.
Indeed, when I create high disk load a VM (kickstart installation or iozone test for example) , VM is paused due to storage I/O error.
Problem is 100% reproducible, and is located only on NFS (v3 and v4) on my EMC VNXe3200 NAS's (I have a 10TB and a 20TB NAS)
I have done test (simple iozone -a) with VM 1 vCPU / 2GB RAM and 2 disk (1*20GB + 1*10GB). Both VMs disk are placed into the same SAN / NAS for each test. Results are:
- EMC VNXe3200 (10TB) NFSv3 => VM stopped after 10- 30s iozone lauch
- EMC VNXe3200 (20TB) NFSv3 => VM stopped after 10- 30s iozone lauch
- EMC VNXe3200 (10TB) ISCSI => No problem, iozone test finish, and performance are "standard" regarding load of the VNXe (60MB/s sequential Write for info)
- EMC VNXe3200 (20TB) ISCSI=> No problem, iozone test finish, and performance are "standard"regarding load of the VNXe (40-60MB/S sequential Write for info)
- NETAPP FAS2240 NFSv3 => No problem, iozone test finish, and performance are good (100MB/s sequential Write for info)
- Freebsd10 NAS NFSv3 => No problem, iozone test finish, and performance are good regarding NAS conf (80MB/s sequential Write for info)
I can't explain why I have an issue on NFS and I have not issue on ISCSI (on the same EMC VNxe3200...).
NFS default params are keeped when storage added to datacenter:(rw,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,soft,nosharecache,proto=tcp,port=0,timeo=600,retrans=6,sec=sys,clientaddr=XXXXXXX,local_lock=none,addr=XXXXXX)
(rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,soft,nolock,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=XXXXXX,mountvers=3,mountport=1234,mountproto=udp,local_lock=all,addr=XXXXXX)
Debug logs on host does not help me a lot:
2018-08-22 15:36:13,883+0200 INFO (periodic/22) [vdsm.api] START multipath_health() from=internal, task_id=53da2eca-eb66-400c-8367-ab62cedc5dc1 (api:46)
2018-08-22 15:36:13,883+0200 INFO (periodic/22) [vdsm.api] FINISH multipath_health return={} from=internal, task_id=53da2eca-eb66-400c-8367-ab62cedc5dc1 (api:52)
2018-08-22 15:36:15,161+0200 INFO (libvirt/events) [virt.vm] (vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') abnormal vm stop device ua-179375b0-0a18-4fcb-a884-4aeb1c 8fed97 error eother (vm:5116)
2018-08-22 15:36:15,161+0200 INFO (libvirt/events) [virt.vm] (vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') CPU stopped: onIOError (vm:6157)
2018-08-22 15:36:15,162+0200 DEBUG (libvirt/events) [virt.metadata.Descriptor] values: {'minGuaranteedMemoryMb': 1024, 'clusterVersion': '4.2', 'resumeBehavior': 'au to_resume', 'memGuaranteedSize': 1024, 'launchPaused': 'false', 'startTime': 1534944832.058459, 'destroy_on_reboot': False, 'pauseTime': 4999289.49} (metadata:596)
2018-08-22 15:36:15,162+0200 DEBUG (libvirt/events) [virt.metadata.Descriptor] values updated: {'minGuaranteedMemoryMb': 1024, 'clusterVersion': '4.2', 'resumeBehavi or': 'auto_resume', 'memGuaranteedSize': 1024, 'launchPaused': 'false', 'startTime': 1534944832.058459, 'destroy_on_reboot': False, 'pauseTime': 4999289.49} (metadat a:601)
2018-08-22 15:36:15,168+0200 DEBUG (libvirt/events) [virt.metadata.Descriptor] dumped metadata for b139a9b9-16bc-40ee-ba84-d1d59e5ce17a: <?xml version='1.0' encoding
metadata blablablabla............................>
2018-08-22 15:36:15,169+0200 DEBUG (libvirt/events) [virt.vm] (vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') event Suspended detail 2 opaque None (vm:5520)
2018-08-22 15:36:15,169+0200 INFO (libvirt/events) [virt.vm] (vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') CPU stopped: onSuspend (vm:6157)
2018-08-22 15:36:15,174+0200 WARN (libvirt/events) [virt.vm] (vmId='b139a9b9-16bc-40ee-ba84-d1d59e5ce17a') device sda reported I/O error (vm:4065)
2018-08-22 15:36:15,340+0200 DEBUG (vmchannels) [virt.vm] (vmId='46d496af-e2d0-4caa-9a13-10c624f265d8') Guest's message heartbeat: {u'memory-stat': {u'swap_out': 0, u'majflt': 0, u'swap_usage': 0, u'mem_cached': 119020, u'mem_free': 3693900, u'mem_buffers': 2108, u'swap_in': 0, u'swap_total': 8257532, u'pageflt': 141, u'mem_tota l': 3878980, u'mem_unused': 3572772}, u'free-ram': u'3607', u'apiVersion': 3} (guestagent:337)
Do you have some idea ?
6 years, 3 months
Update disk alias by using ovirtsdk4
by branimirp@gmail.com
Hello list!
I have been trying to rename a disk alias that defaults to "some_template_name-Disk1" to something more logical by using ovirtsdk4 (in my example to "vm2_disk_0"). On the list itself, I saw a couple of examples for older version of SDK. Basically, I am trying to create a VM and then update its properties, notably "alias" attribute that can be found in types.Disk. The first portion of the code works well (vm provisioning), but second one (alias update) does not, causing the TypeError exception. Obviously, I am doing something wrong here. I have spent most of the time trying to figure out how to actually do this. I would kindly ask for some hint.
Thank you in advance!
Regards,
Branimir
*******************************
# create a vm
vms_service = connection.system_service().vms_service()
vms_service.add(
types.Vm(
# --- vm name
name='vm2',
# --- vCpus
cpu=types.Cpu(
topology=types.CpuTopology(cores=2, sockets=1)
),
# --- memory
memory = 512*1024*1024,
# -- oVirt cluster
cluster = types.Cluster(
name='cluster_1',
),
# -- template to use
template = types.Template(
name='centos7-tmplt-compute-node-v1'),
# -- boot device
os = types.OperatingSystem(boot=types.Boot(devices=[types.BootDevice.HD])),
),
)
# update the vm in terms of disk name
vm = vms_service.list(search='name=vm2')[0]
vm_service = vms_service.vm_service(vm.id)
updated_vm = vm_service.update(
types.Vm(
disk_attachment = types.Disk(alias='vm2_disk_0')
)
)
6 years, 3 months
Upgrading librbd1 and librados2 for oVirt 4.2.x
by Andreas Elvers
Hello,
I am in need of upgrading librbd1 and librados2 for a oVirt 4.2.x cluster.
The cluster was installed via node ng.
Taking the repository for Ceph Mimic or Luminous will end with a dependency
problem because liburcu-cds.so.1 is already installed as a more recent version.
A hint on how to get a more recent versions of the above libraries would be very
much appreciated.
Thanks,
- Andreas
6 years, 3 months
Re: losing ib0 connection after activating host
by Douglas Duckworth
Here's a link to the files:
https://bit.ly/2wjZ6Vo
Thank you!
Thanks,
Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit
Weill Cornell Medicine
1300 York - LC-502
E: doug(a)med.cornell.edu
O: 212-746-6305
F: 212-746-8690
On Thu, Aug 23, 2018 at 6:51 AM, Dominik Holler <dholler(a)redhat.com> wrote:
> Would you please share the vdsm.log and the supervdsm.log from this
> host?
>
> On Wed, 22 Aug 2018 11:36:09 -0400
> Douglas Duckworth <dod2014(a)med.cornell.edu> wrote:
>
> > Hi
> >
> > I keep losing ib0 connection on hypervisor after adding host to
> > engine. This makes the host not really work since NFS will be mounted
> > over ib0.
> >
> > I don't really understand why this occurs.
> >
> > OS:
> >
> > [root@ovirt-hv2 ~]# cat /etc/redhat-release
> > CentOS Linux release 7.5.1804 (Core)
> >
> > Here's the network script:
> >
> > [root@ovirt-hv2 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ib0
> > DEVICE=ib0
> > BOOTPROTO=static
> > IPADDR=172.16.0.207
> > NETMASK=255.255.255.0
> > ONBOOT=yes
> > ZONE=public
> >
> > When I try "ifup"
> >
> > [root@ovirt-hv2 ~]# ifup ib0
> > Error: Connection activation failed: No suitable device found for this
> > connection.
> >
> > The error in syslog:
> >
> > Aug 22 11:31:50 ovirt-hv2 kernel: IPv4: martian source 172.16.0.87
> > from 172.16.0.49, on dev ib0
> > Aug 22 11:31:53 ovirt-hv2 NetworkManager[1070]: <info>
> > [1534951913.7486] audit: op="connection-activate"
> > uuid="2ab4abde-b8a5-6cbc-19b1-2bfb193e4e89" name="System ib0"
> > result="fail" reason="No suitable device found for this connection.
> >
> > As you can see media state up:
> >
> > [root@ovirt-hv2 ~]# ip a
> > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
> > group default qlen 1000
> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > inet 127.0.0.1/8 scope host lo
> > valid_lft forever preferred_lft forever
> > 2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
> > ovirtmgmt state UP group default qlen 1000
> > link/ether 50:9a:4c:89:d3:81 brd ff:ff:ff:ff:ff:ff
> > 3: em2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state
> > DOWN group default qlen 1000
> > link/ether 50:9a:4c:89:d3:82 brd ff:ff:ff:ff:ff:ff
> > 4: p1p1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state
> > DOWN group default qlen 1000
> > link/ether b4:96:91:13:ea:68 brd ff:ff:ff:ff:ff:ff
> > 5: p1p2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state
> > DOWN group default qlen 1000
> > link/ether b4:96:91:13:ea:6a brd ff:ff:ff:ff:ff:ff
> > 6: idrac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> > state UNKNOWN group default qlen 1000
> > link/ether 50:9a:4c:89:d3:84 brd ff:ff:ff:ff:ff:ff
> > inet 169.254.0.2/16 brd 169.254.255.255 scope global idrac
> > valid_lft forever preferred_lft forever
> > 7: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP
> > group default qlen 256
> > link/infiniband
> > a0:00:02:08:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:1d:13:41 brd
> > 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
> > 8: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> > group default qlen 1000
> > link/ether 12:b4:30:22:39:5b brd ff:ff:ff:ff:ff:ff
> > 9: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group
> > default qlen 1000
> > link/ether 3e:32:e6:66:98:49 brd ff:ff:ff:ff:ff:ff
> > 25: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> > noqueue state UP group default qlen 1000
> > link/ether 50:9a:4c:89:d3:81 brd ff:ff:ff:ff:ff:ff
> > inet 10.0.0.183/16 brd 10.0.255.255 scope global ovirtmgmt
> > valid_lft forever preferred_lft forever
> > 26: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc
> > noqueue master ovs-system state UNKNOWN group default qlen 1000
> > link/ether aa:32:82:1b:01:d9 brd ff:ff:ff:ff:ff:ff
> > 27: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
> > group default qlen 1000
> > link/ether 32:ff:5d:b8:c2:b4 brd ff:ff:ff:ff:ff:ff
> >
> > The card is FDR:
> >
> > [root@ovirt-hv2 ~]# lspci -v | grep Mellanox
> > 01:00.0 Network controller: Mellanox Technologies MT27500 Family
> > [ConnectX-3]
> > Subsystem: Mellanox Technologies Device 0051
> >
> > Latest OFED driver:
> >
> > [root@ovirt-hv2 ~]# /etc/init.d/openibd status
> >
> > HCA driver loaded
> >
> > Configured IPoIB devices:
> > ib0
> >
> > Currently active IPoIB devices:
> > ib0
> > Configured Mellanox EN devices:
> >
> > Currently active Mellanox devices:
> > ib0
> >
> > The following OFED modules are loaded:
> >
> > rdma_ucm
> > rdma_cm
> > ib_ipoib
> > mlx4_core
> > mlx4_ib
> > mlx4_en
> > mlx5_core
> > mlx5_ib
> > ib_uverbs
> > ib_umad
> > ib_ucm
> > ib_cm
> > ib_core
> > mlxfw
> > mlx5_fpga_tools
> >
> > I can add an IP to ib0 using "ip addr" though I need Network Manager
> > to work with ib0.
> >
> >
> > Thanks,
> >
> > Douglas Duckworth, MSc, LFCS
> > HPC System Administrator
> > Scientific Computing Unit
> > Weill Cornell Medicine
> > 1300 York - LC-502
> > E: doug(a)med.cornell.edu
> > O: 212-746-6305
> > F: 212-746-8690
>
>
6 years, 3 months
Data Center non-responsive, storage not loading, and psql errors after importing an ova
by josh@nightscapetech.com
I posted about this a few months ago on Reddit when it happened the first time, and I have now repeated it on two more separate installations. I have a simple oVirt 4.2 hosted-engine setup, two hosts, with a qnap NAS as the shared NFS storage. I copied an ova file (exported from vmware) to the NFS storage. I then imported it by logging into the engine gui, going to "Virtual machines", selecting the host/file path/datacenter/etc, and starting the import. Once I started the import, I get an alert that it failed almost immediately, followed by a notice that the data center is in a non-responsive state. Clicking on the "storage" tab under Data Center, or going to the "Storage Domain" page directly yields the three ". . ." loading animation, which never ends.
I can still see the storage mounts on the hosts, and I can move files to and from them.
the engine.log file on the hosted-engine VM contains a lot of the following lines:
Caused by: org.springframework.dao.DataIntegrityViolationException: PreparedStatementCallback; SQL [select * from getstorage_domains_list_by_imageid
(?)]; ERROR: integer out of range
Where: PL/pgSQL function getstorage_domains_list_by_imageid(uuid) line 3 at RETURN QUERY; nested exception is org.postgresql.util.PSQLException: ER
ROR: integer out of range
Where: PL/pgSQL function getstorage_domains_list_by_imageid(uuid) line 3 at RETURN QUERY
at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:102) [spring-jdbc.jar:4.3.
9.RELEASE]
What I have tried:
Restarting the hosted-engine service
Restarting nfs and vdsm services
Rebooting the hosted-engine
Rebooting both hosts
Entering an exiting maintenance mode (one host never came out of maintenance, and refuses to with the error " General command validation failure ")
Any idea what might have happened? And what should I have done to try to rectify it? Entering maintenance mode and restarting services seems to have made things much worse.
6 years, 3 months