Try listing the domains with
virsh -r list
maybe it just has some weird name...
Martin
On Fri, Jan 12, 2018 at 6:56 PM, Jayme <jaymef@gmail.com> wrote:
> I thought that it might be a good sign but unfortunately I cannot access it
> with console :( if I could get console access to it I might be able to fix
> the problem. But seeing is how the console is also not working leads me to
> believe there is a bigger issue at hand here.
>
> hosted-engine --console
> The engine VM is running on this host
> error: failed to get domain 'HostedEngine'
> error: Domain not found: no domain with matching name 'HostedEngine'
>
> I really wonder if this is all a symlinking problem in some way. Is it
> possible for me to upgrade host to 4.2 RC2 without being able to upgrade the
> engine first or should I keep everything on 4.2 as it is?
>
> On Fri, Jan 12, 2018 at 1:49 PM, Martin Sivak <msivak@redhat.com> wrote:
>>
>> Hi,
>>
>> the VM is up according to the status (at least for a while). You
>> should be able to use console and diagnose anything that happened
>> inside (line the need for fsck and such) now.
>>
>> Check the presence of those links again now, the metadata file content
>> is not important, but the file has to exist (agents will populate it
>> with status data). I have no new idea about what is wrong with that
>> though.
>>
>> Best regards
>>
>> Martin
>>
>>
>>
>> On Fri, Jan 12, 2018 at 5:47 PM, Jayme <jaymef@gmail.com> wrote:
>> > The lock space issue was an issue I needed to clear but I don't believe
>> > it
>> > has resolved the problem. I shutdown agent and broker on all hosts and
>> > disconnected hosted-storage then enabled broker/agent on just one host
>> > and
>> > connected storage. I started the VM and actually didn't get any errors
>> > in
>> > the logs barely at all which was good to see, however the VM is still
>> > not
>> > running:
>> >
>> > HOST3:
>> >
>> > Engine status : {"reason": "failed liveliness
>> > check",
>> > "health": "bad", "vm": "up", "detail": "Up"}
>> >
>> > ==> /var/log/messages <==
>> > Jan 12 12:42:57 cultivar3 kernel: ovirtmgmt: port 2(vnet0) entered
>> > disabled
>> > state
>> > Jan 12 12:42:57 cultivar3 kernel: device vnet0 entered promiscuous mode
>> > Jan 12 12:42:57 cultivar3 kernel: ovirtmgmt: port 2(vnet0) entered
>> > blocking
>> > state
>> > Jan 12 12:42:57 cultivar3 kernel: ovirtmgmt: port 2(vnet0) entered
>> > forwarding state
>> > Jan 12 12:42:57 cultivar3 lldpad: recvfrom(Event interface): No buffer
>> > space
>> > available
>> > Jan 12 12:42:57 cultivar3 systemd-machined: New machine
>> > qemu-111-Cultivar.
>> > Jan 12 12:42:57 cultivar3 systemd: Started Virtual Machine
>> > qemu-111-Cultivar.
>> > Jan 12 12:42:57 cultivar3 systemd: Starting Virtual Machine
>> > qemu-111-Cultivar.
>> > Jan 12 12:42:57 cultivar3 kvm: 3 guests now active
>> > Jan 12 12:44:38 cultivar3 libvirtd: 2018-01-12 16:44:38.737+0000: 1535:
>> > error : qemuDomainAgentAvailable:6010 : Guest agent is not responding:
>> > QEMU
>> > guest agent is not connected
>> >
>> > Interestingly though, now I'm seeing this in the logs which may be a new
>> > clue:
>> >
>> >
>> > ==> /var/log/vdsm/vdsm.log <==
>> > File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD. py", line
>> > 126,
>> > in findDomain
>> > return NfsStorageDomain(NfsStorageDomain. findDomainPath(sdUUID))
>> > File "/usr/lib/python2.7/site-packages/vdsm/storage/nfsSD. py", line
>> > 116,
>> > in findDomainPath
>> > raise se.StorageDomainDoesNotExist(sdUUID)
>> > StorageDomainDoesNotExist: Storage domain does not exist:
>> > (u'248f46f0-d793-4581-9810-c9d965e2f286',)
>> > jsonrpc/4::ERROR::2018-01-12
>> > 12:40:30,380::dispatcher::82::storage.Dispatcher::(wrapper) FINISH
>> > getStorageDomainInfo error=Storage domain does not exist:
>> > (u'248f46f0-d793-4581-9810-c9d965e2f286',)
>> > periodic/42::ERROR::2018-01-12
>> > 12:40:35,430::api::196::root::(_getHaInfo)
>> > failed to retrieve Hosted Engine HA score '[Errno 2] No such file or
>> > directory'Is the Hosted Engine setup finished?
>> > periodic/43::ERROR::2018-01-12
>> > 12:40:50,473::api::196::root::(_getHaInfo)
>> > failed to retrieve Hosted Engine HA score '[Errno 2] No such file or
>> > directory'Is the Hosted Engine setup finished?
>> > periodic/40::ERROR::2018-01-12
>> > 12:41:05,519::api::196::root::(_getHaInfo)
>> > failed to retrieve Hosted Engine HA score '[Errno 2] No such file or
>> > directory'Is the Hosted Engine setup finished?
>> > periodic/43::ERROR::2018-01-12
>> > 12:41:20,566::api::196::root::(_getHaInfo)
>> > failed to retrieve Hosted Engine HA score '[Errno 2] No such file or
>> > directory'Is the Hosted Engine setup finished?
>> >
>> > ==> /var/log/ovirt-hosted-engine-ha/broker.log <==
>> > File
>> >
>> > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/broker/storage_broker.py",
>> > line 151, in get_raw_stats
>> > f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
>> > OSError: [Errno 2] No such file or directory:
>> >
>> > '/var/run/vdsm/storage/248f46f0-d793-4581-9810- c9d965e2f286/14a20941-1b84- 4b82-be8f-ace38d7c037a/ 8582bdfc-ef54-47af-9f1e- f5b7ec1f1cf8'
>> > StatusStorageThread::ERROR::2018-01-12
>> >
>> > 12:32:06,049::status_broker::92::ovirt_hosted_engine_ha. broker.status_broker. StatusBroker.Update::(run)
>> > Failed to read state.
>> > Traceback (most recent call last):
>> > File
>> >
>> > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/broker/status_broker.py",
>> > line 88, in run
>> > self._storage_broker.get_raw_stats()
>> > File
>> >
>> > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ ha/broker/storage_broker.py",
>> > line 162, in get_raw_stats
>> > .format(str(e)))
>> > RequestError: failed to read metadata: [Errno 2] No such file or
>> > directory:
>> >
>> > '/var/run/vdsm/storage/248f46f0-d793-4581-9810- c9d965e2f286/14a20941-1b84- 4b82-be8f-ace38d7c037a/ 8582bdfc-ef54-47af-9f1e- f5b7ec1f1cf8'
>> >
>> > On Fri, Jan 12, 2018 at 12:02 PM, Martin Sivak <msivak@redhat.com>
>> > wrote:
>> >>
>> >> The lock is the issue.
>> >>
>> >> - try running sanlock client status on all hosts
>> >> - also make sure you do not have some forgotten host still connected
>> >> to the lockspace, but without ha daemons running (and with the VM)
>> >>
>> >> I need to go to our president election now, I might check the email
>> >> later tonight.
>> >>
>> >> Martin
>> >>
>> >> On Fri, Jan 12, 2018 at 4:59 PM, Jayme <jaymef@gmail.com> wrote:
>> >> > Here are the newest logs from me trying to start hosted vm:
>> >> >
>> >> > ==> /var/log/messages <==
>> >> > Jan 12 11:58:14 cultivar0 kernel: ovirtmgmt: port 6(vnet4) entered
>> >> > blocking
>> >> > state
>> >> > Jan 12 11:58:14 cultivar0 kernel: ovirtmgmt: port 6(vnet4) entered
>> >> > disabled
>> >> > state
>> >> > Jan 12 11:58:14 cultivar0 kernel: device vnet4 entered promiscuous
>> >> > mode
>> >> > Jan 12 11:58:14 cultivar0 kernel: ovirtmgmt: port 6(vnet4) entered
>> >> > blocking
>> >> > state
>> >> > Jan 12 11:58:14 cultivar0 kernel: ovirtmgmt: port 6(vnet4) entered
>> >> > forwarding state
>> >> > Jan 12 11:58:14 cultivar0 lldpad: recvfrom(Event interface): No
>> >> > buffer
>> >> > space
>> >> > available
>> >> > Jan 12 11:58:14 cultivar0 NetworkManager[1092]: <info>
>> >> > [1515772694.8715]
>> >> > manager: (vnet4): new Tun device
>> >> > (/org/freedesktop/NetworkManager/Devices/140)
>> >> > Jan 12 11:58:14 cultivar0 NetworkManager[1092]: <info>
>> >> > [1515772694.8795]
>> >> > device (vnet4): state change: unmanaged -> unavailable (reason
>> >> > 'connection-assumed') [10 20 41]
>> >> >
>> >> > ==> /var/log/libvirt/qemu/Cultivar.log <==
>> >> > 2018-01-12 15:58:14.879+0000: starting up libvirt version: 3.2.0,
>> >> > package:
>> >> > 14.el7_4.7 (CentOS BuildSystem <http://bugs.centos.org>,
>> >> > 2018-01-04-19:31:34, c1bm.rdu2.centos.org), qemu version:
>> >> > 2.9.0(qemu-kvm-ev-2.9.0-16.el7_4.13.1), hostname:
>> >> > cultivar0.grove.silverorange.com
>> >> > LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
>> >> > QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name
>> >> > guest=Cultivar,debug-threads=on -S -object
>> >> >
>> >> >
>> >> > secret,id=masterKey0,format=raw,file=/var/lib/libvirt/ qemu/domain-119-Cultivar/ master-key.aes
>> >> > -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off
>> >> > -cpu
>> >> > Conroe -m 8192 -realtime mlock=off -smp
>> >> > 2,maxcpus=16,sockets=16,cores=1,threads=1 -uuid
>> >> > 4013c829-c9d7-4b72-90d5-6fe58137504c -smbios
>> >> > 'type=1,manufacturer=oVirt,product=oVirt
>> >> >
>> >> >
>> >> > Node,version=7-4.1708.el7.centos,serial=44454C4C-3300- 1042-8031-B4C04F4B4831,uuid= 4013c829-c9d7-4b72-90d5- 6fe58137504c'
>> >> > -no-user-config -nodefaults -chardev
>> >> >
>> >> >
>> >> > socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain- 119-Cultivar/monitor.sock, server,nowait
>> >> > -mon chardev=charmonitor,id=monitor,mode=control -rtc
>> >> > base=2018-01-12T15:58:14,driftfix=slew -global
>> >> > kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -boot strict=on
>> >> > -device
>> >> > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
>> >> > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive
>> >> >
>> >> >
>> >> > file=/var/run/vdsm/storage/248f46f0-d793-4581-9810- c9d965e2f286/c2dde892-f978- 4dfc-a421-c8e04cf387f9/ 23aa0a66-fa6c-4967-a1e5- fbe47c0cd705,format=raw,if= none,id=drive-virtio-disk0, serial=c2dde892-f978-4dfc- a421-c8e04cf387f9,cache=none, werror=stop,rerror=stop,aio= threads
>> >> > -device
>> >> >
>> >> >
>> >> > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive- virtio-disk0,id=virtio-disk0, bootindex=1
>> >> > -drive if=none,id=drive-ide0-1-0,readonly=on -device
>> >> > ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev
>> >> > tap,fd=35,id=hostnet0,vhost=on,vhostfd=38 -device
>> >> >
>> >> >
>> >> > virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e: 7f:d6:83,bus=pci.0,addr=0x3
>> >> > -chardev
>> >> >
>> >> >
>> >> > socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/ 4013c829-c9d7-4b72-90d5- 6fe58137504c.com.redhat.rhevm. vdsm,server,nowait
>> >> > -device
>> >> >
>> >> >
>> >> > virtserialport,bus=virtio-serial0.0,nr=1,chardev= charchannel0,id=channel0,name= com.redhat.rhevm.vdsm
>> >> > -chardev
>> >> >
>> >> >
>> >> > socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/ 4013c829-c9d7-4b72-90d5- 6fe58137504c.org.qemu.guest_ agent.0,server,nowait
>> >> > -device
>> >> >
>> >> >
>> >> > virtserialport,bus=virtio-serial0.0,nr=2,chardev= charchannel1,id=channel1,name= org.qemu.guest_agent.0
>> >> > -chardev spicevmc,id=charchannel2,name=vdagent -device
>> >> >
>> >> >
>> >> > virtserialport,bus=virtio-serial0.0,nr=3,chardev= charchannel2,id=channel2,name= com.redhat.spice.0
>> >> > -chardev
>> >> >
>> >> >
>> >> > socket,id=charchannel3,path=/var/lib/libvirt/qemu/channels/ 4013c829-c9d7-4b72-90d5- 6fe58137504c.org.ovirt.hosted- engine-setup.0,server,nowait
>> >> > -device
>> >> >
>> >> >
>> >> > virtserialport,bus=virtio-serial0.0,nr=4,chardev= charchannel3,id=channel3,name= org.ovirt.hosted-engine-setup. 0
>> >> > -chardev pty,id=charconsole0 -device
>> >> > virtconsole,chardev=charconsole0,id=console0 -spice
>> >> >
>> >> >
>> >> > tls-port=5904,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice, tls-channel=default,seamless- migration=on
>> >> > -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -object
>> >> > rng-random,id=objrng0,filename=/dev/urandom -device
>> >> > virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x5 -msg
>> >> > timestamp=on
>> >> >
>> >> > ==> /var/log/messages <==
>> >> > Jan 12 11:58:14 cultivar0 NetworkManager[1092]: <info>
>> >> > [1515772694.8807]
>> >> > device (vnet4): state change: unavailable -> disconnected (reason
>> >> > 'none')
>> >> > [20 30 0]
>> >> > Jan 12 11:58:14 cultivar0 systemd-machined: New machine
>> >> > qemu-119-Cultivar.
>> >> > Jan 12 11:58:14 cultivar0 systemd: Started Virtual Machine
>> >> > qemu-119-Cultivar.
>> >> > Jan 12 11:58:14 cultivar0 systemd: Starting Virtual Machine
>> >> > qemu-119-Cultivar.
>> >> >
>> >> > ==> /var/log/libvirt/qemu/Cultivar.log <==
>> >> > 2018-01-12T15:58:15.094002Z qemu-kvm: -chardev pty,id=charconsole0:
>> >> > char
>> >> > device redirected to /dev/pts/1 (label charconsole0)
>> >> >
>> >> > ==> /var/log/messages <==
>> >> > Jan 12 11:58:15 cultivar0 kvm: 5 guests now active
>> >> >
>> >> > ==> /var/log/libvirt/qemu/Cultivar.log <==
>> >> > 2018-01-12 15:58:15.217+0000: shutting down, reason=failed
>> >> >
>> >> > ==> /var/log/messages <==
>> >> > Jan 12 11:58:15 cultivar0 libvirtd: 2018-01-12 15:58:15.217+0000:
>> >> > 1908:
>> >> > error : virLockManagerSanlockAcquire:1041 : resource busy: Failed to
>> >> > acquire
>> >> > lock: Lease is held by another host
>> >> >
>> >> > ==> /var/log/libvirt/qemu/Cultivar.log <==
>> >> > 2018-01-12T15:58:15.219934Z qemu-kvm: terminating on signal 15 from
>> >> > pid
>> >> > 1773
>> >> > (/usr/sbin/libvirtd)
>> >> >
>> >> > ==> /var/log/messages <==
>> >> > Jan 12 11:58:15 cultivar0 kernel: ovirtmgmt: port 6(vnet4) entered
>> >> > disabled
>> >> > state
>> >> > Jan 12 11:58:15 cultivar0 kernel: device vnet4 left promiscuous mode
>> >> > Jan 12 11:58:15 cultivar0 kernel: ovirtmgmt: port 6(vnet4) entered
>> >> > disabled
>> >> > state
>> >> > Jan 12 11:58:15 cultivar0 NetworkManager[1092]: <info>
>> >> > [1515772695.2348]
>> >> > device (vnet4): state change: disconnected -> unmanaged (reason
>> >> > 'unmanaged')
>> >> > [30 10 3]
>> >> > Jan 12 11:58:15 cultivar0 NetworkManager[1092]: <info>
>> >> > [1515772695.2349]
>> >> > device (vnet4): released from master device ovirtmgmt
>> >> > Jan 12 11:58:15 cultivar0 kvm: 4 guests now active
>> >> > Jan 12 11:58:15 cultivar0 systemd-machined: Machine qemu-119-Cultivar
>> >> > terminated.
>> >> >
>> >> > ==> /var/log/vdsm/vdsm.log <==
>> >> > vm/4013c829::ERROR::2018-01-12
>> >> > 11:58:15,444::vm::914::virt.vm::(_startUnderlyingVm)
>> >> > (vmId='4013c829-c9d7-4b72-90d5-6fe58137504c') The vm start process
>> >> > failed
>> >> > Traceback (most recent call last):
>> >> > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 843,
>> >> > in
>> >> > _startUnderlyingVm
>> >> > self._run()
>> >> > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2721,
>> >> > in
>> >> > _run
>> >> > dom.createWithFlags(flags)
>> >> > File "/usr/lib/python2.7/site-packages/vdsm/ libvirtconnection.py",
>> >> > line
>> >> > 126, in wrapper
>> >> > ret = f(*args, **kwargs)
>> >> > File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 512, in
>> >> > wrapper
>> >> > return func(inst, *args, **kwargs)
>> >> > File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1069, in
>> >> > createWithFlags
>> >> > if ret == -1: raise libvirtError ('virDomainCreateWithFlags()
>> >> > failed',
>> >> > dom=self)
>> >> > libvirtError: resource busy: Failed to acquire lock: Lease is held by
>> >> > another host
>> >> > jsonrpc/6::ERROR::2018-01-12
>> >> > 11:58:16,421::__init__::611::jsonrpc.JsonRpcServer::(_ handle_request)
>> >> > Internal server error
>> >> > Traceback (most recent call last):
>> >> > File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__. py", line
>> >> > 606,
>> >> > in _handle_request
>> >> > res = method(**params)
>> >> > File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line
>> >> > 201,
>> >> > in
>> >> > _dynamicMethod
>> >> > result = fn(*methodArgs)
>> >> > File "<string>", line 2, in getAllVmIoTunePolicies
>> >> > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line
>> >> > 48,
>> >> > in
>> >> > method
>> >> > ret = func(*args, **kwargs)
>> >> > File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1354, in
>> >> > getAllVmIoTunePolicies
>> >> > io_tune_policies_dict = self._cif.getAllVmIoTunePolicies()
>> >> > File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 524,
>> >> > in
>> >> > getAllVmIoTunePolicies
>> >> > 'current_values': v.getIoTune()}
>> >> > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3481,
>> >> > in
>> >> > getIoTune
>> >> > result = self.getIoTuneResponse()
>> >> > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3500,
>> >> > in
>> >> > getIoTuneResponse
>> >> > res = self._dom.blockIoTune(
>> >> > File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain. py",
>> >> > line
>> >> > 47,
>> >> > in __getattr__
>> >> > % self.vmid)
>> >> > NotConnectedError: VM '4013c829-c9d7-4b72-90d5-6fe58137504c' was not
>> >> > defined
>> >> > yet or was undefined
>> >> >
>> >> > ==> /var/log/messages <==
>> >> > Jan 12 11:58:16 cultivar0 journal: vdsm jsonrpc.JsonRpcServer ERROR
>> >> > Internal
>> >> > server error#012Traceback (most recent call last):#012 File
>> >> > "/usr/lib/python2.7/site-packages/yajsonrpc/__init__. py", line 606,
>> >> > in
>> >> > _handle_request#012 res = method(**params)#012 File
>> >> > "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 201, in
>> >> > _dynamicMethod#012 result = fn(*methodArgs)#012 File "<string>",
>> >> > line 2,
>> >> > in getAllVmIoTunePolicies#012 File
>> >> > "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in
>> >> > method#012 ret = func(*args, **kwargs)#012 File
>> >> > "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1354, in
>> >> > getAllVmIoTunePolicies#012 io_tune_policies_dict =
>> >> > self._cif.getAllVmIoTunePolicies()#012 File
>> >> > "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 524, in
>> >> > getAllVmIoTunePolicies#012 'current_values': v.getIoTune()}#012
>> >> > File
>> >> > "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3481, in
>> >> > getIoTune#012 result = self.getIoTuneResponse()#012 File
>> >> > "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3500, in
>> >> > getIoTuneResponse#012 res = self._dom.blockIoTune(#012 File
>> >> > "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain. py", line 47,
>> >> > in
>> >> > __getattr__#012 % self.vmid)#012NotConnectedError: VM
>> >> > '4013c829-c9d7-4b72-90d5-6fe58137504c' was not defined yet or was
>> >> > undefined
>> >> >
>> >> > On Fri, Jan 12, 2018 at 11:55 AM, Jayme <jaymef@gmail.com> wrote:
>> >> >>
>> >> >> One other tidbit I noticed is that it seems like there are less
>> >> >> errors
>> >> >> if
>> >> >> I started in paused mode:
>> >> >>
>> >> >> but still shows: Engine status : {"reason":
>> >> >> "bad
>> >> >> vm
>> >> >> status", "health": "bad", "vm": "up", "detail": "Paused"}
>> >> >>
>> >> >> ==> /var/log/messages <==
>> >> >> Jan 12 11:55:05 cultivar0 kernel: ovirtmgmt: port 6(vnet4) entered
>> >> >> blocking state
>> >> >> Jan 12 11:55:05 cultivar0 kernel: ovirtmgmt: port 6(vnet4) entered
>> >> >> disabled state
>> >> >> Jan 12 11:55:05 cultivar0 kernel: device vnet4 entered promiscuous
>> >> >> mode
>> >> >> Jan 12 11:55:05 cultivar0 kernel: ovirtmgmt: port 6(vnet4) entered
>> >> >> blocking state
>> >> >> Jan 12 11:55:05 cultivar0 kernel: ovirtmgmt: port 6(vnet4) entered
>> >> >> forwarding state
>> >> >> Jan 12 11:55:05 cultivar0 lldpad: recvfrom(Event interface): No
>> >> >> buffer
>> >> >> space available
>> >> >> Jan 12 11:55:05 cultivar0 NetworkManager[1092]: <info>
>> >> >> [1515772505.3625]
>> >> >> manager: (vnet4): new Tun device
>> >> >> (/org/freedesktop/NetworkManager/Devices/139)
>> >> >>
>> >> >> ==> /var/log/libvirt/qemu/Cultivar.log <==
>> >> >> 2018-01-12 15:55:05.370+0000: starting up libvirt version: 3.2.0,
>> >> >> package:
>> >> >> 14.el7_4.7 (CentOS BuildSystem <http://bugs.centos.org>,
>> >> >> 2018-01-04-19:31:34, c1bm.rdu2.centos.org), qemu version:
>> >> >> 2.9.0(qemu-kvm-ev-2.9.0-16.el7_4.13.1), hostname:
>> >> >> cultivar0.grove.silverorange.com
>> >> >> LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
>> >> >> QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name
>> >> >> guest=Cultivar,debug-threads=on -S -object
>> >> >>
>> >> >>
>> >> >> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/ qemu/domain-118-Cultivar/ master-key.aes
>> >> >> -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off
>> >> >> -cpu
>> >> >> Conroe -m 8192 -realtime mlock=off -smp
>> >> >> 2,maxcpus=16,sockets=16,cores=1,threads=1 -uuid
>> >> >> 4013c829-c9d7-4b72-90d5-6fe58137504c -smbios
>> >> >> 'type=1,manufacturer=oVirt,product=oVirt
>> >> >>
>> >> >>
>> >> >> Node,version=7-4.1708.el7.centos,serial=44454C4C-3300- 1042-8031-B4C04F4B4831,uuid= 4013c829-c9d7-4b72-90d5- 6fe58137504c'
>> >> >> -no-user-config -nodefaults -chardev
>> >> >>
>> >> >>
>> >> >> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain- 118-Cultivar/monitor.sock, server,nowait
>> >> >> -mon chardev=charmonitor,id=monitor,mode=control -rtc
>> >> >> base=2018-01-12T15:55:05,driftfix=slew -global
>> >> >> kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -boot strict=on
>> >> >> -device
>> >> >> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
>> >> >> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive
>> >> >>
>> >> >>
>> >> >> file=/var/run/vdsm/storage/248f46f0-d793-4581-9810- c9d965e2f286/c2dde892-f978- 4dfc-a421-c8e04cf387f9/ 23aa0a66-fa6c-4967-a1e5- fbe47c0cd705,format=raw,if= none,id=drive-virtio-disk0, serial=c2dde892-f978-4dfc- a421-c8e04cf387f9,cache=none, werror=stop,rerror=stop,aio= threads
>> >> >> -device
>> >> >>
>> >> >>
>> >> >> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive- virtio-disk0,id=virtio-disk0, bootindex=1
>> >> >> -drive if=none,id=drive-ide0-1-0,readonly=on -device
>> >> >> ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev
>> >> >> tap,fd=35,id=hostnet0,vhost=on,vhostfd=38 -device
>> >> >>
>> >> >>
>> >> >> virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e: 7f:d6:83,bus=pci.0,addr=0x3
>> >> >> -chardev
>> >> >>
>> >> >>
>> >> >> socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/ 4013c829-c9d7-4b72-90d5- 6fe58137504c.com.redhat.rhevm. vdsm,server,nowait
>> >> >> -device
>> >> >>
>> >> >>
>> >> >> virtserialport,bus=virtio-serial0.0,nr=1,chardev= charchannel0,id=channel0,name= com.redhat.rhevm.vdsm
>> >> >> -chardev
>> >> >>
>> >> >>
>> >> >> socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/ 4013c829-c9d7-4b72-90d5- 6fe58137504c.org.qemu.guest_ agent.0,server,nowait
>> >> >> -device
>> >> >>
>> >> >>
>> >> >> virtserialport,bus=virtio-serial0.0,nr=2,chardev= charchannel1,id=channel1,name= org.qemu.guest_agent.0
>> >> >> -chardev spicevmc,id=charchannel2,name=vdagent -device
>> >> >>
>> >> >>
>> >> >> virtserialport,bus=virtio-serial0.0,nr=3,chardev= charchannel2,id=channel2,name= com.redhat.spice.0
>> >> >> -chardev
>> >> >>
>> >> >>
>> >> >> socket,id=charchannel3,path=/var/lib/libvirt/qemu/channels/ 4013c829-c9d7-4b72-90d5- 6fe58137504c.org.ovirt.hosted- engine-setup.0,server,nowait
>> >> >> -device
>> >> >>
>> >> >>
>> >> >> virtserialport,bus=virtio-serial0.0,nr=4,chardev= charchannel3,id=channel3,name= org.ovirt.hosted-engine-setup. 0
>> >> >> -chardev pty,id=charconsole0 -device
>> >> >> virtconsole,chardev=charconsole0,id=console0 -spice
>> >> >>
>> >> >>
>> >> >> tls-port=5904,addr=0,x509-dir=/etc/pki/vdsm/libvirt-spice, tls-channel=default,seamless- migration=on
>> >> >> -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -object
>> >> >> rng-random,id=objrng0,filename=/dev/urandom -device
>> >> >> virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x5 -msg
>> >> >> timestamp=on
>> >> >>
>> >> >> ==> /var/log/messages <==
>> >> >> Jan 12 11:55:05 cultivar0 NetworkManager[1092]: <info>
>> >> >> [1515772505.3689]
>> >> >> device (vnet4): state change: unmanaged -> unavailable (reason
>> >> >> 'connection-assumed') [10 20 41]
>> >> >> Jan 12 11:55:05 cultivar0 NetworkManager[1092]: <info>
>> >> >> [1515772505.3702]
>> >> >> device (vnet4): state change: unavailable -> disconnected (reason
>> >> >> 'none')
>> >> >> [20 30 0]
>> >> >> Jan 12 11:55:05 cultivar0 systemd-machined: New machine
>> >> >> qemu-118-Cultivar.
>> >> >> Jan 12 11:55:05 cultivar0 systemd: Started Virtual Machine
>> >> >> qemu-118-Cultivar.
>> >> >> Jan 12 11:55:05 cultivar0 systemd: Starting Virtual Machine
>> >> >> qemu-118-Cultivar.
>> >> >>
>> >> >> ==> /var/log/libvirt/qemu/Cultivar.log <==
>> >> >> 2018-01-12T15:55:05.586827Z qemu-kvm: -chardev pty,id=charconsole0:
>> >> >> char
>> >> >> device redirected to /dev/pts/1 (label charconsole0)
>> >> >>
>> >> >> ==> /var/log/messages <==
>> >> >> Jan 12 11:55:05 cultivar0 kvm: 5 guests now active
>> >> >>
>> >> >> On Fri, Jan 12, 2018 at 11:36 AM, Jayme <jaymef@gmail.com> wrote:
>> >> >>>
>> >> >>> Yeah I am in global maintenance:
>> >> >>>
>> >> >>> state=GlobalMaintenance
>> >> >>>
>> >> >>> host0: {"reason": "vm not running on this host", "health": "bad",
>> >> >>> "vm":
>> >> >>> "down", "detail": "unknown"}
>> >> >>> host2: {"reason": "vm not running on this host", "health": "bad",
>> >> >>> "vm":
>> >> >>> "down", "detail": "unknown"}
>> >> >>> host3: {"reason": "vm not running on this host", "health": "bad",
>> >> >>> "vm":
>> >> >>> "down", "detail": "unknown"}
>> >> >>>
>> >> >>> I understand the lock is an issue, I'll try to make sure it is
>> >> >>> fully
>> >> >>> stopped on all three before starting but I don't think that is the
>> >> >>> issue at
>> >> >>> hand either. What concerns me is mostly that it seems to be
>> >> >>> unable
>> >> >>> to read
>> >> >>> the meta data, I think that might be the heart of the problem but
>> >> >>> I'm
>> >> >>> not
>> >> >>> sure what is causing it.
>> >> >>>
>> >> >>> On Fri, Jan 12, 2018 at 11:33 AM, Martin Sivak <msivak@redhat.com>
>> >> >>> wrote:
>> >> >>>>
>> >> >>>> > On all three hosts I ran hosted-engine --vm-shutdown;
>> >> >>>> > hosted-engine
>> >> >>>> > --vm-poweroff
>> >> >>>>
>> >> >>>> Are you in global maintenance? I think you were in one of the
>> >> >>>> previous
>> >> >>>> emails, but worth checking.
>> >> >>>>
>> >> >>>> > I started ovirt-ha-broker with systemctl as root user but it
>> >> >>>> > does
>> >> >>>> > appear to be running under vdsm:
>> >> >>>>
>> >> >>>> That is the correct behavior.
>> >> >>>>
>> >> >>>> > libvirtError: resource busy: Failed to acquire lock: Lease is
>> >> >>>> > held
>> >> >>>> > by
>> >> >>>> > another host
>> >> >>>>
>> >> >>>> sanlock seems to think the VM runs somewhere and it is possible
>> >> >>>> that
>> >> >>>> some other host tried to start the VM as well unless you are in
>> >> >>>> global
>> >> >>>> maintenance (that is why I asked the first question here).
>> >> >>>>
>> >> >>>> Martin
>> >> >>>>
>> >> >>>> On Fri, Jan 12, 2018 at 4:28 PM, Jayme <jaymef@gmail.com> wrote:
>> >> >>>> > Martin,
>> >> >>>> >
>> >> >>>> > Thanks so much for keeping with me, this is driving me crazy! I
>> >> >>>> > really do
>> >> >>>> > appreciate it, thanks again
>> >> >>>> >
>> >> >>>> > Let's go through this:
>> >> >>>> >
>> >> >>>> > HE VM is down - YES
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > HE agent fails when opening metadata using the symlink - YES
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > the symlink is there and readable by vdsm:kvm - it appears to
>> >> >>>> > be:
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > lrwxrwxrwx. 1 vdsm kvm 159 Jan 10 21:20
>> >> >>>> > 14a20941-1b84-4b82-be8f-ace38d7c037a
>> >> >>>> > ->
>> >> >>>> >
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > /rhev/data-center/mnt/cultivar0.grove.silverorange. com:_exports_hosted__engine/ 248f46f0-d793-4581-9810- c9d965e2f286/images/14a20941- 1b84-4b82-be8f-ace38d7c037a
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > And the files in the linked directory exist and have vdsm:kvm
>> >> >>>> > perms
>> >> >>>> > as
>> >> >>>> > well:
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > # cd
>> >> >>>> >
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > /rhev/data-center/mnt/cultivar0.grove.silverorange. com:_exports_hosted__engine/ 248f46f0-d793-4581-9810- c9d965e2f286/images/14a20941- 1b84-4b82-be8f-ace38d7c037a
>> >> >>>> >
>> >> >>>> > [root@cultivar0 14a20941-1b84-4b82-be8f-ace38d7c037a]# ls -al
>> >> >>>> >
>> >> >>>> > total 2040
>> >> >>>> >
>> >> >>>> > drwxr-xr-x. 2 vdsm kvm 4096 Jan 12 10:51 .
>> >> >>>> >
>> >> >>>> > drwxr-xr-x. 8 vdsm kvm 4096 Feb 3 2016 ..
>> >> >>>> >
>> >> >>>> > -rw-rw----. 1 vdsm kvm 1028096 Jan 12 11:19
>> >> >>>> > 8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8
>> >> >>>> >
>> >> >>>> > -rw-rw----. 1 vdsm kvm 1048576 Feb 3 2016
>> >> >>>> > 8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8.lease
>> >> >>>> >
>> >> >>>> > -rw-r--r--. 1 vdsm kvm 283 Feb 3 2016
>> >> >>>> > 8582bdfc-ef54-47af-9f1e-f5b7ec1f1cf8.meta
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > I started ovirt-ha-broker with systemctl as root user but it
>> >> >>>> > does
>> >> >>>> > appear to
>> >> >>>> > be running under vdsm:
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > vdsm 16928 0.6 0.0 1618244 43328 ? Ssl 10:33 0:18
>> >> >>>> > /usr/bin/python
>> >> >>>> > /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>> >> >>>> >
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > Here is something I tried:
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > - On all three hosts I ran hosted-engine --vm-shutdown;
>> >> >>>> > hosted-engine
>> >> >>>> > --vm-poweroff
>> >> >>>> >
>> >> >>>> > - On HOST0 (cultivar0) I disconnected and reconnected storage
>> >> >>>> > using
>> >> >>>> > hosted-engine
>> >> >>>> >
>> >> >>>> > - Tried starting up the hosted VM on cultivar0 while tailing the
>> >> >>>> > logs:
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > # hosted-engine --vm-start
>> >> >>>> >
>> >> >>>> > VM exists and is down, cleaning up and restarting
>> >> >>>> >
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > ==> /var/log/vdsm/vdsm.log <==
>> >> >>>> >
>> >> >>>> > jsonrpc/2::ERROR::2018-01-12
>> >> >>>> > 11:27:27,194::vm::1766::virt.vm::(_getRunningVmStats)
>> >> >>>> > (vmId='4013c829-c9d7-4b72-90d5-6fe58137504c') Error fetching vm
>> >> >>>> > stats
>> >> >>>> >
>> >> >>>> > Traceback (most recent call last):
>> >> >>>> >
>> >> >>>> > File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line
>> >> >>>> > 1762,
>> >> >>>> > in
>> >> >>>> > _getRunningVmStats
>> >> >>>> >
>> >> >>>> > vm_sample.interval)
>> >> >>>> >
>> >> >>>> > File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py" ,
>> >> >>>> > line
>> >> >>>> > 45, in
>> >> >>>> > produce
>> >> >>>> >
>> >> >>>> > networks(vm, stats, first_sample, last_sample, interval)
>> >> >>>> >
>> >> >>>> > File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py" ,
>> >> >>>> > line
>> >> >>>> > 322, in
>> >> >>>> > networks
>> >> >>>> >
>> >> >>>> > if nic.name.startswith('hostdev'):
>> >> >>>> >
>> >> >>>> > AttributeError: name
>> >> >>>> >
>> >> >>>> > jsonrpc/3::ERROR::2018-01-12
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > 11:27:27,221::__init__::611::jsonrpc.JsonRpcServer::(_ handle_request)
>> >> >>>> > Internal server error
>> >> >>>> >
>> >> >>>> > Traceback (most recent call last):
>> >> >>>> >
>> >> >>>> > File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__. py",