Thanks Ravi.
it seems there are two problems.
1. the host deploy failure
2. random failures caused by host monitoring exception.
Please note that master is broken for ovirt-engine and we need to stop
merging until these issues are fixed.
On Mon, Sep 10, 2018 at 8:48 PM, Ravi Shankar Nori <rnori(a)redhat.com> wrote:
VdsDeploy is failing with message
Failed to execute stage 'Initializing': Invalid response opcode
'CONFIRM'
The SQL exceptions occur when trying to create an auditlog event with the
above message which is a separate issue.
On Mon, Sep 10, 2018 at 7:41 AM, Dafna Ron <dron(a)redhat.com> wrote:
> I am seeing a failure in 4.2 which suggests the same issue is causing
> failure in 4.2 as well (on different test)
>
> full logs:
>
>
https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/
> 3069/artifact/basic-suite.el7.x86_64/test_logs/basic-suite-4
> .2/post-006_migrations.py/
>
> error:
>
> 2018-09-10 05:27:19,084-0400 INFO (jsonrpc/4) [api.virt] FINISH migrate
> return={'status': {'message': 'Migration in progress',
'code': 0},
> 'progress': 0} from=::ffff:192.168.200.2,36202, flow_id=107cdbdd,
> vmId=9d5e4d63-41ad-47eb-bf04-0fc
> 8a49d59a6 (api:52)
> 2018-09-10 05:27:19,085-0400 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer]
> RPC call VM.migrate succeeded in 0.01 seconds (__init__:573)
> 2018-09-10 05:27:21,583-0400 INFO (migsrc/9d5e4d63) [virt.vm]
> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Creation of destination VM
> took: 2 seconds (migration:473)
> 2018-09-10 05:27:21,583-0400 INFO (migsrc/9d5e4d63) [virt.vm]
> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') starting migration to
> qemu+tls://lago-basic-suite-4-2-host-1/system with miguri
> tcp://lago-basic-suite-4-2-host-1 (migration:502)
> 2018-09-10 05:27:25,044-0400 ERROR (periodic/0) [virt.periodic.Operation]
> <vdsm.virt.sampling.HostMonitor object at 0x7f0ddfc2b190> operation
> failed (periodic:222)
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line
> 220, in __call__
> self._func()
> File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line
> 576, in __call__
> sample = HostSample(self._pid)
> File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line
> 240, in __init__
> self.interfaces = _get_interfaces_and_samples()
> File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line
> 202, in _get_interfaces_and_samples
> links_and_samples[link.name] = InterfaceSample(link)
> File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line
> 109, in __init__
> self.speed = _getLinkSpeed(link)
> File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line
> 590, in _getLinkSpeed
> speed = vlan.speed(dev.name)
> File "/usr/lib/python2.7/site-packages/vdsm/network/link/vlan.py",
> line 35, in speed
> dev_speed = nic.read_speed_using_sysfs(dev_name)
> File "/usr/lib/python2.7/site-packages/vdsm/network/link/nic.py", line
> 47, in read_speed_using_sysfs
> s = int(f.read())
> IOError: [Errno 22] Invalid argument
> 2018-09-10 05:27:25,499-0400 INFO (libvirt/events) [virt.vm]
> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') CPU stopped: onSuspend
> (vm:6203)
> 2018-09-10 05:27:27,167-0400 INFO (migsrc/9d5e4d63) [virt.vm]
> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') migration took 7 seconds
> to complete (migration:514)
> 2018-09-10 05:27:27,168-0400 INFO (migsrc/9d5e4d63) [virt.vm]
> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Changed state to Down:
> Migration succeeded (code=4) (vm:1693)
> 2018-09-10 05:27:27,176-0400 INFO (migsrc/9d5e4d63) [virt.vm]
> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Stopping connection
> (guestagent:438)
> 2018-09-10 05:27:27,183-0400 INFO (jsonrpc/6) [api.virt] START
> destroy(gracefulAttempts=1) from=::ffff:192.168.200.2,36202,
> vmId=9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6 (api:46)
> 2018-09-10 05:27:27,183-0400 INFO (jsonrpc/6) [virt.vm]
> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Release VM resources
> (vm:5283)
> 2018-09-10 05:27:27,183-0400 WARN (jsonrpc/6) [virt.vm]
> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') trying to set state to
> Powering down when already Down (vm:612)
> 2018-09-10 05:27:27,183-0400 INFO (jsonrpc/6) [virt.vm]
> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Stopping connection
> (guestagent:438)
> 2018-09-10 05:27:27,183-0400 INFO (jsonrpc/6) [virt.vm]
> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') _destroyVmGraceful attempt
> #0 (vm:5320)
> 2018-09-10 05:27:27,184-0400 WARN (jsonrpc/6) [virt.vm]
> (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') VM
> '9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6' couldn't be destroyed in libvirt:
> Requested operation is not valid: domain is not running (vm
> :5341)
> 2018-09-10 05:27:27,185-0400 INFO (jsonrpc/6) [vdsm.api] START
> teardownImage(sdUUID='8cbc7539-1ab3-4a3e-9504-65b5002c6546',
> spUUID='3b4effaf-6bad-42b8-810b-80faaa5d5a1e',
> imgUUID='90402f78-6c84-44cf-bcdc-7823f252d185', volUUID=None) from
> =::ffff:192.168.200.2,36202, task_id=73c097e0-6cb6-449e-bebe-f0dfa55e2155
> (api:46)
> 2018-09-10 05:27:27,186-0400 INFO (jsonrpc/6) [storage.StorageDomain]
> Removing image run directory '/var/run/vdsm/storage/8cbc753
> 9-1ab3-4a3e-9504-65b5002c6546/90402f78-6c84-44cf-bcdc-7823f252d185'
> (blockSD:1360)
> 2018-09-10 05:27:27,186-0400 INFO (jsonrpc/6) [storage.fileUtils]
> Removing directory: /var/run/vdsm/storage/8cbc7539
> -1ab3-4a3e-9504-65b5002c6546/90402f78-6c84-44cf-bcdc-7823f252d185
> (fileUtils:178)
> 2018-09-10 05:27:27,325-0400 INFO (jsonrpc/6) [storage.LVM] Deactivating
> lvs: vg=8cbc7539-1ab3-4a3e-9504-65b5002c6546
> lvs=['715b9383-87ec-4381-aaff-f084df42b3ce',
> 'ba23b2ac-90db-41b7-ab0f-c49f0c4d1b1e', '9201288a-918b-41a8-8ff2-66438
> 33e1
> d91'] (lvm:1310)
>
https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/
> 3069/artifact/basic-suite.el7.x86_64/test_logs/basic-suite-4
> .2/post-006_migrations.py/
>
> On Mon, Sep 10, 2018 at 11:03 AM, Dafna Ron <dron(a)redhat.com> wrote:
>
>> Hi,
>>
>> we had a failure to deploy hosts for change;
>>
https://gerrit.ovirt.org/#/c/94228/1
>>
>> I don't think its related directly to the change but anything between
>> packaging: use system provided nimbus-jose-jwt also on el7 patch and
>> current patch could have caused the failure.
>>
>>
https://gerrit.ovirt.org/#/q/project:ovirt-engine+status:Mer
>> ged+branch:master
>>
>> Sandro, Ondra, can you please have a look?
>>
>> (Relevant) error snippet from the log:
>>
>> <error>
>>
>> 018-09-10 04:05:33,850-04 INFO
[org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Installation of
>> lago-upgrade-from-release-suite-master-host-0. Executing command via
>> SSH umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t
>> ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" >
/dev/null 2>&1; rm
>> -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar -b1
--warning=no-timestamp -C
>> "${MYTMP}" -x && "${MYTMP}"/ovirt-host-deploy
DIALOG/dialect=str:machine
>> DIALOG/customization=bool:True < /var/cache/ovirt-engine/ovirt-
>> host-deploy.tar
>> 2018-09-10 04:05:33,850-04 INFO
[org.ovirt.engine.core.utils.archivers.tar.CachedTar]
>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Tarball
>> '/var/cache/ovirt-engine/ovirt-host-deploy.tar' refresh
>> 2018-09-10 04:05:33,915-04 INFO [org.ovirt.engine.core.uutils.ssh.SSHDialog]
>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] SSH execute
>> 'root@lago-upgrade-from-release-suite-master-host-0' 'umask 0077;
>> MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t
ovirt-XXXXXXXXXX)"; trap
>> "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr
\"${MYTMP}\" >
>> /dev/null 2>&1" 0; tar -b1 --warning=no-timestamp -C
"${MYTMP}" -x &&
>> "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine
>> DIALOG/customization=bool:True'
>> 2018-09-10 04:05:34,694-04 INFO [org.ovirt.engine.core.dal.dbb
>> roker.auditloghandling.AuditLogDirector] (VdsDeploy) [33bb53b]
>> EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host
>> lago-upgrade-from-release-suite-master-host-0. Stage: Initializing.
>> 2018-09-10 04:05:34,710-04 INFO
[org.ovirt.engine.core.utils.transaction.TransactionSupport]
>> (VdsDeploy) [33bb53b] transaction rolled back
>> 2018-09-10 04:05:34,719-04 ERROR
[org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>> (VdsDeploy) [33bb53b] Error during deploy dialog
>> 2018-09-10 04:05:34,723-04 ERROR
[org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Error during host
>> lago-upgrade-from-release-suite-master-host-0 install
>> 2018-09-10 04:05:34,736-04 ERROR [org.ovirt.engine.core.dal.dbb
>> roker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-1)
>> [33bb53b] EVENT_ID: VDS_INSTALL_IN_PROGRESS_ERROR(511), An error has
>> occurred during installation of Host
lago-upgrade-from-release-suite-master-host-0:
>> CallableStatementCallback; SQL [{call insertauditlog(?, ?, ?, ?, ?, ?, ?,
>> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
>> ?, ?)}ERROR: invalid byte sequence for encoding "UTF8": 0x00; nested
>> exception is org.postgresql.util.PSQLException: ERROR: invalid byte
>> sequence for encoding "UTF8": 0x00.
>> 2018-09-10 04:05:34,737-04 ERROR
[org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase]
>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Error during host
>> lago-upgrade-from-release-suite-master-host-0 install, preferring first
>> exception: CallableStatementCallback; SQL [{call insertauditlog(?, ?, ?, ?,
>> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
>> ?, ?, ?, ?, ?)}ERROR: invalid byte sequence for encoding "UTF8": 0x00;
>> nested exception is org.postgresql.util.PSQLException: ERROR: invalid
>> byte sequence for encoding "UTF8": 0x00
>> 2018-09-10 04:05:34,737-04 ERROR [org.ovirt.engine.core.bll.hos
>> tdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1)
>> [33bb53b] Host installation failed for host
'febee66d-fa4b-4321-9c99-cb053ddd4817',
>> 'lago-upgrade-from-release-suite-master-host-0':
>> CallableStatementCallback; SQL [{call insertauditlog(?, ?, ?, ?, ?, ?, ?,
>> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
>> ?, ?)}ERROR: invalid byte sequence for encoding "UTF8": 0x00; nested
>> exception is org.postgresql.util.PSQLException: ERROR: invalid byte
>> sequence for encoding "UTF8": 0x00
>> 2018-09-10 04:05:34,760-04 INFO
[org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] START,
>> SetVdsStatusVDSCommand(HostName = lago-upgrade-from-release-suite-master-host-0,
>>
SetVdsStatusVDSCommandParameters:{hostId='febee66d-fa4b-4321-9c99-cb053ddd4817',
>> status='InstallFailed', nonOperationalReason='NONE',
>> stopSpmFailureLogged='false', maintenanceReason='null'}), log id:
793ad1ed
>> 2018-09-10 04:05:34,769-04 INFO
[org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand]
>> (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] FINISH,
>> SetVdsStatusVDSCommand, return: , log id: 793ad1ed
>> 2018-09-10 04:05:34,781-04 ERROR [org.ovirt.engine.core.dal.dbb
>> roker.auditloghandling.AuditLogDirector]
(EE-ManagedThreadFactory-engine-Thread-1)
>> [33bb53b] EVENT_ID: VDS_INSTALL_FAILED(505), Host
>> lago-upgrade-from-release-suite-master-host-0 installation failed.
>> CallableStatementCallback; SQL [{call insertauditlog(?, ?, ?, ?, ?, ?, ?,
>> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
>> ?, ?)}ERROR: invalid byte sequence for encoding "UTF8": 0x00; nested
>> exception is org.postgresql.util.PSQLException: ERROR: invalid byte
>> sequence for encoding "UTF8": 0x00.
>> 2018-09-10 04:05:34,794-04 INFO [org.ovirt.engine.core.bll.hos
>> tdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1)
>> [33bb53b] Lock freed to object 'EngineLock:{exclusiveLocks='[
>> febee66d-fa4b-4321-9c99-cb053ddd4817=VDS]', sharedLocks=''}'
>> 2018-09-10 04:06:10,003-04 INFO [org.ovirt.engine.core.bll.quota.QuotaManager]
>> (EE-ManagedThreadFactory-engineScheduled-Thread-44) [] Quota Cache
>> updated. (20 msec)
>> (END)
>>
>>
>> </error>
>>
>>
>
> _______________________________________________
> Infra mailing list -- infra(a)ovirt.org
> To unsubscribe send an email to infra-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
https://www.ovirt.org/communit
> y/about/community-guidelines/
> List Archives:
https://lists.ovirt.org/archiv
> es/list/infra(a)ovirt.org/message/TZGEPC56JYKKCOZ5XODOJMVLBR7MI6KG/
>
>