Thanks Ravi.
it seems there are two problems.
1. the host deploy failure
2. random failures caused by host monitoring exception.

Please note that master is broken for ovirt-engine and we need to stop merging until these issues are fixed.



On Mon, Sep 10, 2018 at 8:48 PM, Ravi Shankar Nori <rnori@redhat.com> wrote:
VdsDeploy is failing with message

Failed to execute stage 'Initializing': Invalid response opcode 'CONFIRM'

The SQL exceptions occur when trying to create an auditlog event with the above message which is a separate issue.

On Mon, Sep 10, 2018 at 7:41 AM, Dafna Ron <dron@redhat.com> wrote:
I am seeing a failure in 4.2 which suggests the same issue is causing failure in 4.2 as well (on different test)

full logs:

https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3069/artifact/basic-suite.el7.x86_64/test_logs/basic-suite-4.2/post-006_migrations.py/

error:

2018-09-10 05:27:19,084-0400 INFO  (jsonrpc/4) [api.virt] FINISH migrate return={'status': {'message': 'Migration in progress', 'code': 0}, 'progress': 0} from=::ffff:192.168.200.2,36202, flow_id=107cdbdd, vmId=9d5e4d63-41ad-47eb-bf04-0fc
8a49d59a6 (api:52)
2018-09-10 05:27:19,085-0400 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call VM.migrate succeeded in 0.01 seconds (__init__:573)
2018-09-10 05:27:21,583-0400 INFO  (migsrc/9d5e4d63) [virt.vm] (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Creation of destination VM took: 2 seconds (migration:473)
2018-09-10 05:27:21,583-0400 INFO  (migsrc/9d5e4d63) [virt.vm] (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') starting migration to qemu+tls://lago-basic-suite-4-2-host-1/system with miguri tcp://lago-basic-suite-4-2-host-1 (migration:502)
2018-09-10 05:27:25,044-0400 ERROR (periodic/0) [virt.periodic.Operation] <vdsm.virt.sampling.HostMonitor object at 0x7f0ddfc2b190> operation failed (periodic:222)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 220, in __call__
    self._func()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 576, in __call__
    sample = HostSample(self._pid)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 240, in __init__
    self.interfaces = _get_interfaces_and_samples()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 202, in _get_interfaces_and_samples
    links_and_samples[link.name] = InterfaceSample(link)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 109, in __init__
    self.speed = _getLinkSpeed(link)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 590, in _getLinkSpeed
    speed = vlan.speed(dev.name)
  File "/usr/lib/python2.7/site-packages/vdsm/network/link/vlan.py", line 35, in speed
    dev_speed = nic.read_speed_using_sysfs(dev_name)
  File "/usr/lib/python2.7/site-packages/vdsm/network/link/nic.py", line 47, in read_speed_using_sysfs
    s = int(f.read())
IOError: [Errno 22] Invalid argument
2018-09-10 05:27:25,499-0400 INFO  (libvirt/events) [virt.vm] (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') CPU stopped: onSuspend (vm:6203)
2018-09-10 05:27:27,167-0400 INFO  (migsrc/9d5e4d63) [virt.vm] (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') migration took 7 seconds to complete (migration:514)
2018-09-10 05:27:27,168-0400 INFO  (migsrc/9d5e4d63) [virt.vm] (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Changed state to Down: Migration succeeded (code=4) (vm:1693)
2018-09-10 05:27:27,176-0400 INFO  (migsrc/9d5e4d63) [virt.vm] (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Stopping connection (guestagent:438)
2018-09-10 05:27:27,183-0400 INFO  (jsonrpc/6) [api.virt] START destroy(gracefulAttempts=1) from=::ffff:192.168.200.2,36202, vmId=9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6 (api:46)
2018-09-10 05:27:27,183-0400 INFO  (jsonrpc/6) [virt.vm] (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Release VM resources (vm:5283)
2018-09-10 05:27:27,183-0400 WARN  (jsonrpc/6) [virt.vm] (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') trying to set state to Powering down when already Down (vm:612)
2018-09-10 05:27:27,183-0400 INFO  (jsonrpc/6) [virt.vm] (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') Stopping connection (guestagent:438)
2018-09-10 05:27:27,183-0400 INFO  (jsonrpc/6) [virt.vm] (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') _destroyVmGraceful attempt #0 (vm:5320)
2018-09-10 05:27:27,184-0400 WARN  (jsonrpc/6) [virt.vm] (vmId='9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6') VM '9d5e4d63-41ad-47eb-bf04-0fc8a49d59a6' couldn't be destroyed in libvirt: Requested operation is not valid: domain is not running (vm
:5341)
2018-09-10 05:27:27,185-0400 INFO  (jsonrpc/6) [vdsm.api] START teardownImage(sdUUID='8cbc7539-1ab3-4a3e-9504-65b5002c6546', spUUID='3b4effaf-6bad-42b8-810b-80faaa5d5a1e', imgUUID='90402f78-6c84-44cf-bcdc-7823f252d185', volUUID=None) from
=::ffff:192.168.200.2,36202, task_id=73c097e0-6cb6-449e-bebe-f0dfa55e2155 (api:46)
2018-09-10 05:27:27,186-0400 INFO  (jsonrpc/6) [storage.StorageDomain] Removing image run directory '/var/run/vdsm/storage/8cbc7539-1ab3-4a3e-9504-65b5002c6546/90402f78-6c84-44cf-bcdc-7823f252d185' (blockSD:1360)
2018-09-10 05:27:27,186-0400 INFO  (jsonrpc/6) [storage.fileUtils] Removing directory: /var/run/vdsm/storage/8cbc7539-1ab3-4a3e-9504-65b5002c6546/90402f78-6c84-44cf-bcdc-7823f252d185 (fileUtils:178)
2018-09-10 05:27:27,325-0400 INFO  (jsonrpc/6) [storage.LVM] Deactivating lvs: vg=8cbc7539-1ab3-4a3e-9504-65b5002c6546 lvs=['715b9383-87ec-4381-aaff-f084df42b3ce', 'ba23b2ac-90db-41b7-ab0f-c49f0c4d1b1e', '9201288a-918b-41a8-8ff2-6643833e1
d91'] (lvm:1310)
https://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/3069/artifact/basic-suite.el7.x86_64/test_logs/basic-suite-4.2/post-006_migrations.py/

On Mon, Sep 10, 2018 at 11:03 AM, Dafna Ron <dron@redhat.com> wrote:
Hi,

we had a failure to deploy hosts for change; https://gerrit.ovirt.org/#/c/94228/1

I don't think its related directly to the change but anything between
packaging: use system provided nimbus-jose-jwt also on el7 patch and current patch could have caused the failure.

https://gerrit.ovirt.org/#/q/project:ovirt-engine+status:Merged+branch:master

Sandro, Ondra, can you please have a look?

(Relevant) error snippet from the log:

<error>


018-09-10 04:05:33,850-04 INFO  [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Installation of lago-upgrade-from-release-suite-master-host-0. Executing command via SSH umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar -b1 --warning=no-timestamp -C "${MYTMP}" -x &&  "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True < /var/cache/ovirt-engine/ovirt-host-deploy.tar
2018-09-10 04:05:33,850-04 INFO  [org.ovirt.engine.core.utils.archivers.tar.CachedTar] (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Tarball '/var/cache/ovirt-engine/ovirt-host-deploy.tar' refresh
2018-09-10 04:05:33,915-04 INFO  [org.ovirt.engine.core.uutils.ssh.SSHDialog] (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] SSH execute 'root@lago-upgrade-from-release-suite-master-host-0' 'umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar -b1 --warning=no-timestamp -C "${MYTMP}" -x &&  "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True'
2018-09-10 04:05:34,694-04 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [33bb53b] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host lago-upgrade-from-release-suite-master-host-0. Stage: Initializing.
2018-09-10 04:05:34,710-04 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (VdsDeploy) [33bb53b] transaction rolled back
2018-09-10 04:05:34,719-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (VdsDeploy) [33bb53b] Error during deploy dialog
2018-09-10 04:05:34,723-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Error during host lago-upgrade-from-release-suite-master-host-0 install
2018-09-10 04:05:34,736-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] EVENT_ID: VDS_INSTALL_IN_PROGRESS_ERROR(511), An error has occurred during installation of Host lago-upgrade-from-release-suite-master-host-0: CallableStatementCallback; SQL [{call insertauditlog(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}ERROR: invalid byte sequence for encoding "UTF8": 0x00; nested exception is org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00.
2018-09-10 04:05:34,737-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Error during host lago-upgrade-from-release-suite-master-host-0 install, preferring first exception: CallableStatementCallback; SQL [{call insertauditlog(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}ERROR: invalid byte sequence for encoding "UTF8": 0x00; nested exception is org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00
2018-09-10 04:05:34,737-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Host installation failed for host 'febee66d-fa4b-4321-9c99-cb053ddd4817', 'lago-upgrade-from-release-suite-master-host-0': CallableStatementCallback; SQL [{call insertauditlog(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}ERROR: invalid byte sequence for encoding "UTF8": 0x00; nested exception is org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00
2018-09-10 04:05:34,760-04 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] START, SetVdsStatusVDSCommand(HostName = lago-upgrade-from-release-suite-master-host-0, SetVdsStatusVDSCommandParameters:{hostId='febee66d-fa4b-4321-9c99-cb053ddd4817', status='InstallFailed', nonOperationalReason='NONE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 793ad1ed
2018-09-10 04:05:34,769-04 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] FINISH, SetVdsStatusVDSCommand, return: , log id: 793ad1ed
2018-09-10 04:05:34,781-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] EVENT_ID: VDS_INSTALL_FAILED(505), Host lago-upgrade-from-release-suite-master-host-0 installation failed. CallableStatementCallback; SQL [{call insertauditlog(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}ERROR: invalid byte sequence for encoding "UTF8": 0x00; nested exception is org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00.
2018-09-10 04:05:34,794-04 INFO  [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [33bb53b] Lock freed to object 'EngineLock:{exclusiveLocks='[febee66d-fa4b-4321-9c99-cb053ddd4817=VDS]', sharedLocks=''}'
2018-09-10 04:06:10,003-04 INFO  [org.ovirt.engine.core.bll.quota.QuotaManager] (EE-ManagedThreadFactory-engineScheduled-Thread-44) [] Quota Cache updated. (20 msec)
(END)


</error>




_______________________________________________
Infra mailing list -- infra@ovirt.org
To unsubscribe send an email to infra-leave@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/infra@ovirt.org/message/TZGEPC56JYKKCOZ5XODOJMVLBR7MI6KG/