
----- Original Message -----
----- Original Message -----
On 11 Feb 2016, at 17:02, Johannes Tiefenbacher <jojo@linbit.com> wrote:
Hi, finally I am posting something to this list :) I read it for quite some time now and I am an ovirt user since 3.0.
Hi, welcome:)
I updated an engine installation from 3.2 to 3.6 (stepwise of course, and yes I know that's pretty outdated ;-). Then I updated the associated Centos6 hosts vdsm as well, from 3.10.x to 3.16.30. I also set my cluster comp level to 3.5(3.6 comp level is only possible with El7 hosts if I understood correctly).
After my first failover test a VM could not be restarted, altough the host where it was running could correctly be fenced.
The reason according to engine's log was this:
VM xxxxxxxx is down with error. Exit message: internal error process exited while connecting to monitor: qemu-kvm: -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4: Duplicate ID 'virtio-serial0' for device
I then recognized that I am not able to run this VM on any host. Ich checked the virtual hardware in the engine database and could confirm that ALL my VMs had this problem: 2 devices with alias='virtio-serial0’
it may very well be a bug, but it would be quite difficult to say unless it is reproducible. It may be broken from earlier releases Arik/Shmuel, maybe it rings a bell?
In 3.6 we changed virtio-serial to be a managed device. The script named 03_06_0310_change_virtio_serial_to_managed_device.sql changes unmanaged virtio-serial devices (that were all unmanaged before) to be managed. A potential flow that will cause this duplication I can think of is: 1. Have a running VM in a pre-3.6 engine - it has unmanaged virtio-serial 2. Upgrade to 3.6 while the VM is running - the unmanaged virtio-serial becomes managed 3. Do something that will change the hash of the devices => the engine will add an additional unmanaged virtio-serial device
Why didn't it happen before? because the handling of unmanaged devices was: 1. Upon change in the VM devices (their hash), ask for all the devices (full-list) 2. Remove all previous unmanaged devices 3. Add every device that does not exist in the database When we add an unmanaged device we generate a new ID (!) - therefore we had to remove all the previous unmanaged devices before adding the new ones. If the previous unmanaged virtio-serial became managed, it is not removed and we will end up having two virtio-serial devices.
@Johannes - is it true that the VM was running before the engine got updated to 3.6 and wasn't powered-off since then?
I managed to simulate this. We probably need to prevent the addition of unmanaged virtio-serial in 3.6 engine but IMO we should also use the ID reported by VDSM instead of generating a new one to eliminate similar issues in the future. @Eli, Omer - can you recall why can't we use the ID we get from VDSM for the unmanaged devices? (we can continue this discussion in devel-list or in bugzilla..)
e.g.:
---- engine=# SELECT * FROM vm_device WHERE vm_device.device = 'virtio-serial' AND vm_id = 'cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec' ORDER BY vm_id; -[ RECORD 1 ]-------------+------------------------------------------------------------- device_id | 2821d03c-ce88-4613-9095-e88eadcd3792 vm_id | cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec type | controller device | virtio-serial address | boot_order | 0 spec_params | { } is_managed | t is_plugged | f is_readonly | f _create_date | 2016-01-14 08:30:43.797161+01 _update_date | 2016-02-10 10:04:56.228724+01 alias | virtio-serial0 custom_properties | { } snapshot_id | logical_name | is_using_scsi_reservation | f -[ RECORD 2 ]-------------+------------------------------------------------------------- device_id | 29e0805f-d836-451a-9ec3-9031baa995e6 vm_id | cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec type | controller device | virtio-serial address | {bus=0x00, domain=0x0000, type=pci, slot=0x04, function=0x0} boot_order | 0 spec_params | { } is_managed | f is_plugged | t is_readonly | f _create_date | 2016-02-11 13:47:02.69992+01 _update_date | alias | virtio-serial0 custom_properties | snapshot_id | logical_name | is_using_scsi_reservation | f
----
My solution was this:
DELETE FROM vm_device WHERE vm_id='cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec' AND vm_device.device = 'virtio-serial' AND address = '';
(just renaming one of the aliases to virtio-serial1" did not help)
I believe it is not the right solution, it is better to remove the unmanaged device 1. For consistency 2. We changed the virtio-serial device to be managed in order to prevent a problem with VM-pools where in some cases Windows OS detects an existing virtio-serial device as a new device (and therefore pops-up a dialog for searching for an appropriate driver). By having the virtio-serial device managed we preserve its address and eliminate this problem.
And then to restart the VM of course, otherwise it will be added again the next time the devices change..
Is this a known issue? Couldn't find anything so far.
Should I also post this to the developer list? I am not subscribed there yet, wanted to check out here first.
I think it would be best to track and have it documented in bugzilla. Please open a bug (https://bugzilla.redhat.com)
thanks in advance and all the best Jojo @ LINBIT VIE
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users