
15 Feb
2016
15 Feb
'16
4:02 p.m.
On 2016-02-14 12:16, Arik Hadas wrote: > > ----- Original Message ----- >> >> ----- Original Message ----- >>>> On 11 Feb 2016, at 17:02, Johannes Tiefenbacher <jojo@linbit.com> wrote: >>>> >>>> Hi, >>>> finally I am posting something to this list :) I read it for quite some >>>> time now and I am an ovirt user since 3.0. >>> Hi, >>> welcome:) >>> >>>> >>>> I updated an engine installation from 3.2 to 3.6 (stepwise of course, and >>>> yes I know that's pretty outdated ;-). Then I updated the associated >>>> Centos6 hosts vdsm as well, from 3.10.x to 3.16.30. I also set my cluster >>>> comp level to 3.5(3.6 comp level is only possible with El7 hosts if I >>>> understood correctly). >>>> >>>> After my first failover test a VM could not be restarted, altough the >>>> host >>>> where it was running could correctly be fenced. >>>> >>>> The reason according to engine's log was this: >>>> >>>> VM xxxxxxxx is down with error. Exit message: internal error process >>>> exited >>>> while connecting to monitor: qemu-kvm: -device >>>> virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4: >>>> Duplicate ID 'virtio-serial0' for device >>>> >>>> >>>> I then recognized that I am not able to run this VM on any host. Ich >>>> checked the virtual hardware in the engine database and could confirm >>>> that >>>> ALL my VMs had this problem: 2 devices with alias='virtio-serial0’ >>> it may very well be a bug, but it would be quite difficult to say unless it >>> is reproducible. It may be broken from earlier releases >>> Arik/Shmuel, maybe it rings a bell? >> In 3.6 we changed virtio-serial to be a managed device. >> The script named 03_06_0310_change_virtio_serial_to_managed_device.sql >> changes unmanaged virtio-serial devices (that were all unmanaged before) to >> be managed. >> A potential flow that will cause this duplication I can think of is: >> 1. Have a running VM in a pre-3.6 engine - it has unmanaged virtio-serial >> 2. Upgrade to 3.6 while the VM is running - the unmanaged virtio-serial >> becomes managed >> 3. Do something that will change the hash of the devices >> => the engine will add an additional unmanaged virtio-serial device >> >> Why didn't it happen before? because the handling of unmanaged devices was: >> 1. Upon change in the VM devices (their hash), ask for all the devices >> (full-list) >> 2. Remove all previous unmanaged devices >> 3. Add every device that does not exist in the database >> When we add an unmanaged device we generate a new ID (!) - therefore we had >> to remove all the previous unmanaged devices before adding the new ones. >> If the previous unmanaged virtio-serial became managed, it is not removed and >> we will end up having two virtio-serial devices. >> >> @Johannes - is it true that the VM was running before the engine got updated >> to 3.6 and wasn't powered-off since then? yes that's true >> >> I managed to simulate this. >> We probably need to prevent the addition of unmanaged virtio-serial in 3.6 >> engine but IMO we should also use the ID reported by VDSM instead of >> generating a new one to eliminate similar issues in the future. >> @Eli, Omer - can you recall why can't we use the ID we get from VDSM for the >> unmanaged devices? >> (we can continue this discussion in devel-list or in bugzilla..) >> >>>> e.g.: >>>> >>>> ---- >>>> engine=# SELECT * FROM vm_device WHERE vm_device.device = 'virtio-serial' >>>> AND vm_id = 'cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec' ORDER BY vm_id; >>>> -[ RECORD 1 >>>> ]-------------+------------------------------------------------------------- >>>> device_id | 2821d03c-ce88-4613-9095-e88eadcd3792 >>>> vm_id | cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec >>>> type | controller >>>> device | virtio-serial >>>> address | >>>> boot_order | 0 >>>> spec_params | { } >>>> is_managed | t >>>> is_plugged | f >>>> is_readonly | f >>>> _create_date | 2016-01-14 08:30:43.797161+01 >>>> _update_date | 2016-02-10 10:04:56.228724+01 >>>> alias | virtio-serial0 >>>> custom_properties | { } >>>> snapshot_id | >>>> logical_name | >>>> is_using_scsi_reservation | f >>>> -[ RECORD 2 >>>> ]-------------+------------------------------------------------------------- >>>> device_id | 29e0805f-d836-451a-9ec3-9031baa995e6 >>>> vm_id | cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec >>>> type | controller >>>> device | virtio-serial >>>> address | {bus=0x00, domain=0x0000, type=pci, >>>> slot=0x04, >>>> function=0x0} >>>> boot_order | 0 >>>> spec_params | { } >>>> is_managed | f >>>> is_plugged | t >>>> is_readonly | f >>>> _create_date | 2016-02-11 13:47:02.69992+01 >>>> _update_date | >>>> alias | virtio-serial0 >>>> custom_properties | >>>> snapshot_id | >>>> logical_name | >>>> is_using_scsi_reservation | f >>>> >>>> ---- >>>> >>>> My solution was this: >>>> >>>> DELETE FROM vm_device WHERE vm_id='cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec' >>>> AND vm_device.device = 'virtio-serial' AND address = ''; >>>> >>>> (just renaming one of the aliases to virtio-serial1" did not help) >> I believe it is not the right solution, it is better to remove the unmanaged >> device >> 1. For consistency >> 2. We changed the virtio-serial device to be managed in order to prevent a >> problem with VM-pools where in some cases Windows OS detects an existing >> virtio-serial device as a new device (and therefore pops-up a dialog for >> searching for an appropriate driver). By having the virtio-serial device >> managed we preserve its address and eliminate this problem. > And then to restart the VM of course, otherwise it will be added again the next time the devices change.. alright, so i just deleted the wrong one. was a 50:50 change ;-) I have some test vms that can be rebootet and i can experiment with. though for my production vms i will delete the unmanaged one as well. it shouldn't hurt if there is no virtio-serial device at all, right? i am pretty sure we are not using these devices. does a vm bother if these devices are gone? i also recently found the checkbox in the vm settings where i can activate or deactivate a virtio-serial device. this is unchecked in all my vms. just in case this was not obvious for you, and to be complete with my information. I think I should understand the difference between managed and unmanaged devices first.... this should help i guess: http://www.ovirt.org/Features/Design/StableDeviceAddresses > >>>> >>>> >>>> Is this a known issue? Couldn't find anything so far. >>>> >>>> Should I also post this to the developer list? I am not subscribed there >>>> yet, wanted to check out here first. >> I think it would be best to track and have it documented in bugzilla. >> Please open a bug (https://bugzilla.redhat.com) >> alright i'll open a bug. and just for my future behaviour on this list: is it good practice to post stuff like that to this list first, before bothering the devel list or open a bugzilla instantly without knowing if it's actually a bug. or should i have posted this to the devel list in the first place? thank you all for your replies best Jojo