Re: [ovirt-users] virtio-serial0 duplicate id

15 Feb 2016

      On 2016-02-14 12:16, Arik Hadas wrote:
>
> ----- Original Message -----
>>
>> ----- Original Message -----
>>>> On 11 Feb 2016, at 17:02, Johannes Tiefenbacher <jojo@linbit.com> wrote:
>>>>
>>>> Hi,
>>>> finally I am posting something to this list :) I read it for quite some
>>>> time now and I am an ovirt user since 3.0.
>>> Hi,
>>> welcome:)
>>>
>>>>
>>>> I updated an engine installation from 3.2 to 3.6 (stepwise of course, and
>>>> yes I know that's pretty outdated ;-). Then I updated the associated
>>>> Centos6 hosts vdsm as well, from 3.10.x to 3.16.30. I also set my cluster
>>>> comp level to 3.5(3.6 comp level is only possible with El7 hosts if I
>>>> understood correctly).
>>>>
>>>> After my first failover test a VM could not be restarted, altough the
>>>> host
>>>> where it was running could correctly be fenced.
>>>>
>>>> The reason according to engine's log was this:
>>>>
>>>> VM xxxxxxxx is down with error. Exit message: internal error process
>>>> exited
>>>> while connecting to monitor: qemu-kvm: -device
>>>> virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4:
>>>> Duplicate ID 'virtio-serial0' for device
>>>>
>>>>
>>>> I then recognized that I am not able to run this VM on any host. Ich
>>>> checked the virtual hardware in the engine database and could confirm
>>>> that
>>>> ALL my VMs had this problem: 2 devices with alias='virtio-serial0’
>>> it may very well be a bug, but it would be quite difficult to say unless it
>>> is reproducible. It may be broken from earlier releases
>>> Arik/Shmuel, maybe it rings a bell?
>> In 3.6 we changed virtio-serial to be a managed device.
>> The script named 03_06_0310_change_virtio_serial_to_managed_device.sql
>> changes unmanaged virtio-serial devices (that were all unmanaged before) to
>> be managed.
>> A potential flow that will cause this duplication I can think of is:
>> 1. Have a running VM in a pre-3.6 engine - it has unmanaged virtio-serial
>> 2. Upgrade to 3.6 while the VM is running - the unmanaged virtio-serial
>> becomes managed
>> 3. Do something that will change the hash of the devices
>> => the engine will add an additional unmanaged virtio-serial device
>>
>> Why didn't it happen before? because the handling of unmanaged devices was:
>> 1. Upon change in the VM devices (their hash), ask for all the devices
>> (full-list)
>> 2. Remove all previous unmanaged devices
>> 3. Add every device that does not exist in the database
>> When we add an unmanaged device we generate a new ID (!) - therefore we had
>> to remove all the previous unmanaged devices before adding the new ones.
>> If the previous unmanaged virtio-serial became managed, it is not removed and
>> we will end up having two virtio-serial devices.
>>
>> @Johannes - is it true that the VM was running before the engine got updated
>> to 3.6 and wasn't powered-off since then?
yes that's true
>>
>> I managed to simulate this.
>> We probably need to prevent the addition of unmanaged virtio-serial in 3.6
>> engine but IMO we should also use the ID reported by VDSM instead of
>> generating a new one to eliminate similar issues in the future.
>> @Eli, Omer - can you recall why can't we use the ID we get from VDSM for the
>> unmanaged devices?
>> (we can continue this discussion in devel-list or in bugzilla..)
>>
>>>> e.g.:
>>>>
>>>> ----
>>>> engine=# SELECT * FROM vm_device WHERE vm_device.device = 'virtio-serial'
>>>> AND vm_id = 'cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec' ORDER BY vm_id;
>>>> -[ RECORD 1
>>>> ]-------------+-------------------------------------------------------------
>>>> device_id                 | 2821d03c-ce88-4613-9095-e88eadcd3792
>>>> vm_id                     | cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec
>>>> type                      | controller
>>>> device                    | virtio-serial
>>>> address                   |
>>>> boot_order                | 0
>>>> spec_params               | { }
>>>> is_managed                | t
>>>> is_plugged                | f
>>>> is_readonly               | f
>>>> _create_date              | 2016-01-14 08:30:43.797161+01
>>>> _update_date              | 2016-02-10 10:04:56.228724+01
>>>> alias                     | virtio-serial0
>>>> custom_properties         | { }
>>>> snapshot_id               |
>>>> logical_name              |
>>>> is_using_scsi_reservation | f
>>>> -[ RECORD 2
>>>> ]-------------+-------------------------------------------------------------
>>>> device_id                 | 29e0805f-d836-451a-9ec3-9031baa995e6
>>>> vm_id                     | cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec
>>>> type                      | controller
>>>> device                    | virtio-serial
>>>> address                   | {bus=0x00, domain=0x0000, type=pci,
>>>> slot=0x04,
>>>> function=0x0}
>>>> boot_order                | 0
>>>> spec_params               | { }
>>>> is_managed                | f
>>>> is_plugged                | t
>>>> is_readonly               | f
>>>> _create_date              | 2016-02-11 13:47:02.69992+01
>>>> _update_date              |
>>>> alias                     | virtio-serial0
>>>> custom_properties         |
>>>> snapshot_id               |
>>>> logical_name              |
>>>> is_using_scsi_reservation | f
>>>>
>>>> ----
>>>>
>>>> My solution was this:
>>>>
>>>> DELETE FROM vm_device WHERE vm_id='cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec'
>>>> AND vm_device.device = 'virtio-serial' AND address = '';
>>>>
>>>> (just renaming one of the aliases to virtio-serial1" did not help)
>> I believe it is not the right solution, it is better to remove the unmanaged
>> device
>> 1. For consistency
>> 2. We changed the virtio-serial device to be managed in order to prevent a
>> problem with VM-pools where in some cases Windows OS detects an existing
>> virtio-serial device as a new device (and therefore pops-up a dialog for
>> searching for an appropriate driver). By having the virtio-serial device
>> managed we preserve its address and eliminate this problem.
> And then to restart the VM of course, otherwise it will be added again the next time the devices change..

alright, so i just deleted the wrong one. was a 50:50 change ;-)

I have some test vms that can be rebootet and i can experiment with.

though for my production vms i will delete the unmanaged one as well.
it shouldn't hurt if there is no virtio-serial device at all, right? i 
am pretty sure we are not using these devices. does a vm bother if these 
devices are gone?

i also recently found the checkbox in the vm settings where i can 
activate or deactivate a virtio-serial device. this is unchecked in all 
my vms. just in case this was not obvious for you, and to be complete 
with my information.

I think I should understand the difference between managed and unmanaged 
devices first.... this should help i guess: 
http://www.ovirt.org/Features/Design/StableDeviceAddresses

>
>>>>
>>>>
>>>> Is this a known issue? Couldn't find anything so far.
>>>>
>>>> Should I also post this to the developer list? I am not subscribed there
>>>> yet, wanted to check out here first.
>> I think it would be best to track and have it documented in bugzilla.
>> Please open a bug (https://bugzilla.redhat.com)
>>
alright i'll open a bug.
and just for my future behaviour on this list: is it good practice to 
post stuff like that to this list first, before bothering the devel list 
or open a bugzilla instantly without knowing if it's actually a bug.
or should i have posted this to the devel list in the first place?

thank you all for your replies
best
Jojo