[ovirt-users] vdsm storage problem - maybe cache problem?

Maor Lipchuk mlipchuk at redhat.com
Fri May 22 13:10:10 UTC 2015


Thanks!
Looks very good

Regards,
Maor


----- Oorspronkelijk bericht -----
> Van: ml at ohnewald.net
> Aan: "Maor Lipchuk" <mlipchuk at redhat.com>
> Cc: users at ovirt.org
> Verzonden: Donderdag 21 mei 2015 21:04:33
> Onderwerp: Re: [ovirt-users] vdsm storage problem - maybe cache problem?
> 
> Done: https://bugzilla.redhat.com/show_bug.cgi?id=1223925
> 
> Anything else i forgot?
> 
> Thanks,
> Mario
> 
> 
> Am 20.05.2015 um 15:12 schrieb Maor Lipchuk:
> > Mario,
> >
> > Can you please open a bug with all the VDSM and engine logs.
> > so this issue can be tracked.
> >
> > This is the link:
> >   https://bugzilla.redhat.com/enter_bug.cgi?product=oVirt
> >
> >
> > Regards,
> > Maor
> >
> >
> >
> > ----- Original Message -----
> >> From: ml at ohnewald.net
> >> Cc: users at ovirt.org
> >> Sent: Wednesday, May 20, 2015 1:28:53 PM
> >> Subject: Re: [ovirt-users] vdsm storage problem - maybe cache problem?
> >>
> >> Hello List,
> >>
> >> i really need some help here....could someone please give it a try? I
> >> already tried to get help on the irc channel but it seems that my
> >> problem here is too complicated or maybe i am providing not useful infos?
> >>
> >> DB & vdsClient: http://fpaste.org/223483/14320588/ (i think is part is
> >> very intresting)
> >>
> >> engine.log: http://paste.fedoraproject.org/223349/04494414
> >> Node02 vdsm Log: http://paste.fedoraproject.org/223350/43204496
> >> Node01 vdsm Log: http://paste.fedoraproject.org/223347/20448951
> >>
> >> Why does my vdsm look for StorageDomain
> >> 036b5575-51fa-4f14-8b05-890d7807894c ? => This was a NFS Export which i
> >> deleted from the GUI yesterday (!!!).
> >>
> >>     From the Database Log/Dump:
> >> =============================
> >> USER_FORCE_REMOVE_STORAGE_DOMAIN        981     0       Storage Domain
> >> EXPORT2 was forcibly removed by admin at internal   f
> >> b384b3da-02a6-44f3-a3f6-56751ce8c26d    HP_Proliant_DL180G6
> >> 036b5575-51fa-4f14-8b05-890d7807894c    EXPORT2
> >> 00000000-0000-0000-0000-000000000000            c1754ff
> >> 807321f6-2043-4a26-928c-0ce6b423c381
> >>
> >>
> >> I already put one node into maintance, rebootet and activated it. Still
> >> the same problem.
> >>
> >> Some Screen Shots:
> >> --------------------
> >> http://postimg.org/image/8zo4ujgjb/
> >> http://postimg.org/image/le918grdr/
> >> http://postimg.org/image/wnawwhrgh/
> >>
> >>
> >> My GlusterFS fully works and is not the problem here i guess:
> >> ==============================================================
> >>
> >> 2015-05-19 04:09:06,292 INFO
> >> [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand]
> >> (DefaultQuartzScheduler_Worker-43) [3df9132b] START,
> >> HSMClearTaskVDSCommand(HostName = ovirt-node02.stuttgart.imos.net,
> >> HostId = 6948da12-0b8a-4b6d-a9af-162e6c25dad3,
> >> taskId=2aeec039-5b95-40f0-8410-da62b44a28e8), log id: 19b18840
> >> 2015-05-19 04:09:06,337 INFO
> >> [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand]
> >> (DefaultQuartzScheduler_Worker-43) [3df9132b] FINISH,
> >> HSMClearTaskVDSCommand, log id: 19b18840
> >>
> >> I already had a chat with Maor Lipchuk and he told me to add another
> >> host to the datacenter and then re-initialize it. I will migrate back to
> >> ESXi for now to get those two nodes free. Then we can mess with it if
> >> anyone is interested to help me. Otherwise i will have to stay with ESXi
> >> :-(
> >>
> >> Has anyone else an idea until then? Why is my vdsm host all messed up
> >> with that zombie StorageDomain?
> >>
> >>
> >> Thanks for your time. I really, really appreciate it.
> >>
> >> Mario
> >>
> >>
> >>
> >>
> >>
> >>
> >> Am 19.05.15 um 14:57 schrieb ml at ohnewald.net:
> >>> Hello List,
> >>>
> >>> okay, i really need some help now. I stopped vdsmd for a little bit too
> >>> long, fencing stepped in and rebootet node01.
> >>>
> >>> I now can NOT start any vm because the storage is marked "Unknown".
> >>>
> >>>
> >>> Since node01 rebootet i am wondering why the hell it is still looking to
> >>> this StorageDomain:
> >>>
> >>> StorageDomainCache::(_findDomain) domain
> >>> 036b5575-51fa-4f14-8b05-890d7807894c not found
> >>>
> >>> Can anyone please please tell me where the id exactly it comes from?
> >>>
> >>>
> >>> Thanks,
> >>> Mario
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Am 18.05.15 um 14:21 schrieb Maor Lipchuk:
> >>>> Hi Mario,
> >>>>
> >>>> Can u try to mount this directly from the Host?
> >>>> Can u please attach the VDSM and engine logs
> >>>>
> >>>> Thanks,
> >>>> Maor
> >>>>
> >>>>
> >>>> ----- Original Message -----
> >>>>> From: ml at ohnewald.net
> >>>>> To: "Maor Lipchuk" <mlipchuk at redhat.com>
> >>>>> Cc: users at ovirt.org
> >>>>> Sent: Monday, May 18, 2015 2:36:38 PM
> >>>>> Subject: Re: [ovirt-users] vdsm storage problem - maybe cache problem?
> >>>>>
> >>>>> Hi Maor,
> >>>>>
> >>>>> thanks for the quick reply.
> >>>>>
> >>>>> Am 18.05.15 um 13:25 schrieb Maor Lipchuk:
> >>>>>
> >>>>>>> Now my Question: Why does the vdsm node not know that i deleted the
> >>>>>>> storage? Has the vdsm cached this mount informations? Why does it
> >>>>>>> still
> >>>>>>> try to access 036b5575-51fa-4f14-8b05-890d7807894c?
> >>>>>>
> >>>>>> Yes, the vdsm use a cache for Storage Domains, you can try to
> >>>>>> restart the
> >>>>>> vdsmd service instead of rebooting the host.
> >>>>>>
> >>>>> I am still getting the same error.
> >>>>>
> >>>>>
> >>>>> [root at ovirt-node01 ~]# /etc/init.d/vdsmd stop
> >>>>> Shutting down vdsm daemon:
> >>>>> vdsm watchdog stop                                         [  OK  ]
> >>>>> vdsm: Running run_final_hooks                              [  OK  ]
> >>>>> vdsm stop                                                  [  OK  ]
> >>>>> [root at ovirt-node01 ~]#
> >>>>> [root at ovirt-node01 ~]#
> >>>>> [root at ovirt-node01 ~]#
> >>>>> [root at ovirt-node01 ~]# ps aux | grep vdsmd
> >>>>> root      3198  0.0  0.0  11304   740 ?        S<   May07   0:00
> >>>>> /bin/bash -e /usr/share/vdsm/respawn --minlifetime 10 --daemon
> >>>>> --masterpid /var/run/vdsm/supervdsm_respawn.pid
> >>>>> /usr/share/vdsm/supervdsmServer --sockfile /var/run/vdsm/svdsm.sock
> >>>>> --pidfile /var/run/vdsm/supervdsmd.pid
> >>>>> root      3205  0.0  0.0 922368 26724 ?        S<l  May07  12:10
> >>>>> /usr/bin/python /usr/share/vdsm/supervdsmServer --sockfile
> >>>>> /var/run/vdsm/svdsm.sock --pidfile /var/run/vdsm/supervdsmd.pid
> >>>>> root     15842  0.0  0.0 103248   900 pts/0    S+   13:35   0:00 grep
> >>>>> vdsmd
> >>>>>
> >>>>>
> >>>>> [root at ovirt-node01 ~]# /etc/init.d/vdsmd start
> >>>>> initctl: Job is already running: libvirtd
> >>>>> vdsm: Running mkdirs
> >>>>> vdsm: Running configure_coredump
> >>>>> vdsm: Running configure_vdsm_logs
> >>>>> vdsm: Running run_init_hooks
> >>>>> vdsm: Running gencerts
> >>>>> vdsm: Running check_is_configured
> >>>>> libvirt is already configured for vdsm
> >>>>> sanlock service is already configured
> >>>>> vdsm: Running validate_configuration
> >>>>> SUCCESS: ssl configured to true. No conflicts
> >>>>> vdsm: Running prepare_transient_repository
> >>>>> vdsm: Running syslog_available
> >>>>> vdsm: Running nwfilter
> >>>>> vdsm: Running dummybr
> >>>>> vdsm: Running load_needed_modules
> >>>>> vdsm: Running tune_system
> >>>>> vdsm: Running test_space
> >>>>> vdsm: Running test_lo
> >>>>> vdsm: Running restore_nets
> >>>>> vdsm: Running unified_network_persistence_upgrade
> >>>>> vdsm: Running upgrade_300_nets
> >>>>> Starting up vdsm daemon:
> >>>>> vdsm start                                                 [  OK  ]
> >>>>> [root at ovirt-node01 ~]#
> >>>>>
> >>>>> [root at ovirt-node01 ~]# grep ERROR /var/log/vdsm/vdsm.log | tail -n 20
> >>>>> Thread-13::ERROR::2015-05-18
> >>>>> 13:35:03,631::sdc::137::Storage.StorageDomainCache::(_findDomain)
> >>>>> looking for unfetched domain abc51e26-7175-4b38-b3a8-95c6928fbc2b
> >>>>> Thread-13::ERROR::2015-05-18
> >>>>> 13:35:03,632::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain)
> >>>>>
> >>>>> looking for domain abc51e26-7175-4b38-b3a8-95c6928fbc2b
> >>>>> Thread-36::ERROR::2015-05-18
> >>>>> 13:35:11,607::sdc::137::Storage.StorageDomainCache::(_findDomain)
> >>>>> looking for unfetched domain 036b5575-51fa-4f14-8b05-890d7807894c
> >>>>> Thread-36::ERROR::2015-05-18
> >>>>> 13:35:11,621::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain)
> >>>>>
> >>>>> looking for domain 036b5575-51fa-4f14-8b05-890d7807894c
> >>>>> Thread-36::ERROR::2015-05-18
> >>>>> 13:35:11,960::sdc::143::Storage.StorageDomainCache::(_findDomain)
> >>>>> domain
> >>>>> 036b5575-51fa-4f14-8b05-890d7807894c not found
> >>>>> Thread-36::ERROR::2015-05-18
> >>>>> 13:35:11,960::domainMonitor::239::Storage.DomainMonitorThread::(_monitorDomain)
> >>>>>
> >>>>> Error while collecting domain 036b5575-51fa-4f14-8b05-890d7807894c
> >>>>> monitoring information
> >>>>> Thread-36::ERROR::2015-05-18
> >>>>> 13:35:21,962::sdc::137::Storage.StorageDomainCache::(_findDomain)
> >>>>> looking for unfetched domain 036b5575-51fa-4f14-8b05-890d7807894c
> >>>>> Thread-36::ERROR::2015-05-18
> >>>>> 13:35:21,965::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain)
> >>>>>
> >>>>> looking for domain 036b5575-51fa-4f14-8b05-890d7807894c
> >>>>> Thread-36::ERROR::2015-05-18
> >>>>> 13:35:22,068::sdc::143::Storage.StorageDomainCache::(_findDomain)
> >>>>> domain
> >>>>> 036b5575-51fa-4f14-8b05-890d7807894c not found
> >>>>> Thread-36::ERROR::2015-05-18
> >>>>> 13:35:22,072::domainMonitor::239::Storage.DomainMonitorThread::(_monitorDomain)
> >>>>>
> >>>>> Error while collecting domain 036b5575-51fa-4f14-8b05-890d7807894c
> >>>>> monitoring information
> >>>>> Thread-15::ERROR::2015-05-18
> >>>>> 13:35:33,821::task::866::TaskManager.Task::(_setError)
> >>>>> Task=`54bdfc77-f63a-493b-b24e-e5a3bc4977bb`::Unexpected error
> >>>>> Thread-15::ERROR::2015-05-18
> >>>>> 13:35:33,864::dispatcher::65::Storage.Dispatcher.Protect::(run)
> >>>>> {'status': {'message': "Unknown pool id, pool not connected:
> >>>>> ('b384b3da-02a6-44f3-a3f6-56751ce8c26d',)", 'code': 309}}
> >>>>> Thread-13::ERROR::2015-05-18
> >>>>> 13:35:33,930::sdc::137::Storage.StorageDomainCache::(_findDomain)
> >>>>> looking for unfetched domain abc51e26-7175-4b38-b3a8-95c6928fbc2b
> >>>>> Thread-15::ERROR::2015-05-18
> >>>>> 13:35:33,928::task::866::TaskManager.Task::(_setError)
> >>>>> Task=`fe9bb0fa-cf1e-4b21-af00-0698c6d1718f`::Unexpected error
> >>>>> Thread-13::ERROR::2015-05-18
> >>>>> 13:35:33,932::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain)
> >>>>>
> >>>>> looking for domain abc51e26-7175-4b38-b3a8-95c6928fbc2b
> >>>>> Thread-15::ERROR::2015-05-18
> >>>>> 13:35:33,978::dispatcher::65::Storage.Dispatcher.Protect::(run)
> >>>>> {'status': {'message': 'Not SPM: ()', 'code': 654}}
> >>>>> Thread-36::ERROR::2015-05-18
> >>>>> 13:35:41,117::sdc::137::Storage.StorageDomainCache::(_findDomain)
> >>>>> looking for unfetched domain 036b5575-51fa-4f14-8b05-890d7807894c
> >>>>> Thread-36::ERROR::2015-05-18
> >>>>> 13:35:41,131::sdc::154::Storage.StorageDomainCache::(_findUnfetchedDomain)
> >>>>>
> >>>>> looking for domain 036b5575-51fa-4f14-8b05-890d7807894c
> >>>>> Thread-36::ERROR::2015-05-18
> >>>>> 13:35:41,452::sdc::143::Storage.StorageDomainCache::(_findDomain)
> >>>>> domain
> >>>>> 036b5575-51fa-4f14-8b05-890d7807894c not found
> >>>>> Thread-36::ERROR::2015-05-18
> >>>>> 13:35:41,453::domainMonitor::239::Storage.DomainMonitorThread::(_monitorDomain)
> >>>>>
> >>>>> Error while collecting domain 036b5575-51fa-4f14-8b05-890d7807894c
> >>>>> monitoring information
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Mario
> >>>>>
> >>>>>
> >>
> >>
> >> _______________________________________________
> >> Users mailing list
> >> Users at ovirt.org
> >> http://lists.ovirt.org/mailman/listinfo/users
> >>
> 
> 



More information about the Users mailing list