[Users] host deploy and after reboot not responsive

Gianluca Cecchi

15 Jan 2013 15 Jan '13

7:28 a.m.

Hello, using ovirt 3.2 on fedora 18 from ovirt-nightly 3.2.0-1.20130113.gitc954518 After deploy and reboot of a fedora 18 host it stays not-responsive. What to check and which log files to send from engine and node? Thanks, Gianluca

Attachments:

attachment.html (text/html — 364 bytes)

Show replies by date

Alon Bar-Lev

15 Jan 15 Jan

10:02 a.m.

----- Original Message -----

...

From: "Gianluca Cecchi" <gianluca.cecchi@gmail.com> To: "users" <users@ovirt.org> Sent: Tuesday, January 15, 2013 7:28:45 AM Subject: [Users] host deploy and after reboot not responsive

Hello, using ovirt 3.2 on fedora 18 from ovirt-nightly 3.2.0-1.20130113.gitc954518

After deploy and reboot of a fedora 18 host it stays not-responsive. What to check and which log files to send from engine and node?

/var/log/ovirt-engine/engine.log /var/log/ovirt-engine/host-deploy/*-<host>.log Thanks.

Gianluca Cecchi

10:11 a.m.

On Tue, Jan 15, 2013 at 9:02 AM, Alon Bar-Lev wrote:

...

/var/log/ovirt-engine/engine.log /var/log/ovirt-engine/host-deploy/*-<host>.log

here they are engine: https://docs.google.com/file/d/0BwoPbcrMv8mvTllaNUh3R1pXMDg/edit host-deploy: https://docs.google.com/file/d/0BwoPbcrMv8mvQklOS1ZtTFJkbzg/edit Thanks, Gianluca

Alon Bar-Lev

10:20 a.m.

----- Original Message -----

...

From: "Gianluca Cecchi" <gianluca.cecchi@gmail.com> To: "Alon Bar-Lev" <alonbl@redhat.com> Cc: "users" <users@ovirt.org> Sent: Tuesday, January 15, 2013 10:11:14 AM Subject: Re: [Users] host deploy and after reboot not responsive

On Tue, Jan 15, 2013 at 9:02 AM, Alon Bar-Lev wrote:

/var/log/ovirt-engine/engine.log /var/log/ovirt-engine/host-deploy/*-<host>.log here they are

engine: https://docs.google.com/file/d/0BwoPbcrMv8mvTllaNUh3R1pXMDg/edit

host-deploy: https://docs.google.com/file/d/0BwoPbcrMv8mvQklOS1ZtTFJkbzg/edit

Thanks, Gianluca

It is like the engine does not even try to access this host. After the 5 minutes wait there is the following: 2013-01-15 06:12:08,895 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (pool-3-thread-49) [b4e1be9] START, SetVdsStatusVDSCommand(HostName = f18ovn03.ceda.polimi.it, HostId = 56b94303-99d4-48ca-bc73-b3ff9943270b, status=NonResponsive, nonOperationalReason=NONE), log id: 12c2e2ae 2013-01-15 06:12:08,901 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (pool-3-thread-49) [b4e1be9] FINISH, SetVdsStatusVDSCommand, log id: 12c2e2ae And then nothing. Can you just confirm that vdsm is up on host and attach /var/log/vdsm/vdsm.log? Thanks!

Gianluca Cecchi

12:13 p.m.

On Tue, Jan 15, 2013 at 9:20 AM, Alon Bar-Lev wrote:

...

Can you just confirm that vdsm is up on host and attach /var/log/vdsm/vdsm.log?

right now I'm out of office, but I collected the vdsm.log and it is here: https://docs.google.com/file/d/0BwoPbcrMv8mvQW9lTnhtN3VJRTA/edit what t execute to see if the vdsm is up? systemctl status vdsm.service ? or anything similar? As you will see in the log, the server has access to some clustered volumes that it skips... for example you see storageRefresh::DEBUG::2013-01-15 06:07:15,735::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = ' Skipping clustered volume group VG_VIRT04\n Skipping volume group VG_VIRT04\n Skipping clustered volume group VG_VIRT02\n Skipping volume group VG_VIRT02\n Skipping clustered volume group VG_VIRT03\n Skipping volume group VG_VIRT03\n Skipping clustered volume group VG_VIRT03\n Skipping volume group VG_VIRT03\n Skipping clustered volume group VG_VIRT01\n Skipping volume group VG_VIRT01\n Skipping clustered volume group VG_VIRT01\n Skipping volume group VG_VIRT01\n'; <rc> = 5 Does this prevent it from correctly starting and so I have to prevent access to these luns or can I blacklist them in some way in LVM or instead the reason is another one at all? Gianluca

Alon Bar-Lev

12:20 p.m.

Dan, Can you please help? vdsm stops because of what? Thanks! ----- Original Message -----

...

From: "Gianluca Cecchi" <gianluca.cecchi@gmail.com> To: "Alon Bar-Lev" <alonbl@redhat.com> Cc: "users" <users@ovirt.org> Sent: Tuesday, January 15, 2013 12:13:39 PM Subject: Re: [Users] host deploy and after reboot not responsive

On Tue, Jan 15, 2013 at 9:20 AM, Alon Bar-Lev wrote:

Can you just confirm that vdsm is up on host and attach /var/log/vdsm/vdsm.log?

right now I'm out of office, but I collected the vdsm.log and it is here: https://docs.google.com/file/d/0BwoPbcrMv8mvQW9lTnhtN3VJRTA/edit

what t execute to see if the vdsm is up?

systemctl status vdsm.service ? or anything similar?

As you will see in the log, the server has access to some clustered volumes that it skips... for example you see

storageRefresh::DEBUG::2013-01-15 06:07:15,735::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = ' Skipping clustered volume group VG_VIRT04\n Skipping volume group VG_VIRT04\n Skipping clustered volume group VG_VIRT02\n Skipping volume group VG_VIRT02\n Skipping clustered volume group VG_VIRT03\n Skipping volume group VG_VIRT03\n Skipping clustered volume group VG_VIRT03\n Skipping volume group VG_VIRT03\n Skipping clustered volume group VG_VIRT01\n Skipping volume group VG_VIRT01\n Skipping clustered volume group VG_VIRT01\n Skipping volume group VG_VIRT01\n'; <rc> = 5

Does this prevent it from correctly starting and so I have to prevent access to these luns or can I blacklist them in some way in LVM or instead the reason is another one at all?

Gianluca

Dan Kenigsberg

12:30 p.m.

On Tue, Jan 15, 2013 at 11:13:39AM +0100, Gianluca Cecchi wrote:

...

On Tue, Jan 15, 2013 at 9:20 AM, Alon Bar-Lev wrote:

...
Can you just confirm that vdsm is up on host and attach /var/log/vdsm/vdsm.log?

right now I'm out of office, but I collected the vdsm.log and it is here: https://docs.google.com/file/d/0BwoPbcrMv8mvQW9lTnhtN3VJRTA/edit

what t execute to see if the vdsm is up?

systemctl status vdsm.service ? or anything similar?

Yes, and you can check its reponsiveness with vdsClient; tailing /var/log/vdsm.log can give clues, too. I see that something has asked vdsm to stop: MainThread::INFO::2013-01-15 06:07:19,097::logUtils::41::dispatcher::(wrapper) Run and protect: prepareForShutdown(options=None) Do you have systemd's logs (/var/log/messages may have traces of vdsm's stop/start). What happens id you start vdsm now, does it crash again or happily running?

...

As you will see in the log, the server has access to some clustered volumes that it skips... for example you see

storageRefresh::DEBUG::2013-01-15 06:07:15,735::misc::84::Storage.Misc.excCmd::(<lambda>) FAILED: <err> = ' Skipping clustered volume group VG_VIRT04\n Skipping volume group VG_VIRT04\n Skipping clustered volume group VG_VIRT02\n Skipping volume group VG_VIRT02\n Skipping clustered volume group VG_VIRT03\n Skipping volume group VG_VIRT03\n Skipping clustered volume group VG_VIRT03\n Skipping volume group VG_VIRT03\n Skipping clustered volume group VG_VIRT01\n Skipping volume group VG_VIRT01\n Skipping clustered volume group VG_VIRT01\n Skipping volume group VG_VIRT01\n'; <rc> = 5

Does this prevent it from correctly starting and so I have to prevent access to these luns or can I blacklist them in some way in LVM or instead the reason is another one at all?

I believe it's something else. Vdsm can operate with non-accessible VG showing up in vgs.

Gianluca Cecchi

12:38 p.m.

On Tue, Jan 15, 2013 at 11:30 AM, Dan Kenigsberg wrote:

...

Do you have systemd's logs (/var/log/messages may have traces of vdsm's stop/start).

What happens id you start vdsm now, does it crash again or happily running?

I found this strange line in messages tat could indicate something: Jan 15 06:07:16 f18ovn03 vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed. Anyway full node messages here: https://docs.google.com/file/d/0BwoPbcrMv8mvZnRZZnMtdU9hR28/edit I asked a guy that can access the server to run systemctl status vdsmd.service (I forgot a final d in the service daemon name in the previous e-mail) and a systemctl start vdsmd.service At the same time if you have any hint for vdsClient command it is welcome... Gianluca

Dan Kenigsberg

12:49 p.m.

On Tue, Jan 15, 2013 at 11:38:29AM +0100, Gianluca Cecchi wrote:

...

On Tue, Jan 15, 2013 at 11:30 AM, Dan Kenigsberg wrote:

...
Do you have systemd's logs (/var/log/messages may have traces of vdsm's stop/start).

What happens id you start vdsm now, does it crash again or happily running?

I found this strange line in messages tat could indicate something: Jan 15 06:07:16 f18ovn03 vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed.

That's plain log noise, since you do not have vdsm-json installed. oVirt currently uses only xmlrpc, so this is not a real issue. I suppose that the language of this line should be fined to something like "If you really want jsonrpc, please make sure it is installed".

...

Anyway full node messages here: https://docs.google.com/file/d/0BwoPbcrMv8mvZnRZZnMtdU9hR28/edit

It seems that vdsm has stopped since the host was rebooted... Jan 15 06:07:16 f18ovn03 vdsm vds WARNING Unable to load the json rpc server mod Jan 15 06:07:18 f18ovn03 systemd[1]: Deactivating swap /dev/dm-0... Jan 15 06:07:18 f18ovn03 systemd[1]: Deactivating swap /dev/dm-0... Jan 15 06:07:18 f18ovn03 systemd[1]: Deactivating swap /dev/dm-0... Jan 15 06:07:18 f18ovn03 systemd[1]: Deactivating swap /dev/dm-0... Jan 15 06:07:18 f18ovn03 systemd[1]: Deactivating swap /dev/dm-0... Jan 15 06:07:18 f18ovn03 systemd[1]: Stopping Stop Read-Ahead Data Collection 10 Jan 15 06:07:18 f18ovn03 systemd[1]: Stopped Stop Read-Ahead Data Collection 10s Jan 15 06:07:18 f18ovn03 systemd[1]: Stopping Shared Storage Lease Manager... Jan 15 06:07:18 f18ovn03 systemd[1]: Stopping Watchdog Multiplexing Daemon... Jan 15 06:07:18 f18ovn03 systemd[1]: Stopping Network Time Service... Jan 15 06:07:18 f18ovn03 systemd[1]: Stopping Multi-User. the only remaining question is why vdsm did not start up after boot. Could it be that it is missing `systemctl enable`?

...

I asked a guy that can access the server to run

systemctl status vdsmd.service (I forgot a final d in the service daemon name in the previous e-mail)

and a systemctl start vdsmd.service

At the same time if you have any hint for vdsClient command it is welcome... Gianluca

vdsClient -s 0 getVdsCaps would access localhost vdsm (assuming you are using the default ssl). If it barfs, we have a problem. Dan.

Gianluca Cecchi

1:14 p.m.

On Tue, Jan 15, 2013 at 11:49 AM, Dan Kenigsberg wrote:

...

vdsClient -s 0 getVdsCaps

would access localhost vdsm (assuming you are using the default ssl). If it barfs, we have a problem.

# systemctl status vdsmd.service vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: inactive (dead) CGroup: name=systemd:/system/vdsmd.service # vdsClient -s 0 getVdsCaps Connection to 10.4.4.59:54321 refused # systemctl start vdsmd.service # # systemctl status vdsmd.service vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Tue, 2013-01-15 12:09:11 CET; 15s ago Process: 6566 ExecStart=/lib/systemd/systemd-vdsmd start (code=exited, status=0/SUCCESS) Main PID: 6834 (respawn) CGroup: name=systemd:/system/vdsmd.service ├ 6834 /bin/bash -e /usr/share/vdsm/respawn --minlifetime 10 --daemon --masterpid /var/run/vdsm/respawn.pid ... ├ 6837 /usr/bin/python /usr/share/vdsm/vdsm ├ 6855 /usr/bin/sudo -n /usr/bin/python /usr/share/vdsm/supervdsmServer.py d1587fa5-b439-4a5e-bca3-e3971dc08... └ 6856 /usr/bin/python /usr/share/vdsm/supervdsmServer.py d1587fa5-b439-4a5e-bca3-e3971dc08627 6837 /var/run... Jan 15 12:09:11 f18ovn03runuser[6831]: pam_unix(runuser:session): session closed for user vdsm Jan 15 12:09:11 f18ovn03 systemd-vdsmd[6566]: [27B blob data] Jan 15 12:09:11 f18ovn03 systemd[1]: Started Virtual Desktop Server Manager. Jan 15 12:09:11 f18ovn03 python[6837]: DIGEST-MD5 client step 2 Jan 15 12:09:11 f18ovn03 python[6837]: DIGEST-MD5 client step 2 Jan 15 12:09:11 f18ovn03 python[6837]: DIGEST-MD5 client step 3 Jan 15 12:09:11 f18ovn03 vdsm[6837]: vdsm fileUtils WARNING Dir /rhev/data-center/mnt already exists Jan 15 12:09:14 f18ovn03 vdsm[6837]: vdsm Storage.LVM WARNING lvm pvs failed: 5 [] [' Skipping clustered vo...T01'] Jan 15 12:09:14 f18ovn03 vdsm[6837]: vdsm Storage.LVM WARNING lvm vgs failed: 5 [] [' Skipping clustered vo...T01'] Jan 15 12:09:14 f18ovn03 vdsm[6837]: vdsm vds WARNING Unable to load the json rpc server module. Please make...lled. Is it normal the message regarding WARNING Dir /rhev/data-center/mnt already exists ? Now # vdsClient -s 0 getVdsCaps HBAInventory = {'iSCSI': [{'InitiatorName': 'iqn.1994-05.com.redhat:3563e5612db4'}], 'FC': [{'wwpn': '50014380011bf958', 'wwnn': '50014380011bf959', 'model': 'QMH2462 - PCI-Express Dual Channel 4Gb Fibre Channel Mezzanine HBA'}, {'wwpn': '50014380011bf95a', 'wwnn': '50014380011bf95b', 'model': 'QMH2462 - PCI-Express Dual Channel 4Gb Fibre Channel Mezzanine HBA'}]} ISCSIInitiatorName = iqn.1994-05.com.redhat:3563e5612db4 bondings = {'bond4': {'addr': '', 'cfg': {}, 'mtu': '1500', 'netmask': '', 'slaves': [], 'hwaddr': '00:00:00:00:00:00'}, 'bond0': {'addr': '', 'cfg': {}, 'mtu': '1500', 'netmask': '', 'slaves': [], 'hwaddr': '00:00:00:00:00:00'}} bridges = {'ovirtmgmt': {'addr': '10.4.4.59', 'cfg': {'IPV6INIT': 'no', 'IPADDR': '10.4.4.59', 'ONBOOT': 'yes', 'DELAY': '0', 'NM_CONTROLLED': 'no', 'NETMASK': '255.255.255.0', 'BOOTPROTO': 'none', 'DEVICE': 'ovirtmgmt', 'TYPE': 'Bridge', 'GATEWAY': '10.4.4.250'}, 'mtu': '1500', 'netmask': '255.255.255.0', 'stp': 'off', 'ports': ['em3']}} clusterLevels = ['3.0', '3.1', '3.2'] cpuCores = 8 cpuFlags = fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,mmx,fxsr,sse,sse2,ht,syscall,nx,mmxext,fxsr_opt,rdtscp,lm,3dnowext,3dnow,rep_good,nopl,extd_apicid,pni,cx16,lahf_lm,cmp_legacy,svm,extapic,cr8_legacy,model_athlon,model_Opteron_G1,model_Opteron_G2 cpuModel = Dual-Core AMD Opteron(tm) Processor 8222 cpuSockets = 4 cpuSpeed = 3013.706 cpuThreads = 8 emulatedMachines = ['pc-1.2', 'none', 'pc', 'pc-1.1', 'pc-1.0', 'pc-0.15', 'pc-0.14', 'pc-0.13', 'pc-0.12', 'pc-0.11', 'pc-0.10', 'isapc'] guestOverhead = 65 hooks = {} kvmEnabled = true lastClient = 10.4.4.60 lastClientIface = ovirtmgmt management_ip = memSize = 32176 netConfigDirty = False networks = {'ovirtmgmt': {'iface': 'ovirtmgmt', 'addr': '10.4.4.59', 'cfg': {'IPV6INIT': 'no', 'IPADDR': '10.4.4.59', 'ONBOOT': 'yes', 'DELAY': '0', 'NM_CONTROLLED': 'no', 'NETMASK': '255.255.255.0', 'BOOTPROTO': 'none', 'DEVICE': 'ovirtmgmt', 'TYPE': 'Bridge', 'GATEWAY': '10.4.4.250'}, 'mtu': '1500', 'netmask': '255.255.255.0', 'stp': 'off', 'bridged': True, 'gateway': '10.4.4.250', 'ports': ['em3']}} nics = {'em4': {'addr': '', 'cfg': {'DEVICE': 'em4', 'HWADDR': '00:1c:c4:ab:3a:de', 'ONBOOT': 'yes', 'NM_CONTROLLED': 'no'}, 'mtu': '1500', 'netmask': '', 'hwaddr': '00:1c:c4:ab:3a:de', 'speed': 1000}, 'em1': {'addr': '', 'cfg': {'DEVICE': 'em1', 'NM_CONTROLLED': 'no', 'TYPE': 'Ethernet', 'ONBOOT': 'yes', 'HWADDR': '00:1E:0B:21:B8:C4'}, 'mtu': '1500', 'netmask': '', 'hwaddr': '00:1e:0b:21:b8:c4', 'speed': 1000}, 'em3': {'addr': '', 'cfg': {'BRIDGE': 'ovirtmgmt', 'IPV6INIT': 'no', 'NM_CONTROLLED': 'no', 'DEVICE': 'em3', 'HWADDR': '00:1c:c4:ab:3a:dd', 'ONBOOT': 'yes'}, 'mtu': '1500', 'netmask': '', 'hwaddr': '00:1c:c4:ab:3a:dd', 'speed': 1000}, 'em2': {'addr': '', 'cfg': {'DEVICE': 'em2', 'NM_CONTROLLED': 'no', 'TYPE': 'Ethernet', 'ONBOOT': 'yes', 'HWADDR': '00:1E:0B:21:B8:C6'}, 'mtu': '1500', 'netmask': '', 'hwaddr': '00:1e:0b:21:b8:c6', 'speed': 1000}} operatingSystem = {'release': '1', 'version': '18', 'name': 'Fedora'} packages2 = {'kernel': {'release': '3.fc18.x86_64', 'buildtime': 1355776539.0, 'version': '3.6.11'}, 'spice-server': {'release': '1.fc18', 'buildtime': 1356035501, 'version': '0.12.2'}, 'vdsm': {'release': '0.78.gitb005b54.fc18', 'buildtime': 1358090637, 'version': '4.10.3'}, 'qemu-kvm': {'release': '1.fc18', 'buildtime': 1355702442, 'version': '1.2.2'}, 'libvirt': {'release': '3.fc18', 'buildtime': 1355788803, 'version': '0.10.2.2'}, 'qemu-img': {'release': '1.fc18', 'buildtime': 1355702442, 'version': '1.2.2'}, 'mom': {'release': '1.fc18', 'buildtime': 1349470214, 'version': '0.3.0'}} reservedMem = 321 software_revision = 0.78 software_version = 4.10 supportedENGINEs = ['3.0', '3.1'] supportedProtocols = ['2.2', '2.3'] uuid = 34353439-3036-435A-4A38-303330393338 version_name = Snow Man vlans = {} vmTypes = ['kvm'] and from webadmin I see the host up...

Dan Kenigsberg

3:44 p.m.

On Tue, Jan 15, 2013 at 12:14:45PM +0100, Gianluca Cecchi wrote:

...

On Tue, Jan 15, 2013 at 11:49 AM, Dan Kenigsberg wrote:

...
vdsClient -s 0 getVdsCaps

would access localhost vdsm (assuming you are using the default ssl). If it barfs, we have a problem.

# systemctl status vdsmd.service vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: inactive (dead)

So vdsm is enabled, it should've been up after reboot. But it has died. Maybe you can findout when it has happened, and correlate this to vdsm.log or libvirtd.log for clues. I'm out of guesses right now.

...

CGroup: name=systemd:/system/vdsmd.service

# vdsClient -s 0 getVdsCaps Connection to 10.4.4.59:54321 refused

# systemctl start vdsmd.service #

# systemctl status vdsmd.service vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Tue, 2013-01-15 12:09:11 CET; 15s ago Process: 6566 ExecStart=/lib/systemd/systemd-vdsmd start (code=exited, status=0/SUCCESS) Main PID: 6834 (respawn) CGroup: name=systemd:/system/vdsmd.service ├ 6834 /bin/bash -e /usr/share/vdsm/respawn --minlifetime 10 --daemon --masterpid /var/run/vdsm/respawn.pid ... ├ 6837 /usr/bin/python /usr/share/vdsm/vdsm ├ 6855 /usr/bin/sudo -n /usr/bin/python /usr/share/vdsm/supervdsmServer.py d1587fa5-b439-4a5e-bca3-e3971dc08... └ 6856 /usr/bin/python /usr/share/vdsm/supervdsmServer.py d1587fa5-b439-4a5e-bca3-e3971dc08627 6837 /var/run...

Jan 15 12:09:11 f18ovn03runuser[6831]: pam_unix(runuser:session): session closed for user vdsm Jan 15 12:09:11 f18ovn03 systemd-vdsmd[6566]: [27B blob data] Jan 15 12:09:11 f18ovn03 systemd[1]: Started Virtual Desktop Server Manager. Jan 15 12:09:11 f18ovn03 python[6837]: DIGEST-MD5 client step 2 Jan 15 12:09:11 f18ovn03 python[6837]: DIGEST-MD5 client step 2 Jan 15 12:09:11 f18ovn03 python[6837]: DIGEST-MD5 client step 3 Jan 15 12:09:11 f18ovn03 vdsm[6837]: vdsm fileUtils WARNING Dir /rhev/data-center/mnt already exists Jan 15 12:09:14 f18ovn03 vdsm[6837]: vdsm Storage.LVM WARNING lvm pvs failed: 5 [] [' Skipping clustered vo...T01'] Jan 15 12:09:14 f18ovn03 vdsm[6837]: vdsm Storage.LVM WARNING lvm vgs failed: 5 [] [' Skipping clustered vo...T01'] Jan 15 12:09:14 f18ovn03 vdsm[6837]: vdsm vds WARNING Unable to load the json rpc server module. Please make...lled.

Is it normal the message regarding WARNING Dir /rhev/data-center/mnt already exists

yeah, just another log noise, I suppose :-(

...

?

Now # vdsClient -s 0 getVdsCaps HBAInventory = {'iSCSI': [{'InitiatorName': 'iqn.1994-05.com.redhat:3563e5612db4'}], 'FC': [{'wwpn': '50014380011bf958', 'wwnn': '50014380011bf959', 'model': 'QMH2462 - PCI-Express Dual Channel 4Gb Fibre Channel Mezzanine HBA'}, {'wwpn': '50014380011bf95a', 'wwnn': '50014380011bf95b', 'model': 'QMH2462 - PCI-Express Dual Channel 4Gb Fibre Channel Mezzanine HBA'}]} ISCSIInitiatorName = iqn.1994-05.com.redhat:3563e5612db4 bondings = {'bond4': {'addr': '', 'cfg': {}, 'mtu': '1500', 'netmask': '', 'slaves': [], 'hwaddr': '00:00:00:00:00:00'}, 'bond0': {'addr': '', 'cfg': {}, 'mtu': '1500', 'netmask': '', 'slaves': [], 'hwaddr': '00:00:00:00:00:00'}} bridges = {'ovirtmgmt': {'addr': '10.4.4.59', 'cfg': {'IPV6INIT': 'no', 'IPADDR': '10.4.4.59', 'ONBOOT': 'yes', 'DELAY': '0', 'NM_CONTROLLED': 'no', 'NETMASK': '255.255.255.0', 'BOOTPROTO': 'none', 'DEVICE': 'ovirtmgmt', 'TYPE': 'Bridge', 'GATEWAY': '10.4.4.250'}, 'mtu': '1500', 'netmask': '255.255.255.0', 'stp': 'off', 'ports': ['em3']}} clusterLevels = ['3.0', '3.1', '3.2'] cpuCores = 8 cpuFlags = fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,mmx,fxsr,sse,sse2,ht,syscall,nx,mmxext,fxsr_opt,rdtscp,lm,3dnowext,3dnow,rep_good,nopl,extd_apicid,pni,cx16,lahf_lm,cmp_legacy,svm,extapic,cr8_legacy,model_athlon,model_Opteron_G1,model_Opteron_G2 cpuModel = Dual-Core AMD Opteron(tm) Processor 8222 cpuSockets = 4 cpuSpeed = 3013.706 cpuThreads = 8 emulatedMachines = ['pc-1.2', 'none', 'pc', 'pc-1.1', 'pc-1.0', 'pc-0.15', 'pc-0.14', 'pc-0.13', 'pc-0.12', 'pc-0.11', 'pc-0.10', 'isapc'] guestOverhead = 65 hooks = {} kvmEnabled = true lastClient = 10.4.4.60 lastClientIface = ovirtmgmt management_ip = memSize = 32176 netConfigDirty = False networks = {'ovirtmgmt': {'iface': 'ovirtmgmt', 'addr': '10.4.4.59', 'cfg': {'IPV6INIT': 'no', 'IPADDR': '10.4.4.59', 'ONBOOT': 'yes', 'DELAY': '0', 'NM_CONTROLLED': 'no', 'NETMASK': '255.255.255.0', 'BOOTPROTO': 'none', 'DEVICE': 'ovirtmgmt', 'TYPE': 'Bridge', 'GATEWAY': '10.4.4.250'}, 'mtu': '1500', 'netmask': '255.255.255.0', 'stp': 'off', 'bridged': True, 'gateway': '10.4.4.250', 'ports': ['em3']}} nics = {'em4': {'addr': '', 'cfg': {'DEVICE': 'em4', 'HWADDR': '00:1c:c4:ab:3a:de', 'ONBOOT': 'yes', 'NM_CONTROLLED': 'no'}, 'mtu': '1500', 'netmask': '', 'hwaddr': '00:1c:c4:ab:3a:de', 'speed': 1000}, 'em1': {'addr': '', 'cfg': {'DEVICE': 'em1', 'NM_CONTROLLED': 'no', 'TYPE': 'Ethernet', 'ONBOOT': 'yes', 'HWADDR': '00:1E:0B:21:B8:C4'}, 'mtu': '1500', 'netmask': '', 'hwaddr': '00:1e:0b:21:b8:c4', 'speed': 1000}, 'em3': {'addr': '', 'cfg': {'BRIDGE': 'ovirtmgmt', 'IPV6INIT': 'no', 'NM_CONTROLLED': 'no', 'DEVICE': 'em3', 'HWADDR': '00:1c:c4:ab:3a:dd', 'ONBOOT': 'yes'}, 'mtu': '1500', 'netmask': '', 'hwaddr': '00:1c:c4:ab:3a:dd', 'speed': 1000}, 'em2': {'addr': '', 'cfg': {'DEVICE': 'em2', 'NM_CONTROLLED': 'no', 'TYPE': 'Ethernet', 'ONBOOT': 'yes', 'HWADDR': '00:1E:0B:21:B8:C6'}, 'mtu': '1500', 'netmask': '', 'hwaddr': '00:1e:0b:21:b8:c6', 'speed': 1000}} operatingSystem = {'release': '1', 'version': '18', 'name': 'Fedora'} packages2 = {'kernel': {'release': '3.fc18.x86_64', 'buildtime': 1355776539.0, 'version': '3.6.11'}, 'spice-server': {'release': '1.fc18', 'buildtime': 1356035501, 'version': '0.12.2'}, 'vdsm': {'release': '0.78.gitb005b54.fc18', 'buildtime': 1358090637, 'version': '4.10.3'}, 'qemu-kvm': {'release': '1.fc18', 'buildtime': 1355702442, 'version': '1.2.2'}, 'libvirt': {'release': '3.fc18', 'buildtime': 1355788803, 'version': '0.10.2.2'}, 'qemu-img': {'release': '1.fc18', 'buildtime': 1355702442, 'version': '1.2.2'}, 'mom': {'release': '1.fc18', 'buildtime': 1349470214, 'version': '0.3.0'}} reservedMem = 321 software_revision = 0.78 software_version = 4.10 supportedENGINEs = ['3.0', '3.1'] supportedProtocols = ['2.2', '2.3'] uuid = 34353439-3036-435A-4A38-303330393338 version_name = Snow Man vlans = {} vmTypes = ['kvm']

and from webadmin I see the host up...

Is it repeatable that vdsm is down after boot? Dan.

Gianluca Cecchi

3:51 p.m.

On Tue, Jan 15, 2013 at 2:44 PM, Dan Kenigsberg wrote:

...

Is it repeatable that vdsm is down after boot?

I will try asap and report back (I have first to solve the problem regarding webadmin accessibility first, see other thread ;-) question/doubt: If I install a host without configuring power mgmt at that point, then after the first reboot initiated by the engine, still I have in some way to select "confirm host has been rebooted" or is this not required for the initial deploy? Gianluca

Gianluca Cecchi

4:17 p.m.

On Tue, Jan 15, 2013 at 2:51 PM, Gianluca Cecchi wrote:

...

I will try asap and report back

And also: in case of host up and configured without power mgmt, what is the workflow to reboot it, and so test what you ask Is it repeatable that vdsm is down after boot? In my case the DC is uninitialized and the host is up. Can I simply run "shutdown -r now" inside host os, or should I before put it into maintenance? DO I have to select at any point in time "confirm host has been rebooted"? Gianluca

Dan Kenigsberg

16 Jan 16 Jan

1:08 a.m.

On Tue, Jan 15, 2013 at 02:51:13PM +0100, Gianluca Cecchi wrote:

...

On Tue, Jan 15, 2013 at 2:44 PM, Dan Kenigsberg wrote:

...
Is it repeatable that vdsm is down after boot?

I will try asap and report back (I have first to solve the problem regarding webadmin accessibility first, see other thread ;-)

question/doubt: If I install a host without configuring power mgmt at that point, then after the first reboot initiated by the engine, still I have in some way to select "confirm host has been rebooted" or is this not required for the initial deploy?

AFAIK confirmation is needed only when the host enters a dangerous state (running VMs, SPM), loosing connectivity, and failing to show up pristine to Engine after its reboot.

...

Gianluca

On Tue, Jan 15, 2013 at 03:17:32PM +0100, Gianluca Cecchi wrote:

...

On Tue, Jan 15, 2013 at 2:51 PM, Gianluca Cecchi wrote:

...
I will try asap and report back

And also: in case of host up and configured without power mgmt, what is the workflow to reboot it, and so test what you ask

Is it repeatable that vdsm is down after boot?

In my case the DC is uninitialized and the host is up. Can I simply run "shutdown -r now" inside host os, or should I before put it into maintenance? DO I have to select at any point in time "confirm host has been rebooted"?

My question regarded a simple node-only reboot. I wonder if vdsm fails to start ALWAYS, or that this is an even more complex glitch.

Gianluca Cecchi

2:19 a.m.

On Wed, Jan 16, 2013 at 12:08 AM, Dan Kenigsberg wrote:

...

My question regarded a simple node-only reboot. I wonder if vdsm fails to start ALWAYS, or that this is an even more complex glitch.

Just done another reboot and the same problem. It seems vdsmd doesn't start at all [root@f18ovn03 ~]# sudo systemctl status vdsmd.service vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: inactive (dead) CGroup: name=systemd:/system/vdsmd.service Under /var/log/vdsm I see vdsm.log with timestamp of when I stopped the system: MainThread::INFO::2013-01-16 00:45:57,716::vdsm::95::vds::(run) VDSM main thread ended. Waiting for 16 other threads... MainThread::INFO::2013-01-16 00:45:57,716::vdsm::99::vds::(run) <Thread(libvirtEventLoop, started daemon 140084926744320)> MainThread::INFO::2013-01-16 00:45:57,716::vdsm::99::vds::(run) <WorkerThread(Thread-5, started daemon 140084753970944)> MainThread::INFO::2013-01-16 00:45:57,716::vdsm::99::vds::(run) <HostMonitor(HostMonitor, started daemon 140083713799936)> MainThread::INFO::2013-01-16 00:45:57,716::vdsm::99::vds::(run) <WorkerThread(Thread-7, started daemon 140084737185536)> MainThread::INFO::2013-01-16 00:45:57,717::momIF::65::MOM::(stop) Shutting down MOM MainThread::INFO::2013-01-16 00:45:57,717::vdsm::99::vds::(run) <MomThread(MOM, started 140084200314624)> MainThread::INFO::2013-01-16 00:45:57,717::vdsm::99::vds::(run) <WorkerThread(Thread-4, started daemon 140084762363648)> MainThread::INFO::2013-01-16 00:45:57,717::vdsm::99::vds::(run) <WorkerThread(Thread-10, started daemon 140084233885440)> MainThread::INFO::2013-01-16 00:45:57,717::vdsm::99::vds::(run) <WorkerThread(Thread-9, started daemon 140084242278144)> MainThread::INFO::2013-01-16 00:45:57,717::vdsm::99::vds::(run) <WorkerThread(Thread-1, started daemon 140084787541760)> MainThread::INFO::2013-01-16 00:45:57,717::vdsm::99::vds::(run) <WorkerThread(Thread-3, started daemon 140084770756352)> MainThread::INFO::2013-01-16 00:45:57,717::vdsm::99::vds::(run) <WorkerThread(Thread-8, started daemon 140084250670848)> MainThread::INFO::2013-01-16 00:45:57,717::vdsm::99::vds::(run) <_MainThread(MainThread, started 140085026416448)> MainThread::INFO::2013-01-16 00:45:57,718::vdsm::99::vds::(run) <GuestManager(GuestManager, started daemon 140083688621824)> MainThread::INFO::2013-01-16 00:45:57,718::vdsm::99::vds::(run) <PolicyEngine(PolicyEngine, started daemon 140083680229120)> MainThread::INFO::2013-01-16 00:45:57,718::vdsm::99::vds::(run) <WorkerThread(Thread-2, started daemon 140084779149056)> MainThread::INFO::2013-01-16 00:45:57,718::vdsm::99::vds::(run) <WorkerThread(Thread-6, started daemon 140084745578240)> MainThread::INFO::2013-01-16 00:45:57,718::vmChannels::136::vds::(stop) VM channels listener was stopped. MainThread::INFO::2013-01-16 00:45:57,718::vdsm::99::vds::(run) <Listener(VM Channels Listener, started daemon 140083705407232)> If then I manually start it root@f18ovn03 vdsm]# systemctl start vdsmd.service [root@f18ovn03 vdsm]# echo $? 0 [root@f18ovn03 vdsm]# systemctl status vdsmd.service vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Wed, 2013-01-16 01:09:44 CET; 10s ago Process: 2483 ExecStart=/lib/systemd/systemd-vdsmd start (code=exited, status=0/SUCCESS) Main PID: 2770 (respawn) CGroup: name=systemd:/system/vdsmd.service ├ 2770 /bin/bash -e /usr/share/vdsm/respawn --minlifetime 10 --daemon --masterpid /var/run/vdsm/respawn.p... ├ 2773 /usr/bin/python /usr/share/vdsm/vdsm ├ 2791 /usr/bin/sudo -n /usr/bin/python /usr/share/vdsm/supervdsmServer.py 75cc5638-f5c7-4431-bcbe-4a0cd8... └ 2792 /usr/bin/python /usr/share/vdsm/supervdsmServer.py 75cc5638-f5c7-4431-bcbe-4a0cd817a0e4 2773 /var/... Jan 16 01:09:44 f18ovn03.ceda.polimi.it systemd-vdsmd[2483]: [27B blob data] Jan 16 01:09:44 f18ovn03.ceda.polimi.it systemd[1]: Started Virtual Desktop Server Manager. Jan 16 01:09:45 f18ovn03.ceda.polimi.it python[2773]: DIGEST-MD5 client step 2 Jan 16 01:09:45 f18ovn03.ceda.polimi.it python[2773]: DIGEST-MD5 client step 2 Jan 16 01:09:45 f18ovn03.ceda.polimi.it python[2773]: DIGEST-MD5 client step 3 Jan 16 01:09:45 f18ovn03.ceda.polimi.it vdsm[2773]: vdsm fileUtils WARNING Dir /rhev/data-center/mnt already exists Jan 16 01:09:48 f18ovn03.ceda.polimi.it vdsm[2773]: vdsm Storage.LVM WARNING lvm pvs failed: 5 [' NQRb0Q-3C0k-3RRo-1LZZ...T01'] Jan 16 01:09:48 f18ovn03.ceda.polimi.it vdsm[2773]: vdsm Storage.LVM WARNING lvm vgs failed: 5 [' x3XSZx-avUC-0NNI-w5K4...T01'] Jan 16 01:09:48 f18ovn03.ceda.polimi.it vdsm[2773]: vdsm vds WARNING Unable to load the json rpc server module. Please m...lled. Jan 16 01:09:51 f18ovn03.ceda.polimi.it vdsm[2773]: vdsm fileUtils WARNING Dir /rhev/data-center/hsm-tasks already exists Tried also to run anable again systemctl enable vdsmd.service and reboot but same behaviour. In notice that the same is true for ntpd service, configured as enabled but not started. When I manually start vdsmd, also ntpd starts, as it is in the required chain for vdsmd... strange Can it depend on delay on network/bridges initialization? when the service has not manually started by me yet, the output of systemctl show vdsmd.service Id=vdsmd.service Names=vdsmd.service Requires=multipathd.service libvirtd.service ntpd.service systemd-journald.socket basic.target WantedBy=multi-user.target Conflicts=libvirt-guests.service shutdown.target Before=shutdown.target multi-user.target After=systemd-journald.socket basic.target Description=Virtual Desktop Server Manager LoadState=loaded ActiveState=inactive SubState=dead FragmentPath=/usr/lib/systemd/system/vdsmd.service UnitFileState=enabled InactiveExitTimestampMonotonic=0 ActiveEnterTimestampMonotonic=0 ActiveExitTimestampMonotonic=0 InactiveEnterTimestampMonotonic=0 CanStart=yes CanStop=yes CanReload=no CanIsolate=no StopWhenUnneeded=no RefuseManualStart=no RefuseManualStop=no AllowIsolate=no DefaultDependencies=yes OnFailureIsolate=no IgnoreOnIsolate=no IgnoreOnSnapshot=no DefaultControlGroup=name=systemd:/system/vdsmd.service ControlGroup=cpu:/system/vdsmd.service name=systemd:/system/vdsmd.service NeedDaemonReload=no JobTimeoutUSec=0 ConditionTimestampMonotonic=0 ConditionResult=no Type=forking Restart=no NotifyAccess=none RestartUSec=100ms TimeoutUSec=1min 30s TimeoutStartUSec=1min 30s TimeoutStopUSec=1min 30s WatchdogUSec=0 WatchdogTimestampMonotonic=0 StartLimitInterval=10000000 StartLimitBurst=5 StartLimitAction=none ExecStart={ path=/lib/systemd/systemd-vdsmd ; argv[]=/lib/systemd/systemd-vdsmd start ; ignore_errors=no ; start_time=[n/a] ; st ExecStop={ path=/lib/systemd/systemd-vdsmd ; argv[]=/lib/systemd/systemd-vdsmd stop ; ignore_errors=no ; start_time=[n/a] ; stop PermissionsStartOnly=no RootDirectoryStartOnly=no RemainAfterExit=no GuessMainPID=yes MainPID=0 ControlPID=0 Result=success EnvironmentFile=/etc/sysconfig/vdsm (ignore_errors=yes) UMask=0022 LimitCPU=18446744073709551615 LimitFSIZE=18446744073709551615 LimitDATA=18446744073709551615 LimitSTACK=18446744073709551615 LimitCORE=18446744073709551615 LimitRSS=18446744073709551615 LimitNOFILE=4096 LimitAS=18446744073709551615 LimitNPROC=256714 LimitMEMLOCK=65536 LimitLOCKS=18446744073709551615 LimitSIGPENDING=256714 LimitMSGQUEUE=819200 LimitNICE=0 LimitRTPRIO=0 LimitRTTIME=18446744073709551615 OOMScoreAdjust=0 Nice=0 IOScheduling=0 CPUSchedulingPolicy=0 CPUSchedulingPriority=0 TimerSlackNSec=50000 CPUSchedulingResetOnFork=no NonBlocking=no StandardInput=null StandardOutput=journal StandardError=inherit TTYReset=no TTYVHangup=no TTYVTDisallocate=no SyslogPriority=30 SyslogLevelPrefix=yes SecureBits=0 CapabilityBoundingSet=18446744073709551615 MountFlags=0 PrivateTmp=no PrivateNetwork=no SameProcessGroup=no ControlGroupModify=no ControlGroupPersistent=no IgnoreSIGPIPE=yes NoNewPrivileges=no KillMode=control-group KillSignal=15 SendSIGKILL=yes ExecMainStartTimestampMonotonic=0 ExecMainExitTimestampMonotonic=0 ExecMainPID=0 ExecMainCode=0 ExecMainStatus=0

Dan Kenigsberg

19 Jan 19 Jan

9:26 p.m.

On Wed, Jan 16, 2013 at 01:19:47AM +0100, Gianluca Cecchi wrote:

...

On Wed, Jan 16, 2013 at 12:08 AM, Dan Kenigsberg wrote:

...
My question regarded a simple node-only reboot. I wonder if vdsm fails to start ALWAYS, or that this is an even more complex glitch.

Just done another reboot and the same problem. It seems vdsmd doesn't start at all

[root@f18ovn03 ~]# sudo systemctl status vdsmd.service vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: inactive (dead) CGroup: name=systemd:/system/vdsmd.service

Under /var/log/vdsm I see vdsm.log with timestamp of when I stopped the system: MainThread::INFO::2013-01-16 00:45:57,716::vdsm::95::vds::(run) VDSM main thread ended. Waiting for 16 other threads... MainThread::INFO::2013-01-16 00:45:57,716::vdsm::99::vds::(run) <Thread(libvirtEventLoop, started daemon 140084926744320)> MainThread::INFO::2013-01-16 00:45:57,716::vdsm::99::vds::(run) <WorkerThread(Thread-5, started daemon 140084753970944)> MainThread::INFO::2013-01-16 00:45:57,716::vdsm::99::vds::(run) <HostMonitor(HostMonitor, started daemon 140083713799936)> MainThread::INFO::2013-01-16 00:45:57,716::vdsm::99::vds::(run) <WorkerThread(Thread-7, started daemon 140084737185536)> MainThread::INFO::2013-01-16 00:45:57,717::momIF::65::MOM::(stop) Shutting down MOM MainThread::INFO::2013-01-16 00:45:57,717::vdsm::99::vds::(run) <MomThread(MOM, started 140084200314624)> MainThread::INFO::2013-01-16 00:45:57,717::vdsm::99::vds::(run) <WorkerThread(Thread-4, started daemon 140084762363648)> MainThread::INFO::2013-01-16 00:45:57,717::vdsm::99::vds::(run) <WorkerThread(Thread-10, started daemon 140084233885440)> MainThread::INFO::2013-01-16 00:45:57,717::vdsm::99::vds::(run) <WorkerThread(Thread-9, started daemon 140084242278144)> MainThread::INFO::2013-01-16 00:45:57,717::vdsm::99::vds::(run) <WorkerThread(Thread-1, started daemon 140084787541760)> MainThread::INFO::2013-01-16 00:45:57,717::vdsm::99::vds::(run) <WorkerThread(Thread-3, started daemon 140084770756352)> MainThread::INFO::2013-01-16 00:45:57,717::vdsm::99::vds::(run) <WorkerThread(Thread-8, started daemon 140084250670848)> MainThread::INFO::2013-01-16 00:45:57,717::vdsm::99::vds::(run) <_MainThread(MainThread, started 140085026416448)> MainThread::INFO::2013-01-16 00:45:57,718::vdsm::99::vds::(run) <GuestManager(GuestManager, started daemon 140083688621824)> MainThread::INFO::2013-01-16 00:45:57,718::vdsm::99::vds::(run) <PolicyEngine(PolicyEngine, started daemon 140083680229120)> MainThread::INFO::2013-01-16 00:45:57,718::vdsm::99::vds::(run) <WorkerThread(Thread-2, started daemon 140084779149056)> MainThread::INFO::2013-01-16 00:45:57,718::vdsm::99::vds::(run) <WorkerThread(Thread-6, started daemon 140084745578240)> MainThread::INFO::2013-01-16 00:45:57,718::vmChannels::136::vds::(stop) VM channels listener was stopped. MainThread::INFO::2013-01-16 00:45:57,718::vdsm::99::vds::(run) <Listener(VM Channels Listener, started daemon 140083705407232)>

If then I manually start it root@f18ovn03 vdsm]# systemctl start vdsmd.service [root@f18ovn03 vdsm]# echo $? 0 [root@f18ovn03 vdsm]# systemctl status vdsmd.service vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Wed, 2013-01-16 01:09:44 CET; 10s ago Process: 2483 ExecStart=/lib/systemd/systemd-vdsmd start (code=exited, status=0/SUCCESS) Main PID: 2770 (respawn) CGroup: name=systemd:/system/vdsmd.service ├ 2770 /bin/bash -e /usr/share/vdsm/respawn --minlifetime 10 --daemon --masterpid /var/run/vdsm/respawn.p... ├ 2773 /usr/bin/python /usr/share/vdsm/vdsm ├ 2791 /usr/bin/sudo -n /usr/bin/python /usr/share/vdsm/supervdsmServer.py 75cc5638-f5c7-4431-bcbe-4a0cd8... └ 2792 /usr/bin/python /usr/share/vdsm/supervdsmServer.py 75cc5638-f5c7-4431-bcbe-4a0cd817a0e4 2773 /var/...

Jan 16 01:09:44 f18ovn03.ceda.polimi.it systemd-vdsmd[2483]: [27B blob data] Jan 16 01:09:44 f18ovn03.ceda.polimi.it systemd[1]: Started Virtual Desktop Server Manager. Jan 16 01:09:45 f18ovn03.ceda.polimi.it python[2773]: DIGEST-MD5 client step 2 Jan 16 01:09:45 f18ovn03.ceda.polimi.it python[2773]: DIGEST-MD5 client step 2 Jan 16 01:09:45 f18ovn03.ceda.polimi.it python[2773]: DIGEST-MD5 client step 3 Jan 16 01:09:45 f18ovn03.ceda.polimi.it vdsm[2773]: vdsm fileUtils WARNING Dir /rhev/data-center/mnt already exists Jan 16 01:09:48 f18ovn03.ceda.polimi.it vdsm[2773]: vdsm Storage.LVM WARNING lvm pvs failed: 5 [' NQRb0Q-3C0k-3RRo-1LZZ...T01'] Jan 16 01:09:48 f18ovn03.ceda.polimi.it vdsm[2773]: vdsm Storage.LVM WARNING lvm vgs failed: 5 [' x3XSZx-avUC-0NNI-w5K4...T01'] Jan 16 01:09:48 f18ovn03.ceda.polimi.it vdsm[2773]: vdsm vds WARNING Unable to load the json rpc server module. Please m...lled. Jan 16 01:09:51 f18ovn03.ceda.polimi.it vdsm[2773]: vdsm fileUtils WARNING Dir /rhev/data-center/hsm-tasks already exists

Tried also to run anable again systemctl enable vdsmd.service and reboot but same behaviour. In notice that the same is true for ntpd service, configured as enabled but not started. When I manually start vdsmd, also ntpd starts, as it is in the required chain for vdsmd... strange Can it depend on delay on network/bridges initialization?

It might be it - but there must be more clues in the logs. I'd like to see both vdsm.log and /var/log/message of boottime. journalctl may have stories to tell, too. If you can manually zap ntpd off of vdsmd.service and reboot, it could eliminate one usual suspect. Would you?

...

when the service has not manually started by me yet, the output of systemctl show vdsmd.service

Id=vdsmd.service Names=vdsmd.service Requires=multipathd.service libvirtd.service ntpd.service systemd-journald.socket basic.target

Gianluca Cecchi

22 Jan 22 Jan

1:52 p.m.

On Sat, Jan 19, 2013 at 8:26 PM, Dan Kenigsberg wrote:

...

It might be it - but there must be more clues in the logs. I'd like to see both vdsm.log and /var/log/message of boottime. journalctl may have stories to tell, too.

In the mean time here messages https://docs.google.com/file/d/0BwoPbcrMv8mvNHpPREZGMnhYSzg/edit vdsm.log https://docs.google.com/file/d/0BwoPbcrMv8mvQ0k0aEJ5cWtXeTA/edit The are related at boot on 16/01 at 01:04 am In the afternoon I should be able to make the nt changes you requested. journalctl was rotated. any command to let journalctl take another log file if present? it says now: -- Logs begin at Wed, 2013-01-16 14:39:23 CET, end at Tue, 2013-01-22 12:52:05 CET. -- Gianluca

Gianluca Cecchi

3:22 p.m.

On Tue, Jan 22, 2013 at 12:52 PM, Gianluca Cecchi wrote:

...

In the afternoon I should be able to make the ntp changes you requested.

Ok, so it seems indeed ntp takes it part in vdsmd not starting Changed /usr/lib/systemd/system/vdsmd.service from Requires=multipathd.service libvirtd.service ntpd.service to Requires=multipathd.service libvirtd.service and # shutdown -r now on node vdsmd start correctly at boot [root@f18ovn03 ~]# uptime 14:20:16 up 5 min, 1 user, load average: 0.27, 0.35, 0.19 # systemctl status vdsmd.service vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Tue, 2013-01-22 14:15:52 CET; 1min 13s ago Process: 1201 ExecStart=/lib/systemd/systemd-vdsmd start (code=exited, status=0/SUCCESS) Main PID: 2308 (respawn) CGroup: name=systemd:/system/vdsmd.service ├ 2308 /bin/bash -e /usr/share/vdsm/respawn --minlifetime 10 --daemon --masterpid /var/run/v... ├ 2312 /usr/bin/python /usr/share/vdsm/vdsm ├ 2331 /usr/bin/sudo -n /usr/bin/python /usr/share/vdsm/supervdsmServer.py 8596d6dc-fc63-4a2... ├ 2332 /usr/bin/python /usr/share/vdsm/supervdsmServer.py 8596d6dc-fc63-4a20-8589-8782298ea1... ├ 2538 rpc.statd --no-notify ├ 2545 /usr/bin/python /usr/share/vdsm/storage/remoteFileHandler.pyc 32 30 ├ 2769 /usr/bin/python /usr/share/vdsm/storage/remoteFileHandler.pyc 39 37 ├ 2771 /usr/bin/python /usr/share/vdsm/storage/remoteFileHandler.pyc 41 39 └ 2775 /usr/bin/python /usr/share/vdsm/storage/remoteFileHandler.pyc 41 39 Jan 22 14:15:53 f18ovn03 vdsm[2312]: vdsm fileUtils WARNING Dir /rhev/data-center/mnt already exists Jan 22 14:15:56 f18ovn03 vdsm[2312]: vdsm vds WARNING Unable to load the json rpc server modu...led. Jan 22 14:15:59 f18ovn03 vdsm[2312]: vdsm fileUtils WARNING Dir /rhev/data-center/hsm-tasks a...ists Jan 22 14:16:16 f18ovn03 rpc.statd[2538]: Version 1.2.7 starting Jan 22 14:16:16 f18ovn03 rpc.statd[2538]: Flags: TI-RPC Jan 22 14:16:18 f18ovn03 vdsm[2312]: vdsm TaskManager.Task ERROR Task=`801dd9ae-896a-4790-94c...rror Jan 22 14:16:18 f18ovn03 vdsm[2312]: vdsm Storage.Dispatcher.Protect ERROR {'status': {'messa...09}} Jan 22 14:16:24 f18ovn03 vdsm[2312]: vdsm Storage.PersistentDict WARNING data has no embedded...t is Jan 22 14:16:24 f18ovn03 vdsm[2312]: vdsm Storage.LVM WARNING lvm vgs failed: 5 [] [' Volume...nd'] Jan 22 14:16:24 f18ovn03 vdsm[2312]: vdsm Storage.LVM WARNING lvm vgs failed: 5 [] [' Volume...nd'] and from engine point of view all is ok too; host is UP and on webadmin console I have: 2013-Jan-22, 14:19 Storage Pool Manager runs on Host f18ovn03 (Address: 10.4.4.59). Gianluca

Dan Kenigsberg

10:02 p.m.

On Tue, Jan 22, 2013 at 02:22:40PM +0100, Gianluca Cecchi wrote:

...

On Tue, Jan 22, 2013 at 12:52 PM, Gianluca Cecchi wrote:

...
In the afternoon I should be able to make the ntp changes you requested.

Ok, so it seems indeed ntp takes it part in vdsmd not starting

Changed /usr/lib/systemd/system/vdsmd.service

from Requires=multipathd.service libvirtd.service ntpd.service

to Requires=multipathd.service libvirtd.service

and # shutdown -r now on node

vdsmd start correctly at boot

That's great, as it reduces the question to: why doesn't ntpd start on your machine after boot. Do you have any guess? and ntpd logs to share? Any peculiar ntp.conf setting?

Gianluca Cecchi

23 Jan 23 Jan

1:50 a.m.

On Tue, Jan 22, 2013 at 9:02 PM, Dan Kenigsberg wrote:

...

That's great, as it reduces the question to: why doesn't ntpd start on your machine after boot. Do you have any guess? and ntpd logs to share? Any peculiar ntp.conf setting?

Hum... I think in Fedora 18 itself and/or due to oVirt setup there is some conflict regarding network synchronization services.... In fact together with standard ntp there is chrony that I didn't know until today... My suspect is this one: - At install time Fedora 18 by default installs now chrony and doesn't install instead ntpd In fact my anaconda log in /root [root@f18ovn03 ~]# ll /root/anaconda-tb-38wvGO -rw-r--r--. 1 root root 1493714 Dec 18 14:27 /root/anaconda-tb-38wvGO contains: 14:19:00,759 DEBUG packaging: select package chrony and no entry for ntp At this moment chronyd was still enabled as per default at install time: [root@f18ovn03 ~]# systemctl status chronyd.service chronyd.service - NTP client/server Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled) Active: inactive (dead) since Tue, 2013-01-22 14:15:42 CET; 9h ago Main PID: 1348 (code=exited, status=0/SUCCESS) CGroup: name=systemd:/system/chronyd.service Jan 22 14:15:40 f18ovn03.ceda.polimi.it chronyd[1348]: chronyd version 1.27-git1ca844 starting Jan 22 14:15:40 f18ovn03.ceda.polimi.it chronyd[1348]: Linux kernel major=3 minor=6 patch=11 Jan 22 14:15:40 f18ovn03.ceda.polimi.it chronyd[1348]: hz=100 shift_hz=7 freq_scale=1.00000000 nominal_tick=10000...ll=2 Jan 22 14:15:41 f18ovn03.ceda.polimi.it systemd[1]: Started NTP client/server. Jan 22 14:15:42 f18ovn03.ceda.polimi.it systemd[1]: Stopping NTP client/server... Jan 22 14:15:42 f18ovn03.ceda.polimi.it systemd[1]: Stopped NTP client/server. - When I ran install of ovirt-engine it pulled in ntp as one of its dependencies, enabling it In yum.log in fact I find: Jan 15 06:05:53 Installed: libvirt-0.10.2.2-3.fc18.x86_64 Jan 15 06:05:53 Installed: mom-0.3.0-1.fc18.noarch Jan 15 06:05:54 Installed: fence-agents-3.1.10-1.fc18.x86_64 Jan 15 06:05:54 Installed: usbredir-0.6-1.fc18.x86_64 Jan 15 06:05:54 Installed: ipxe-roms-qemu-20120328-2.gitaac9718.fc18.noarch Jan 15 06:05:55 Installed: 2:qemu-system-x86-1.2.2-1.fc18.x86_64 Jan 15 06:05:55 Installed: 2:qemu-kvm-1.2.2-1.fc18.x86_64 Jan 15 06:05:56 Installed: autogen-libopts-5.12-2.fc17.x86_64 ---> Jan 15 06:05:56 Installed: ntp-4.2.6p5-8.fc18.x86_64 <------ Jan 15 06:06:02 Installed: vdsm-4.10.3-0.78.gitb005b54.fc18.x86_64 Jan 15 06:06:04 Installed: tuned-2.1.2-1.fc18.noarch Jan 15 06:06:04 Installed: vdsm-cli-4.10.3-0.78.gitb005b54.fc18.noarch Jan 15 06:06:04 Installed: 2:qemu-kvm-tools-1.2.2-1.fc18.x86_64 Probably chronyd runs before ntpd and creates conflict..... So I have tested disabling chrony # systemctl disable chronyd and re-enabling ntp as a dependency for vdsmd to see if something changes: Changed /usr/lib/systemd/system/vdsmd.service matching: Requires=multipathd.service libvirtd.service ntpd.service Bingo! After reboot [root@f18ovn03 ~]# systemctl status ntpd.service ntpd.service - Network Time Service Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled) Active: active (running) since Wed, 2013-01-23 00:31:00 CET; 1min 47s ago Main PID: 1424 (ntpd) CGroup: name=systemd:/system/ntpd.service └ 1424 /usr/sbin/ntpd -u ntp:ntp -g Jan 23 00:31:07 f18ovn03 ntpd[1424]: Listen normally on 8 em3 fe80::21c:c4ff:feab:3add UDP 123 Jan 23 00:31:07 f18ovn03. ntpd[1424]: Listen normally on 9 em2 fe80::21e:bff:fe21:b8c6 UDP 123 Jan 23 00:31:07 f18ovn03 ntpd[1424]: Listen normally on 10 em1 fe80::21e:bff:fe21:b8c4 UDP 123 Jan 23 00:31:07 f18ovn03 ntpd[1424]: peers refreshed ... [root@f18ovn03 ~]# systemctl status vdsmd.service vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Wed, 2013-01-23 00:31:10 CET; 2min 25s ago Process: 1434 ExecStart=/lib/systemd/systemd-vdsmd start (code=exited, status=0/SUCCESS) Main PID: 2778 (respawn) CGroup: name=systemd:/system/vdsmd.service ├ 2778 /bin/bash -e /usr/share/vdsm/respawn --minlifetime 10 --daemon --masterpid /var/run/vdsm/r... ├ 2781 /usr/bin/python /usr/share/vdsm/vdsm ├ 2801 /usr/bin/sudo -n /usr/bin/python /usr/share/vdsm/supervdsmServer.py 95a1f3e1-8c0d-4efe-a65... ├ 2802 /usr/bin/python /usr/share/vdsm/supervdsmServer.py 95a1f3e1-8c0d-4efe-a65a-c5af1c7f4d61 27... ├ 3035 rpc.statd --no-notify ├ 3041 /usr/bin/python /usr/share/vdsm/storage/remoteFileHandler.pyc 31 30 ├ 3214 /usr/bin/python /usr/share/vdsm/storage/remoteFileHandler.pyc 44 43 ├ 3216 /usr/bin/python /usr/share/vdsm/storage/remoteFileHandler.pyc 49 46 └ 3229 /usr/bin/python /usr/share/vdsm/storage/remoteFileHandler.pyc 37 34 Jan 23 00:31:38 f18ovn03 rpc.statd[3035]: Version 1.2.7 starting Jan 23 00:31:38 f18ovn03 rpc.statd[3035]: Flags: TI-RPC ... I also modified ntp.conf commenting out #server 0.fedora.pool.ntp.org iburst #server 1.fedora.pool.ntp.org iburst #server 2.fedora.pool.ntp.org iburst #server 3.fedora.pool.ntp.org iburst and putting a "server xxx.yyy.www.zzz" line to an internal ntp server but I don't think this was influencing my problems To be sure anyway I reverted to old ntp.conf config with server 0.fedora.pool.ntp.org iburst server 1.fedora.pool.ntp.org iburst server 2.fedora.pool.ntp.org iburst server 3.fedora.pool.ntp.org iburst # server xxx.yyy.www.zzz And both ntpd and vdsmd started again and host up and SPM in webadmin. Probably another thing to take care of for Fedora 18 configured as an oVirt node. Gianluca

Alon Bar-Lev

2:29 a.m.

This is nice to know.... otopi is ready: commit d756b789d60934f935718ff25f8208f563b0f123 Author: Alon Bar-Lev <alonbl@redhat.com> Date: Wed Jan 23 02:27:51 2013 +0200 system: clock: support chrony as ntpd Change-Id: I2917bdd8248eb0b123f6b2cca875820f2cac664c Signed-off-by: Alon Bar-Lev <alonbl@redhat.com> http://gerrit.ovirt.org/#/c/11288/ ----- Original Message -----

...

From: "Gianluca Cecchi" <gianluca.cecchi@gmail.com> To: "Dan Kenigsberg" <danken@redhat.com> Cc: "Adam Litke" <agl@us.ibm.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "users" <users@ovirt.org> Sent: Wednesday, January 23, 2013 1:50:59 AM Subject: Re: [Users] host deploy and after reboot not responsive

On Tue, Jan 22, 2013 at 9:02 PM, Dan Kenigsberg wrote:

...
That's great, as it reduces the question to: why doesn't ntpd start on your machine after boot. Do you have any guess? and ntpd logs to share? Any peculiar ntp.conf setting?

Hum... I think in Fedora 18 itself and/or due to oVirt setup there is some conflict regarding network synchronization services....

In fact together with standard ntp there is chrony that I didn't know until today...

My suspect is this one:

- At install time Fedora 18 by default installs now chrony and doesn't install instead ntpd

In fact my anaconda log in /root [root@f18ovn03 ~]# ll /root/anaconda-tb-38wvGO -rw-r--r--. 1 root root 1493714 Dec 18 14:27 /root/anaconda-tb-38wvGO contains:

14:19:00,759 DEBUG packaging: select package chrony

and no entry for ntp

At this moment chronyd was still enabled as per default at install time: [root@f18ovn03 ~]# systemctl status chronyd.service chronyd.service - NTP client/server Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled) Active: inactive (dead) since Tue, 2013-01-22 14:15:42 CET; 9h ago Main PID: 1348 (code=exited, status=0/SUCCESS) CGroup: name=systemd:/system/chronyd.service

Jan 22 14:15:40 f18ovn03.ceda.polimi.it chronyd[1348]: chronyd version 1.27-git1ca844 starting Jan 22 14:15:40 f18ovn03.ceda.polimi.it chronyd[1348]: Linux kernel major=3 minor=6 patch=11 Jan 22 14:15:40 f18ovn03.ceda.polimi.it chronyd[1348]: hz=100 shift_hz=7 freq_scale=1.00000000 nominal_tick=10000...ll=2 Jan 22 14:15:41 f18ovn03.ceda.polimi.it systemd[1]: Started NTP client/server. Jan 22 14:15:42 f18ovn03.ceda.polimi.it systemd[1]: Stopping NTP client/server... Jan 22 14:15:42 f18ovn03.ceda.polimi.it systemd[1]: Stopped NTP client/server.

- When I ran install of ovirt-engine it pulled in ntp as one of its dependencies, enabling it In yum.log in fact I find:

Jan 15 06:05:53 Installed: libvirt-0.10.2.2-3.fc18.x86_64 Jan 15 06:05:53 Installed: mom-0.3.0-1.fc18.noarch Jan 15 06:05:54 Installed: fence-agents-3.1.10-1.fc18.x86_64 Jan 15 06:05:54 Installed: usbredir-0.6-1.fc18.x86_64 Jan 15 06:05:54 Installed: ipxe-roms-qemu-20120328-2.gitaac9718.fc18.noarch Jan 15 06:05:55 Installed: 2:qemu-system-x86-1.2.2-1.fc18.x86_64 Jan 15 06:05:55 Installed: 2:qemu-kvm-1.2.2-1.fc18.x86_64 Jan 15 06:05:56 Installed: autogen-libopts-5.12-2.fc17.x86_64 ---> Jan 15 06:05:56 Installed: ntp-4.2.6p5-8.fc18.x86_64 <------ Jan 15 06:06:02 Installed: vdsm-4.10.3-0.78.gitb005b54.fc18.x86_64 Jan 15 06:06:04 Installed: tuned-2.1.2-1.fc18.noarch Jan 15 06:06:04 Installed: vdsm-cli-4.10.3-0.78.gitb005b54.fc18.noarch Jan 15 06:06:04 Installed: 2:qemu-kvm-tools-1.2.2-1.fc18.x86_64

Probably chronyd runs before ntpd and creates conflict.....

So I have tested disabling chrony # systemctl disable chronyd

and re-enabling ntp as a dependency for vdsmd to see if something changes: Changed /usr/lib/systemd/system/vdsmd.service matching: Requires=multipathd.service libvirtd.service ntpd.service

Bingo!

After reboot [root@f18ovn03 ~]# systemctl status ntpd.service ntpd.service - Network Time Service Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled) Active: active (running) since Wed, 2013-01-23 00:31:00 CET; 1min 47s ago Main PID: 1424 (ntpd) CGroup: name=systemd:/system/ntpd.service └ 1424 /usr/sbin/ntpd -u ntp:ntp -g

Jan 23 00:31:07 f18ovn03 ntpd[1424]: Listen normally on 8 em3 fe80::21c:c4ff:feab:3add UDP 123 Jan 23 00:31:07 f18ovn03. ntpd[1424]: Listen normally on 9 em2 fe80::21e:bff:fe21:b8c6 UDP 123 Jan 23 00:31:07 f18ovn03 ntpd[1424]: Listen normally on 10 em1 fe80::21e:bff:fe21:b8c4 UDP 123 Jan 23 00:31:07 f18ovn03 ntpd[1424]: peers refreshed ...

[root@f18ovn03 ~]# systemctl status vdsmd.service vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled) Active: active (running) since Wed, 2013-01-23 00:31:10 CET; 2min 25s ago Process: 1434 ExecStart=/lib/systemd/systemd-vdsmd start (code=exited, status=0/SUCCESS) Main PID: 2778 (respawn) CGroup: name=systemd:/system/vdsmd.service ├ 2778 /bin/bash -e /usr/share/vdsm/respawn --minlifetime 10 --daemon --masterpid /var/run/vdsm/r... ├ 2781 /usr/bin/python /usr/share/vdsm/vdsm ├ 2801 /usr/bin/sudo -n /usr/bin/python /usr/share/vdsm/supervdsmServer.py 95a1f3e1-8c0d-4efe-a65... ├ 2802 /usr/bin/python /usr/share/vdsm/supervdsmServer.py 95a1f3e1-8c0d-4efe-a65a-c5af1c7f4d61 27... ├ 3035 rpc.statd --no-notify ├ 3041 /usr/bin/python /usr/share/vdsm/storage/remoteFileHandler.pyc 31 30 ├ 3214 /usr/bin/python /usr/share/vdsm/storage/remoteFileHandler.pyc 44 43 ├ 3216 /usr/bin/python /usr/share/vdsm/storage/remoteFileHandler.pyc 49 46 └ 3229 /usr/bin/python /usr/share/vdsm/storage/remoteFileHandler.pyc 37 34

Jan 23 00:31:38 f18ovn03 rpc.statd[3035]: Version 1.2.7 starting Jan 23 00:31:38 f18ovn03 rpc.statd[3035]: Flags: TI-RPC ...

I also modified ntp.conf commenting out #server 0.fedora.pool.ntp.org iburst #server 1.fedora.pool.ntp.org iburst #server 2.fedora.pool.ntp.org iburst #server 3.fedora.pool.ntp.org iburst

and putting a "server xxx.yyy.www.zzz" line to an internal ntp server but I don't think this was influencing my problems To be sure anyway I reverted to old ntp.conf config with server 0.fedora.pool.ntp.org iburst server 1.fedora.pool.ntp.org iburst server 2.fedora.pool.ntp.org iburst server 3.fedora.pool.ntp.org iburst # server xxx.yyy.www.zzz

And both ntpd and vdsmd started again and host up and SPM in webadmin.

Probably another thing to take care of for Fedora 18 configured as an oVirt node.

Gianluca

Dan Kenigsberg

10:40 a.m.

On Tue, Jan 22, 2013 at 07:29:43PM -0500, Alon Bar-Lev wrote:

...

This is nice to know....

otopi is ready:

commit d756b789d60934f935718ff25f8208f563b0f123 Author: Alon Bar-Lev <alonbl@redhat.com> Date: Wed Jan 23 02:27:51 2013 +0200

system: clock: support chrony as ntpd

Change-Id: I2917bdd8248eb0b123f6b2cca875820f2cac664c Signed-off-by: Alon Bar-Lev <alonbl@redhat.com>

http://gerrit.ovirt.org/#/c/11288/

Alon, if both ntpq and chronyc are installed, do you intentionally run them both?

...

----- Original Message -----

...
From: "Gianluca Cecchi" <gianluca.cecchi@gmail.com> To: "Dan Kenigsberg" <danken@redhat.com> Cc: "Adam Litke" <agl@us.ibm.com>, "Alon Bar-Lev" <alonbl@redhat.com>, "users" <users@ovirt.org> Sent: Wednesday, January 23, 2013 1:50:59 AM Subject: Re: [Users] host deploy and after reboot not responsive

On Tue, Jan 22, 2013 at 9:02 PM, Dan Kenigsberg wrote:

...
That's great, as it reduces the question to: why doesn't ntpd start on your machine after boot. Do you have any guess? and ntpd logs to share? Any peculiar ntp.conf setting?

Hum... I think in Fedora 18 itself and/or due to oVirt setup there is some conflict regarding network synchronization services....

In fact together with standard ntp there is chrony that I didn't know until today...

My suspect is this one:

- At install time Fedora 18 by default installs now chrony and doesn't install instead ntpd

Gianluca, how about softening our ntpd requirement with something like http://gerrit.ovirt.org/11291 ? Could you verify that it's working on your system?

Alon Bar-Lev

10:46 a.m.

----- Original Message -----

...

From: "Dan Kenigsberg" <danken@redhat.com> To: "Alon Bar-Lev" <alonbl@redhat.com>, "Gianluca Cecchi" <gianluca.cecchi@gmail.com> Cc: "Adam Litke" <agl@us.ibm.com>, "users" <users@ovirt.org> Sent: Wednesday, January 23, 2013 10:40:01 AM Subject: Re: [Users] host deploy and after reboot not responsive

On Tue, Jan 22, 2013 at 07:29:43PM -0500, Alon Bar-Lev wrote:

...
This is nice to know....

otopi is ready:

commit d756b789d60934f935718ff25f8208f563b0f123 Author: Alon Bar-Lev <alonbl@redhat.com> Date: Wed Jan 23 02:27:51 2013 +0200

system: clock: support chrony as ntpd

Change-Id: I2917bdd8248eb0b123f6b2cca875820f2cac664c Signed-off-by: Alon Bar-Lev <alonbl@redhat.com>

http://gerrit.ovirt.org/#/c/11288/

Alon, if both ntpq and chronyc are installed, do you intentionally run them both?

Yes. I only need to know if clock is sync so I query them both. Alon

Dan Kenigsberg

11:56 a.m.

On Wed, Jan 23, 2013 at 03:46:38AM -0500, Alon Bar-Lev wrote:

...

----- Original Message -----

...
From: "Dan Kenigsberg" <danken@redhat.com> To: "Alon Bar-Lev" <alonbl@redhat.com>, "Gianluca Cecchi" <gianluca.cecchi@gmail.com> Cc: "Adam Litke" <agl@us.ibm.com>, "users" <users@ovirt.org> Sent: Wednesday, January 23, 2013 10:40:01 AM Subject: Re: [Users] host deploy and after reboot not responsive

On Tue, Jan 22, 2013 at 07:29:43PM -0500, Alon Bar-Lev wrote:

...
This is nice to know....

otopi is ready:

commit d756b789d60934f935718ff25f8208f563b0f123 Author: Alon Bar-Lev <alonbl@redhat.com> Date: Wed Jan 23 02:27:51 2013 +0200

system: clock: support chrony as ntpd

Change-Id: I2917bdd8248eb0b123f6b2cca875820f2cac664c Signed-off-by: Alon Bar-Lev <alonbl@redhat.com>

http://gerrit.ovirt.org/#/c/11288/

Alon, if both ntpq and chronyc are installed, do you intentionally run them both?

Yes. I only need to know if clock is sync so I query them both.

If I had got an answer from one, I wouldn't have bothered to ask for a second opinion.

Alon Bar-Lev

12:01 p.m.

----- Original Message -----

...

From: "Dan Kenigsberg" <danken@redhat.com> To: "Alon Bar-Lev" <alonbl@redhat.com> Cc: "Adam Litke" <agl@us.ibm.com>, "users" <users@ovirt.org>, "Gianluca Cecchi" <gianluca.cecchi@gmail.com> Sent: Wednesday, January 23, 2013 11:56:48 AM Subject: Re: [Users] host deploy and after reboot not responsive

On Wed, Jan 23, 2013 at 03:46:38AM -0500, Alon Bar-Lev wrote:

...
----- Original Message -----

...
From: "Dan Kenigsberg" <danken@redhat.com> To: "Alon Bar-Lev" <alonbl@redhat.com>, "Gianluca Cecchi" <gianluca.cecchi@gmail.com> Cc: "Adam Litke" <agl@us.ibm.com>, "users" <users@ovirt.org> Sent: Wednesday, January 23, 2013 10:40:01 AM Subject: Re: [Users] host deploy and after reboot not responsive

On Tue, Jan 22, 2013 at 07:29:43PM -0500, Alon Bar-Lev wrote:

...
This is nice to know....

otopi is ready:

commit d756b789d60934f935718ff25f8208f563b0f123 Author: Alon Bar-Lev <alonbl@redhat.com> Date: Wed Jan 23 02:27:51 2013 +0200

system: clock: support chrony as ntpd

Change-Id: I2917bdd8248eb0b123f6b2cca875820f2cac664c Signed-off-by: Alon Bar-Lev <alonbl@redhat.com>

http://gerrit.ovirt.org/#/c/11288/

Alon, if both ntpq and chronyc are installed, do you intentionally run them both?

Yes. I only need to know if clock is sync so I query them both.

If I had got an answer from one, I wouldn't have bothered to ask for a second opinion.

Yes, thought of this, but this is not something that worth the effort of extra conditions.

Gianluca Cecchi

11:57 a.m.

On Wed, Jan 23, 2013 at 9:40 AM, Dan Kenigsberg wrote:

...

Gianluca, how about softening our ntpd requirement with something like http://gerrit.ovirt.org/11291 ?

Could you verify that it's working on your system?

Yes, I can test the change the service modification for vdsmd. I presume you want me to test it with default fedora 18, so chronyd enabled and ntpd disabled, correct? BTW: in case of indipendent services both enabled in a target level, how systemd processes them, in parallel? In old style init there were S25 S35 that ruled the start order, together with some rules inside scripts themselves such as # Proveides: .... # Required-Start: .... etc

Dan Kenigsberg

12:09 p.m.

On Wed, Jan 23, 2013 at 10:57:03AM +0100, Gianluca Cecchi wrote:

...

On Wed, Jan 23, 2013 at 9:40 AM, Dan Kenigsberg wrote:

...
Gianluca, how about softening our ntpd requirement with something like http://gerrit.ovirt.org/11291 ?

Could you verify that it's working on your system?

Yes, I can test the change the service modification for vdsmd. I presume you want me to test it with default fedora 18, so chronyd enabled and ntpd disabled, correct?

Yes. though my patch may generate excessive noise, with its attempt to start two conflicting service. I suppose that we can/should "want" only chrony. But please provide your input.

...

BTW: in case of indipendent services both enabled in a target level, how systemd processes them, in parallel? In old style init there were S25 S35 that ruled the start order, together with some rules inside scripts themselves such as

# Proveides: .... # Required-Start: ....

I'm no systemd expert, but I believe that it build a dependency tree, and attempts to start independet services in parallel, to save time. Dan.

Alon Bar-Lev

12:16 p.m.

----- Original Message -----

...

From: "Dan Kenigsberg" <danken@redhat.com> To: "Gianluca Cecchi" <gianluca.cecchi@gmail.com> Cc: "Alon Bar-Lev" <alonbl@redhat.com>, "Adam Litke" <agl@us.ibm.com>, "users" <users@ovirt.org> Sent: Wednesday, January 23, 2013 12:09:59 PM Subject: Re: [Users] host deploy and after reboot not responsive

On Wed, Jan 23, 2013 at 10:57:03AM +0100, Gianluca Cecchi wrote:

...
On Wed, Jan 23, 2013 at 9:40 AM, Dan Kenigsberg wrote:

...
Gianluca, how about softening our ntpd requirement with something like http://gerrit.ovirt.org/11291 ?

Could you verify that it's working on your system?

Yes, I can test the change the service modification for vdsmd. I presume you want me to test it with default fedora 18, so chronyd enabled and ntpd disabled, correct?

Yes. though my patch may generate excessive noise, with its attempt to start two conflicting service. I suppose that we can/should "want" only chrony. But please provide your input.

We should use either, and default. On none systemd other distributions there is the concept of 'provides'. This means that a service like vdsmd can depend on timesync and a service like chronyd *AND* ntpd provide timesync. This is kind of logical name for service. Also other non systemd distributions provides the ability to script the dependencies. I could not find either of he above methods in systemd documentation.

...

...
BTW: in case of indipendent services both enabled in a target level, how systemd processes them, in parallel? In old style init there were S25 S35 that ruled the start order, together with some rules inside scripts themselves such as

# Proveides: .... # Required-Start: ....

I'm no systemd expert, but I believe that it build a dependency tree, and attempts to start independet services in parallel, to save time.

Dan.

Dan Kenigsberg

12:37 p.m.

On Wed, Jan 23, 2013 at 05:16:09AM -0500, Alon Bar-Lev wrote:

...

----- Original Message -----

...
From: "Dan Kenigsberg" <danken@redhat.com> To: "Gianluca Cecchi" <gianluca.cecchi@gmail.com> Cc: "Alon Bar-Lev" <alonbl@redhat.com>, "Adam Litke" <agl@us.ibm.com>, "users" <users@ovirt.org> Sent: Wednesday, January 23, 2013 12:09:59 PM Subject: Re: [Users] host deploy and after reboot not responsive

On Wed, Jan 23, 2013 at 10:57:03AM +0100, Gianluca Cecchi wrote:

...
On Wed, Jan 23, 2013 at 9:40 AM, Dan Kenigsberg wrote:

...
Gianluca, how about softening our ntpd requirement with something like http://gerrit.ovirt.org/11291 ?

Could you verify that it's working on your system?

Yes, I can test the change the service modification for vdsmd. I presume you want me to test it with default fedora 18, so chronyd enabled and ntpd disabled, correct?

Yes. though my patch may generate excessive noise, with its attempt to start two conflicting service. I suppose that we can/should "want" only chrony. But please provide your input.

We should use either, and default.

I'm not sure that I parse your English properly. Could you be more explict, preferably in a code review of http://gerrit.ovirt.org/#/c/11291/ ?

...

On none systemd other distributions there is the concept of 'provides'. This means that a service like vdsmd can depend on timesync and a service like chronyd *AND* ntpd provide timesync. This is kind of logical name for service. Also other non systemd distributions provides the ability to script the dependencies.

I could not find either of he above methods in systemd documentation.

Alon Bar-Lev

12:42 p.m.

----- Original Message -----

...

From: "Dan Kenigsberg" <danken@redhat.com> To: "Alon Bar-Lev" <alonbl@redhat.com> Cc: "Adam Litke" <agl@us.ibm.com>, "users" <users@ovirt.org>, "Gianluca Cecchi" <gianluca.cecchi@gmail.com> Sent: Wednesday, January 23, 2013 12:37:07 PM Subject: Re: [Users] host deploy and after reboot not responsive

On Wed, Jan 23, 2013 at 05:16:09AM -0500, Alon Bar-Lev wrote:

...
----- Original Message -----

...
From: "Dan Kenigsberg" <danken@redhat.com> To: "Gianluca Cecchi" <gianluca.cecchi@gmail.com> Cc: "Alon Bar-Lev" <alonbl@redhat.com>, "Adam Litke" <agl@us.ibm.com>, "users" <users@ovirt.org> Sent: Wednesday, January 23, 2013 12:09:59 PM Subject: Re: [Users] host deploy and after reboot not responsive

On Wed, Jan 23, 2013 at 10:57:03AM +0100, Gianluca Cecchi wrote:

...
On Wed, Jan 23, 2013 at 9:40 AM, Dan Kenigsberg wrote:

...
Gianluca, how about softening our ntpd requirement with something like http://gerrit.ovirt.org/11291 ?

Could you verify that it's working on your system?

Yes, I can test the change the service modification for vdsmd. I presume you want me to test it with default fedora 18, so chronyd enabled and ntpd disabled, correct?

Yes. though my patch may generate excessive noise, with its attempt to start two conflicting service. I suppose that we can/should "want" only chrony. But please provide your input.

We should use either, and default.

I'm not sure that I parse your English properly. Could you be more explict, preferably in a code review of http://gerrit.ovirt.org/#/c/11291/ ?

I guess this will work, however this is primitive approach that a service should explicitly state the service provider in multi-option configuration. Let's assume there is yet another alternative to ntpd, then all services should have three alternatives... And what if I add my own alternative to my system? then I need to manually fix all services... as the authors of these were not aware of the new service I introduce. It is systemd I don't like...

...

...
On none systemd other distributions there is the concept of 'provides'. This means that a service like vdsmd can depend on timesync and a service like chronyd *AND* ntpd provide timesync. This is kind of logical name for service. Also other non systemd distributions provides the ability to script the dependencies.

I could not find either of he above methods in systemd documentation.

Gianluca Cecchi

1:01 p.m.

On Wed, Jan 23, 2013 at 11:42 AM, Alon Bar-Lev wrote:

...

And what if I add my own alternative to my system? then I need to manually fix all services... as the authors of these were not aware of the new service I introduce.

It is systemd I don't like...

if it can be of any help, it seems that others occurred in similar problems. On freeipa-devel mailing list november 2012 http://www.redhat.com/archives/freeipa-devel/2012-November/msg00198.html and in particular its final progression (around December ;-): http://www.redhat.com/archives/freeipa-devel/2012-December/msg00088.html The approach installing oVirt hosts could be similar... There the point was also distinguishing between FreeIPA server and client and the conclusion was: " During server installation, user is warned when running conflicting time service. Installation then enforces ntpd configuration. During client installation, user is also warned, but continuing in installation omits ntpd configuration instead. But user can use --force-ntpd to force ntpd configuration. " We can follow the server side for oVirt node Gianluca

Dan Kenigsberg

22 Jan 22 Jan

4:28 p.m.

On Tue, Jan 22, 2013 at 12:52:43PM +0100, Gianluca Cecchi wrote:

...

On Sat, Jan 19, 2013 at 8:26 PM, Dan Kenigsberg wrote:

...
It might be it - but there must be more clues in the logs. I'd like to see both vdsm.log and /var/log/message of boottime. journalctl may have stories to tell, too.

In the mean time here messages https://docs.google.com/file/d/0BwoPbcrMv8mvNHpPREZGMnhYSzg/edit

Sorry, this gives no trace of a vdsmd failure. It would be great to know if removal of the ntpd requirement helps vdsm on boot. Jan 16 01:09:40 f18ovn03 ntpd[2489]: 0.0.0.0 c011 01 freq_not_set Jan 16 01:09:40 f18ovn03 systemd-vdsmd[2483]: Note: Forwarding request to 'systemctl disable libvirt-guests.service'. Jan 16 01:09:40 f18ovn03 systemd[1]: Reloading. Jan 16 01:09:40 f18ovn03 systemd[1]: tuned.service has a D-Bus service name specified, but is not of type dbus. Ignoring. Jan 16 01:09:40 f18ovn03 systemd-vdsmd[2483]: vdsm: libvirt already configured for vdsm [ OK ] Jan 16 01:09:41 f18ovn03 systemd-vdsmd[2483]: Starting wdmd... Jan 16 01:09:41 f18ovn03 systemd-vdsmd[2483]: Redirecting to /bin/systemctl start wdmd.service Jan 16 01:09:41 f18ovn03 systemd[1]: Starting Watchdog Multiplexing Daemon... Jan 16 01:09:41 f18ovn03 wdmd[2602]: wdmd started S0 H1 G179 Jan 16 01:09:41 f18ovn03 systemd-wdmd[2594]: Starting wdmd: [ OK ] Jan 16 01:09:41 f18ovn03 wdmd[2602]: /dev/watchdog armed with fire_timeout 60 Jan 16 01:09:41 f18ovn03 systemd[1]: Started Watchdog Multiplexing Daemon. Jan 16 01:09:41 f18ovn03 systemd-vdsmd[2483]: Starting sanlock... Jan 16 01:09:41 f18ovn03 systemd-vdsmd[2483]: Redirecting to /bin/systemctl start sanlock.service Jan 16 01:09:41 f18ovn03 systemd[1]: Starting Shared Storage Lease Manager... Jan 16 01:09:41 f18ovn03 systemd-sanlock[2618]: Starting sanlock: [ OK ] Jan 16 01:09:41 f18ovn03 systemd[1]: Started Shared Storage Lease Manager. Jan 16 01:09:41 f18ovn03 systemd-vdsmd[2483]: Starting iscsid: Jan 16 01:09:41 f18ovn03 systemd-vdsmd[2483]: Redirecting to /bin/systemctl start libvirtd.service Jan 16 01:09:41 f18ovn03 systemd[1]: Started Virtualization daemon. Jan 16 01:09:44 f18ovn03 kernel: [ 361.499864] bonding: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Jan 16 01:09:44 f18ovn03 kernel: [ 361.528772] 8021q: 802.1Q VLAN Support v1.8 Jan 16 01:09:44 f18ovn03 kernel: [ 361.530500] bonding: bond4 is being created... Jan 16 01:09:44 f18ovn03 systemd-vdsmd[2483]: Starting up vdsm daemon: Jan 16 01:09:44 f18ovn03 systemd-vdsmd[2483]: [ OK ]#015vdsm start[ OK ] Jan 16 01:09:44 f18ovn03 systemd[1]: Started Virtual Desktop Server Manager. Jan 16 01:09:45 f18ovn03 vdsm fileUtils WARNING Dir /rhev/data-center/mnt already exists Jan 16 01:09:48 f18ovn03 vdsm Storage.LVM WARNING lvm pvs failed: 5 [' NQRb0Q-3C0k-3RRo-1LZZ-NNy2-42A1-c7zO8e|/dev/mapper/3600507630efe05800000000000001601|106971529216|c6bb44ee-b824-44a0-a62c-f537a23d2e2b|x3XSZx-avUC-0NNI-w5K4-nCOp-uxp5-TD9GvH|135266304|797|0|2|107374182400'] [' Skipping clustered volume group VG_VIRT04', ' Skipping volume group VG_VIRT04', ' Skipping clustered volume group VG_VIRT02', ' Skipping volume group VG_VIRT02', ' Skipping clustered volume group VG_VIRT03', ' Skipping volume group VG_VIRT03', ' Skipping clustered volume group VG_VIRT03', ' Skipping volume group VG_VIRT03', ' Skipping clustered volume group VG_VIRT01', ' Skipping volume group VG_VIRT01', ' Skipping clustered volume group VG_VIRT01', ' Skipping volume group VG_VIRT01'] Jan 16 01:09:48 f18ovn03 vdsm Storage.LVM WARNING lvm vgs failed: 5 [' x3XSZx-avUC-0NNI-w5K4-nCOp-uxp5-TD9GvH|c6bb44ee-b824-44a0-a62c-f537a23d2e2b|wz--n-|106971529216|106971529216|134217728|797|797|RHAT_storage_domain_UNREADY|134217728|67107328'] [' Skipping clustered volume group VG_VIRT04', ' Skipping clustered volume group VG_VIRT02', ' Skipping clustered volume group VG_VIRT03', ' Skipping clustered volume group VG_VIRT01'] Jan 16 01:09:48 f18ovn03 vdsm vds WARNING Unable to load the json rpc server module. Please make sure it is installed.

...

vdsm.log https://docs.google.com/file/d/0BwoPbcrMv8mvQ0k0aEJ5cWtXeTA/edit

The are related at boot on 16/01 at 01:04 am

In the afternoon I should be able to make the nt changes you requested.

journalctl was rotated. any command to let journalctl take another log file if present?

it says now: -- Logs begin at Wed, 2013-01-16 14:39:23 CET, end at Tue, 2013-01-22 12:52:05 CET. --

Gianluca

4652

Age (days ago)

4660

Last active (days ago)

List overview

Download

31 comments

3 participants

participants (3)

Alon Bar-Lev
Dan Kenigsberg
Gianluca Cecchi

[Users] host deploy and after reboot not responsive

tags

participants (3)