
On Wed, Feb 1, 2017 at 8:22 PM, Gianluca Cecchi <gianluca.cecchi@gmail.com> wrote:
OK. In the mean time I have applied your suggested config and restarted the 2 nodes. Let we test and see if I find any problems running also some I/O tests. Thanks in the mean time, Gianluca
Quick test without much success Inside the guest I run this loop while true do time dd if=/dev/zero bs=1024k count=1024 of=/home/g.cecchi/testfile sleep 5 done BTW: my home is inside / filesystem on guest that has space to accomodate 1Gb of the dd command: [g.cecchi@ol65 ~]$ df -h /home/g.cecchi/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_ol65-lv_root 20G 4.9G 14G 27% / [g.cecchi@ol65 ~]$ After about 7 rounds I get this in messages of the host where the VM is running: Feb 1 23:31:39 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition! Feb 1 23:31:39 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition! Feb 1 23:31:44 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition! Feb 1 23:31:45 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition! Feb 1 23:31:45 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition! Feb 1 23:31:47 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition! Feb 1 23:31:50 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition! Feb 1 23:31:50 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition! Feb 1 23:31:56 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition! Feb 1 23:31:57 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition! Nothing on the other host. In web admin events pane: Feb 1, 2017 11:31:44 PM VM ol65 has been paused due to no Storage space error. Feb 1, 2017 11:31:44 PM VM ol65 has been paused. I stop the dd loop and after some seconds: Feb 1, 2017 11:32:32 PM VM ol65 has recovered from paused back to up Multipath status for my device: 3600a0b8000299aa80000d08b55014119 dm-2 IBM ,1814 FAStT size=4.0T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw |-+- policy='service-time 0' prio=0 status=active | |- 0:0:1:3 sdj 8:144 active undef running | `- 2:0:1:3 sdp 8:240 active undef running `-+- policy='service-time 0' prio=0 status=enabled |- 0:0:0:3 sdd 8:48 active undef running `- 2:0:0:3 sdk 8:160 active undef running In engine.log 2017-02-01 23:22:01,449 INFO [org.ovirt.engine.core.bll.aaa.CreateUserSessionCommand] (default task-15) [530aee87] Running command: CreateUserSessionCommand internal: false. 2017-02-01 23:22:04,011 INFO [org.ovirt.engine.docs.utils.servlet.ContextSensitiveHelpMappingServlet] (default task-12) [] Context-sensitive help is not installed. Manual directory doesn't exist: /usr/share/ovirt-engine/manual 2017-02-01 23:31:43,936 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-10) [10c15e39] VM '932db7c7-4121-4cbe-ad8d-09e4e99b3cdd'(ol65) moved from 'Up' --> 'Paused' 2017-02-01 23:31:44,087 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-10) [10c15e39] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM ol65 has been paused. 2017-02-01 23:31:44,227 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-10) [10c15e39] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM ol65 has been paused due to no Storage space error. 2017-02-01 23:32:32,100 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (DefaultQuartzScheduler5) [1c02a371] VM '932db7c7-4121-4cbe-ad8d-09e4e99b3cdd'(ol65) moved from 'Paused' --> 'Up' 2017-02-01 23:32:32,259 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler5) [1c02a371] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM ol65 has recovered from paused back to up. In vdsm.log of the host where the VM is running I see this: mailbox.SPMMonitor::DEBUG::2017-02-01 23:31:37,729::storage_mailbox: :733::Storage.Misc.excCmd::(_chec kForMail) /usr/bin/taskset --cpu-list 0-7 dd if=/rhev/data-center/588237b8- 0031-02f6-035d-00000000013 6/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None) mailbox.SPMMonitor::DEBUG::2017-02-01 23:31:37,764::storage_mailbox: :733::Storage.Misc.excCmd::(_chec kForMail) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0061866 s, 166 MB/s\n'; <rc> = 0 periodic/8::INFO::2017-02-01 23:31:39,699::vm::1028::virt.vm::(extendDrivesIfNeeded) vmId=`932db7c7-4 121-4cbe-ad8d-09e4e99b3cdd`::Requesting extension for volume 66615920-619f-4f52-ad5a-3f062c054094 on domain 922b5269-ab56-4c4d-838f-49d33427e2ab (apparent: 12884901888, capacity: 96636764160, allocated: 12355698688, physical: 12884901888) periodic/8::DEBUG::2017-02-01 23:31:39,699::vm::1093::virt.vm::(__extendDriveVolume) vmId=`932db7c7-4 121-4cbe-ad8d-09e4e99b3cdd`::Requesting an extension for the volume: {'newSize': 13958643712, 'domain ID': u'922b5269-ab56-4c4d-838f-49d33427e2ab', 'name': u'vda', 'volumeID': u'66615920-619f-4f52-ad5a-3 f062c054094', 'imageID': u'32b0e94d-0c8e-4d3d-a4eb-448885297e93', 'internal': False, 'poolID': u'5882 37b8-0031-02f6-035d-000000000136'} periodic/8::DEBUG::2017-02-01 23:31:39,700::task::597:: Storage.TaskManager.Task::(_updateState) Task= `d5169f55-8fa6-494e-bf3c-b077b922e06c`::moving from state init -> state preparing periodic/8::INFO::2017-02-01 23:31:39,700::logUtils::49::dispatcher::(wrapper) Run and protect: sendExtendMsg(spUUID=u'588237b8-0031-02f6-035d-000000000136', volDict={'newSize': 13958643712, 'domainID': u'922b5269-ab56-4c4d-838f-49d33427e2ab', 'name': u'vda', 'volumeID': u'66615920-619f-4f52-ad5a-3f062c054094', 'imageID': u'32b0e94d-0c8e-4d3d-a4eb-448885297e93', 'internal': False, 'poolID': u'588237b8-0031-02f6-035d-000000000136'}, newSize=13958643712, callbackFunc=<bound method Vm.__afterVolumeExtension of <virt.vm.Vm object at 0x2feb5d0>>) periodic/8::DEBUG::2017-02-01 23:31:39,701::storage_mailbox: :125::Storage.SPM.Messages.Extend::(__init__) new extend msg created: domain: 922b5269-ab56-4c4d-838f-49d33427e2ab, volume: 66615920-619f-4f52-ad5a-3f062c054094 periodic/8::INFO::2017-02-01 23:31:39,701::logUtils::52::dispatcher::(wrapper) Run and protect: sendExtendMsg, Return response: None periodic/8::DEBUG::2017-02-01 23:31:39,701::task::1193:: Storage.TaskManager.Task::(prepare) Task=`d5169f55-8fa6-494e-bf3c-b077b922e06c`::finished: None periodic/8::DEBUG::2017-02-01 23:31:39,701::task::597:: Storage.TaskManager.Task::(_updateState) Task=`d5169f55-8fa6-494e-bf3c-b077b922e06c`::moving from state preparing -> state finished periodic/8::DEBUG::2017-02-01 23:31:39,701::resourceManager: :952::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} periodic/8::DEBUG::2017-02-01 23:31:39,701::resourceManager: :989::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} periodic/8::DEBUG::2017-02-01 23:31:39,701::task::995:: Storage.TaskManager.Task::(_decref) Task=`d5169f55-8fa6-494e-bf3c-b077b922e06c`::ref 0 aborting False mailbox.HSMMonitor::DEBUG::2017-02-01 23:31:39,702::storage_mailbox: :428::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - start: 64, end: 128, len: 4096, message(1/63): "1xtnd\xab\xe2'4\xd3I\x8f\ x83MLV\xabiR+\x92\x94@\x05,\x06?Z\xadRO\x9fa Yaf000000000000340000000000000" mailbox.HSMMonitor::DEBUG::2017-02-01 23:31:39,702::storage_mailbox: :375::Storage.Misc.excCmd::(_checkForMail) /usr/bin/taskset --cpu-list 0-7 /usr/bin/dd if=/rhev/data-center/588237b8-0031-02f6-035d-00 0000000136/mastersd/dom_md/outbox iflag=direct,fullblock bs=512 count=8 skip=16 (cwd None) mailbox.SPMMonitor::DEBUG::2017-02-01 23:31:39,790::storage_mailbox: :733::Storage.Misc.excCmd::(_checkForMail) /usr/bin/taskset --cpu-list 0-7 dd if=/rhev/data-center/588237b8-0031-02f6-035d-000000000136/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None) mailbox.HSMMonitor::DEBUG::2017-02-01 23:31:39,815::storage_mailbox: :375::Storage.Misc.excCmd::(_checkForMail) SUCCESS: <err> = '8+0 records in\n8+0 records out\n4096 bytes (4.1 kB) copied, 0.0900633 s, 45.5 kB/s\n'; <rc> = 0 mailbox.HSMMonitor::INFO::2017-02-01 23:31:39,816::storage_mailbox: :387::Storage.MailBox.HsmMailMonitor::(_sendMail) HSM_MailMonitor sending mail to SPM - ['/usr/bin/dd', 'of=/rhev/data-center/588237b8-0031-02f6-035d- 000000000136/mastersd/dom_md/inbox', 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=512', 'seek=16'] mailbox.HSMMonitor::DEBUG::2017-02-01 23:31:39,816::storage_mailbox: :86::Storage.Misc.excCmd::(_mboxExecCmd) /usr/bin/taskset --cpu-list 0-7 /usr/bin/dd of=/rhev/data-center/588237b8-0031-02f6-035d-000000000136/mastersd/dom_md/inbox iflag=fullblock oflag=direct conv=notrunc bs=512 seek=16 (cwd None) mailbox.SPMMonitor::DEBUG::2017-02-01 23:31:39,844::storage_mailbox: :733::Storage.Misc.excCmd::(_checkForMail) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0290379 s, 35.3 MB/s\n'; <rc> = 0 periodic/7::INFO::2017-02-01 23:31:41,700::vm::1028::virt.vm::(extendDrivesIfNeeded) vmId=`932db7c7-4121-4cbe-ad8d-09e4e99b3cdd`::Requesting extension for volume 66615920-619f-4f52-ad5a-3f062c054094 on domain 922b5269-ab56-4c4d-838f-49d33427e2ab (apparent: 12884901888, capacity: 96636764160, allocated: 12617187328, physical: 12884901888) ... mailbox.SPMMonitor::DEBUG::2017-02-01 23:31:42,181::storage_mailbox: :733::Storage.Misc.excCmd::(_checkForMail) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.30031 s, 3.4 MB/s\n'; <rc> = 0 mailbox.SPMMonitor::ERROR::2017-02-01 23:31:42,182::storage_mailbox: :619::Storage.MailBox.SpmMailMonitor::(_validateMailbox) SPM_MailMonitor: mailbox 2 checksum failed, not clearing mailbox, clearing newMail. jsonrpc.Executor/6::DEBUG::2017-02-01 23:31:42,587::__init__::530:: jsonrpc.JsonRpcServer::(_handle_request) Calling 'Host.getAllVmStats' in bridge with {} jsonrpc.Executor/6::DEBUG::2017-02-01 23:31:42,589::__init__::555:: jsonrpc.JsonRpcServer::(_handle_request) Return 'Host.getAllVmStats' in bridge with (suppressed) jsonrpc.Executor/6::INFO::2017-02-01 23:31:42,590::__init__::513:: jsonrpc.JsonRpcServer::(_serveRequest) RPC call Host.getAllVmStats succeeded in 0.00 seconds libvirtEventLoop::INFO::2017-02-01 23:31:43,586::vm::4041::virt.vm::(onIOError) vmId=`932db7c7-4121-4cbe-ad8d-09e4e99b3cdd`::abnormal vm stop device virtio-disk0 error enospc libvirtEventLoop::INFO::2017-02-01 23:31:43,586::vm::4877::virt.vm::(_logGuestCpuStatus) vmId=`932db7c7-4121-4cbe-ad8d-09e4e99b3cdd`::CPU stopped: onIOError periodic/6::WARNING::2017-02-01 23:31:43,698::periodic::277:: virt.periodic.VmDispatcher::(__call__) could not run <class 'vdsm.virt.periodic.DriveWatermarkMonitor'> on [u'932db7c7-4121-4cbe-ad8d- 09e4e99b3cdd'] mailbox.HSMMonitor::DEBUG::2017-02-01 23:31:43,790::storage_mailbox: :86::Storage.Misc.excCmd::(_mboxExecCmd) SUCCESS: <err> = '8+0 records in\n8+0 records out\n4096 bytes (4.1 kB) copied, 3.95001 s, 1.0 kB/s\n'; <rc> = 0 The disk of the VM is 90Gb in virtual size, thin provisioned Output of the commands inside guest until Ctrl+C (I interrupted the lop as I received the VM paused error...): [g.cecchi@ol65 ~]$ while true; do time dd if=/dev/zero bs=1024k count=1024 of=/home/g.cecchi/testfile; sleep 5; done 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 9.27353 s, 116 MB/s real 0m9.353s user 0m0.007s sys 0m3.174s 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 9.68965 s, 111 MB/s real 0m9.929s user 0m0.000s sys 0m2.972s 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 9.16642 s, 117 MB/s real 0m9.412s user 0m0.008s sys 0m2.766s 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 2.68209 s, 400 MB/s real 0m2.931s user 0m0.004s sys 0m2.927s 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 6.11601 s, 176 MB/s real 0m9.158s user 0m0.002s sys 0m2.922s 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 2.67673 s, 401 MB/s real 0m2.921s user 0m0.004s sys 0m2.916s 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 4.10537 s, 262 MB/s real 0m6.885s user 0m0.009s sys 0m2.984s 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 5.56343 s, 193 MB/s real 0m7.866s user 0m0.004s sys 0m3.016s ^C For the VM disk I get this info: [root@ovmsrv06 vdsm]# qemu-img info /rhev/data-center/mnt/blockSD/922b5269-ab56-4c4d-838f-49d33427e2ab/images/32b0e94d-0c8e-4d3d-a4eb-448885297e93/66615920-619f-4f52-ad5a-3f062c054094 image: /rhev/data-center/mnt/blockSD/922b5269-ab56-4c4d-838f-49d33427e2ab/images/32b0e94d-0c8e-4d3d-a4eb-448885297e93/66615920-619f-4f52-ad5a-3f062c054094 file format: qcow2 virtual size: 90G (96636764160 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 0.10 refcount bits: 16 [root@ovmsrv06 vdsm]# Apparently disk size 0????? In cat /rhev/data-center/mnt/blockSD/922b5269-ab56-4c4d-838f-49d33427e2ab/dom_md/metadata I get EOF DOMAIN=922b5269-ab56-4c4d-838f-49d33427e2ab CTIME=1485272737 FORMAT=COW DISKTYPE=2 LEGALITY=LEGAL SIZE=188743680 VOLTYPE=LEAF DESCRIPTION={"DiskAlias":"ol65_Disk1","DiskDescription":""} IMAGE=32b0e94d-0c8e-4d3d-a4eb-448885297e93 PUUID=00000000-0000-0000-0000-000000000000 MTIME=0 POOL_UUID= TYPE=SPARSE EOF