<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Feb 1, 2017 at 8:22 PM, Gianluca Cecchi <span dir="ltr"><<a href="mailto:gianluca.cecchi@gmail.com" target="_blank">gianluca.cecchi@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="auto"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="gmail-"></span></div><br></div><div class="gmail_extra">OK. In the mean time I have applied your suggested config and restarted the 2 nodes.<br></div><div class="gmail_extra">Let we test and see if I find any problems running also some I/O tests.<br></div><div class="gmail_extra">Thanks in the mean time,<br></div><div class="gmail_extra">Gianluca<br></div></div></div>
</div>
</blockquote></div><br><br><div class="gmail_extra">Quick test without much success<br><br></div><div class="gmail_extra">Inside the guest I run this loop<br>while true <br>do <br>time dd if=/dev/zero bs=1024k count=1024 of=/home/g.cecchi/testfile<br>sleep 5<br>done<br><br></div><div class="gmail_extra">BTW: my home is inside / filesystem on guest that has space to accomodate 1Gb of the dd command:<br>[g.cecchi@ol65 ~]$ df -h /home/g.cecchi/<br>Filesystem Size Used Avail Use% Mounted on<br>/dev/mapper/vg_ol65-lv_root<br> 20G 4.9G 14G 27% /<br>[g.cecchi@ol65 ~]$ <br><br></div><div class="gmail_extra">After about 7 rounds I get this in messages of the host where the VM is running:<br><br>Feb 1 23:31:39 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb 1 23:31:39 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb 1 23:31:44 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb 1 23:31:45 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb 1 23:31:45 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb 1 23:31:47 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb 1 23:31:50 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb 1 23:31:50 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb 1 23:31:56 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb 1 23:31:57 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br><br></div><div class="gmail_extra">Nothing on the other host.<br></div>In web admin events pane:<br>Feb 1, 2017 11:31:44 PM VM ol65 has been paused due to no Storage space error.<br>Feb 1, 2017 11:31:44 PM VM ol65 has been paused.<br><br><div class="gmail_extra">I stop the dd loop and after some seconds:<br>Feb 1, 2017 11:32:32 PM VM ol65 has recovered from paused back to up<br><br>Multipath status for my device:<br><br>3600a0b8000299aa80000d08b55014<wbr>119 dm-2 IBM ,1814 FAStT <br>size=4.0T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw<br>|-+- policy='service-time 0' prio=0 status=active<br>| |- 0:0:1:3 sdj 8:144 active undef running<br>| `- 2:0:1:3 sdp 8:240 active undef running<br>`-+- policy='service-time 0' prio=0 status=enabled<br> |- 0:0:0:3 sdd 8:48 active undef running<br> `- 2:0:0:3 sdk 8:160 active undef running<br><br></div><div class="gmail_extra">In engine.log<br><br>2017-02-01
23:22:01,449 INFO
[org.ovirt.engine.core.bll.aaa.CreateUserSessionCommand] (default
task-15) [530aee87] Running command: CreateUserSessionCommand internal:
false.<br>2017-02-01 23:22:04,011 INFO
[org.ovirt.engine.docs.utils.servlet.ContextSensitiveHelpMappingServlet]
(default task-12) [] Context-sensitive help is not installed. Manual
directory doesn't exist: /usr/share/ovirt-engine/manual<br>2017-02-01
23:31:43,936 INFO
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
(ForkJoinPool-1-worker-10) [10c15e39] VM
'932db7c7-4121-4cbe-ad8d-09e4e99b3cdd'(ol65) moved from 'Up' -->
'Paused'<br>2017-02-01 23:31:44,087 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(ForkJoinPool-1-worker-10) [10c15e39] Correlation ID: null, Call Stack:
null, Custom Event ID: -1, Message: VM ol65 has been paused.<br>2017-02-01
23:31:44,227 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(ForkJoinPool-1-worker-10) [10c15e39] Correlation ID: null, Call Stack:
null, Custom Event ID: -1, Message: VM ol65 has been paused due to no
Storage space error.<br>2017-02-01 23:32:32,100 INFO
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
(DefaultQuartzScheduler5) [1c02a371] VM
'932db7c7-4121-4cbe-ad8d-09e4e99b3cdd'(ol65) moved from 'Paused' -->
'Up'<br>2017-02-01 23:32:32,259 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler5) [1c02a371] Correlation ID: null, Call Stack:
null, Custom Event ID: -1, Message: VM ol65 has recovered from paused
back to up.<br><br><br></div><div class="gmail_extra">In vdsm.log of the host where the VM is running I see this:<br><br>mailbox.SPMMonitor::DEBUG::<wbr>2017-02-01 23:31:37,729::storage_mailbox:<wbr>:733::Storage.Misc.excCmd::(_<wbr>chec<br>kForMail) /usr/bin/taskset --cpu-list 0-7 dd if=/rhev/data-center/588237b8-<wbr>0031-02f6-035d-00000000013<br>6/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None)<br>mailbox.SPMMonitor::DEBUG::<wbr>2017-02-01 23:31:37,764::storage_mailbox:<wbr>:733::Storage.Misc.excCmd::(_<wbr>chec<br>kForMail) SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0061866<br> s, 166 MB/s\n'; <rc> = 0<br>periodic/8::INFO::2017-02-01 23:31:39,699::vm::1028::virt.<wbr>vm::(extendDrivesIfNeeded) vmId=`932db7c7-4<br>121-4cbe-ad8d-09e4e99b3cdd`::<wbr>Requesting extension for volume 66615920-619f-4f52-ad5a-<wbr>3f062c054094 on <br>domain 922b5269-ab56-4c4d-838f-<wbr>49d33427e2ab (apparent: 12884901888, capacity: 96636764160, allocated:<br> 12355698688, physical: 12884901888)<br>periodic/8::DEBUG::2017-02-01 23:31:39,699::vm::1093::virt.<wbr>vm::(__extendDriveVolume) vmId=`932db7c7-4<br>121-4cbe-ad8d-09e4e99b3cdd`::<wbr>Requesting an extension for the volume: {'newSize': 13958643712, 'domain<br>ID': u'922b5269-ab56-4c4d-838f-<wbr>49d33427e2ab', 'name': u'vda', 'volumeID': u'66615920-619f-4f52-ad5a-3<br>f062c054094', 'imageID': u'32b0e94d-0c8e-4d3d-a4eb-<wbr>448885297e93', 'internal': False, 'poolID': u'5882<br>37b8-0031-02f6-035d-<wbr>000000000136'}<br>periodic/8::DEBUG::2017-02-01 23:31:39,700::task::597::<wbr>Storage.TaskManager.Task::(_<wbr>updateState) Task=<br>`d5169f55-8fa6-494e-bf3c-<wbr>b077b922e06c`::moving from state init -> state preparing<br>periodic/8::INFO::2017-02-01 23:31:39,700::logUtils::49::<wbr>dispatcher::(wrapper) Run and protect: sendExtendMsg(spUUID=u'<wbr>588237b8-0031-02f6-035d-<wbr>000000000136', volDict={'newSize': 13958643712, 'domainID': u'922b5269-ab56-4c4d-838f-<wbr>49d33427e2ab', 'name': u'vda', 'volumeID': u'66615920-619f-4f52-ad5a-<wbr>3f062c054094', 'imageID': u'32b0e94d-0c8e-4d3d-a4eb-<wbr>448885297e93', 'internal': False, 'poolID': u'588237b8-0031-02f6-035d-<wbr>000000000136'},
newSize=13958643712, callbackFunc=<bound method
Vm.__afterVolumeExtension of <virt.vm.Vm object at 0x2feb5d0>>)<br>periodic/8::DEBUG::2017-02-01 23:31:39,701::storage_mailbox:<wbr>:125::Storage.SPM.Messages.<wbr>Extend::(__init__) new extend msg created: domain: 922b5269-ab56-4c4d-838f-<wbr>49d33427e2ab, volume: 66615920-619f-4f52-ad5a-<wbr>3f062c054094<br>periodic/8::INFO::2017-02-01 23:31:39,701::logUtils::52::<wbr>dispatcher::(wrapper) Run and protect: sendExtendMsg, Return response: None<br>periodic/8::DEBUG::2017-02-01 23:31:39,701::task::1193::<wbr>Storage.TaskManager.Task::(<wbr>prepare) Task=`d5169f55-8fa6-494e-bf3c-<wbr>b077b922e06c`::finished: None<br>periodic/8::DEBUG::2017-02-01 23:31:39,701::task::597::<wbr>Storage.TaskManager.Task::(_<wbr>updateState) Task=`d5169f55-8fa6-494e-bf3c-<wbr>b077b922e06c`::moving from state preparing -> state finished<br>periodic/8::DEBUG::2017-02-01 23:31:39,701::resourceManager:<wbr>:952::Storage.ResourceManager.<wbr>Owner::(releaseAll) Owner.releaseAll requests {} resources {}<br>periodic/8::DEBUG::2017-02-01 23:31:39,701::resourceManager:<wbr>:989::Storage.ResourceManager.<wbr>Owner::(cancelAll) Owner.cancelAll requests {}<br>periodic/8::DEBUG::2017-02-01 23:31:39,701::task::995::<wbr>Storage.TaskManager.Task::(_<wbr>decref) Task=`d5169f55-8fa6-494e-bf3c-<wbr>b077b922e06c`::ref 0 aborting False<br>mailbox.HSMMonitor::DEBUG::<wbr>2017-02-01 23:31:39,702::storage_mailbox:<wbr>:428::Storage.MailBox.<wbr>HsmMailMonitor::(_<wbr>handleMessage) HSM_MailMonitor - start: 64, end: 128, len: 4096, message(1/63): "1xtnd\xab\xe2'4\xd3I\x8f\<wbr>x83MLV\xabiR+\x92\x94@\x05,\<wbr>x06?Z\xadRO\x9fa Yaf000000000000340000000000000<wbr>"<br>mailbox.HSMMonitor::DEBUG::<wbr>2017-02-01 23:31:39,702::storage_mailbox:<wbr>:375::Storage.Misc.excCmd::(_<wbr>checkForMail) /usr/bin/taskset --cpu-list 0-7 /usr/bin/dd if=/rhev/data-center/588237b8-<wbr>0031-02f6-035d-00<br></div><div class="gmail_extra">0000000136/mastersd/dom_md/<wbr>outbox iflag=direct,fullblock bs=512 count=8 skip=16 (cwd None)<br>mailbox.SPMMonitor::DEBUG::<wbr>2017-02-01 23:31:39,790::storage_mailbox:<wbr>:733::Storage.Misc.excCmd::(_<wbr>checkForMail) /usr/bin/taskset --cpu-list 0-7 dd if=/rhev/data-center/588237b8-<wbr>0031-02f6-035d-000000000136/<wbr>mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None)<br>mailbox.HSMMonitor::DEBUG::<wbr>2017-02-01 23:31:39,815::storage_mailbox:<wbr>:375::Storage.Misc.excCmd::(_<wbr>checkForMail)
SUCCESS: <err> = '8+0 records in\n8+0 records out\n4096 bytes
(4.1 kB) copied, 0.0900633 s, 45.5 kB/s\n'; <rc> = 0<br>mailbox.HSMMonitor::INFO::<wbr>2017-02-01 23:31:39,816::storage_mailbox:<wbr>:387::Storage.MailBox.<wbr>HsmMailMonitor::(_sendMail) HSM_MailMonitor sending mail to SPM - ['/usr/bin/dd', 'of=/rhev/data-center/<wbr>588237b8-0031-02f6-035d-<wbr>000000000136/mastersd/dom_md/<wbr>inbox', 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=512', 'seek=16']<br>mailbox.HSMMonitor::DEBUG::<wbr>2017-02-01 23:31:39,816::storage_mailbox:<wbr>:86::Storage.Misc.excCmd::(_<wbr>mboxExecCmd) /usr/bin/taskset --cpu-list 0-7 /usr/bin/dd of=/rhev/data-center/588237b8-<wbr>0031-02f6-035d-000000000136/<wbr>mastersd/dom_md/inbox iflag=fullblock oflag=direct conv=notrunc bs=512 seek=16 (cwd None)<br>mailbox.SPMMonitor::DEBUG::<wbr>2017-02-01 23:31:39,844::storage_mailbox:<wbr>:733::Storage.Misc.excCmd::(_<wbr>checkForMail)
SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes
(1.0 MB) copied, 0.0290379 s, 35.3 MB/s\n'; <rc> = 0<br>periodic/7::INFO::2017-02-01 23:31:41,700::vm::1028::virt.<wbr>vm::(extendDrivesIfNeeded) vmId=`932db7c7-4121-4cbe-ad8d-<wbr>09e4e99b3cdd`::Requesting extension for volume 66615920-619f-4f52-ad5a-<wbr>3f062c054094 on domain 922b5269-ab56-4c4d-838f-<wbr>49d33427e2ab (apparent: 12884901888, capacity: 96636764160, allocated: 12617187328, physical: 12884901888)<br>...<br>mailbox.SPMMonitor::DEBUG::<wbr>2017-02-01 23:31:42,181::storage_mailbox:<wbr>:733::Storage.Misc.excCmd::(_<wbr>checkForMail)
SUCCESS: <err> = '1+0 records in\n1+0 records out\n1024000 bytes
(1.0 MB) copied, 0.30031 s, 3.4 MB/s\n'; <rc> = 0<br>mailbox.SPMMonitor::ERROR::<wbr>2017-02-01 23:31:42,182::storage_mailbox:<wbr>:619::Storage.MailBox.<wbr>SpmMailMonitor::(_<wbr>validateMailbox) SPM_MailMonitor: mailbox 2 checksum failed, not clearing mailbox, clearing newMail.<br>jsonrpc.Executor/6::DEBUG::<wbr>2017-02-01 23:31:42,587::__init__::530::<wbr>jsonrpc.JsonRpcServer::(_<wbr>handle_request) Calling 'Host.getAllVmStats' in bridge with {}<br>jsonrpc.Executor/6::DEBUG::<wbr>2017-02-01 23:31:42,589::__init__::555::<wbr>jsonrpc.JsonRpcServer::(_<wbr>handle_request) Return 'Host.getAllVmStats' in bridge with (suppressed)<br>jsonrpc.Executor/6::INFO::<wbr>2017-02-01 23:31:42,590::__init__::513::<wbr>jsonrpc.JsonRpcServer::(_<wbr>serveRequest) RPC call Host.getAllVmStats succeeded in 0.00 seconds<br>libvirtEventLoop::INFO::2017-<wbr>02-01 23:31:43,586::vm::4041::virt.<wbr>vm::(onIOError) vmId=`932db7c7-4121-4cbe-ad8d-<wbr>09e4e99b3cdd`::abnormal vm stop device virtio-disk0 error enospc<br>libvirtEventLoop::INFO::2017-<wbr>02-01 23:31:43,586::vm::4877::virt.<wbr>vm::(_logGuestCpuStatus) vmId=`932db7c7-4121-4cbe-ad8d-<wbr>09e4e99b3cdd`::CPU stopped: onIOError<br>periodic/6::WARNING::2017-02-<wbr>01 23:31:43,698::periodic::277::<wbr>virt.periodic.VmDispatcher::(_<wbr>_call__) could not run <class 'vdsm.virt.periodic.<wbr>DriveWatermarkMonitor'> on [u'932db7c7-4121-4cbe-ad8d-<wbr>09e4e99b3cdd']<br>mailbox.HSMMonitor::DEBUG::<wbr>2017-02-01 23:31:43,790::storage_mailbox:<wbr>:86::Storage.Misc.excCmd::(_<wbr>mboxExecCmd)
SUCCESS: <err> = '8+0 records in\n8+0 records out\n4096 bytes
(4.1 kB) copied, 3.95001 s, 1.0 kB/s\n'; <rc> = 0<br><br></div><div class="gmail_extra">The disk of the VM is 90Gb in virtual size, thin provisioned<br></div><br>Output of the commands inside guest until Ctrl+C (I interrupted the lop as I received the VM paused error...):<br><br>[g.cecchi@ol65 ~]$ while true; do time dd if=/dev/zero bs=1024k count=1024 of=/home/g.cecchi/testfile; sleep 5; done<br>1024+0 records in<br>1024+0 records out<br>1073741824 bytes (1.1 GB) copied, 9.27353 s, 116 MB/s<br><br>real 0m9.353s<br>user 0m0.007s<br>sys 0m3.174s<br><br><br><br>1024+0 records in<br>1024+0 records out<br>1073741824 bytes (1.1 GB) copied, 9.68965 s, 111 MB/s<br><br>real 0m9.929s<br>user 0m0.000s<br>sys 0m2.972s<br>1024+0 records in<br>1024+0 records out<br>1073741824 bytes (1.1 GB) copied, 9.16642 s, 117 MB/s<br><br>real 0m9.412s<br>user 0m0.008s<br>sys 0m2.766s<br>1024+0 records in<br>1024+0 records out<br>1073741824 bytes (1.1 GB) copied, 2.68209 s, 400 MB/s<br><br>real 0m2.931s<br>user 0m0.004s<br>sys 0m2.927s<br>1024+0 records in<br>1024+0 records out<br>1073741824 bytes (1.1 GB) copied, 6.11601 s, 176 MB/s<br><br>real 0m9.158s<br>user 0m0.002s<br>sys 0m2.922s<br>1024+0 records in<br>1024+0 records out<br>1073741824 bytes (1.1 GB) copied, 2.67673 s, 401 MB/s<br><br>real 0m2.921s<br>user 0m0.004s<br>sys 0m2.916s<br>1024+0 records in<br>1024+0 records out<br>1073741824 bytes (1.1 GB) copied, 4.10537 s, 262 MB/s<br><br>real 0m6.885s<br>user 0m0.009s<br>sys 0m2.984s<br>1024+0 records in<br>1024+0 records out<br>1073741824 bytes (1.1 GB) copied, 5.56343 s, 193 MB/s<br><br>real 0m7.866s<br>user 0m0.004s<br>sys 0m3.016s<br>^C<br><br>For the VM disk I get this info:<br><br>[root@ovmsrv06
vdsm]# qemu-img info
/rhev/data-center/mnt/blockSD/922b5269-ab56-4c4d-838f-49d33427e2ab/images/32b0e94d-0c8e-4d3d-a4eb-448885297e93/66615920-619f-4f52-ad5a-3f062c054094
<br>image:
/rhev/data-center/mnt/blockSD/922b5269-ab56-4c4d-838f-49d33427e2ab/images/32b0e94d-0c8e-4d3d-a4eb-448885297e93/66615920-619f-4f52-ad5a-3f062c054094<br>file format: qcow2<br>virtual size: 90G (96636764160 bytes)<br>disk size: 0<br>cluster_size: 65536<br>Format specific information:<br> compat: 0.10<br> refcount bits: 16<br>[root@ovmsrv06 vdsm]# <br><br>Apparently disk size 0?????<br><br>In <br>cat /rhev/data-center/mnt/blockSD/922b5269-ab56-4c4d-838f-49d33427e2ab/dom_md/metadata<br><br>I get<br>EOF<br>DOMAIN=922b5269-ab56-4c4d-838f-49d33427e2ab<br>CTIME=1485272737<br>FORMAT=COW<br>DISKTYPE=2<br>LEGALITY=LEGAL<br>SIZE=188743680<br>VOLTYPE=LEAF<br>DESCRIPTION={"DiskAlias":"ol65_Disk1","DiskDescription":""}<br>IMAGE=32b0e94d-0c8e-4d3d-a4eb-448885297e93<br>PUUID=00000000-0000-0000-0000-000000000000<br>MTIME=0<br>POOL_UUID=<br>TYPE=SPARSE<br>EOF<br></div></div>