<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Feb 1, 2017 at 8:22 PM, Gianluca Cecchi <span dir="ltr">&lt;<a href="mailto:gianluca.cecchi@gmail.com" target="_blank">gianluca.cecchi@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="auto"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="gmail-"></span></div><br></div><div class="gmail_extra">OK. In the mean time I have applied your suggested config and restarted the 2 nodes.<br></div><div class="gmail_extra">Let we test and see if I find any problems running also some I/O tests.<br></div><div class="gmail_extra">Thanks in the mean time,<br></div><div class="gmail_extra">Gianluca<br></div></div></div>
</div>
</blockquote></div><br><br><div class="gmail_extra">Quick test without much success<br><br></div><div class="gmail_extra">Inside the guest I run this loop<br>while true <br>do <br>time dd if=/dev/zero bs=1024k count=1024 of=/home/g.cecchi/testfile<br>sleep 5<br>done<br><br></div><div class="gmail_extra">BTW: my home is inside / filesystem on guest that has space to accomodate 1Gb of the dd command:<br>[g.cecchi@ol65 ~]$ df -h /home/g.cecchi/<br>Filesystem            Size  Used Avail Use% Mounted on<br>/dev/mapper/vg_ol65-lv_root<br>                       20G  4.9G   14G  27% /<br>[g.cecchi@ol65 ~]$ <br><br></div><div class="gmail_extra">After about 7 rounds I get this in messages of the host where the VM is running:<br><br>Feb  1 23:31:39 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb  1 23:31:39 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb  1 23:31:44 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb  1 23:31:45 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb  1 23:31:45 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb  1 23:31:47 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb  1 23:31:50 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb  1 23:31:50 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb  1 23:31:56 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br>Feb  1 23:31:57 ovmsrv06 kernel: dd: sending ioctl 80306d02 to a partition!<br><br></div><div class="gmail_extra">Nothing on the other host.<br></div>In web admin events pane:<br>Feb 1, 2017 11:31:44 PM VM ol65 has been paused due to no Storage space error.<br>Feb 1, 2017 11:31:44 PM VM ol65 has been paused.<br><br><div class="gmail_extra">I stop the dd loop and after some seconds:<br>Feb 1, 2017 11:32:32 PM VM ol65 has recovered from paused back to up<br><br>Multipath status for my device:<br><br>3600a0b8000299aa80000d08b55014<wbr>119 dm-2 IBM     ,1814      FAStT <br>size=4.0T features=&#39;3 queue_if_no_path pg_init_retries 50&#39; hwhandler=&#39;1 rdac&#39; wp=rw<br>|-+- policy=&#39;service-time 0&#39; prio=0 status=active<br>| |- 0:0:1:3 sdj 8:144 active undef running<br>| `- 2:0:1:3 sdp 8:240 active undef running<br>`-+- policy=&#39;service-time 0&#39; prio=0 status=enabled<br>  |- 0:0:0:3 sdd 8:48  active undef running<br>  `- 2:0:0:3 sdk 8:160 active undef running<br><br></div><div class="gmail_extra">In engine.log<br><br>2017-02-01
 23:22:01,449 INFO  
[org.ovirt.engine.core.bll.aaa.CreateUserSessionCommand] (default 
task-15) [530aee87] Running command: CreateUserSessionCommand internal: 
false.<br>2017-02-01 23:22:04,011 INFO  
[org.ovirt.engine.docs.utils.servlet.ContextSensitiveHelpMappingServlet]
 (default task-12) [] Context-sensitive help is not installed. Manual 
directory doesn&#39;t exist: /usr/share/ovirt-engine/manual<br>2017-02-01 
23:31:43,936 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(ForkJoinPool-1-worker-10) [10c15e39] VM 
&#39;932db7c7-4121-4cbe-ad8d-09e4e99b3cdd&#39;(ol65) moved from &#39;Up&#39; --&gt; 
&#39;Paused&#39;<br>2017-02-01 23:31:44,087 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(ForkJoinPool-1-worker-10) [10c15e39] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM ol65 has been paused.<br>2017-02-01
 23:31:44,227 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(ForkJoinPool-1-worker-10) [10c15e39] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM ol65 has been paused due to no 
Storage space error.<br>2017-02-01 23:32:32,100 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(DefaultQuartzScheduler5) [1c02a371] VM 
&#39;932db7c7-4121-4cbe-ad8d-09e4e99b3cdd&#39;(ol65) moved from &#39;Paused&#39; --&gt; 
&#39;Up&#39;<br>2017-02-01 23:32:32,259 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler5) [1c02a371] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VM ol65 has recovered from paused 
back to up.<br><br><br></div><div class="gmail_extra">In vdsm.log of the host where the VM is running I see this:<br><br>mailbox.SPMMonitor::DEBUG::<wbr>2017-02-01 23:31:37,729::storage_mailbox:<wbr>:733::Storage.Misc.excCmd::(_<wbr>chec<br>kForMail) /usr/bin/taskset --cpu-list 0-7 dd if=/rhev/data-center/588237b8-<wbr>0031-02f6-035d-00000000013<br>6/mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None)<br>mailbox.SPMMonitor::DEBUG::<wbr>2017-02-01 23:31:37,764::storage_mailbox:<wbr>:733::Storage.Misc.excCmd::(_<wbr>chec<br>kForMail) SUCCESS: &lt;err&gt; = &#39;1+0 records in\n1+0 records out\n1024000 bytes (1.0 MB) copied, 0.0061866<br> s, 166 MB/s\n&#39;; &lt;rc&gt; = 0<br>periodic/8::INFO::2017-02-01 23:31:39,699::vm::1028::virt.<wbr>vm::(extendDrivesIfNeeded) vmId=`932db7c7-4<br>121-4cbe-ad8d-09e4e99b3cdd`::<wbr>Requesting extension for volume 66615920-619f-4f52-ad5a-<wbr>3f062c054094 on <br>domain 922b5269-ab56-4c4d-838f-<wbr>49d33427e2ab (apparent: 12884901888, capacity: 96636764160, allocated:<br> 12355698688, physical: 12884901888)<br>periodic/8::DEBUG::2017-02-01 23:31:39,699::vm::1093::virt.<wbr>vm::(__extendDriveVolume) vmId=`932db7c7-4<br>121-4cbe-ad8d-09e4e99b3cdd`::<wbr>Requesting an extension for the volume: {&#39;newSize&#39;: 13958643712, &#39;domain<br>ID&#39;: u&#39;922b5269-ab56-4c4d-838f-<wbr>49d33427e2ab&#39;, &#39;name&#39;: u&#39;vda&#39;, &#39;volumeID&#39;: u&#39;66615920-619f-4f52-ad5a-3<br>f062c054094&#39;, &#39;imageID&#39;: u&#39;32b0e94d-0c8e-4d3d-a4eb-<wbr>448885297e93&#39;, &#39;internal&#39;: False, &#39;poolID&#39;: u&#39;5882<br>37b8-0031-02f6-035d-<wbr>000000000136&#39;}<br>periodic/8::DEBUG::2017-02-01 23:31:39,700::task::597::<wbr>Storage.TaskManager.Task::(_<wbr>updateState) Task=<br>`d5169f55-8fa6-494e-bf3c-<wbr>b077b922e06c`::moving from state init -&gt; state preparing<br>periodic/8::INFO::2017-02-01 23:31:39,700::logUtils::49::<wbr>dispatcher::(wrapper) Run and protect: sendExtendMsg(spUUID=u&#39;<wbr>588237b8-0031-02f6-035d-<wbr>000000000136&#39;, volDict={&#39;newSize&#39;: 13958643712, &#39;domainID&#39;: u&#39;922b5269-ab56-4c4d-838f-<wbr>49d33427e2ab&#39;, &#39;name&#39;: u&#39;vda&#39;, &#39;volumeID&#39;: u&#39;66615920-619f-4f52-ad5a-<wbr>3f062c054094&#39;, &#39;imageID&#39;: u&#39;32b0e94d-0c8e-4d3d-a4eb-<wbr>448885297e93&#39;, &#39;internal&#39;: False, &#39;poolID&#39;: u&#39;588237b8-0031-02f6-035d-<wbr>000000000136&#39;},
 newSize=13958643712, callbackFunc=&lt;bound method 
Vm.__afterVolumeExtension of &lt;virt.vm.Vm object at 0x2feb5d0&gt;&gt;)<br>periodic/8::DEBUG::2017-02-01 23:31:39,701::storage_mailbox:<wbr>:125::Storage.SPM.Messages.<wbr>Extend::(__init__) new extend msg created: domain: 922b5269-ab56-4c4d-838f-<wbr>49d33427e2ab, volume: 66615920-619f-4f52-ad5a-<wbr>3f062c054094<br>periodic/8::INFO::2017-02-01 23:31:39,701::logUtils::52::<wbr>dispatcher::(wrapper) Run and protect: sendExtendMsg, Return response: None<br>periodic/8::DEBUG::2017-02-01 23:31:39,701::task::1193::<wbr>Storage.TaskManager.Task::(<wbr>prepare) Task=`d5169f55-8fa6-494e-bf3c-<wbr>b077b922e06c`::finished: None<br>periodic/8::DEBUG::2017-02-01 23:31:39,701::task::597::<wbr>Storage.TaskManager.Task::(_<wbr>updateState) Task=`d5169f55-8fa6-494e-bf3c-<wbr>b077b922e06c`::moving from state preparing -&gt; state finished<br>periodic/8::DEBUG::2017-02-01 23:31:39,701::resourceManager:<wbr>:952::Storage.ResourceManager.<wbr>Owner::(releaseAll) Owner.releaseAll requests {} resources {}<br>periodic/8::DEBUG::2017-02-01 23:31:39,701::resourceManager:<wbr>:989::Storage.ResourceManager.<wbr>Owner::(cancelAll) Owner.cancelAll requests {}<br>periodic/8::DEBUG::2017-02-01 23:31:39,701::task::995::<wbr>Storage.TaskManager.Task::(_<wbr>decref) Task=`d5169f55-8fa6-494e-bf3c-<wbr>b077b922e06c`::ref 0 aborting False<br>mailbox.HSMMonitor::DEBUG::<wbr>2017-02-01 23:31:39,702::storage_mailbox:<wbr>:428::Storage.MailBox.<wbr>HsmMailMonitor::(_<wbr>handleMessage) HSM_MailMonitor - start: 64, end: 128, len: 4096, message(1/63): &quot;1xtnd\xab\xe2&#39;4\xd3I\x8f\<wbr>x83MLV\xabiR+\x92\x94@\x05,\<wbr>x06?Z\xadRO\x9fa Yaf000000000000340000000000000<wbr>&quot;<br>mailbox.HSMMonitor::DEBUG::<wbr>2017-02-01 23:31:39,702::storage_mailbox:<wbr>:375::Storage.Misc.excCmd::(_<wbr>checkForMail) /usr/bin/taskset --cpu-list 0-7 /usr/bin/dd if=/rhev/data-center/588237b8-<wbr>0031-02f6-035d-00<br></div><div class="gmail_extra">0000000136/mastersd/dom_md/<wbr>outbox iflag=direct,fullblock bs=512 count=8 skip=16 (cwd None)<br>mailbox.SPMMonitor::DEBUG::<wbr>2017-02-01 23:31:39,790::storage_mailbox:<wbr>:733::Storage.Misc.excCmd::(_<wbr>checkForMail) /usr/bin/taskset --cpu-list 0-7 dd if=/rhev/data-center/588237b8-<wbr>0031-02f6-035d-000000000136/<wbr>mastersd/dom_md/inbox iflag=direct,fullblock count=1 bs=1024000 (cwd None)<br>mailbox.HSMMonitor::DEBUG::<wbr>2017-02-01 23:31:39,815::storage_mailbox:<wbr>:375::Storage.Misc.excCmd::(_<wbr>checkForMail)
 SUCCESS: &lt;err&gt; = &#39;8+0 records in\n8+0 records out\n4096 bytes 
(4.1 kB) copied, 0.0900633 s, 45.5 kB/s\n&#39;; &lt;rc&gt; = 0<br>mailbox.HSMMonitor::INFO::<wbr>2017-02-01 23:31:39,816::storage_mailbox:<wbr>:387::Storage.MailBox.<wbr>HsmMailMonitor::(_sendMail) HSM_MailMonitor sending mail to SPM - [&#39;/usr/bin/dd&#39;, &#39;of=/rhev/data-center/<wbr>588237b8-0031-02f6-035d-<wbr>000000000136/mastersd/dom_md/<wbr>inbox&#39;, &#39;iflag=fullblock&#39;, &#39;oflag=direct&#39;, &#39;conv=notrunc&#39;, &#39;bs=512&#39;, &#39;seek=16&#39;]<br>mailbox.HSMMonitor::DEBUG::<wbr>2017-02-01 23:31:39,816::storage_mailbox:<wbr>:86::Storage.Misc.excCmd::(_<wbr>mboxExecCmd) /usr/bin/taskset --cpu-list 0-7 /usr/bin/dd of=/rhev/data-center/588237b8-<wbr>0031-02f6-035d-000000000136/<wbr>mastersd/dom_md/inbox iflag=fullblock oflag=direct conv=notrunc bs=512 seek=16 (cwd None)<br>mailbox.SPMMonitor::DEBUG::<wbr>2017-02-01 23:31:39,844::storage_mailbox:<wbr>:733::Storage.Misc.excCmd::(_<wbr>checkForMail)
 SUCCESS: &lt;err&gt; = &#39;1+0 records in\n1+0 records out\n1024000 bytes 
(1.0 MB) copied, 0.0290379 s, 35.3 MB/s\n&#39;; &lt;rc&gt; = 0<br>periodic/7::INFO::2017-02-01 23:31:41,700::vm::1028::virt.<wbr>vm::(extendDrivesIfNeeded) vmId=`932db7c7-4121-4cbe-ad8d-<wbr>09e4e99b3cdd`::Requesting extension for volume 66615920-619f-4f52-ad5a-<wbr>3f062c054094 on domain 922b5269-ab56-4c4d-838f-<wbr>49d33427e2ab (apparent: 12884901888, capacity: 96636764160, allocated: 12617187328, physical: 12884901888)<br>...<br>mailbox.SPMMonitor::DEBUG::<wbr>2017-02-01 23:31:42,181::storage_mailbox:<wbr>:733::Storage.Misc.excCmd::(_<wbr>checkForMail)
 SUCCESS: &lt;err&gt; = &#39;1+0 records in\n1+0 records out\n1024000 bytes 
(1.0 MB) copied, 0.30031 s, 3.4 MB/s\n&#39;; &lt;rc&gt; = 0<br>mailbox.SPMMonitor::ERROR::<wbr>2017-02-01 23:31:42,182::storage_mailbox:<wbr>:619::Storage.MailBox.<wbr>SpmMailMonitor::(_<wbr>validateMailbox) SPM_MailMonitor: mailbox 2 checksum failed, not clearing mailbox, clearing newMail.<br>jsonrpc.Executor/6::DEBUG::<wbr>2017-02-01 23:31:42,587::__init__::530::<wbr>jsonrpc.JsonRpcServer::(_<wbr>handle_request) Calling &#39;Host.getAllVmStats&#39; in bridge with {}<br>jsonrpc.Executor/6::DEBUG::<wbr>2017-02-01 23:31:42,589::__init__::555::<wbr>jsonrpc.JsonRpcServer::(_<wbr>handle_request) Return &#39;Host.getAllVmStats&#39; in bridge with (suppressed)<br>jsonrpc.Executor/6::INFO::<wbr>2017-02-01 23:31:42,590::__init__::513::<wbr>jsonrpc.JsonRpcServer::(_<wbr>serveRequest) RPC call Host.getAllVmStats succeeded in 0.00 seconds<br>libvirtEventLoop::INFO::2017-<wbr>02-01 23:31:43,586::vm::4041::virt.<wbr>vm::(onIOError) vmId=`932db7c7-4121-4cbe-ad8d-<wbr>09e4e99b3cdd`::abnormal vm stop device virtio-disk0 error enospc<br>libvirtEventLoop::INFO::2017-<wbr>02-01 23:31:43,586::vm::4877::virt.<wbr>vm::(_logGuestCpuStatus) vmId=`932db7c7-4121-4cbe-ad8d-<wbr>09e4e99b3cdd`::CPU stopped: onIOError<br>periodic/6::WARNING::2017-02-<wbr>01 23:31:43,698::periodic::277::<wbr>virt.periodic.VmDispatcher::(_<wbr>_call__) could not run &lt;class &#39;vdsm.virt.periodic.<wbr>DriveWatermarkMonitor&#39;&gt; on [u&#39;932db7c7-4121-4cbe-ad8d-<wbr>09e4e99b3cdd&#39;]<br>mailbox.HSMMonitor::DEBUG::<wbr>2017-02-01 23:31:43,790::storage_mailbox:<wbr>:86::Storage.Misc.excCmd::(_<wbr>mboxExecCmd)
 SUCCESS: &lt;err&gt; = &#39;8+0 records in\n8+0 records out\n4096 bytes 
(4.1 kB) copied, 3.95001 s, 1.0 kB/s\n&#39;; &lt;rc&gt; = 0<br><br></div><div class="gmail_extra">The disk of the VM is 90Gb in virtual size, thin provisioned<br></div><br>Output of the commands inside guest until Ctrl+C (I interrupted the lop as I received the VM paused error...):<br><br>[g.cecchi@ol65 ~]$ while true; do time dd if=/dev/zero bs=1024k count=1024 of=/home/g.cecchi/testfile; sleep 5; done<br>1024+0 records in<br>1024+0 records out<br>1073741824 bytes (1.1 GB) copied, 9.27353 s, 116 MB/s<br><br>real    0m9.353s<br>user    0m0.007s<br>sys    0m3.174s<br><br><br><br>1024+0 records in<br>1024+0 records out<br>1073741824 bytes (1.1 GB) copied, 9.68965 s, 111 MB/s<br><br>real    0m9.929s<br>user    0m0.000s<br>sys    0m2.972s<br>1024+0 records in<br>1024+0 records out<br>1073741824 bytes (1.1 GB) copied, 9.16642 s, 117 MB/s<br><br>real    0m9.412s<br>user    0m0.008s<br>sys    0m2.766s<br>1024+0 records in<br>1024+0 records out<br>1073741824 bytes (1.1 GB) copied, 2.68209 s, 400 MB/s<br><br>real    0m2.931s<br>user    0m0.004s<br>sys    0m2.927s<br>1024+0 records in<br>1024+0 records out<br>1073741824 bytes (1.1 GB) copied, 6.11601 s, 176 MB/s<br><br>real    0m9.158s<br>user    0m0.002s<br>sys    0m2.922s<br>1024+0 records in<br>1024+0 records out<br>1073741824 bytes (1.1 GB) copied, 2.67673 s, 401 MB/s<br><br>real    0m2.921s<br>user    0m0.004s<br>sys    0m2.916s<br>1024+0 records in<br>1024+0 records out<br>1073741824 bytes (1.1 GB) copied, 4.10537 s, 262 MB/s<br><br>real    0m6.885s<br>user    0m0.009s<br>sys    0m2.984s<br>1024+0 records in<br>1024+0 records out<br>1073741824 bytes (1.1 GB) copied, 5.56343 s, 193 MB/s<br><br>real    0m7.866s<br>user    0m0.004s<br>sys    0m3.016s<br>^C<br><br>For the VM disk I get this info:<br><br>[root@ovmsrv06
 vdsm]# qemu-img info 
/rhev/data-center/mnt/blockSD/922b5269-ab56-4c4d-838f-49d33427e2ab/images/32b0e94d-0c8e-4d3d-a4eb-448885297e93/66615920-619f-4f52-ad5a-3f062c054094
 <br>image: 
/rhev/data-center/mnt/blockSD/922b5269-ab56-4c4d-838f-49d33427e2ab/images/32b0e94d-0c8e-4d3d-a4eb-448885297e93/66615920-619f-4f52-ad5a-3f062c054094<br>file format: qcow2<br>virtual size: 90G (96636764160 bytes)<br>disk size: 0<br>cluster_size: 65536<br>Format specific information:<br>    compat: 0.10<br>    refcount bits: 16<br>[root@ovmsrv06 vdsm]# <br><br>Apparently disk size 0?????<br><br>In <br>cat /rhev/data-center/mnt/blockSD/922b5269-ab56-4c4d-838f-49d33427e2ab/dom_md/metadata<br><br>I get<br>EOF<br>DOMAIN=922b5269-ab56-4c4d-838f-49d33427e2ab<br>CTIME=1485272737<br>FORMAT=COW<br>DISKTYPE=2<br>LEGALITY=LEGAL<br>SIZE=188743680<br>VOLTYPE=LEAF<br>DESCRIPTION={&quot;DiskAlias&quot;:&quot;ol65_Disk1&quot;,&quot;DiskDescription&quot;:&quot;&quot;}<br>IMAGE=32b0e94d-0c8e-4d3d-a4eb-448885297e93<br>PUUID=00000000-0000-0000-0000-000000000000<br>MTIME=0<br>POOL_UUID=<br>TYPE=SPARSE<br>EOF<br></div></div>