Using Microsoft NFS server as storage domain

--_000_28E0F4543693485EA391F19737F89D56acroniscom_ Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 SGVsbG8sDQoNCkZpcnN0IG9mIGFsbCBJIHdvdWxkIGxpa2UgdG8gYXNrIGlmIGFueWJvZHkgaGFz IGFuIGV4cGVyaWVuY2Ugd2l0aCB1c2luZyBNaWNyb3NvZnQgTkZTIHNlcnZlciBhcyBhIHN0b3Jh Z2UgZG9tYWluLg0KDQpUaGUgbWFpbiBpc3N1ZSB3aXRoIE1TIE5GUyBpcyBOVEZTIDopIE5URlMg ZG9lc24ndCBzdXBwb3J0IHNwYXJzZSBmaWxlcy4gVGVjaG5pY2FsbHkgaXQncyBwb3NzaWJsZSBi eSBlbmFibGluZyBOVEZTIGNvbXByZXNzaW9uIGJ1dCAgaXQgaGFzIGJhZCBwZXJmb3JtYW5jZSBv biBodWdlIGZpbGVzIHdoaWNoIGlzIG91ciBjYXNlLiBBbHNvIHRoZXJlIGlzIG5vIG9wdGlvbiBp biBvVmlydCB3ZWIgaW50ZXJmYWNlIHRvIHVzZSBDT1cgZm9ybWF0IG9uIE5GUyBzdG9yYWdlIGRv bWFpbnMuDQoNClNpbmNlIGl0IGxvb2tzIGxpa2Ugb1ZpcnQgZG9lc24ndCBzdXBwb3J0IE1TIE5G UywgSSBkZWNpZGVkIHRvIG1pZ3JhdGUgYWxsIG15IFZNcyBvdXQgb2YgTVMgTkZTIHRvIGFub3Ro ZXIgc3RvcmFnZS4gQW5kIEkgaGl0IGEgYnVnLiBMaXZlIHN0b3JhZ2UgbWlncmF0aW9uIHNpbGVu dGx5IGNvcnJ1cHRzIGRhdGEgaWYgeW91IG1pZ3JhdGUgYSBkaXNrIGZyb20gTVMgTkZTIHN0b3Jh Z2UgZG9tYWluLiBTbyBpZiB5b3Ugc2h1dGRvd24ganVzdCBtaWdyYXRlZCBWTSBhbmQgY2hlY2sg ZmlsZXN5c3RlbSB5b3UgZmluZCB0aGF0IGl0IGhhcyBhIGxvdCBvZiB1bnJlY292ZXJhYmxlIGVy cm9ycy4NCg0KVGhlcmUgYXJlIHRoZSBmb2xsb3dpbmcgc3ltcHRvbXM6DQoxLiBJdCBjb3JydXB0 cyBkYXRhIGlmIHlvdSBtaWdyYXRlIGEgZGlzayBmcm9tIE1TIE5GUyB0byBMaW51eCBORlMNCjIu IEl0IGNvcnJ1cHRzIGRhdGEgaWYgeW91IG1pZ3JhdGUgYSBkaXNrIGZyb20gTVMgTkZTIHRvIGlT Q1NJDQozLiBUaGVyZSBpcyBubyBjb3JydXB0aW9uIGlmIHlvdSBtaWdyYXRlIGZyb20gTGludXgg TkZTIHRvIGlTQ1NJIGFuZCB2aWNlIHZlcnNhLg0KNC4gVGhlcmUgaXMgbm8gY29ycnVwdGlvbiBp ZiB5b3UgbWlncmF0ZSBmcm9tIGFueXdoZXJlIHRvIE1TIE5GUy4NCjUuIERhdGEgY29ycnVwdGlv biBoYXBwZW5zIGFmdGVyICdBdXRvLWdlbmVyYXRlZCBmb3IgTGl2ZSBTdG9yYWdlIE1pZ3JhdGlv bicgc25hcHNob3QuIFNvIGlmIHlvdSByb2xsYmFjayB0aGUgc25hcHNob3QsIHlvdSBjb3VsZCBz ZWUgYWJzb2x1dGVseSBjbGVhbiBmaWxlc3lzdGVtLg0KNi4gSXQgZG9lc24ndCBkZXBlbmQgb24g U1BNLiBTbyBpdCBjb3JydXB0cyBkYXRhIGlmIFNQTSBpcyBvbiB0aGUgc2FtZSBob3N0LCBvciBh bm90aGVyLg0KNy4gVGhlcmUgYXJlIG5vIGVycm9yIG1lc3NhZ2VzIGluIHZkc20vcWVtdS9zeXN0 ZW0gbG9ncy4NCg0KWWVzLCBvZiBjb3Vyc2UgSSBjb3VsZCBtaWdyYXRlIGZyb20gTVMgTkZTIHdp dGggZG93bnRpbWUg4oCTIGl0J3Mgbm90IGFuIGlzc3VlLiBUaGUgaXNzdWUgaXMgdGhhdCBvVmly dCBkb2VzIHNpbGVudGx5IGNvcnJ1cHQgZGF0YSB1bmRlciBzb21lIGNpcmN1bXN0YW5jZXMuDQoN CkNvdWxkIHlvdSBwbGVhc2UgaGVscCBtZSB0byB1bmRlcnN0YW5kIHRoZSByZWFzb24gb2YgZGF0 YSBjb3JydXB0aW9uPw0KDQp2ZHNtLTQuMTcuMTMtMS5lbDcubm9hcmNoDQpxZW11LWltZy1ldi0y LjMuMC0zMS5lbDdfMi40LjEueDg2XzY0DQpsaWJ2aXJ0LWRhZW1vbi0xLjIuMTctMTMuZWw3XzIu Mi54ODZfNjQNCm92aXJ0LWVuZ2luZS1iYWNrZW5kLTMuNi4xLjMtMS5lbDcuY2VudG9zLm5vYXJj aA0KDQpUaGFuayB5b3UNCg0KDQo= --_000_28E0F4543693485EA391F19737F89D56acroniscom_ Content-Type: text/html; charset="utf-8" Content-ID: <33D4DFB91E2C1549AAFC1003C4B300DA@acronis.com> Content-Transfer-Encoding: base64 PGh0bWw+DQo8aGVhZD4NCjxtZXRhIGh0dHAtZXF1aXY9IkNvbnRlbnQtVHlwZSIgY29udGVudD0i dGV4dC9odG1sOyBjaGFyc2V0PXV0Zi04Ij4NCjwvaGVhZD4NCjxib2R5IHN0eWxlPSJ3b3JkLXdy YXA6IGJyZWFrLXdvcmQ7IC13ZWJraXQtbmJzcC1tb2RlOiBzcGFjZTsgLXdlYmtpdC1saW5lLWJy ZWFrOiBhZnRlci13aGl0ZS1zcGFjZTsiPg0KPGRpdiBzdHlsZT0iY29sb3I6IHJnYigwLCAwLCAw KTsgZm9udC1mYW1pbHk6IENhbGlicmksIHNhbnMtc2VyaWY7IGZvbnQtc2l6ZTogMTRweDsiPg0K SGVsbG8sPC9kaXY+DQo8ZGl2IHN0eWxlPSJjb2xvcjogcmdiKDAsIDAsIDApOyBmb250LWZhbWls eTogQ2FsaWJyaSwgc2Fucy1zZXJpZjsgZm9udC1zaXplOiAxNHB4OyI+DQo8YnI+DQo8L2Rpdj4N CjxkaXYgc3R5bGU9ImNvbG9yOiByZ2IoMCwgMCwgMCk7IGZvbnQtZmFtaWx5OiBDYWxpYnJpLCBz YW5zLXNlcmlmOyBmb250LXNpemU6IDE0cHg7Ij4NCkZpcnN0IG9mIGFsbCBJIHdvdWxkIGxpa2Ug dG8gYXNrIGlmIGFueWJvZHkgaGFzIGFuIGV4cGVyaWVuY2Ugd2l0aCB1c2luZyBNaWNyb3NvZnQg TkZTIHNlcnZlciBhcyBhIHN0b3JhZ2UgZG9tYWluLjwvZGl2Pg0KPGRpdiBzdHlsZT0iY29sb3I6 IHJnYigwLCAwLCAwKTsgZm9udC1mYW1pbHk6IENhbGlicmksIHNhbnMtc2VyaWY7IGZvbnQtc2l6 ZTogMTRweDsiPg0KPGJyPg0KPC9kaXY+DQo8ZGl2IHN0eWxlPSJjb2xvcjogcmdiKDAsIDAsIDAp OyBmb250LWZhbWlseTogQ2FsaWJyaSwgc2Fucy1zZXJpZjsgZm9udC1zaXplOiAxNHB4OyI+DQpU aGUgbWFpbiBpc3N1ZSB3aXRoIE1TIE5GUyBpcyBOVEZTIDopIE5URlMgZG9lc24ndCBzdXBwb3J0 IHNwYXJzZSBmaWxlcy4gVGVjaG5pY2FsbHkgaXQncyBwb3NzaWJsZSBieSBlbmFibGluZyBOVEZT IGNvbXByZXNzaW9uIGJ1dCAmbmJzcDtpdCBoYXMgYmFkIHBlcmZvcm1hbmNlIG9uIGh1Z2UgZmls ZXMgd2hpY2ggaXMgb3VyIGNhc2UuIEFsc28gdGhlcmUgaXMgbm8gb3B0aW9uIGluIG9WaXJ0IHdl YiBpbnRlcmZhY2UgdG8gdXNlIENPVyBmb3JtYXQgb24NCiBORlMgc3RvcmFnZSBkb21haW5zLjwv ZGl2Pg0KPGRpdiBzdHlsZT0iY29sb3I6IHJnYigwLCAwLCAwKTsgZm9udC1mYW1pbHk6IENhbGli cmksIHNhbnMtc2VyaWY7IGZvbnQtc2l6ZTogMTRweDsiPg0KPGJyPg0KPC9kaXY+DQo8ZGl2IHN0 eWxlPSJjb2xvcjogcmdiKDAsIDAsIDApOyBmb250LWZhbWlseTogQ2FsaWJyaSwgc2Fucy1zZXJp ZjsgZm9udC1zaXplOiAxNHB4OyI+DQpTaW5jZSBpdCBsb29rcyBsaWtlIG9WaXJ0IGRvZXNuJ3Qg c3VwcG9ydCBNUyBORlMsIEkgZGVjaWRlZCB0byBtaWdyYXRlIGFsbCBteSBWTXMgb3V0IG9mIE1T IE5GUyB0byBhbm90aGVyIHN0b3JhZ2UuIEFuZCBJIGhpdCBhIGJ1Zy4gTGl2ZSBzdG9yYWdlIG1p Z3JhdGlvbg0KPGI+c2lsZW50bHk8L2I+Jm5ic3A7PGI+Y29ycnVwdHM8L2I+IDxiPmRhdGE8L2I+ IGlmIHlvdSBtaWdyYXRlIGEgZGlzayBmcm9tIE1TIE5GUyBzdG9yYWdlIGRvbWFpbi4gU28gaWYg eW91IHNodXRkb3duIGp1c3QgbWlncmF0ZWQgVk0gYW5kIGNoZWNrIGZpbGVzeXN0ZW0geW91IGZp bmQgdGhhdCBpdCBoYXMgYSBsb3Qgb2YgdW5yZWNvdmVyYWJsZSBlcnJvcnMuJm5ic3A7PC9kaXY+ DQo8ZGl2IHN0eWxlPSJjb2xvcjogcmdiKDAsIDAsIDApOyBmb250LWZhbWlseTogQ2FsaWJyaSwg c2Fucy1zZXJpZjsgZm9udC1zaXplOiAxNHB4OyI+DQo8YnI+DQo8L2Rpdj4NCjxkaXYgc3R5bGU9 ImNvbG9yOiByZ2IoMCwgMCwgMCk7IGZvbnQtZmFtaWx5OiBDYWxpYnJpLCBzYW5zLXNlcmlmOyBm b250LXNpemU6IDE0cHg7Ij4NClRoZXJlIGFyZSB0aGUgZm9sbG93aW5nIHN5bXB0b21zOjwvZGl2 Pg0KPGRpdiBzdHlsZT0iY29sb3I6IHJnYigwLCAwLCAwKTsgZm9udC1mYW1pbHk6IENhbGlicmks IHNhbnMtc2VyaWY7IGZvbnQtc2l6ZTogMTRweDsiPg0KMS4gSXQgY29ycnVwdHMgZGF0YSBpZiB5 b3UgbWlncmF0ZSBhIGRpc2sgZnJvbSBNUyBORlMgdG8gTGludXggTkZTPC9kaXY+DQo8ZGl2IHN0 eWxlPSJjb2xvcjogcmdiKDAsIDAsIDApOyBmb250LWZhbWlseTogQ2FsaWJyaSwgc2Fucy1zZXJp ZjsgZm9udC1zaXplOiAxNHB4OyI+DQoyLiBJdCBjb3JydXB0cyBkYXRhIGlmIHlvdSBtaWdyYXRl IGEgZGlzayBmcm9tIE1TIE5GUyB0byBpU0NTSTwvZGl2Pg0KPGRpdiBzdHlsZT0iY29sb3I6IHJn YigwLCAwLCAwKTsgZm9udC1mYW1pbHk6IENhbGlicmksIHNhbnMtc2VyaWY7IGZvbnQtc2l6ZTog MTRweDsiPg0KMy4gVGhlcmUgaXMgbm8gY29ycnVwdGlvbiBpZiB5b3UgbWlncmF0ZSBmcm9tIExp bnV4IE5GUyB0byBpU0NTSSBhbmQgdmljZSB2ZXJzYS48L2Rpdj4NCjxkaXYgc3R5bGU9ImNvbG9y OiByZ2IoMCwgMCwgMCk7IGZvbnQtZmFtaWx5OiBDYWxpYnJpLCBzYW5zLXNlcmlmOyBmb250LXNp emU6IDE0cHg7Ij4NCjQuIFRoZXJlIGlzIG5vIGNvcnJ1cHRpb24gaWYgeW91IG1pZ3JhdGUgZnJv bSBhbnl3aGVyZSB0byBNUyBORlMuPC9kaXY+DQo8ZGl2IHN0eWxlPSJjb2xvcjogcmdiKDAsIDAs IDApOyBmb250LWZhbWlseTogQ2FsaWJyaSwgc2Fucy1zZXJpZjsgZm9udC1zaXplOiAxNHB4OyI+ DQo1LiBEYXRhIGNvcnJ1cHRpb24gaGFwcGVucyBhZnRlciAnQXV0by1nZW5lcmF0ZWQgZm9yIExp dmUgU3RvcmFnZSBNaWdyYXRpb24nIHNuYXBzaG90LiBTbyBpZiB5b3Ugcm9sbGJhY2sgdGhlIHNu YXBzaG90LCB5b3UgY291bGQgc2VlIGFic29sdXRlbHkgY2xlYW4gZmlsZXN5c3RlbS48L2Rpdj4N CjxkaXYgc3R5bGU9ImNvbG9yOiByZ2IoMCwgMCwgMCk7IGZvbnQtZmFtaWx5OiBDYWxpYnJpLCBz YW5zLXNlcmlmOyBmb250LXNpemU6IDE0cHg7Ij4NCjYuIEl0IGRvZXNuJ3QgZGVwZW5kIG9uIFNQ TS4gU28gaXQgY29ycnVwdHMgZGF0YSBpZiBTUE0gaXMgb24gdGhlIHNhbWUgaG9zdCwgb3IgYW5v dGhlci48L2Rpdj4NCjxkaXYgc3R5bGU9ImNvbG9yOiByZ2IoMCwgMCwgMCk7IGZvbnQtZmFtaWx5 OiBDYWxpYnJpLCBzYW5zLXNlcmlmOyBmb250LXNpemU6IDE0cHg7Ij4NCjcuIFRoZXJlIGFyZSBu byBlcnJvciBtZXNzYWdlcyBpbiB2ZHNtL3FlbXUvc3lzdGVtIGxvZ3MuPC9kaXY+DQo8ZGl2IHN0 eWxlPSJjb2xvcjogcmdiKDAsIDAsIDApOyBmb250LWZhbWlseTogQ2FsaWJyaSwgc2Fucy1zZXJp ZjsgZm9udC1zaXplOiAxNHB4OyI+DQo8YnI+DQo8L2Rpdj4NCjxkaXYgc3R5bGU9ImNvbG9yOiBy Z2IoMCwgMCwgMCk7IGZvbnQtZmFtaWx5OiBDYWxpYnJpLCBzYW5zLXNlcmlmOyBmb250LXNpemU6 IDE0cHg7Ij4NClllcywgb2YgY291cnNlIEkgY291bGQgbWlncmF0ZSBmcm9tIE1TIE5GUyB3aXRo IGRvd250aW1lIOKAkyBpdCdzIG5vdCBhbiBpc3N1ZS4gVGhlIGlzc3VlIGlzIHRoYXQgb1ZpcnQg ZG9lcyZuYnNwO3NpbGVudGx5IGNvcnJ1cHQgZGF0YSB1bmRlciBzb21lIGNpcmN1bXN0YW5jZXMu PC9kaXY+DQo8ZGl2IHN0eWxlPSJjb2xvcjogcmdiKDAsIDAsIDApOyBmb250LWZhbWlseTogQ2Fs aWJyaSwgc2Fucy1zZXJpZjsgZm9udC1zaXplOiAxNHB4OyI+DQo8YnI+DQo8L2Rpdj4NCjxkaXYg c3R5bGU9ImNvbG9yOiByZ2IoMCwgMCwgMCk7IGZvbnQtZmFtaWx5OiBDYWxpYnJpLCBzYW5zLXNl cmlmOyBmb250LXNpemU6IDE0cHg7Ij4NCkNvdWxkIHlvdSBwbGVhc2UgaGVscCBtZSB0byB1bmRl cnN0YW5kIHRoZSByZWFzb24gb2YgZGF0YSBjb3JydXB0aW9uPzwvZGl2Pg0KPGRpdiBzdHlsZT0i Y29sb3I6IHJnYigwLCAwLCAwKTsgZm9udC1mYW1pbHk6IENhbGlicmksIHNhbnMtc2VyaWY7IGZv bnQtc2l6ZTogMTRweDsiPg0KPGJyPg0KPC9kaXY+DQo8ZGl2Pjxmb250IGZhY2U9IkNhbGlicmks c2Fucy1zZXJpZiI+dmRzbS00LjE3LjEzLTEuZWw3Lm5vYXJjaDwvZm9udD48L2Rpdj4NCjxkaXY+ PGZvbnQgZmFjZT0iQ2FsaWJyaSxzYW5zLXNlcmlmIj5xZW11LWltZy1ldi0yLjMuMC0zMS5lbDdf Mi40LjEueDg2XzY0PC9mb250PjwvZGl2Pg0KPGRpdj48Zm9udCBmYWNlPSJDYWxpYnJpLHNhbnMt c2VyaWYiPmxpYnZpcnQtZGFlbW9uLTEuMi4xNy0xMy5lbDdfMi4yLng4Nl82NDwvZm9udD48L2Rp dj4NCjxkaXY+PGZvbnQgZmFjZT0iQ2FsaWJyaSxzYW5zLXNlcmlmIj5vdmlydC1lbmdpbmUtYmFj a2VuZC0zLjYuMS4zLTEuZWw3LmNlbnRvcy5ub2FyY2g8L2ZvbnQ+PC9kaXY+DQo8ZGl2Pjxmb250 IGZhY2U9IkNhbGlicmksc2Fucy1zZXJpZiI+PGJyPg0KPC9mb250PjwvZGl2Pg0KPGRpdiBzdHls ZT0iY29sb3I6IHJnYigwLCAwLCAwKTsgZm9udC1mYW1pbHk6IENhbGlicmksIHNhbnMtc2VyaWY7 IGZvbnQtc2l6ZTogMTRweDsiPg0KVGhhbmsgeW91PC9kaXY+DQo8ZGl2IHN0eWxlPSJjb2xvcjog cmdiKDAsIDAsIDApOyBmb250LWZhbWlseTogQ2FsaWJyaSwgc2Fucy1zZXJpZjsgZm9udC1zaXpl OiAxNHB4OyI+DQo8YnI+DQo8L2Rpdj4NCjxkaXYgc3R5bGU9ImNvbG9yOiByZ2IoMCwgMCwgMCk7 IGZvbnQtZmFtaWx5OiBDYWxpYnJpLCBzYW5zLXNlcmlmOyBmb250LXNpemU6IDE0cHg7Ij4NCjxi cj4NCjwvZGl2Pg0KPGRpdiBzdHlsZT0iY29sb3I6IHJnYigwLCAwLCAwKTsgZm9udC1mYW1pbHk6 IENhbGlicmksIHNhbnMtc2VyaWY7IGZvbnQtc2l6ZTogMTRweDsiPg0KPGRpdiBpZD0iTUFDX09V VExPT0tfU0lHTkFUVVJFIj48L2Rpdj4NCjwvZGl2Pg0KPC9ib2R5Pg0KPC9odG1sPg0K --_000_28E0F4543693485EA391F19737F89D56acroniscom_--

inline On Thu, Jan 21, 2016 at 7:54 AM, Pavel Gashev <Pax@acronis.com> wrote:
Hello,
First of all I would like to ask if anybody has an experience with using Microsoft NFS server as a storage domain.
I have used one as an ISO domain for years. It wasn't great, but it was good enough. Never a data domain though
The main issue with MS NFS is NTFS :) NTFS doesn't support sparse files. Technically it's possible by enabling NTFS compression but it has bad performance on huge files which is our case. Also there is no option in oVirt web interface to use COW format on NFS storage domains.
Since it looks like oVirt doesn't support MS NFS, I decided to migrate all my VMs out of MS NFS to another storage. And I hit a bug. Live storage migration *silently* *corrupts* *data* if you migrate a disk from MS NFS storage domain. So if you shutdown just migrated VM and check filesystem you find that it has a lot of unrecoverable errors.
There are the following symptoms: 1. It corrupts data if you migrate a disk from MS NFS to Linux NFS 2. It corrupts data if you migrate a disk from MS NFS to iSCSI 3. There is no corruption if you migrate from Linux NFS to iSCSI and vice versa. 4. There is no corruption if you migrate from anywhere to MS NFS. 5. Data corruption happens after 'Auto-generated for Live Storage Migration' snapshot. So if you rollback the snapshot, you could see absolutely clean filesystem. 6. It doesn't depend on SPM. So it corrupts data if SPM is on the same host, or another. 7. There are no error messages in vdsm/qemu/system logs.
Yes, of course I could migrate from MS NFS with downtime – it's not an issue. The issue is that oVirt does silently corrupt data under some circumstances.
Could you please help me to understand the reason of data corruption?
vdsm-4.17.13-1.el7.noarch qemu-img-ev-2.3.0-31.el7_2.4.1.x86_64 libvirt-daemon-1.2.17-13.el7_2.2.x86_64 ovirt-engine-backend-3.6.1.3-1.el7.centos.noarch
Thank you
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

On Thu, Jan 21, 2016 at 2:54 PM, Pavel Gashev <Pax@acronis.com> wrote:
Hello,
First of all I would like to ask if anybody has an experience with using Microsoft NFS server as a storage domain.
The main issue with MS NFS is NTFS :) NTFS doesn't support sparse files. Technically it's possible by enabling NTFS compression but it has bad performance on huge files which is our case. Also there is no option in oVirt web interface to use COW format on NFS storage domains.
You can 1. create a small disk (1G) 2. create a snapshot 3. extend the disk go the final size And you have nfs with cow format. The performance difference with one snapshot should be small.
Since it looks like oVirt doesn't support MS NFS, I decided to migrate all my VMs out of MS NFS to another storage. And I hit a bug. Live storage migration silently corrupts data if you migrate a disk from MS NFS storage domain. So if you shutdown just migrated VM and check filesystem you find that it has a lot of unrecoverable errors.
There are the following symptoms: 1. It corrupts data if you migrate a disk from MS NFS to Linux NFS 2. It corrupts data if you migrate a disk from MS NFS to iSCSI 3. There is no corruption if you migrate from Linux NFS to iSCSI and vice versa. 4. There is no corruption if you migrate from anywhere to MS NFS. 5. Data corruption happens after 'Auto-generated for Live Storage Migration' snapshot. So if you rollback the snapshot, you could see absolutely clean filesystem.
Can you try to create a live-snapshot on MS NFS? It seems that this is the issue, not live storage migration. Do you have qemu-guest-agent on the vm? Without qemu-guest-agent, file systems on the guest will no be freezed during the snapshot, which may cause inconsistent snapshot. Can you reproduce this with virt-manager, or by creating a vm and taking a snapshot using virsh?
6. It doesn't depend on SPM. So it corrupts data if SPM is on the same host, or another. 7. There are no error messages in vdsm/qemu/system logs.
Yes, of course I could migrate from MS NFS with downtime – it's not an issue. The issue is that oVirt does silently corrupt data under some circumstances.
Could you please help me to understand the reason of data corruption?
Please file a bug and attach: - /var/log/vdsm/vdsm.log - /var/log/messages - /var/log/sanlock.log - output of nfsstat during the test, maybe run it every minute?
vdsm-4.17.13-1.el7.noarch qemu-img-ev-2.3.0-31.el7_2.4.1.x86_64 libvirt-daemon-1.2.17-13.el7_2.2.x86_64 ovirt-engine-backend-3.6.1.3-1.el7.centos.noarch
Thank you
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users

--_000_1453407210332141camelacroniscom_ Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 T24gVGh1LCAyMDE2LTAxLTIxIGF0IDE4OjQyICswMDAwLCBOaXIgU29mZmVyIHdyb3RlOg0KDQpP biBUaHUsIEphbiAyMSwgMjAxNiBhdCAyOjU0IFBNLCBQYXZlbCBHYXNoZXYgPFBheEBhY3Jvbmlz LmNvbTxtYWlsdG86UGF4QGFjcm9uaXMuY29tPj4gd3JvdGU6DQoNCkFsc28gdGhlcmUgaXMgbm8g b3B0aW9uIGluDQpvVmlydCB3ZWIgaW50ZXJmYWNlIHRvIHVzZSBDT1cgZm9ybWF0IG9uIE5GUyBz dG9yYWdlIGRvbWFpbnMuDQoNCg0KDQpZb3UgY2FuDQoxLiBjcmVhdGUgYSBzbWFsbCBkaXNrICgx RykNCjIuIGNyZWF0ZSBhIHNuYXBzaG90DQozLiBleHRlbmQgdGhlIGRpc2sgZ28gdGhlIGZpbmFs IHNpemUNCg0KQW5kIHlvdSBoYXZlIG5mcyB3aXRoIGNvdyBmb3JtYXQuIFRoZSBwZXJmb3JtYW5j ZSBkaWZmZXJlbmNlIHdpdGggb25lIHNuYXBzaG90DQpzaG91bGQgYmUgc21hbGwuDQoNCg0KWWVz LiBBbmQgdGhlcmUgYXJlIG90aGVyIHdvcmthcm91bmRzOg0KMS4gVXNlIHNvbWUgYmxvY2sgKGku ZS4gaVNDU0kpIHN0b3JhZ2UgZm9yIGNyZWF0aW5nIGEgdGhpbiBwcm92aXNpb25lZCBkaXNrICh3 aGljaCBpcyBDT1cpIGFuZCB0aGVuIG1vdmUgaXQgdG8gcmVxdWlyZWQgc3RvcmFnZS4NCjIuIEtl ZXAgYW4gZW1wdHkgMUcgQ09XIGRpc2sgYW5kIGNvcHkrcmVzaXplIGl0IHdoZW4gcmVxdWlyZWQu DQozLiBVc2Ugb3ZpcnQtc2hlbGwgZm9yIGNyZWF0aW5nIGRpc2tzLg0KDQpVbmZvcnR1bmF0ZWx5 LCB0aGVzZSBhcmUgbm90IG5hdGl2ZSB3YXlzLiBUaGVzZSBhcmUgd2F5cyBmb3IgYSBoYWNrZXIu IFBsYWluIHVzZXIgY2xpY2tzICJOZXciIGluICJEaXNrcyIgdGFiIGFuZCBzZWxlY3RzICJUaGlu IFByb3Zpc2lvbiIgYWxsb2NhdGlvbiBwb2xpY3kuIEl0J3MgaGFyZCB0byBleHBsYWluIHRvIHVz ZXJzIHRoYXQgdGhlIHNpbXBsZXN0IGFuZCBvYnZpb3VzIHdheSBpcyB3cm9uZy4gSSBob3BlIGl0 J3Mgd3Jvbmcgb25seSBmb3IgTVMgTkZTLg0KDQoNCjUuIERhdGEgY29ycnVwdGlvbiBoYXBwZW5z IGFmdGVyICdBdXRvLWdlbmVyYXRlZCBmb3IgTGl2ZSBTdG9yYWdlIE1pZ3JhdGlvbicNCnNuYXBz aG90LiBTbyBpZiB5b3Ugcm9sbGJhY2sgdGhlIHNuYXBzaG90LCB5b3UgY291bGQgc2VlIGFic29s dXRlbHkgY2xlYW4NCmZpbGVzeXN0ZW0uDQoNCg0KDQpDYW4geW91IHRyeSB0byBjcmVhdGUgYSBs aXZlLXNuYXBzaG90IG9uIE1TIE5GUz8gSXQgc2VlbXMgdGhhdCB0aGlzIGlzIHRoZQ0KaXNzdWUs IG5vdCBsaXZlIHN0b3JhZ2UgbWlncmF0aW9uLg0KDQoNCkxpdmUgc25hcHNob3RzIHdvcmsgdmVy eSB3ZWxsIG9uIE1TIE5GUy4gQ3JlYXRpbmcgYW5kIGRlbGV0aW5nIHdvcmtzIGxpdmUgd2l0aG91 dCBhbnkgaXNzdWVzLiBJIGRpZCBpdCBtYW55IHRpbWVzLiBQbGVhc2Ugbm90ZSB0aGF0IGV2ZXJ5 dGhpbmcgYmVmb3JlIHRoZSBzbmFwc2hvdCByZW1haW5zIGNvbnNpc3RlbnQuIERhdGEgY29ycnVw dGlvbiBvY2N1cnMgYWZ0ZXIgdGhlIHNuYXBzaG90LiBTbyBvbmx5IG5vbi1zbmFwc2hvdHRlZCBk YXRhIGlzIGNvcnJ1cHRlZC4NCg0KDQoNCkRvIHlvdSBoYXZlIHFlbXUtZ3Vlc3QtYWdlbnQgb24g dGhlIHZtPyBXaXRob3V0IHFlbXUtZ3Vlc3QtYWdlbnQsIGZpbGUNCnN5c3RlbXMgb24gdGhlIGd1 ZXN0IHdpbGwgbm8gYmUgZnJlZXplZCBkdXJpbmcgdGhlIHNuYXBzaG90LCB3aGljaCBtYXkgY2F1 c2UNCmluY29uc2lzdGVudCBzbmFwc2hvdC4NCg0KDQpJIHRyaWVkIGl0IHdpdGggYW5kIHdpdGhv dXQgcWVtdS1ndWVzdC1hZ2VudC4gSXQgZG9lc24ndCBkZXBlbmQuDQoNCg0KDQpDYW4geW91IHJl cHJvZHVjZSB0aGlzIHdpdGggdmlydC1tYW5hZ2VyLCBvciBieSBjcmVhdGluZyBhIHZtIGFuZCB0 YWtpbmcNCmEgc25hcHNob3QgdXNpbmcgdmlyc2g/DQoNCg0KU29ycnksIEknbSBub3Qgc3VyZSBo b3cgSSBjYW4gcmVwcm9kdWNlIHRoZSBpc3N1ZSB1c2luZyB2aXJzaC4NCg0KDQoNCg0KUGxlYXNl IGZpbGUgYSBidWcgYW5kIGF0dGFjaDoNCg0KLSAvdmFyL2xvZy92ZHNtL3Zkc20ubG9nDQotIC92 YXIvbG9nL21lc3NhZ2VzDQotIC92YXIvbG9nL3NhbmxvY2subG9nDQotIG91dHB1dCBvZiAgbmZz c3RhdCBkdXJpbmcgdGhlIHRlc3QsIG1heWJlIHJ1biBpdCBldmVyeSBtaW51dGU/DQoNCk9rLCBJ IHdpbGwgY29sbGVjdCB0aGUgbG9ncyBhbmQgZmlsbCBhIGJ1Zy4NCg0KVGhhbmtzDQoNCg0K --_000_1453407210332141camelacroniscom_ Content-Type: text/html; charset="utf-8" Content-ID: <D90DD11001F94349B433588E218DEC19@acronis.com> Content-Transfer-Encoding: base64 PGh0bWw+DQo8aGVhZD4NCjxtZXRhIGh0dHAtZXF1aXY9IkNvbnRlbnQtVHlwZSIgY29udGVudD0i dGV4dC9odG1sOyBjaGFyc2V0PXV0Zi04Ij4NCjwvaGVhZD4NCjxib2R5Pg0KPGRpdj5PbiBUaHUs IDIwMTYtMDEtMjEgYXQgMTg6NDIgJiM0MzswMDAwLCBOaXIgU29mZmVyIHdyb3RlOjwvZGl2Pg0K PGJsb2NrcXVvdGUgdHlwZT0iY2l0ZSI+DQo8cHJlPk9uIFRodSwgSmFuIDIxLCAyMDE2IGF0IDI6 NTQgUE0sIFBhdmVsIEdhc2hldiAmbHQ7PGEgaHJlZj0ibWFpbHRvOlBheEBhY3JvbmlzLmNvbSI+ UGF4QGFjcm9uaXMuY29tPC9hPiZndDsgd3JvdGU6DQo8YmxvY2txdW90ZSB0eXBlPSJjaXRlIj5B bHNvIHRoZXJlIGlzIG5vIG9wdGlvbiBpbg0Kb1ZpcnQgd2ViIGludGVyZmFjZSB0byB1c2UgQ09X IGZvcm1hdCBvbiBORlMgc3RvcmFnZSBkb21haW5zLg0KPC9ibG9ja3F1b3RlPg0KDQpZb3UgY2Fu DQoxLiBjcmVhdGUgYSBzbWFsbCBkaXNrICgxRykNCjIuIGNyZWF0ZSBhIHNuYXBzaG90DQozLiBl eHRlbmQgdGhlIGRpc2sgZ28gdGhlIGZpbmFsIHNpemUNCg0KQW5kIHlvdSBoYXZlIG5mcyB3aXRo IGNvdyBmb3JtYXQuIFRoZSBwZXJmb3JtYW5jZSBkaWZmZXJlbmNlIHdpdGggb25lIHNuYXBzaG90 DQpzaG91bGQgYmUgc21hbGwuDQo8L3ByZT4NCjwvYmxvY2txdW90ZT4NCjxkaXY+PGJyPg0KPC9k aXY+DQo8ZGl2Pg0KPGRpdiBzdHlsZT0iZm9udC1mYW1pbHk6IG1vbm9zcGFjZTsiPlllcy4gQW5k IHRoZXJlIGFyZSBvdGhlciB3b3JrYXJvdW5kczo8L2Rpdj4NCjxkaXYgc3R5bGU9ImZvbnQtZmFt aWx5OiBtb25vc3BhY2U7Ij4xLiBVc2Ugc29tZSBibG9jayAoaS5lLiBpU0NTSSkgc3RvcmFnZSBm b3IgY3JlYXRpbmcgYSB0aGluIHByb3Zpc2lvbmVkIGRpc2sgKHdoaWNoIGlzIENPVykgYW5kIHRo ZW4gbW92ZSBpdCB0byByZXF1aXJlZCBzdG9yYWdlLjwvZGl2Pg0KPGRpdiBzdHlsZT0iZm9udC1m YW1pbHk6IG1vbm9zcGFjZTsiPjIuIEtlZXAgYW4gZW1wdHkgMUcgQ09XIGRpc2sgYW5kIGNvcHkm IzQzO3Jlc2l6ZSBpdCB3aGVuIHJlcXVpcmVkLjwvZGl2Pg0KPGRpdiBzdHlsZT0iZm9udC1mYW1p bHk6IG1vbm9zcGFjZTsiPjMuIFVzZSBvdmlydC1zaGVsbCBmb3IgY3JlYXRpbmcgZGlza3MuPC9k aXY+DQo8ZGl2IHN0eWxlPSJmb250LWZhbWlseTogbW9ub3NwYWNlOyI+PGJyPg0KPC9kaXY+DQo8 ZGl2IHN0eWxlPSJmb250LWZhbWlseTogbW9ub3NwYWNlOyI+VW5mb3J0dW5hdGVseSwgdGhlc2Ug YXJlIG5vdCBuYXRpdmUgd2F5cy4gVGhlc2UgYXJlIHdheXMgZm9yIGEgaGFja2VyLiBQbGFpbiB1 c2VyIGNsaWNrcyAmcXVvdDtOZXcmcXVvdDsgaW4gJnF1b3Q7RGlza3MmcXVvdDsgdGFiIGFuZCBz ZWxlY3RzICZxdW90O1RoaW4gUHJvdmlzaW9uJnF1b3Q7IGFsbG9jYXRpb24gcG9saWN5LiBJdCdz IGhhcmQgdG8gZXhwbGFpbiB0byB1c2VycyB0aGF0IHRoZSBzaW1wbGVzdCBhbmQgb2J2aW91cyB3 YXkNCiBpcyB3cm9uZy4gSSBob3BlIGl0J3Mgd3Jvbmcgb25seSBmb3IgTVMgTkZTLjwvZGl2Pg0K PC9kaXY+DQo8ZGl2Pjxicj4NCjwvZGl2Pg0KPGJsb2NrcXVvdGUgdHlwZT0iY2l0ZSI+DQo8cHJl PjxibG9ja3F1b3RlIHR5cGU9ImNpdGUiPjUuIERhdGEgY29ycnVwdGlvbiBoYXBwZW5zIGFmdGVy ICdBdXRvLWdlbmVyYXRlZCBmb3IgTGl2ZSBTdG9yYWdlIE1pZ3JhdGlvbicNCnNuYXBzaG90LiBT byBpZiB5b3Ugcm9sbGJhY2sgdGhlIHNuYXBzaG90LCB5b3UgY291bGQgc2VlIGFic29sdXRlbHkg Y2xlYW4NCmZpbGVzeXN0ZW0uDQo8L2Jsb2NrcXVvdGU+DQoNCkNhbiB5b3UgdHJ5IHRvIGNyZWF0 ZSBhIGxpdmUtc25hcHNob3Qgb24gTVMgTkZTPyBJdCBzZWVtcyB0aGF0IHRoaXMgaXMgdGhlDQpp c3N1ZSwgbm90IGxpdmUgc3RvcmFnZSBtaWdyYXRpb24uDQo8L3ByZT4NCjwvYmxvY2txdW90ZT4N CjxkaXY+PGJyPg0KPC9kaXY+DQo8ZGl2PkxpdmUgc25hcHNob3RzIHdvcmsgdmVyeSB3ZWxsIG9u IE1TIE5GUy4gQ3JlYXRpbmcgYW5kIGRlbGV0aW5nIHdvcmtzIGxpdmUgd2l0aG91dCBhbnkgaXNz dWVzLiBJIGRpZCBpdCBtYW55IHRpbWVzLiBQbGVhc2Ugbm90ZSB0aGF0IGV2ZXJ5dGhpbmcgYmVm b3JlIHRoZSBzbmFwc2hvdCByZW1haW5zIGNvbnNpc3RlbnQuIERhdGEgY29ycnVwdGlvbiBvY2N1 cnMgYWZ0ZXIgdGhlIHNuYXBzaG90LiBTbyBvbmx5IG5vbi1zbmFwc2hvdHRlZCBkYXRhDQogaXMg Y29ycnVwdGVkLjwvZGl2Pg0KPGRpdj48YnI+DQo8L2Rpdj4NCjxibG9ja3F1b3RlIHR5cGU9ImNp dGUiPg0KPHByZT4NCkRvIHlvdSBoYXZlIHFlbXUtZ3Vlc3QtYWdlbnQgb24gdGhlIHZtPyBXaXRo b3V0IHFlbXUtZ3Vlc3QtYWdlbnQsIGZpbGUNCnN5c3RlbXMgb24gdGhlIGd1ZXN0IHdpbGwgbm8g YmUgZnJlZXplZCBkdXJpbmcgdGhlIHNuYXBzaG90LCB3aGljaCBtYXkgY2F1c2UNCmluY29uc2lz dGVudCBzbmFwc2hvdC4NCjwvcHJlPg0KPC9ibG9ja3F1b3RlPg0KPGRpdj48YnI+DQo8L2Rpdj4N CjxkaXY+SSB0cmllZCBpdCB3aXRoIGFuZCB3aXRob3V0IHFlbXUtZ3Vlc3QtYWdlbnQuIEl0IGRv ZXNuJ3QgZGVwZW5kLiZuYnNwOzwvZGl2Pg0KPGRpdj48YnI+DQo8L2Rpdj4NCjxibG9ja3F1b3Rl IHR5cGU9ImNpdGUiPg0KPHByZT4NCkNhbiB5b3UgcmVwcm9kdWNlIHRoaXMgd2l0aCB2aXJ0LW1h bmFnZXIsIG9yIGJ5IGNyZWF0aW5nIGEgdm0gYW5kIHRha2luZw0KYSBzbmFwc2hvdCB1c2luZyB2 aXJzaD8NCjwvcHJlPg0KPC9ibG9ja3F1b3RlPg0KPGRpdj48YnI+DQo8L2Rpdj4NCjxkaXY+U29y cnksIEknbSBub3Qgc3VyZSBob3cgSSBjYW4gcmVwcm9kdWNlIHRoZSBpc3N1ZSB1c2luZyB2aXJz aC48L2Rpdj4NCjxkaXY+PGJyPg0KPC9kaXY+DQo8YmxvY2txdW90ZSB0eXBlPSJjaXRlIj4NCjxw cmU+DQoNClBsZWFzZSBmaWxlIGEgYnVnIGFuZCBhdHRhY2g6DQoNCi0gL3Zhci9sb2cvdmRzbS92 ZHNtLmxvZw0KLSAvdmFyL2xvZy9tZXNzYWdlcw0KLSAvdmFyL2xvZy9zYW5sb2NrLmxvZw0KLSBv dXRwdXQgb2YgIG5mc3N0YXQgZHVyaW5nIHRoZSB0ZXN0LCBtYXliZSBydW4gaXQgZXZlcnkgbWlu dXRlPzwvcHJlPg0KPC9ibG9ja3F1b3RlPg0KPGRpdj48YnI+DQo8L2Rpdj4NCjxkaXY+T2ssIEkg d2lsbCBjb2xsZWN0IHRoZSBsb2dzIGFuZCBmaWxsIGEgYnVnLjwvZGl2Pg0KPGRpdj48YnI+DQo8 L2Rpdj4NCjxkaXY+VGhhbmtzPC9kaXY+DQo8ZGl2Pjxicj4NCjwvZGl2Pg0KPGRpdj48YnI+DQo8 L2Rpdj4NCjwvYm9keT4NCjwvaHRtbD4NCg== --_000_1453407210332141camelacroniscom_--

On Thu, Jan 21, 2016 at 10:13 PM, Pavel Gashev <Pax@acronis.com> wrote:
On Thu, 2016-01-21 at 18:42 +0000, Nir Soffer wrote:
On Thu, Jan 21, 2016 at 2:54 PM, Pavel Gashev <Pax@acronis.com> wrote:
Also there is no option in oVirt web interface to use COW format on NFS storage domains.
You can 1. create a small disk (1G) 2. create a snapshot 3. extend the disk go the final size
And you have nfs with cow format. The performance difference with one snapshot should be small.
Yes. And there are other workarounds: 1. Use some block (i.e. iSCSI) storage for creating a thin provisioned disk (which is COW) and then move it to required storage. 2. Keep an empty 1G COW disk and copy+resize it when required. 3. Use ovirt-shell for creating disks.
Unfortunately, these are not native ways. These are ways for a hacker. Plain user clicks "New" in "Disks" tab and selects "Thin Provision" allocation policy. It's hard to explain to users that the simplest and obvious way is wrong. I hope it's wrong only for MS NFS.
Sure I agree. I think we do not use qcow format on file storage since there is no need for this, the file system is always sparse. I guess we did not plan to use MS NFS. I would open bug for supporting qcow format on file storage. If this works for some users, I think this is an option that should be possible in the ui. Hopefully there are no too many assumptions in the code about this. Allon, do you see any reason not to support this for user that need this option?
5. Data corruption happens after 'Auto-generated for Live Storage Migration' snapshot. So if you rollback the snapshot, you could see absolutely clean filesystem.
Can you try to create a live-snapshot on MS NFS? It seems that this is the issue, not live storage migration.
Live snapshots work very well on MS NFS. Creating and deleting works live without any issues. I did it many times. Please note that everything before the snapshot remains consistent. Data corruption occurs after the snapshot. So only non-snapshotted data is corrupted.
live migration starts by creating a snapshot, then copying the disks to the new storage, and then mirroring the active layer so both the old and the new disks are the same. Finally we switch to the new disk, and delete the old disk. So probably the issue is in the mirroring step. This is most likely a qemu issue.
Do you have qemu-guest-agent on the vm? Without qemu-guest-agent, file systems on the guest will no be freezed during the snapshot, which may cause inconsistent snapshot.
I tried it with and without qemu-guest-agent. It doesn't depend.
Can you reproduce this with virt-manager, or by creating a vm and taking a snapshot using virsh?
Sorry, I'm not sure how I can reproduce the issue using virsh.
I'll try to get instructions for this from libvirt developers. If this happen with libvirt alone, this is a libvirt or qemu bug, and there is little we (ovirt) can do about it.
Please file a bug and attach:
- /var/log/vdsm/vdsm.log - /var/log/messages - /var/log/sanlock.log - output of nfsstat during the test, maybe run it every minute?
Ok, I will collect the logs and fill a bug.
Thanks

Adding Allon On Thu, Jan 21, 2016 at 10:55 PM, Nir Soffer <nsoffer@redhat.com> wrote:
On Thu, Jan 21, 2016 at 10:13 PM, Pavel Gashev <Pax@acronis.com> wrote:
On Thu, 2016-01-21 at 18:42 +0000, Nir Soffer wrote:
On Thu, Jan 21, 2016 at 2:54 PM, Pavel Gashev <Pax@acronis.com> wrote:
Also there is no option in oVirt web interface to use COW format on NFS storage domains.
You can 1. create a small disk (1G) 2. create a snapshot 3. extend the disk go the final size
And you have nfs with cow format. The performance difference with one snapshot should be small.
Yes. And there are other workarounds: 1. Use some block (i.e. iSCSI) storage for creating a thin provisioned disk (which is COW) and then move it to required storage. 2. Keep an empty 1G COW disk and copy+resize it when required. 3. Use ovirt-shell for creating disks.
Unfortunately, these are not native ways. These are ways for a hacker. Plain user clicks "New" in "Disks" tab and selects "Thin Provision" allocation policy. It's hard to explain to users that the simplest and obvious way is wrong. I hope it's wrong only for MS NFS.
Sure I agree.
I think we do not use qcow format on file storage since there is no need for this, the file system is always sparse. I guess we did not plan to use MS NFS.
I would open bug for supporting qcow format on file storage. If this works for some users, I think this is an option that should be possible in the ui. Hopefully there are no too many assumptions in the code about this.
Allon, do you see any reason not to support this for user that need this option?
5. Data corruption happens after 'Auto-generated for Live Storage Migration' snapshot. So if you rollback the snapshot, you could see absolutely clean filesystem.
Can you try to create a live-snapshot on MS NFS? It seems that this is the issue, not live storage migration.
Live snapshots work very well on MS NFS. Creating and deleting works live without any issues. I did it many times. Please note that everything before the snapshot remains consistent. Data corruption occurs after the snapshot. So only non-snapshotted data is corrupted.
live migration starts by creating a snapshot, then copying the disks to the new storage, and then mirroring the active layer so both the old and the new disks are the same. Finally we switch to the new disk, and delete the old disk.
So probably the issue is in the mirroring step. This is most likely a qemu issue.
Do you have qemu-guest-agent on the vm? Without qemu-guest-agent, file systems on the guest will no be freezed during the snapshot, which may cause inconsistent snapshot.
I tried it with and without qemu-guest-agent. It doesn't depend.
Can you reproduce this with virt-manager, or by creating a vm and taking a snapshot using virsh?
Sorry, I'm not sure how I can reproduce the issue using virsh.
I'll try to get instructions for this from libvirt developers. If this happen with libvirt alone, this is a libvirt or qemu bug, and there is little we (ovirt) can do about it.
Please file a bug and attach:
- /var/log/vdsm/vdsm.log - /var/log/messages - /var/log/sanlock.log - output of nfsstat during the test, maybe run it every minute?
Ok, I will collect the logs and fill a bug.
Thanks

Nir, On 21/01/16 23:55, "Nir Soffer" <nsoffer@redhat.com> wrote:
live migration starts by creating a snapshot, then copying the disks to the new storage, and then mirroring the active layer so both the old and the new disks are the same. Finally we switch to the new disk, and delete the old disk.
So probably the issue is in the mirroring step. This is most likely a qemu issue.
Thank you for clarification. This brought me an idea to check consistency of the old disk. I performed the following testing: 1. Create a VM on MS NFS 2. Initiate live disk migration to another storage 3. Catch the source files before oVirt has removed them by creating hard links to another directory 4. Shutdown VM 5. Create another VM and move the catched files to the place where new disk files is located 6. Check consistency of filesystem in both VMs The source disk is consistent. The destination disk is corrupted.
I'll try to get instructions for this from libvirt developers. If this happen with libvirt alone, this is a libvirt or qemu bug, and there is little we (ovirt) can do about it.
I've tried to reproduce the mirroring of active layer: 1. Create two thin template provisioned VMs from the same template on different storages. 2. Start VM1 3. virsh blockcopy VM1 vda /rhev/data-center/...path.to.disk.of.VM2.. --wait --verbose --reuse-external --shallow 4. virsh blockjob VM1 vda --abort --pivot 5. Shutdown VM1 6. Start VM2. Boot in recovery mode and check filesystem. I did try this a dozen times. Everything works fine. No data corruption. Ideas?

On Fri, Jan 22, 2016 at 5:15 PM, Pavel Gashev <Pax@acronis.com> wrote:
Nir,
On 21/01/16 23:55, "Nir Soffer" <nsoffer@redhat.com> wrote:
live migration starts by creating a snapshot, then copying the disks to the new storage, and then mirroring the active layer so both the old and the new disks are the same. Finally we switch to the new disk, and delete the old disk.
So probably the issue is in the mirroring step. This is most likely a qemu issue.
Thank you for clarification. This brought me an idea to check consistency of the old disk.
I performed the following testing: 1. Create a VM on MS NFS 2. Initiate live disk migration to another storage 3. Catch the source files before oVirt has removed them by creating hard links to another directory 4. Shutdown VM 5. Create another VM and move the catched files to the place where new disk files is located 6. Check consistency of filesystem in both VMs
The source disk is consistent. The destination disk is corrupted.
I'll try to get instructions for this from libvirt developers. If this happen with libvirt alone, this is a libvirt or qemu bug, and there is little we (ovirt) can do about it.
I've tried to reproduce the mirroring of active layer:
1. Create two thin template provisioned VMs from the same template on different storages. 2. Start VM1 3. virsh blockcopy VM1 vda /rhev/data-center/...path.to.disk.of.VM2.. --wait --verbose --reuse-external --shallow 4. virsh blockjob VM1 vda --abort --pivot 5. Shutdown VM1 6. Start VM2. Boot in recovery mode and check filesystem.
I did try this a dozen times. Everything works fine. No data corruption.
If you take same vm, and do a live storage migration in ovirt, the file system is corrupted after the migration? What is the guest os? did you try with more then one?
Ideas?
Thanks for this research! The next step is to open a bug with the logs I requested in my last message. Please mark the bug as urgent. I'm adding Kevin (from qemu) and Eric (from libvirt), hopefully they can tell if the virsh flow is indeed identical to what ovirt does, and what should be the next step for debugging this. Ovirt is using blockCopy if available (should be available everywhere for some time), or fallback to blockRebase. Do you see this warning? blockCopy not supported, using blockRebase For reference, this is the relevant code in ovirt for the mirroring part. The mirroring starts with diskReplicateStart(), and ends with diskReplicateFinish(). I remove the parts about managing vdsm state and left the calls to libvirt. 3378 def diskReplicateFinish(self, srcDisk, dstDisk): ... 3394 blkJobInfo = self._dom.blockJobInfo(drive.name, 0) ... 3418 if srcDisk != dstDisk: 3419 self.log.debug("Stopping the disk replication switching to the " 3420 "destination drive: %s", dstDisk) 3421 blockJobFlags = libvirt.VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT ... 3429 else: 3430 self.log.debug("Stopping the disk replication remaining on the " 3431 "source drive: %s", dstDisk) 3432 blockJobFlags = 0 ... 3435 try: 3436 # Stopping the replication 3437 self._dom.blockJobAbort(drive.name, blockJobFlags) 3438 except Exception: 3439 self.log.exception("Unable to stop the replication for" 3440 " the drive: %s", drive.name) ... 3462 def _startDriveReplication(self, drive): 3463 destxml = drive.getReplicaXML().toprettyxml() 3464 self.log.debug("Replicating drive %s to %s", drive.name, destxml) 3465 3466 flags = (libvirt.VIR_DOMAIN_BLOCK_COPY_SHALLOW | 3467 libvirt.VIR_DOMAIN_BLOCK_COPY_REUSE_EXT) 3468 3469 # TODO: Remove fallback when using libvirt >= 1.2.9. 3470 try: 3471 self._dom.blockCopy(drive.name, destxml, flags=flags) 3472 except libvirt.libvirtError as e: 3473 if e.get_error_code() != libvirt.VIR_ERR_NO_SUPPORT: 3474 raise 3475 3476 self.log.warning("blockCopy not supported, using blockRebase") 3477 3478 base = drive.diskReplicate["path"] 3479 self.log.debug("Replicating drive %s to %s", drive.name, base) 3480 3481 flags = (libvirt.VIR_DOMAIN_BLOCK_REBASE_COPY | 3482 libvirt.VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT | 3483 libvirt.VIR_DOMAIN_BLOCK_REBASE_SHALLOW) 3484 3485 if drive.diskReplicate["diskType"] == DISK_TYPE.BLOCK: 3486 flags |= libvirt.VIR_DOMAIN_BLOCK_REBASE_COPY_DEV 3487 3488 self._dom.blockRebase(drive.name, base, flags=flags)

Nir, On Fri, 2016-01-22 at 20:47 +0000, Nir Soffer wrote:
On Fri, Jan 22, 2016 at 5:15 PM, Pavel Gashev <Pax@acronis.com> wrote:
I've tried to reproduce the mirroring of active layer:
1. Create two thin template provisioned VMs from the same template on different storages. 2. Start VM1 3. virsh blockcopy VM1 vda /rhev/data -center/...path.to.disk.of.VM2.. --wait --verbose --reuse-external --shallow 4. virsh blockjob VM1 vda --abort --pivot 5. Shutdown VM1 6. Start VM2. Boot in recovery mode and check filesystem.
I did try this a dozen times. Everything works fine. No data corruption.
If you take same vm, and do a live storage migration in ovirt, the file system is corrupted after the migration?
Yes. And I've reproduced the issue: 1. Create a VM on MS NFS 2. Start VM 3. Create a disk-only snapshot 4. virsh blockcopy VM1 /some/file --wait --verbose --reuse-external --shallow 5. virsh blockjob VM1 vda --abort --pivot 6. Shutdown VM 7. Copy the /some/file back to /rhev/data-center/..the.latest.snapshot.of.VM.. 8. Start VM and check filesystem In other words, creating a snapshot is important step for reproducing the issue.
What is the guest os? did you try with more then one?
Guest OS is W2K12. I was unable to reproduce the issue with Linux.
The next step is to open a bug with the logs I requested in my last message. Please mark the bug as urgent.
https://bugzilla.redhat.com/show_bug.cgi?id=1301713
I'm adding Kevin (from qemu) and Eric (from libvirt), hopefully they can tell if the virsh flow is indeed identical to what ovirt does, and what should be the next step for debugging this.
Ovirt is using blockCopy if available (should be available everywhere for some time), or fallback to blockRebase. Do you see this warning?
blockCopy not supported, using blockRebase
No such warning. There is 'Replicating drive vda to <disk...' Please find vdsm.log attached to the bug report. Thanks

On Mon, Jan 25, 2016 at 9:20 PM, Pavel Gashev <Pax@acronis.com> wrote:
Nir,
On Fri, 2016-01-22 at 20:47 +0000, Nir Soffer wrote:
On Fri, Jan 22, 2016 at 5:15 PM, Pavel Gashev <Pax@acronis.com> wrote:
I've tried to reproduce the mirroring of active layer:
1. Create two thin template provisioned VMs from the same template on different storages. 2. Start VM1 3. virsh blockcopy VM1 vda /rhev/data -center/...path.to.disk.of.VM2.. --wait --verbose --reuse-external --shallow 4. virsh blockjob VM1 vda --abort --pivot 5. Shutdown VM1 6. Start VM2. Boot in recovery mode and check filesystem.
I did try this a dozen times. Everything works fine. No data corruption.
If you take same vm, and do a live storage migration in ovirt, the file system is corrupted after the migration?
Yes. And I've reproduced the issue:
1. Create a VM on MS NFS 2. Start VM 3. Create a disk-only snapshot 4. virsh blockcopy VM1 /some/file --wait --verbose --reuse-external --shallow 5. virsh blockjob VM1 vda --abort --pivot
At this point, /some/file should be the top layer of the vm, instead of the latest snapshot of the vm. Can you add the output of qemu-img info /some/file?
6. Shutdown VM 7. Copy the /some/file back to /rhev/data-center/..the.latest.snapshot.of.VM..
Can you add the output of qemu-img info on the lastest snapshot of the vm?
8. Start VM and check filesystem
What are the results of this check? Can you answer this on the bug, so we have more information for libvirt and qemu developers?
In other words, creating a snapshot is important step for reproducing the issue.
So this happens only when mirroring a volume that is a result of a live snapshot, right?
What is the guest os? did you try with more then one?
Guest OS is W2K12. I was unable to reproduce the issue with Linux.
W2K - Windows 2000?
The next step is to open a bug with the logs I requested in my last message. Please mark the bug as urgent.
https://bugzilla.redhat.com/show_bug.cgi?id=1301713
I'm adding Kevin (from qemu) and Eric (from libvirt), hopefully they can tell if the virsh flow is indeed identical to what ovirt does, and what should be the next step for debugging this.
Ovirt is using blockCopy if available (should be available everywhere for some time), or fallback to blockRebase. Do you see this warning?
blockCopy not supported, using blockRebase
No such warning. There is 'Replicating drive vda to <disk...' Please find vdsm.log attached to the bug report.
Thanks
participants (3)
-
Dan Yasny
-
Nir Soffer
-
Pavel Gashev