On Fri, Sep 18, 2020 at 3:50 AM Adam Xu
<adam_xu(a)adagene.com.cn> wrote:
>
> 在 2020/9/17 17:42, Yedidyah Bar David 写道:
>> On Thu, Sep 17, 2020 at 11:57 AM Adam Xu <adam_xu(a)adagene.com.cn> wrote:
>>> 在 2020/9/17 16:38, Yedidyah Bar David 写道:
>>>> On Thu, Sep 17, 2020 at 11:29 AM Adam Xu <adam_xu(a)adagene.com.cn>
wrote:
>>>>> 在 2020/9/17 15:07, Yedidyah Bar David 写道:
>>>>>> On Thu, Sep 17, 2020 at 8:16 AM Adam Xu
<adam_xu(a)adagene.com.cn> wrote:
>>>>>>> 在 2020/9/16 15:53, Yedidyah Bar David 写道:
>>>>>>>> On Wed, Sep 16, 2020 at 10:46 AM Adam Xu
<adam_xu(a)adagene.com.cn> wrote:
>>>>>>>>> 在 2020/9/16 15:12, Yedidyah Bar David 写道:
>>>>>>>>>> On Wed, Sep 16, 2020 at 6:10 AM Adam Xu
<adam_xu(a)adagene.com.cn> wrote:
>>>>>>>>>>> Hi ovirt
>>>>>>>>>>>
>>>>>>>>>>> I just try to upgrade a self-Hosted engine
from 4.3.10 to 4.4.1.4. I followed the step in the document:
>>>>>>>>>>>
>>>>>>>>>>>
https://www.ovirt.org/documentation/upgrade_guide/#SHE_Upgrading_from_4-3
>>>>>>>>>>>
>>>>>>>>>>> the old 4.3 env has a FC storage as engine
storage domain and I have created a new FC storage vv for the new storage domain to be
used in the next steps.
>>>>>>>>>>>
>>>>>>>>>>> I backup the old 4.3 env and prepare a total
new host to restore the env.
>>>>>>>>>>>
>>>>>>>>>>> in charter 4.4 step 8, it said:
>>>>>>>>>>>
>>>>>>>>>>> "During the deployment you need to
provide a new storage domain. The deployment script renames the 4.3 storage domain and
retains its data."
>>>>>>>>>>>
>>>>>>>>>>> it does rename the old storage domain. but it
didn't let me choose a new storage domain during the deployment. So the new enigne
just deployed in the new host's local storage and can not move to the FC storage
domain.
>>>>>>>>>>>
>>>>>>>>>>> Can anyone tell me what the problem is?
>>>>>>>>>> What do you mean in "deployed in the new
host's local storage"?
>>>>>>>>>>
>>>>>>>>>> Did deploy finish successfully?
>>>>>>>>> I think it was not finished yet.
>>>>>>>> You did 'hosted-engine --deploy
--restore-from-file=something', right?
>>>>>>>>
>>>>>>>> Did this finish?
>>>>>>> not finished yet.
>>>>>>>> What are the last few lines of the output?
>>>>>>> [ INFO ] You can now connect to
>>>>>>>
https://ovirt6.ntbaobei.com:6900/ovirt-engine/ and check the
status of
>>>>>>> this host and eventually remediate it, please continue only
when the
>>>>>>> host is listed as 'up'
>>>>>>>
>>>>>>> [ INFO ] TASK [ovirt.hosted_engine_setup : include_tasks]
>>>>>>>
>>>>>>> [ INFO ] ok: [localhost]
>>>>>>> [ INFO ] TASK [ovirt.hosted_engine_setup : Create temporary
lock file]
>>>>>>> [ INFO ] changed: [localhost]
>>>>>>>
>>>>>>> [ INFO ] TASK [ovirt.hosted_engine_setup : Pause execution
until
>>>>>>> /tmp/ansible.g2opa_y6_he_setup_lock is removed, delete it
once ready to
>>>>>>> proceed]
>>>>>> Great. This means that you replied 'Yes' to 'Pause
the execution
>>>>>> after adding this host to the engine?', and it's now
waiting.
>>>>>>
>>>>>>> but the new host which run the self-hosted engine's
status is
>>>>>>> "NonOperational" and never will be "up"
>>>>>> You seem to to imply that you expected it to become
"up" by itself,
>>>>>> and that you claim that this will never happen, in which you are
>>>>>> correct.
>>>>>>
>>>>>> But that's not the intention. The message you got is:
>>>>>>
>>>>>> You will be able to iteratively connect to the restored
engine in
>>>>>> order to manually review and remediate its configuration before
>>>>>> proceeding with the deployment:
>>>>>> please ensure that all the datacenter hosts and storage
domain are
>>>>>> listed as up or in maintenance mode before proceeding.
>>>>>> This is normally not required when restoring an up to
date and
>>>>>> coherent backup.
>>>>>>
>>>>>> This means that it's up to you to handle this nonoperational
host,
>>>>>> and that you are requested to continue (by removing that file)
only
>>>>>> then.
>>>>>>
>>>>>> So now, let's try to understand why the host is
nonoperational, and
>>>>>> try to fix that. Ok?
>>>>>>
>>>>>> You should be able to find the current (private/local) IP address
of
>>>>>> the engine vm by searching the hosted-engine setup logs for
'local_vm_ip'.
>>>>>> You can ssh (and scp etc.) there from the host, using user
'root' and
>>>>>> the password you supplied.
>>>>>>
>>>>>> Please check/share all of /var/log/ovirt-engine on the engine
vm.
>>>>>> In particular, please check host-deploy/* logs there. The last
lines
>>>>>> show a summary, like:
>>>>>>
>>>>>> HOSTNAME : ok=97 changed=34 unreachable=0 failed=0
>>>>>> skipped=46 rescued=0 ignored=1
>>>>> my log here is:
>>>>>
>>>>> 2020-09-17 12:19:40 CST - TASK [Executing post tasks defined by
user]
>>>>> ************************************
>>>>> 2020-09-17 12:19:40 CST - PLAY RECAP
>>>>>
*********************************************************************
>>>>>
ovirt2.ntbaobei.com : ok=99 changed=45 unreachable=0
>>>>> failed=0 skipped=45 rescued=0 ignored=1
>>>> Good.
>>>>
>>>>>> Is 'failed' higher than 0? If so, please find the failed
task and
>>>>>> check/share the relevant error (or just the entire file).
>>>>>>
>>>>>> Also, please check engine.log there for any ' ERROR '.
>>>>> I collected some error log in engine.log
>>>> Only those below?
>>>>
>>>>> 2020-09-17 12:14:35,084+08 ERROR
>>>>> [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand]
>>>>>
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-83)
>>>>> [4a6cf221] Command 'UploadStreamVDSCommand(HostName =
>>>>>
ovirt6.ntbaobei.com,
>>>>>
UploadStreamVDSCommandParameters:{hostId='784eada4-49e3-4d6c-95cd-f7c81337c2f7'})'
>>>>> execution failed: java.net.SocketException: Connection reset
>>>> This, and similar ones, are expected - the engine is still on the
>>>> private network, so it can't access the other hosts.
>>>>
>>>>> ...
>>>>>
>>>>> 2020-09-17 12:14:35,085+08 ERROR
>>>>> [org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand]
>>>>>
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-83)
>>>>> [4a6cf221] Command
>>>>>
'org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand' failed:
>>>>> EngineException:
>>>>> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
>>>>> java.net.SocketException: Connection reset (Failed with error
>>>>> VDS_NETWORK_ERROR and code 5022)
>>>>>
>>>>> ...
>>>>>
>>>>> 2020-09-17 12:14:40,322+08 ERROR
>>>>> [org.ovirt.engine.core.bll.pm.FenceProxyLocator]
>>>>>
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-53)
>>>>> [8b0987a
>>>>> ] Can not run fence action on host 'ovirt2.ntbaobei.com', no
suitable
>>>>> proxy host was found.
>>>> Not sure why it would want to fence ovirt2, but I think it can be
ignored
>>>> for now as well.
>>>>
>>>>> ...
>>>>>
>>>>> 2020-09-17 12:14:48,861+08 ERROR
>>>>>
[org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand]
>>>>>
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-2)
>>>>> [4a6cf221] Ending command
>>>>>
'org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand'
>>>>> with failure.
>>>> Same - it can't access the storage, so updating ovfstore fails. OK.
>>>>
>>>>> 2020-09-17 12:14:52,630+08 ERROR
>>>>>
[org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand]
>>>>>
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-41)
>>>>> [56d6bb10] Failed to update OVF_STORE content
>>>>> 2020-09-17 12:14:52,630+08 ERROR
>>>>> [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback]
>>>>>
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-41)
>>>>> [56d6bb10] Command 'ProcessOvfUpdateForStorageDomain' id:
>>>>> '8e6e1fa1-1fdf-4928-9153-4fe2ae9b77b0' with children
>>>>> [1c4d99f8-2d05-4b0a-938b-8733157778e1,
>>>>> 62caf674-5567-461c-8e86-4ed7b03306af] failed when attempting to
perform
>>>>> the next operation, marking as 'ACTIVE'
>>>>> 2020-09-17 12:14:52,630+08 ERROR
>>>>> [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback]
>>>>>
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-41)
>>>>> [56d6bb10] null: java.lang.RuntimeException
>>>> Same.
>>>>
>>>> Are these the only errors?
>>>>
>>>> In particular, try to search for 'ovirt2' (your host's name),
try to
>>>> find when it became nonoperational, and check errors around this.
>>> the host has the permission to access the storage. I don't know why it
>>> can access the storage.
>> Me neither, but that's still irrelevant. First the node has to be Up, then
>> you should check the storage.
>>
>>> should I use one host of the original cluster to install the new
>>> self-Hosted engine and restore the backup file?
>> I thought this is what you did, no?
>>
>> Please explain what you did.
> for example, I have a ovirt cluster which have 3 hosts, named
>
ovirt1.example.com,
ovirt2.example.com and ovirt3.example
>
> I backup the engine and prepare a new host named
ovirt4.example.com to
> restore the backup file. is that why ovirt4 can not manage the store
> domain ?
No.
OK. I given up. Since we have to endure a certain amount of downtime.
At last, I create a new ovirt and use a export domain to migrate all the
VMs to the new ovirt.
I know it's ugly but useful.
Best regards,
>> Thanks,
>>
>>>> Thanks,
>>>>
>>>>>> Good luck and best regards,
>>>>>>
>>>>>>>> Please also check/share logs from
/var/log/ovirt-hosted-engine-setup/*
>>>>>>>> (including subdirs).
>>>>>>>> no more errers there, just a lot of DEBUG messages.
>>>>>>>>> It didn't tell me to choose a new
>>>>>>>>> storage domain and just give me the new hosts fqdn as
the engine's URL.
>>>>>>>>> like host6.example.com:6900 .
>>>>>>>> Yes, that's temporarily, to let you access the engine
VM (on the local network).
>>>>>>>>
>>>>>>>>> I can login use the host6.example.com:6900 and I saw
the engine vm ran
>>>>>>>>> in host6's /tmp dir.
>>>>>>>>>
>>>>>>>>>> HE deploy (since 4.3) first creates a VM for the
engine on local
>>>>>>>>>> storage, then prompts you to provide the storage
you want to use, and
>>>>>>>>>> then moves the VM disk image there.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Adam Xu
>>>>>>>>>>>
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> Users mailing list -- users(a)ovirt.org
>>>>>>>>>>> To unsubscribe send an email to
users-leave(a)ovirt.org
>>>>>>>>>>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>>>>>>>>>>> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
>>>>>>>>>>> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XHDGJB2ZAFS...
>>>>>>>>> --
>>>>>>>>> Adam Xu
>>>>>>>>> Phone: 86-512-8777-3585
>>>>>>>>> Adagene (Suzhou) Limited
>>>>>>>>> C14, No. 218, Xinghu Street, Suzhou Industrial Park
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list -- users(a)ovirt.org
>>>>>>>>> To unsubscribe send an email to
users-leave(a)ovirt.org
>>>>>>>>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>>>>>>>>> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
>>>>>>>>> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RLOBPKLW7OB...
>>>>>>> --
>>>>>>> Adam Xu
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list -- users(a)ovirt.org
>>>>>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>>>>>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>>>>>>> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
>>>>>>> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UTVZW7W6XHZ...
>>>>> --
>>>>> Adam Xu
>>>>> Phone: 86-512-8777-3585
>>>>> Adagene (Suzhou) Limited
>>>>> C14, No. 218, Xinghu Street, Suzhou Industrial Park
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list -- users(a)ovirt.org
>>>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>>>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>>>>> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
>>>>> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RQ3V7J4JKQ4...
>>> --
>>> Adam Xu
>>> Phone: 86-512-8777-3585
>>> Adagene (Suzhou) Limited
>>> C14, No. 218, Xinghu Street, Suzhou Industrial Park
>>> _______________________________________________
>>> Users mailing list -- users(a)ovirt.org
>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>>> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/M4S6G6OANQI...
>>
> --
> Adam Xu
> Phone: 86-512-8777-3585
> Adagene (Suzhou) Limited
> C14, No. 218, Xinghu Street, Suzhou Industrial Park
>
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LCLQPHRZWW6...
--
Adam Xu
Phone: 86-512-8777-3585
Adagene (Suzhou) Limited
C14, No. 218, Xinghu Street, Suzhou Industrial Park