On Thu, Sep 17, 2020 at 11:57 AM Adam Xu
<adam_xu(a)adagene.com.cn> wrote:
>
> 在 2020/9/17 16:38, Yedidyah Bar David 写道:
>> On Thu, Sep 17, 2020 at 11:29 AM Adam Xu <adam_xu(a)adagene.com.cn> wrote:
>>> 在 2020/9/17 15:07, Yedidyah Bar David 写道:
>>>> On Thu, Sep 17, 2020 at 8:16 AM Adam Xu <adam_xu(a)adagene.com.cn>
wrote:
>>>>> 在 2020/9/16 15:53, Yedidyah Bar David 写道:
>>>>>> On Wed, Sep 16, 2020 at 10:46 AM Adam Xu
<adam_xu(a)adagene.com.cn> wrote:
>>>>>>> 在 2020/9/16 15:12, Yedidyah Bar David 写道:
>>>>>>>> On Wed, Sep 16, 2020 at 6:10 AM Adam Xu
<adam_xu(a)adagene.com.cn> wrote:
>>>>>>>>> Hi ovirt
>>>>>>>>>
>>>>>>>>> I just try to upgrade a self-Hosted engine from
4.3.10 to 4.4.1.4. I followed the step in the document:
>>>>>>>>>
>>>>>>>>>
https://www.ovirt.org/documentation/upgrade_guide/#SHE_Upgrading_from_4-3
>>>>>>>>>
>>>>>>>>> the old 4.3 env has a FC storage as engine storage
domain and I have created a new FC storage vv for the new storage domain to be used in the
next steps.
>>>>>>>>>
>>>>>>>>> I backup the old 4.3 env and prepare a total new host
to restore the env.
>>>>>>>>>
>>>>>>>>> in charter 4.4 step 8, it said:
>>>>>>>>>
>>>>>>>>> "During the deployment you need to provide a new
storage domain. The deployment script renames the 4.3 storage domain and retains its
data."
>>>>>>>>>
>>>>>>>>> it does rename the old storage domain. but it
didn't let me choose a new storage domain during the deployment. So the new enigne
just deployed in the new host's local storage and can not move to the FC storage
domain.
>>>>>>>>>
>>>>>>>>> Can anyone tell me what the problem is?
>>>>>>>> What do you mean in "deployed in the new host's
local storage"?
>>>>>>>>
>>>>>>>> Did deploy finish successfully?
>>>>>>> I think it was not finished yet.
>>>>>> You did 'hosted-engine --deploy
--restore-from-file=something', right?
>>>>>>
>>>>>> Did this finish?
>>>>> not finished yet.
>>>>>> What are the last few lines of the output?
>>>>> [ INFO ] You can now connect to
>>>>>
https://ovirt6.ntbaobei.com:6900/ovirt-engine/ and check the status
of
>>>>> this host and eventually remediate it, please continue only when the
>>>>> host is listed as 'up'
>>>>>
>>>>> [ INFO ] TASK [ovirt.hosted_engine_setup : include_tasks]
>>>>>
>>>>> [ INFO ] ok: [localhost]
>>>>> [ INFO ] TASK [ovirt.hosted_engine_setup : Create temporary lock
file]
>>>>> [ INFO ] changed: [localhost]
>>>>>
>>>>> [ INFO ] TASK [ovirt.hosted_engine_setup : Pause execution until
>>>>> /tmp/ansible.g2opa_y6_he_setup_lock is removed, delete it once ready
to
>>>>> proceed]
>>>> Great. This means that you replied 'Yes' to 'Pause the
execution
>>>> after adding this host to the engine?', and it's now waiting.
>>>>
>>>>> but the new host which run the self-hosted engine's status is
>>>>> "NonOperational" and never will be "up"
>>>> You seem to to imply that you expected it to become "up" by
itself,
>>>> and that you claim that this will never happen, in which you are
>>>> correct.
>>>>
>>>> But that's not the intention. The message you got is:
>>>>
>>>> You will be able to iteratively connect to the restored engine in
>>>> order to manually review and remediate its configuration before
>>>> proceeding with the deployment:
>>>> please ensure that all the datacenter hosts and storage domain
are
>>>> listed as up or in maintenance mode before proceeding.
>>>> This is normally not required when restoring an up to date and
>>>> coherent backup.
>>>>
>>>> This means that it's up to you to handle this nonoperational host,
>>>> and that you are requested to continue (by removing that file) only
>>>> then.
>>>>
>>>> So now, let's try to understand why the host is nonoperational, and
>>>> try to fix that. Ok?
>>>>
>>>> You should be able to find the current (private/local) IP address of
>>>> the engine vm by searching the hosted-engine setup logs for
'local_vm_ip'.
>>>> You can ssh (and scp etc.) there from the host, using user 'root'
and
>>>> the password you supplied.
>>>>
>>>> Please check/share all of /var/log/ovirt-engine on the engine vm.
>>>> In particular, please check host-deploy/* logs there. The last lines
>>>> show a summary, like:
>>>>
>>>> HOSTNAME : ok=97 changed=34 unreachable=0 failed=0
>>>> skipped=46 rescued=0 ignored=1
>>> my log here is:
>>>
>>> 2020-09-17 12:19:40 CST - TASK [Executing post tasks defined by user]
>>> ************************************
>>> 2020-09-17 12:19:40 CST - PLAY RECAP
>>> *********************************************************************
>>>
ovirt2.ntbaobei.com : ok=99 changed=45 unreachable=0
>>> failed=0 skipped=45 rescued=0 ignored=1
>> Good.
>>
>>>> Is 'failed' higher than 0? If so, please find the failed task
and
>>>> check/share the relevant error (or just the entire file).
>>>>
>>>> Also, please check engine.log there for any ' ERROR '.
>>> I collected some error log in engine.log
>> Only those below?
>>
>>> 2020-09-17 12:14:35,084+08 ERROR
>>> [org.ovirt.engine.core.vdsbroker.irsbroker.UploadStreamVDSCommand]
>>> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-83)
>>> [4a6cf221] Command 'UploadStreamVDSCommand(HostName =
>>>
ovirt6.ntbaobei.com,
>>>
UploadStreamVDSCommandParameters:{hostId='784eada4-49e3-4d6c-95cd-f7c81337c2f7'})'
>>> execution failed: java.net.SocketException: Connection reset
>> This, and similar ones, are expected - the engine is still on the
>> private network, so it can't access the other hosts.
>>
>>> ...
>>>
>>> 2020-09-17 12:14:35,085+08 ERROR
>>> [org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand]
>>> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-83)
>>> [4a6cf221] Command
>>> 'org.ovirt.engine.core.bll.storage.ovfstore.UploadStreamCommand'
failed:
>>> EngineException:
>>> org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
>>> java.net.SocketException: Connection reset (Failed with error
>>> VDS_NETWORK_ERROR and code 5022)
>>>
>>> ...
>>>
>>> 2020-09-17 12:14:40,322+08 ERROR
>>> [org.ovirt.engine.core.bll.pm.FenceProxyLocator]
>>> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-53)
>>> [8b0987a
>>> ] Can not run fence action on host 'ovirt2.ntbaobei.com', no
suitable
>>> proxy host was found.
>> Not sure why it would want to fence ovirt2, but I think it can be ignored
>> for now as well.
>>
>>> ...
>>>
>>> 2020-09-17 12:14:48,861+08 ERROR
>>>
[org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand]
>>> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-2)
>>> [4a6cf221] Ending command
>>>
'org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand'
>>> with failure.
>> Same - it can't access the storage, so updating ovfstore fails. OK.
>>
>>> 2020-09-17 12:14:52,630+08 ERROR
>>>
[org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand]
>>> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-41)
>>> [56d6bb10] Failed to update OVF_STORE content
>>> 2020-09-17 12:14:52,630+08 ERROR
>>> [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback]
>>> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-41)
>>> [56d6bb10] Command 'ProcessOvfUpdateForStorageDomain' id:
>>> '8e6e1fa1-1fdf-4928-9153-4fe2ae9b77b0' with children
>>> [1c4d99f8-2d05-4b0a-938b-8733157778e1,
>>> 62caf674-5567-461c-8e86-4ed7b03306af] failed when attempting to perform
>>> the next operation, marking as 'ACTIVE'
>>> 2020-09-17 12:14:52,630+08 ERROR
>>> [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback]
>>> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-41)
>>> [56d6bb10] null: java.lang.RuntimeException
>> Same.
>>
>> Are these the only errors?
>>
>> In particular, try to search for 'ovirt2' (your host's name), try to
>> find when it became nonoperational, and check errors around this.
> the host has the permission to access the storage. I don't know why it
> can access the storage.
Me neither, but that's still irrelevant. First the node has to be Up, then
you should check the storage.
> should I use one host of the original cluster to install the new
> self-Hosted engine and restore the backup file?
I thought this is what you did, no?
Please explain what you did.
to
restore the backup file. is that why ovirt4 can not manage the store
domain ?
Thanks,
>> Thanks,
>>
>>>> Good luck and best regards,
>>>>
>>>>>> Please also check/share logs from
/var/log/ovirt-hosted-engine-setup/*
>>>>>> (including subdirs).
>>>>>> no more errers there, just a lot of DEBUG messages.
>>>>>>> It didn't tell me to choose a new
>>>>>>> storage domain and just give me the new hosts fqdn as the
engine's URL.
>>>>>>> like host6.example.com:6900 .
>>>>>> Yes, that's temporarily, to let you access the engine VM (on
the local network).
>>>>>>
>>>>>>> I can login use the host6.example.com:6900 and I saw the
engine vm ran
>>>>>>> in host6's /tmp dir.
>>>>>>>
>>>>>>>> HE deploy (since 4.3) first creates a VM for the engine
on local
>>>>>>>> storage, then prompts you to provide the storage you want
to use, and
>>>>>>>> then moves the VM disk image there.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Adam Xu
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list -- users(a)ovirt.org
>>>>>>>>> To unsubscribe send an email to
users-leave(a)ovirt.org
>>>>>>>>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>>>>>>>>> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
>>>>>>>>> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XHDGJB2ZAFS...
>>>>>>> --
>>>>>>> Adam Xu
>>>>>>> Phone: 86-512-8777-3585
>>>>>>> Adagene (Suzhou) Limited
>>>>>>> C14, No. 218, Xinghu Street, Suzhou Industrial Park
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list -- users(a)ovirt.org
>>>>>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>>>>>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>>>>>>> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
>>>>>>> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RLOBPKLW7OB...
>>>>> --
>>>>> Adam Xu
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list -- users(a)ovirt.org
>>>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>>>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>>>>> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
>>>>> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UTVZW7W6XHZ...
>>> --
>>> Adam Xu
>>> Phone: 86-512-8777-3585
>>> Adagene (Suzhou) Limited
>>> C14, No. 218, Xinghu Street, Suzhou Industrial Park
>>>
>>> _______________________________________________
>>> Users mailing list -- users(a)ovirt.org
>>> To unsubscribe send an email to users-leave(a)ovirt.org
>>> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
>>> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RQ3V7J4JKQ4...
>>
> --
> Adam Xu
> Phone: 86-512-8777-3585
> Adagene (Suzhou) Limited
> C14, No. 218, Xinghu Street, Suzhou Industrial Park
> _______________________________________________
> Users mailing list -- users(a)ovirt.org
> To unsubscribe send an email to users-leave(a)ovirt.org
> Privacy Statement:
https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/M4S6G6OANQI...
--
Adam Xu
Phone: 86-512-8777-3585
Adagene (Suzhou) Limited
C14, No. 218, Xinghu Street, Suzhou Industrial Park