Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment stuck at 0% #6

Open
red3333 opened this issue Mar 7, 2024 · 6 comments
Open

Deployment stuck at 0% #6

red3333 opened this issue Mar 7, 2024 · 6 comments

Comments

@red3333
Copy link

red3333 commented Mar 7, 2024

I followed the tutorial on my Nextcloud 28: installed exApp, created a daemon worker, added AiImageGeneratorBot app.
After several hours, the progress remained stuck at "0% deploying", then I finished getting a heartbeat failure.
I checked my docker install:

  • a container has been created and started: "nc_app_ai_image_generator_bot"
  • there doesn't seem to be any activity in it (no memory or disk usage increase, no CPU usage)
  • logs seem to be ok:
    The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling transformers.utils.move_cache().
    0it [00:00, ?it/s]
    INFO: Started server process [1]
    INFO: Waiting for application startup.
    TRACE: ASGI [1] Started scope={'type': 'lifespan', 'asgi': {'version': '3.0', 'spec_version': '2.0'}, 'state': {}}
    TRACE: ASGI [1] Receive {'type': 'lifespan.startup'}
    TRACE: ASGI [1] Send {'type': 'lifespan.startup.complete'}
    INFO: Application startup complete.
    INFO: Uvicorn running on http://127.0.0.1:23000 (Press CTRL+C to quit)

Not sure if it is important, but I don't have any GPU on that server, but plenty of RAM an CPU cores.
Anyone with an idea ?

@bigcat88
Copy link
Contributor

bigcat88 commented Mar 7, 2024

Can you describe you configuration?
Where is installed Nextcloud, where is docker located?
Also can you ping from docker container nextcloud instance?

Do you use Docker Socket Proxy or not?

@red3333
Copy link
Author

red3333 commented Mar 7, 2024

All Machines are on a ESXi hypervisor:

  • Nextcloud running on apache on 1st machine (connected to Internet)

  • All containers running on a 2nd machine behind the 1st one (local network) (64GB ram, 28 Cores, no GPU).

  • I can ping Nextcloud machine from docker

  • I can curl my Nextcloud main page from docker and from inside the nc_app_ai_image_generator_bot container

  • I'm using http connexion between exApp and its docker_socket_proxy container worker; it was created with the following command:
    docker run -e NC_HAPROXY_PASSWORD="$password" \
    -p 2375:2375 \
    -v /var/run/docker.sock:/var/run/docker.sock \
    --name nextcloud-appapi-dsp -h nextcloud-appapi-dsp \
    --restart unless-stopped --privileged -d ghcr.io/cloud-py-api/nextcloud-appapi-dsp:release

@bigcat88
Copy link
Contributor

bigcat88 commented Mar 7, 2024

  • I can curl my Nextcloud main page from docker and from inside the nc_app_ai_image_generator_bot container

Strange, then It should work..
I assume you can curl that url that is inside Daemon config when you create one?
Can you show part of that url, maybe it is not a valid one..

@red3333
Copy link
Author

red3333 commented Mar 8, 2024

I switched to wget for my tests as the url is https :
wget https://my.domain.name/index.php
gives me the main page from the docker machine, the Daemon container and the nc_app_ai_image_generator_bot container.
The same url is configured in the exApp Daemon configuration (and is put in the NEXTCLOUD_URL= env variable of the generator bot)

Also tried the https proxy daemon. Same results as previous.

@red3333
Copy link
Author

red3333 commented Mar 12, 2024

So I found a problem with cloud-py-api/docker-socket-proxy not forwarding request to app
(eg. the /heartbeat was seen in docker-socket-proxy logs, but not in nc_app_ai_image_generator_bot container).
I don't know the exact reason, but replacing localhost by 127.0.0.1 in haproxy_ex_apps.cfg solved the problem.

now, my nc_app_ai_image_generator_bot has received the /heartbeat and the /init requests :
TRACE: 127.0.0.1:54674 - HTTP connection made
TRACE: 127.0.0.1:54674 - ASGI [2] Started scope={'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 23000), 'client': ('127.0.0.1', 54674), 'scheme': 'http', 'root_path': '', 'headers': '<...>', 'state': {}, 'method': 'GET', 'path': '/heartbeat', 'raw_path': b'/heartbeat', 'query_string': b''}
TRACE: 127.0.0.1:54674 - ASGI [2] Send {'type': 'http.response.start', 'status': 200, 'headers': '<...>'}
INFO: 127.0.0.1:54674 - "GET /heartbeat HTTP/1.1" 200 OK
TRACE: 127.0.0.1:54674 - ASGI [2] Send {'type': 'http.response.body', 'body': '<15 bytes>'}
TRACE: 127.0.0.1:54674 - ASGI [2] Completed
TRACE: 127.0.0.1:54674 - ASGI [3] Started scope={'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'server': ('127.0.0.1', 23000), 'client': ('127.0.0.1', 54674), 'scheme': 'http', 'root_path': '', 'headers': '<...>', 'state': {}, 'method':
'POST', 'path': '/init', 'raw_path': b'/init', 'query_string': b''}
TRACE: 127.0.0.1:54674 - ASGI [3] Send {'type': 'http.response.start', 'status': 200, 'headers': '<...>'}
INFO: 127.0.0.1:54674 - "POST /init HTTP/1.1" 200 OK
TRACE: 127.0.0.1:54674 - ASGI [3] Send {'type': 'http.response.body', 'body': '<2 bytes>'}
TRACE: 127.0.0.1:54674 - HTTP connection lost

... but still no visible activity on nc_app_ai_image_generator_bot : no CPU usage, no RAM usage (actually, some RAM usage, but in OS cache, may be related to other containers), no other message.
In Nextcloud, the app is now at "0% initialization", status: "initialization timed out".

@red3333
Copy link
Author

red3333 commented Mar 19, 2024

I tracked the error a bit further:

"/heartbeat" works good.

The "/init" request returns a successfull "OK".
But then the set_init_status generates traceback:
ERROR: Exception in ASGI application
Traceback (most recent call last):
[...]
nc_py_api._exceptions.NextcloudException: [401] Unauthorized <request: PUT /ocs/v1.php/apps/app_api/apps/status/ai_image_generator_bot>
models--stabilityai--sdxl-turbo is therefore not downloaded.
I then achieved to download the models from outside the container, and put the result in the persistent storage of the container.

After a (long) time, the app becomes "Initialization timeout 0%" and keeps waiting...
...but the Nextcloud app_api sends a "/enabled?enabled=1" request, which returns a successful "OK".
In Nextcloud, the app remains "Initialization timeout", but the bot can be added to the Talk app, and requests (eg. @image cinematic portrait of fluffy cat with black eyes) successfully generate images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants