Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to open X display when running in docker #1285

Open
YoelRidgway opened this issue Jun 18, 2024 · 19 comments
Open

Failed to open X display when running in docker #1285

YoelRidgway opened this issue Jun 18, 2024 · 19 comments

Comments

@YoelRidgway
Copy link

YoelRidgway commented Jun 18, 2024

Hello there. I'm using tileserver-gl v4.11.1 and I'm getting the classic:

terminate called after throwing an instance of 'std::runtime_error'
   what(): Failed to open X display.

Normally this happens when the x display server is not running, however I'm running with docker... Are there any suggestions on how I can investigate this issue? I'm not sure where to start. This issue happened a few times on my local machine, but restarting would do the trick, but now it is hosted on a remote server and it is happening every time.

@nathanpackard
Copy link

I'm having the same problem.

@acalcutt
Copy link
Collaborator

acalcutt commented Jul 4, 2024

Does it work either of you with previous versions? what is the last version that worked for you?

@nathanpackard
Copy link

nathanpackard commented Jul 4, 2024

I am actually running: Tileserver-gl v4.4.10, and get this error.

I am new to the project I'm working on so can't comment on the last version that worked.

For me, it mostly doesn't work. However, every once in a while I'll try again and it works (where I didn't change anything). It is sort of hit and miss when it works.

It happens when I run: docker-compose up
When it works, I get:
tileservergl | Starting tileserver-gl v4.4.10
tileservergl | Using specified config file from config.json
tileservergl | Starting server
tileservergl | Listening at http://[::]:8080/
tileservergl | Style "dark_offline" changed, updating...
tileservergl | Style "dark_online" changed, updating...
tileservergl | Style "dark_terrain_offline" changed, updating...
tileservergl | Style "dark_terrain_online" changed, updating...
tileservergl | Style "legacy" changed, updating...
tileservergl | Startup complete
tileservergl | GET /health 200 2 - 2.156 ms

When it doesn't work, I get:
tileservergl | Starting tileserver-gl v4.4.10
tileservergl | Using specified config file from config.json
tileservergl | Starting server
tileservergl | Listening at http://[::]:8080/
tileservergl | Style "dark_offline" changed, updating...
tileservergl | Style "dark_online" changed, updating...
tileservergl | Style "dark_terrain_offline" changed, updating...
tileservergl | Style "dark_terrain_online" changed, updating...
tileservergl | terminate called after throwing an instance of 'std::runtime_error'
tileservergl | what(): Failed to open X display.
tileservergl exited with code 0

@acalcutt
Copy link
Collaborator

acalcutt commented Jul 4, 2024

The docker image uses xvfb to provide X display. So the only thing i could think of is your cpu doesn't meet the requirements for it to emulate open-gl. What cpu and os are you running on?

I ran into something like this running directly on windows 2022 when i ran on a vitual server, since it didn't support opengl. There I had to force it to use mesa3d, which is am emulated open-gl similar to xvfb

the crash is likely happening when you visit the index page, where it needs to render thumbnails, or loading a rendered tiles. since this is when maplibre-native needs X display to render.

@acalcutt
Copy link
Collaborator

acalcutt commented Jul 4, 2024

We have found that when using xvfb in maplibre-native ci workflows, it does fail with that error sometimes. it seemed to be a known xvfb issue

@nathanpackard
Copy link

Yeah it sounds like a resource issue. however, I have lots of resources. Here is my system summary:
image

Also, my docker-compose.yml file has a lot of resources assigned:
mem_limit: 24G
cpus: '8.0'
deploy:
resources:
limits:
cpus: '8.0'
memory: 24G
reservations:
cpus: '8.0'
memory: 24G

@acalcutt
Copy link
Collaborator

acalcutt commented Jul 6, 2024

Are you able to test swapping these two packages around and see if it makes a difference
https://github.com/maptiler/tileserver-gl/blob/master/src/serve_rendered.js#L5-L11

Edit: I guess this is a slightly different error, so probably not it.

@asukachiharu
Copy link

We have found that when using xvfb in maplibre-native ci workflows, it does fail with that error sometimes. it seemed to be a known xvfb issue

I found that as long as there is a style.json file in the folder, starting the Docker container will result in this error. However, this issue almost never occurs on Windows.

@asukachiharu
Copy link

We have found that when using xvfb in maplibre-native ci workflows, it does fail with that error sometimes. it seemed to be a known xvfb issue

I found that as long as there is a style.json file in the folder, starting the Docker container will result in this error. However, this issue almost never occurs on Windows.

Specifically, this error does not appear locally. However, when previewing the raster, an error occurs: [error: failed to parse json: the document is empty. at offset 0] /GET xxxxx/xx512/0/0/0.png 500.

@docuracy
Copy link

docuracy commented Jan 1, 2025

I guess this is the same problem:

2024-12-31T22:47:14.607Z | [CI] Failed to open X display, retrying...
... (20 of these retry notifications in total) ...
2024-12-31T22:47:24.115Z | [CI] Failed to open X display, retrying...
2024-12-31T22:47:24.615Z | terminate called after throwing an instance of 'std::runtime_error'
2024-12-31T22:47:24.615Z |   what():  Failed to open X display.

It's triggered by any call for a static map, for example:

curl -I "http://localhost:30080/styles/elevation/static/9.051,48.228,10/1x1.png"

I'm running the Docker v5.0.0 image in Kubernetes (which includes xvfb , intended I understand to run as a daemon when required), with these resources:

    requests:
      memory: "2Gi"
      cpu: "2"
    limits:
      memory: "4Gi"
      cpu: "4"

Any suggestions, please?

@mloskot
Copy link
Contributor

mloskot commented Jan 27, 2025

I'm experiencing the same issue, as in @docuracy #1285 (comment), when running latest maptiler/tileserver-gl:v5.1.3 from container on AKS cluster.
Thumbnails of previews do not load and attempt to directly access thumbnail URL crashes TileServer-GL, and terminates the container.

I run the container with the following command

containers:
- name: tileserver-gl
  image: maptiler/tileserver-gl:v5.1.3
  command:
  - node
  args:
  - /usr/src/app
  - "--verbose"
  - "--public_url"
  - "https://svc.example.com/test/mbtiles/"

The container has Xvfb included, but this logic seems skipped in that case, isn't it?

if ! which -- "${1}"; then
# first arg is not an executable
if [ -e /tmp/.X99-lock ]; then rm /tmp/.X99-lock -f; fi
export DISPLAY=:99
Xvfb "${DISPLAY}" -nolisten unix &
exec node /usr/src/app/ "$@"
fi

@mloskot
Copy link
Contributor

mloskot commented Jan 27, 2025

Fixed

This is a quick follow-up to my previous #1285 (comment)

I think I have fixed or rather worked around the issue:

I removed explicit execution of node - see my previous YAML snippet above

containers:
- name: tileserver-gl
  image: maptiler/tileserver-gl:v5.1.3
  args:
  - "--verbose"
  - "--public_url"
  - "https://svc.example.com/test/mbtiles/"

in order to ensure this if-ed logic is triggered so the Xvfb is executed:

if ! which -- "${1}"; then
# first arg is not an executable
if [ -e /tmp/.X99-lock ]; then rm /tmp/.X99-lock -f; fi
export DISPLAY=:99
Xvfb "${DISPLAY}" -nolisten unix &
exec node /usr/src/app/ "$@"
fi

Once that tweak is deployed, shell'ed to TileServer-GL container and ps aux-ed to verify Xvfb :99 -nolisten unix is running indeed.

Finally, TileServer-GL frontpage shows the the preview thumbnail

Image

and no more crashes logged by the server

Image

Workaround

Above, I referred to the solution as a workaround because, I think, the container entrypoint could be improved to make it harder for users to trip over explicit execution of node :) The container could allow to pass values for all the command line options via env vars or those could be made configurable in the config.json, lots of ways... I'm happy to propose a PR, but I'd like to hear about developers preferences here.

@okimiko
Copy link
Contributor

okimiko commented Jan 31, 2025

As of @mloskot's issue is caused by an entrypoint overwrite (command == ENTRYPOINT, args == CMD, see https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#define-a-command-and-arguments-when-you-create-a-pod), it may be helpful to check the startup config of the others, too.

@mloskot I like the approach of configuration by environment variables, but I'm not sure if this always fits every need. I mount a preconfigured file in the container. May be this is an option for you, too (see https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/).

@mloskot
Copy link
Contributor

mloskot commented Jan 31, 2025

@okimiko What are config.json equivalents of the command line options e.g. --verbose?

No single solution fits every need, but the issue here boils down to having any solution available. Right now, those are only configurable via containers.[].command and/or containers.[].args which can (too) easily lead to shooting own (whole stories above).

p.s. Some of my workflows use init containers to prepare and deploy my config.json for primary container with TileServer-GL.

@okimiko
Copy link
Contributor

okimiko commented Jan 31, 2025

@mloskot AFAIK: None and in my opinion that is fine: I'm not a fan of long process calls. For simple needs it may help to have some more command line options, but for that I would preferer environment variables. But (as said before) that is just my opinion.

PS: I just checked the source; next to UV_THREADPOOL_SIZE, there is already something for PORT/BIND and a development checks, I thinks it should be not that hard to extend this for non-list options.

PPS: I think we are a bit offtopic, this may be something for a discussion (or a PR ;-))

@mloskot
Copy link
Contributor

mloskot commented Jan 31, 2025

@okimiko

None and in my opinion that is fine: I'm not a fan of long process calls.

You seem to be misunderstanding my suggestion. The fact command line options cannot be controlled in config.json leads users to tweaking them via container command line and THAT leads to the run-time problems like the one discussed in this issue

If users could stick to default entrypoint of the official container image in every scenario, they would have not experienced craches. I'm happy to learn I'm wrong about this.

Having said that, with respect, but "how can we let users configure containers with all aspects of TileServer-GL without touching command line arguments" is absolutely in-topic here as addressing that question is part of solution of this issue here.

@okimiko
Copy link
Contributor

okimiko commented Jan 31, 2025

Yes, you are right then, I misunderstood your suggestion. But then I have an even more opposit opinion: You already can append every command line option (using args or CMD or command) if you do not overwrite the default entrypoint (like in every other image, too).

@mloskot
Copy link
Contributor

mloskot commented Jan 31, 2025

@okimiko

You already can append every command line option (using args or CMD or command) if you do not overwrite the default entrypoint

You cannot use the command without overwriting the default entry point.

From https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/

The command field corresponds to ENTRYPOINT, and the args field corresponds to CMD in some container runtimes.

If users need to specify --public_url, they will apply the obvious canonical manifest:

command: [ "node" ]
args: [ "/usr/src/app", "--public_url", "..." ]

then, TileServer-GL will start crashing at them!

If you can't see there is room for improvement around the configuration aspect here to help users avoid shooting their feet, then I can't help, but I feel like we are running in circles. So, I will to stop there.

@okimiko
Copy link
Contributor

okimiko commented Jan 31, 2025

Then stop using command. My point is, that I do not understand what prevents you from just using this:

args: [ "--public_url", "..." ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

7 participants