Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the geoserver container populate the datadir, at runtime #57

Open
giohappy opened this issue Mar 7, 2025 · 7 comments · May be fixed by #58
Open

Make the geoserver container populate the datadir, at runtime #57

giohappy opened this issue Mar 7, 2025 · 7 comments · May be fixed by #58
Assignees

Comments

@giohappy
Copy link
Contributor

giohappy commented Mar 7, 2025

This task want to solve two issues at the same time:

  1. get rid of the geoserver_data image and container, whose single purpose is to bake the content of the datadir volume
  2. populate the datadir content at runtime instead of build time. This is required to allow bind mounting the datadirectory, instead of relying on Docker's internal volume. Explanation: the VOLUME directive inside the geoserver_data dir + the copying of content at build time forces the creation of an internal volume (that is shared with the geoserver image). If a bind mount is configured inside the Docker Compose configuration, it will be empty, since inside the container it's shadowed by the Docker volume. By moving the copy of datadir content at runtime we ensure that either the internal volume or a bind mounted volume is populated the first time the container is run.
@giohappy giohappy self-assigned this Mar 7, 2025
@giohappy giohappy changed the title Make the geoserver container populate the datadir at runtime Make the geoserver container populate the datadir, at runtime Mar 7, 2025
@giohappy giohappy linked a pull request Mar 7, 2025 that will close this issue
@ridoo
Copy link

ridoo commented Mar 11, 2025

Related: #38

@giohappy
Copy link
Contributor Author

Related: #38

@ridoo what's your feedback on this PR?

@ridoo
Copy link

ridoo commented Mar 11, 2025

hey @giohappy .. had a quick look. Here are my 2cts:

  • Either use curl or wget .. AFAIK curl is pre-installed more often
  • data_dir from geonode-geoserver-ext could be part of this repository -- even better: keep only those which are relevant to GeoNode
  • In the Dockerfile copy the GeoNode specific files to the appropriate locations

Once done, the default data_dir is part of the image but volatile in a created container. However, users want to persist the data dir of course. The docker-compose setup has to reference that data dir as shared volume, like so:

compose.yml

services:                                                                                                                                                             
     my-service:
       image: hello-world:latest
       volumes:
         - my-shared-volume:/etc
   
   volumes:
     my-shared-volume:

The volume is created and populated for the first time running the container and available under the host's /var/lib/docker/volumes by default. It can be configured to be stored elsewhere of course (see the NFS example in the reference docs).

With this

  • The data_dir is versioned and is located where it is actually used
  • No extra repository needed
  • Reduced complexity in creating the image

@giohappy
Copy link
Contributor Author

@ridoo I'm not sure I get all your points

data_dir from geonode-geoserver-ext could be part of this repository -- even better: keep only those which are relevant to GeoNode

the PR already removed the additional docker image and moved the management of the default datadir to the geoserver image.
Probably I miss the point here.

And what do you mean by "keep only those which are relevant to GeoNode"?

Once done, the default data_dir is part of the image but volatile in a created container. However, users want to persist the data dir of course. The docker-compose setup has to reference that data dir as shared volume, like so:

This is what it already does with /geoserver_datadir, do you mean something different?
The point of populating the directory at runtime is to allow bind mounting the volumes. If it was populated at build time the bind mounted (empty) folder would shadow the content burnt in the image.

Are you suggesting to keep using Docker volumes (as we have now)? I haven't tested using the driver options of the Docker volume. It could be an alternative to bind mounting I guess. Do you think this would allow to keep populating the image at build time, and have an externally managed volume?

@ridoo
Copy link

ridoo commented Mar 11, 2025

the PR already removed the additional docker image and moved the management of the default datadir to the geoserver image.
Probably I miss the point here

I mean: get rid of the geonode-geoserver-ext repository and move it into this repository. Keep things as close together as possible.

This is what it already does with /geoserver_datadir, do you mean something different?

I know. But how it is done is unnecessarily complex (as far I can see). What I wanted to say is this:

The data dir baked into the image serves as a "good-to-start" default. Running data dir in the container is volatile and will be gone once the container is re-created. You say this:

If it was populated at build time the bind mounted (empty) folder would shadow the content burnt in the image.

If the geoserver image contains the content from geonode-geoserver-ext the data default dir is not empty. It will be pre-populated even to a shared volume in case it does not exist. No need to download a zipped data dir and merge/replace the data dir from geoserver.

And what do you mean by "keep only those which are relevant to GeoNode"?

Once you have moved the data dir to this directory, you could think to keep only those files which are really relevant to GeoNode. However, this would mean to mount them separately after mounting the whole data dir. This should be done in a later step.

@giohappy
Copy link
Contributor Author

giohappy commented Mar 12, 2025

@ridoo we agree that we could bake the default datadir inside this image, without downloading the artifact generated from geonode-geoserver-ext.
BTW we still need the workflow that builds and publishes the datadir artifact to support the deployments without Docker. This workflow should, in case, be moved inside this repo...

The other point of this PR is allowing Docker Bind Mounts. From my tests, if we configure a bind mount inside the Docker compose, which is map a host path to the containr's $GEOSERVER_DATA_DIR path, the result is that the (initially) empty host path shadows the containr's $GEOSERVER_DATA_DIR content, UNLESS we populate that path at runtime. I.e. the content inside the folder is created AFTER the bind mounting.
Maybe I miss the alternative solution that you have in mind, but the goal is to support bind mounting.

@ridoo
Copy link

ridoo commented Mar 13, 2025

BTW we still need the workflow that builds and publishes the datadir artifact to support the deployments without Docker. This workflow should, in case, be moved inside this repo...

Totally fine for me :)

the (initially) empty host path shadows the containr's $GEOSERVER_DATA_DIR content, UNLESS we populate that path at runtime. I.e. the content inside the folder is created AFTER the bind mounting.

In my experience, shared-folders (which do not exist yet) will be populated with the content of the container. Imagine a postgresql container which gets a host folder mounted as a shared volume. The database entries will be populated on first use but will be re-used once the container starts again. You can try it out:

docker run --rm -e POSTGRES_PASSWORD=postgres  -v /tmp/pg-test-data:/var/lib/postgresql/data --name test -d postgres:15 docker rm -f test
ls /tmp/pg-test-data

I added optional bind mounts to the docker blueprint via an include:

include:
  - ./compose-volumes_${VOLUMES:-default}.yml

This can be configured via VOLUMES=prod in the .env. Then compose-volumes_prod.yml will be used instead (which can be excluded from versioning via .gitignore). The default shared volumes just have moved into compose-volumes_default.yml so that they do not conflict with upstream changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants