Depending on the distro, common commands like sleep and chown may either
be located in /bin or /usr/bin.
Systemd added path lookup to ExecStart in v239, allowing only the
command name to be put in unit files and not the full path as
historically required. At least Ubuntu 18.04 LTS is however still on
v237 so we should maintain portability for a while longer.
Can you double check that the way I have this set only exposes it locally? It is important that the manhole is not available to the outside world since it is quite powerful and the password is hard coded.
matrix_synapse_storage_path is already defined in matrix-synapse/defaults/main.yml (with a default of "{{ matrix_synapse_base_path }}/storage"), but was not being used for its presumed purpose in matrix-synapse.service.j2. As a result, if matrix_synapse_storage_path was overridden (in a vars.yml), the synapse service failed to start.
Bridges start matrix-synapse.service as a dependency, but
Synapse is sometimes slow to start, while bridges are quick to
hit it and die (if unavailable).
They'll auto-restart later, but .. this still breaks `--tags=start`,
which doesn't wait long enough for such a restart to happen.
This attempts to slow down bridge startup enough to ensure Synapse
is up and no failures happen at all.
Reasoning is the same as for matrix-org/synapse#5023.
For us, the journal used to contain `docker` for all services, which
is not very helpful when looking at them all together (`journalctl -f`).
We run containers as a non-root user (no effective capabilities).
Still, if a setuid binary is available in a container image, it could
potentially be used to give the user the default capabilities that the
container was started with. For Docker, the default set currently is:
- "CAP_CHOWN"
- "CAP_DAC_OVERRIDE"
- "CAP_FSETID"
- "CAP_FOWNER"
- "CAP_MKNOD"
- "CAP_NET_RAW"
- "CAP_SETGID"
- "CAP_SETUID"
- "CAP_SETFCAP"
- "CAP_SETPCAP"
- "CAP_NET_BIND_SERVICE"
- "CAP_SYS_CHROOT"
- "CAP_KILL"
- "CAP_AUDIT_WRITE"
We'd rather prevent such a potential escalation by dropping ALL
capabilities.
The problem is nicely explained here: https://github.com/projectatomic/atomic-site/issues/203
This makes all containers (except mautrix-telegram and
mautrix-whatsapp), start as a non-root user.
We do this, because we don't trust some of the images.
In any case, we'd rather not trust ALL images and avoid giving
`root` access at all. We can't be sure they would drop privileges
or what they might do before they do it.
Because Postfix doesn't support running as non-root,
it had to be replaced by an Exim mail server.
The matrix-nginx-proxy nginx container image is patched up
(by replacing its main configuration) so that it can work as non-root.
It seems like there's no other good image that we can use and that is up-to-date
(https://hub.docker.com/r/nginxinc/nginx-unprivileged is outdated).
Likewise for riot-web (https://hub.docker.com/r/bubuntux/riot-web/),
we patch it up ourselves when starting (replacing the main nginx
configuration).
Ideally, it would be fixed upstream so we can simplify.
We do match the defaults anyway (by default that is),
but people can customize `matrix_user_uid` and `matrix_user_uid`
and it wouldn't be correct then.
In any case, it's better to be explicit about such an important thing.
With this change, the following roles are now only dependent
on the minimal `matrix-base` role:
- `matrix-corporal`
- `matrix-coturn`
- `matrix-mailer`
- `matrix-mxisd`
- `matrix-postgres`
- `matrix-riot-web`
- `matrix-synapse`
The `matrix-nginx-proxy` role still does too much and remains
dependent on the others.
Wiring up the various (now-independent) roles happens
via a glue variables file (`group_vars/matrix-servers`).
It's triggered for all hosts in the `matrix-servers` group.
According to Ansible's rules of priority, we have the following
chain of inclusion/overriding now:
- role defaults (mostly empty or good for independent usage)
- playbook glue variables (`group_vars/matrix-servers`)
- inventory host variables (`inventory/host_vars/matrix.<your-domain>`)
All roles default to enabling their main component
(e.g. `matrix_mxisd_enabled: true`, `matrix_riot_web_enabled: true`).
Reasoning: if a role is included in a playbook (especially separately,
in another playbook), it should "work" by default.
Our playbook disables some of those if they are not generally useful
(e.g. `matrix_corporal_enabled: false`).
As suggested in #63 (Github issue), splitting the
playbook's logic into multiple roles will be beneficial for
maintainability.
This patch realizes this split. Still, some components
affect others, so the roles are not really independent of one
another. For example:
- disabling mxisd (`matrix_mxisd_enabled: false`), causes Synapse
and riot-web to reconfigure themselves with other (public)
Identity servers.
- enabling matrix-corporal (`matrix_corporal_enabled: true`) affects
how reverse-proxying (by `matrix-nginx-proxy`) is done, in order to
put matrix-corporal's gateway server in front of Synapse
We may be able to move away from such dependencies in the future,
at the expense of a more complicated manual configuration, but
it's probably not worth sacrificing the convenience we have now.
As part of this work, the way we do "start components" has been
redone now to use a loop, as suggested in #65 (Github issue).
This should make restarting faster and more reliable.
This is described in Github issue #58.
Until now, we had the variable, but if you redefined it, you'd run
into multiple problems:
- we actually always mounted some "storage" directory to the Synapse
container. So if your media store is not there, you're out of luck
- homeserver.yaml always hardcoded the path to the media store,
as a directory called "media-store" inside the storage directory.
Relocating to outside the storage directory was out of the question.
Moreover, even if you had simply renamed the media store directory
(e.g. "media-store" -> "media_store"), it would have also caused trouble.
With this patch, we mount the media store's parent to the Synapse container.
This way, we don't care where the media store is (inside storage or
not). We also don't assume (anymore) that the final part of the path
is called "media-store" -- anything can be used.
The "storage" directory and variable (`matrix_synapse_storage_path`)
still remain for compatibility purposes. People who were previously
overriding `matrix_synapse_storage_path` can continue doing so
and their media store will be at the same place.
The playbook no longer explicitly creates the `matrix_synapse_storage_path` directory
though. It's not necessary. If the media store is specified to be within it, it will
get created when the media store directory is created by the playbook.
Pretty much all variables live in their own `matrix_<whatever>`
prefix now and are grouped closer together in the default
variables file (`roles/matrix-server/defaults/main.yml`).
`--log-driver=none` is used for all Docker containers now.
All these containers are started through systemd anyway and get logged in journald,
so there's no need for Docker to be logging the same thing using the default `json-file` driver.
Doing that was growing `/var/lib/docker/containers/..` infinitely until service/container restart.
As a result of this, things like `docker logs matrix-synapse` won't work anymore.
`journalctl -u matrix-synapse` is how one can see the logs.
Moving away from using the default bridge network to using our own.
This isolates our services from other Docker containers running
on the default network on the same host.
The benefits are that:
- isolation is a little better - we no longer share a default
bridge network with any other containers that might be running on the host
- there are no longer hard dependencies - we do service discovery
by DNS name, and not via explicit `--link` usage during container start,
so containers can start out of order and fail without bringing down others
with them
(`matrix-nginx-proxy` can continue running, even if one of the other services dies)
In the future, when other services get introduced,
the increased resilience and simplicity will help as well.
Switching from from avhost/docker-matrix (silviof/docker-matrix)
to matrixdotorg/synapse.
The avhost/docker-matrix (silviof/docker-matrix) image used to bundle
in the coturn STUN/TURN server, so as part of the move,
we're separating this to a separately-ran service
(matrix-coturn.service, powered by instrumentisto/coturn-docker-image)
As described here (
https://github.com/matrix-org/synapse/issues/2438#issuecomment-327424711
), using own SSL certificates for the federation port is more fragile,
as renewing them could cause federation outages.
The recommended setup is to use the self-signed certificates generated
by Synapse.
On the 443 port (matrix-nginx-proxy) side, we still use the Let's Encrypt
certificates, which ensures API consumers work without having to trust
"our own CA".
Having done this, we also don't need to ever restart Synapse anymore,
as no new SSL certificates need to be applied there.
It's just matrix-nginx-proxy that needs to be restarted, and it doesn't
even need a full restart as an "nginx reload" does the job of swithing
to the new SSL certificates.
Moving keeps everything in the /matrix directory, so that we
wouldn't contaminate anything else on the system or risk
clashing with something else.
Also retrieving certificates separately for the Riot and Matrix domains,
which should help in multiple ways:
- allows them to be very different (completely separate base domain..)
- allows for Riot to be disabled for the playbook some time later
and still have the code not break
The goal is to allow these to be on separate partitions
(including remote ones in the future).
Because the `silviof/docker-matrix` image chowns
everything to MATRIX_UID:MATRIX_GID on startup,
we definitely don't want to include `media_store` in it.
If it's on a remote FS, it would cause a slow startup.
Also, adding some safety checks to the "import media store"
task, after passing a wrong path to it on multiple occassions and
wondering what's wrong.
Also, making logging configurable. The default of keeping 10x100MB
log files is likely excessive and people may want to change that.