Matches the earlier Python -> Go rewrites of the other mautrix-* bridges.
Related to:
- https://github.com/mautrix/telegram/releases/tag/v0.2604.0
- https://mau.fi/blog/2026-04-mautrix-release/
The bridge is now a Go binary with upstream-handled automatic database and
config migration on first start, so in-place upgrades on Postgres should
Just Work for users on the defaults. The lottieconverter sidecar container
is gone (bundled upstream), and the public web-based login endpoint is
gone (login happens inside Matrix now).
Upstream v0.2604.0 has a known bug in the legacy SQLite migration that
can corrupt data. The role detects legacy Python-bridge SQLite databases
(via the `telethon_sessions` table signature) and refuses to upgrade,
pointing users to switch to Postgres (playbook-managed pgloader migration)
or wait for the next upstream release. The guard is isolated in its own
`validate_config_sqlite_legacy_migration_bug.yml` so it can be deleted
cleanly once upstream fixes the bug.
Removed variables (all caught by the deprecation check in
`validate_config.yml` with actionable rename/removal hints): the entire
`_hostname` / `_path_prefix` / `_scheme` / `_public_endpoint` /
`_appservice_public_*` / `_container_labels_public_endpoint_*` /
`_container_http_host_bind_port` family (web login endpoint is gone);
`_bot_token` (old-style relaybot is gone, use the common bridgev2 relay
mode); `_filter_mode` (dropped upstream); `_bridge_login_shared_secret_map*`
(use Appservice Double Puppet); `_username_template`, `_alias_template`,
`_displayname_template` (templates moved under `network:`, new Go-template
syntax, exposed via `_network_displayname_template`); all
`_lottieconverter_*` variables; `_appservice_database` (renamed to
`_appservice_database_uri`).
Added playbook-time validation that catches legacy permission values
(`relaybot`, `puppeting`, `full`) in the fully-merged config (so overrides
via `matrix_mautrix_telegram_configuration_extension_yaml` are caught too),
with a mapping hint in the error message.
Other notes:
- The legacy sqlite->postgres relocation of `{base_path}/mautrix-telegram.db`
to `{data_path}/mautrix-telegram.db` now happens BEFORE the pgloader
migration step, so users who flip to Postgres as part of this upgrade
get their data imported correctly.
- The Ketesa managed-user regex for the telegram namespace is updated to
match both regular IDs and the new `channel-<id>` form used by bridgev2.
- `matrix_playbook_migration_expected_version` bumped to v2026.04.24.0,
with a new breaking-change entry pointing at the CHANGELOG section.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These three roles have multiple variable prefixes each:
- kakaotalk: matrix_appservice_kakaotalk + matrix_appservice_kakaotalk_node
- telegram: matrix_mautrix_telegram + matrix_mautrix_telegram_lottieconverter
- synapse: matrix_synapse + matrix_synapse_customized + matrix_synapse_rust_synapse_compress_state
For each: renamed _docker_image* to _container_image* (and _docker_src*,
_docker_repo* where applicable), added deprecation entries in
validate_config.yml, updated group_vars references, and moved
deprecation tasks to the front of validate_config.yml.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This is an attempt at optimizing service startup.
The effect is most pronounced when many services are restarted one by one.
The systemd service manager role sometimes does this - for example when `just install-service synapse` runs.
In such cases, a 5-second delay for each Synapse worker service
(or other bridge/bot service that waits on the homeserver) quickly adds up to a lot.
When services are all stopped fully and then started, the effect is not so pronounced, because
`matrix-synapse.service` starts first and pulls all worker services (defined as `Wants=` for it).
Later on, when the systemd service manager role "starts" these worker services, they're started already.
Even if they had a 5-second wait each, it would have happened in parallel.
People often report and ask about these "failures".
More-so previously, when the `docker kill/rm` output was collected,
but it still happens now when people do `systemctl status
matrix-something` and notice that it says "FAILURE".
Suppressing to avoid further time being wasted on saying "this is
expected".
Reverts b1b4ba501fdfaa, 90c9801c560b6, a3c84f78ca9c65a, ..
I haven't really traced it (yet), but on some servers, I'm observing
`ansible-playbook ... --tags=start` completing very slowly, waiting
to stop services. I can't reproduce this on all Matrix servers I manage.
I suspect that either the systemd version is to blame or that some
specific service is not responding well to some `docker kill/rm` command.
`ExecStop` seems to work great in all cases and it's what we've been
using for a very long time, so I'm reverting to that.
If they do, our next playbook runs would simply revert it
and report "changed" for that task.
There's no benefit to letting the bridge spew a new config file.
This does not apply to the mautrix whatsapp bridge, because that one
is written in Go (not Python) and takes different flags. There's no
equivalent flag there.
These are just defensive cleanup tasks that we run.
In the good case, there's nothing to kill or remove, so they trigger an
error like this:
> Error response from daemon: Cannot kill container: something: No such container: something
and:
> Error: No such container: something
People often ask us if this is a problem, so instead of always having to
answer with "no, this is to be expected", we'd rather eliminate it now
and make logs cleaner.
In the event that:
- a container is really stuck and needs cleanup using kill/rm
- and cleanup fails, and we fail to report it because of error
suppression (`2>/dev/null`)
.. we'd still get an error when launching ("container name already in use .."),
so it shouldn't be too hard to investigate.
The Docker 19.04 -> 20.10 upgrade contains the following change
in `/usr/lib/systemd/system/docker.service`:
```
-BindsTo=containerd.service
-After=network-online.target firewalld.service containerd.service
+After=network-online.target firewalld.service containerd.service multi-user.target
-Requires=docker.socket
+Requires=docker.socket containerd.service
Wants=network-online.target
```
The `multi-user.target` requirement in `After` seems to be in conflict
with our `WantedBy=multi-user.target` and `After=docker.service` /
`Requires=docker.service` definitions, causing the following error on
startup for all of our systemd services:
> Job matrix-synapse.service/start deleted to break ordering cycle starting with multi-user.target/start
A workaround which appears to work is to add `DefaultDependencies=no`
to all of our services.
Postgres setup like
matrix_mautrix_telegram_configuration_extension_yaml: |
appservice:
database: "postgres://XXX:XXX@matrix-postgres:5432/mxtg"
will fail without the right Dockernetwork
Depending on the distro, common commands like sleep and chown may either
be located in /bin or /usr/bin.
Systemd added path lookup to ExecStart in v239, allowing only the
command name to be put in unit files and not the full path as
historically required. At least Ubuntu 18.04 LTS is however still on
v237 so we should maintain portability for a while longer.
Bridges start matrix-synapse.service as a dependency, but
Synapse is sometimes slow to start, while bridges are quick to
hit it and die (if unavailable).
They'll auto-restart later, but .. this still breaks `--tags=start`,
which doesn't wait long enough for such a restart to happen.
This attempts to slow down bridge startup enough to ensure Synapse
is up and no failures happen at all.
The goal is to move each bridge into its own separate role.
This commit starts off the work on this with 2 bridges:
- mautrix-telegram
- mautrix-whatsapp
Each bridge's role (including these 2) is meant to:
- depend only on the matrix-base role
- integrate nicely with the matrix-synapse role (if available)
- integrate nicely with the matrix-nginx-proxy role (if available and if
required). mautrix-telegram bridge benefits from integrating with
it.
- not break if matrix-synapse or matrix-nginx-proxy are not used at all
This has been provoked by #174 (Github Issue).
As suggested in #63 (Github issue), splitting the
playbook's logic into multiple roles will be beneficial for
maintainability.
This patch realizes this split. Still, some components
affect others, so the roles are not really independent of one
another. For example:
- disabling mxisd (`matrix_mxisd_enabled: false`), causes Synapse
and riot-web to reconfigure themselves with other (public)
Identity servers.
- enabling matrix-corporal (`matrix_corporal_enabled: true`) affects
how reverse-proxying (by `matrix-nginx-proxy`) is done, in order to
put matrix-corporal's gateway server in front of Synapse
We may be able to move away from such dependencies in the future,
at the expense of a more complicated manual configuration, but
it's probably not worth sacrificing the convenience we have now.
As part of this work, the way we do "start components" has been
redone now to use a loop, as suggested in #65 (Github issue).
This should make restarting faster and more reliable.
Pretty much all variables live in their own `matrix_<whatever>`
prefix now and are grouped closer together in the default
variables file (`roles/matrix-server/defaults/main.yml`).
`--log-driver=none` is used for all Docker containers now.
All these containers are started through systemd anyway and get logged in journald,
so there's no need for Docker to be logging the same thing using the default `json-file` driver.
Doing that was growing `/var/lib/docker/containers/..` infinitely until service/container restart.
As a result of this, things like `docker logs matrix-synapse` won't work anymore.
`journalctl -u matrix-synapse` is how one can see the logs.