I have 3 pods of ejabberd running on GCP, the configuration file regarding which database should it use looks like this:
{%- if env["DEFAULT_DB"] is defined %}
default_db: {{ env["DEFAULT_DB"] }}
{%- endif %}
The problem is that I connect to the pods and only one of the pods is returning correct result when running get_user_rooms
endpoint. All the others ones return empty array.
I tried to reload the config, restart the pod, delete the pod, but in all cases I see that the configuration was loaded successfully and no errors during the startup but for some reason it still produces incorrect result.
2025-01-31 14:28:07.432 GET
2025-01-31 10:28:07.431631+00:00 [info] Loading configuration from /home/ejabberd/conf/ejabberd.yml
2025-01-31 14:28:07.437 GET
2025-01-31 10:28:07.435907+00:00 [warning] Option 'commands_admin_access' is deprecated and has no effect anymore. Use option 'api_permissions' instead.
2025-01-31 14:28:07.613 GET
2025-01-31 10:28:07.612765+00:00 [info] Configuration loaded successfully
...
2025-01-31 14:28:11.378 GET
[entrypoint_script] ejabberd did join cluster successfully
I have 3 pods of ejabberd running on GCP, the configuration file regarding which database should it use looks like this:
{%- if env["DEFAULT_DB"] is defined %}
default_db: {{ env["DEFAULT_DB"] }}
{%- endif %}
The problem is that I connect to the pods and only one of the pods is returning correct result when running get_user_rooms
endpoint. All the others ones return empty array.
I tried to reload the config, restart the pod, delete the pod, but in all cases I see that the configuration was loaded successfully and no errors during the startup but for some reason it still produces incorrect result.
2025-01-31 14:28:07.432 GET
2025-01-31 10:28:07.431631+00:00 [info] Loading configuration from /home/ejabberd/conf/ejabberd.yml
2025-01-31 14:28:07.437 GET
2025-01-31 10:28:07.435907+00:00 [warning] Option 'commands_admin_access' is deprecated and has no effect anymore. Use option 'api_permissions' instead.
2025-01-31 14:28:07.613 GET
2025-01-31 10:28:07.612765+00:00 [info] Configuration loaded successfully
...
2025-01-31 14:28:11.378 GET
[entrypoint_script] ejabberd did join cluster successfully
I'll give you several ideas to investigate. Hopefully one of them will lead you to the problem.
Go to each different pod, get what configuration options each one is really using, and compare ALL the configuration files. Maybe they aren't really using the same database:
$ ejabberdctl dump_config /tmp/aaa.yml
$ cat /tmp/aaa.yml
Is there any difference between the node that shows the rooms in get_user_rooms ?
Register an account in the database, then check in the three nodes that they really get that account:
$ ejabberdctl registered_users localhost
admin
An account is registered in the cluster, and the user can login using those credentials in any node of the cluster. When the client logins to that account in a node, the session exists only in that node.
Similarly, the configuration of the rooms is stored in the cluster, and a room can be created in any node, and will be accessible transparently from all the other nodes.
The muc room in fact is alive in one specific node, and the other nodes will just point to that room in that node:
Rooms are distributed at creation time on all available MUC module instances. The multi-user chat module is clustered but the rooms themselves are not clustered nor fault-tolerant: if the node managing a set of rooms goes down, the rooms disappear and they will be recreated on an available node on first connection attempt.
So, maybe the ejabberd nodes connect correctly to the same database, but get_user_rooms doesn't show correct values, or the problem is only in the MUC service?