Clients unable to reconnect after server container restart

pschichtel · June 29, 2020, 12:54am

Hi everyone!

I’m running a small TS3 server in a docker swarm with an external mysql server.

My problem is: Everytime I restart the server (e.g. using docker service update --force teamspeak_teamspeak), every client that was connected is unable to auto-reconnect until they restart the client. I’m not sure if this is a server or a client issue. The client resolves the correct address and port, but fails to connect. The server isn’t even logging anything.

Is anyone else hosting TS3 in a docker swarm with more success then I am? I’m happy for any hint.

The server configuration:

licensepath=
query_protocols=raw
query_timeout=300
query_ssh_rsa_host_key=host_key
query_ip_whitelist=/whitelist.txt
query_ip_blacklist=query_ip_blacklist.txt
dbplugin=ts3db_mariadb
dbpluginparameter=/var/run/ts3server/ts3db.ini
dbsqlpath=/opt/ts3server/sql/
dbsqlcreatepath=create_mariadb
dbconnections=10
dbclientkeepdays=30
logpath=/var/ts3server/logs
logquerycommands=0
logappend=0
serverquerydocs_path=/opt/ts3server/serverquerydocs/
query_port=10011
filetransfer_port=30033
default_voice_port=9987
query_ssh_port=10022

The stack I deploy in swarm:

version: '3.7'

services:
    teamspeak:
        image: "${IMAGE_NAME}"
        deploy:
            endpoint_mode: vip
            mode: replicated
            replicas: 1
            restart_policy:
                condition: any
                delay: 10s
            update_config:
                parallelism: 1
                delay: 10s
                monitor: 10s
                failure_action: rollback
                order: stop-first
            rollback_config:
                parallelism: 1
                delay: 10s
                monitor: 10s
                failure_action: pause
                order: stop-first
            resources:
                limits:
                    cpus: '1'
                    memory: 500M
        stop_grace_period: '5m'
        stop_signal: SIGTERM
        ports:
          - '9987:9987/udp'
          - '9988:9988/udp'
          - '9989:9989/udp'
          - '9990:9990/udp'
          - '9991:9991/udp'
          - '9992:9992/udp'
          - '9993:9993/udp'
          - '9994:9994/udp'
          - '10011:10011'
          - '30033:30033'
        environment:
            TS3SERVER_DB_PLUGIN: 'ts3db_mariadb'
            TS3SERVER_DB_SQLCREATEPATH: 'create_mariadb'
            TS3SERVER_DB_HOST: '...'
            TS3SERVER_DB_USER: '...'
            TS3SERVER_DB_PASSWORD: '...'
            TS3SERVER_DB_NAME: '...'
            TS3SERVER_LICENSE: 'accept'
            TS3SERVER_IP_WHITELIST: '/whitelist.txt'
        volumes:
          - 'ts-files:/var/ts3server/files'
          - '/etc/localtime:/etc/localtime'
          - '/etc/timezone:/etc/timezone'
        configs:
          - source: whitelist
            target: /whitelist.txt
        networks:
          - teamspeak

volumes:
    ts-files:

configs:
    whitelist:
        file: ./whitelist.txt

networks:
    teamspeak:
        name: teamspeak
        attachable: true

The swarm is running without userland proxy and the public IP is NAT’ed to the swarm host.

pschichtel · June 29, 2020, 1:06am

I also just tried services.teamspeak.deploy.update_config.order: start-first, but that didn’t help.

Luuubb · September 20, 2020, 12:44pm

Yes, but I have the same problem.

I suspect it may be related to the ingress network, because I can’t reproduce it with a non-docker installation (extracted tar) or published in host mode instead of ingress.

pschichtel · September 20, 2020, 9:36pm

Good point. I tried replacing '9987:9987/udp' with the following, but it didin’t help either:

target: 9987
published: 9987
protocol: udp
mode: host

ÜberDaniel · September 25, 2020, 3:18pm

I Have the same Problem on Windows. If i restart my Windows server bcs updates or so, i noticed the client wont reconnect. As you said, restarting the client is the only way to fix this. So i think its a client sided problem.

pschichtel · September 28, 2020, 11:29pm

Has someone tried bisecting the client version to see which version introduces the issue?

pschichtel · September 30, 2020, 10:35pm

I just tried it with version 3.3.0 (3.2.x is incompatible with the current server) and it still happens. What I noticed though: The problem only affects servers I was connected to while the restart happened. I can connect to another instance on the same TS server without any issues while still being unable to reconnect.

pschichtel · November 15, 2020, 1:27am

Just confirmed this in the TS5 beta32 with a friend.

Support linked me to this thread: TeamSpeak - Official TeamSpeak Community Forum

it seems to be the same issue, the suggestions seem plausible. Still wondering why docker-swarm makes this happen every time.

pschichtel · February 22, 2021, 9:22am

@dante696 I’ve seen that you responded to the previously mentioned forum.teamspeak.com thread. On September 9th 2016 you said, that you’ll be opening a ticket for the developers, is there any progress / update on that? Is there any more input needed?

Luuubb · April 4, 2021, 1:38am

You happen to use additional addresses on your interface? For me it seems it only works with the primary address (+exposing via host instead of ingress).

I don’t know why.

pschichtel · April 4, 2021, 1:51am

Can you elaborate on that?

derfl0 · June 2, 2023, 6:41am

Hey @pschichtel

I’ve resolved the exact same problem in my swarm by adding

    deploy:
      endpoint_mode: vip

to my config.

pschichtel · June 2, 2023, 7:49am

My original config also had endpoint mode vip. Do you use docker’s user land proxy?
I think it works with endpoint mode vip (Btw it’s the default) and with the user land proxy, however the problem with that is, that server doesn’t see the real client IPs, so you won’t be able to use ip bans.

I recently switched to kubernetes (with metallb und calico underlay networking and externalTrafficPolicy set to Local), that works without any issues.

derfl0 · July 22, 2023, 6:56am

Yes I am using the docker swarm ingress. Maybe the healthcheck I added fixed the problem.

    healthcheck:
      test: nc -zu localhost 9987
      interval: 3s
      retries: 3
      timeout: 3s

i also added

    deploy:
      update_config:
        order: start-first

to make the cutover as minimal as possible.
All I can tell is that in this combination switching between servers and keeping the clients is working.

Kubernetes feels like an overkill for my home setup. Not having IP bans was never a problem. As reconnection your ISP gives you a new IP they are worthless anyway. In terms of Teamspeak I always recommend a whiteliststyle group setup to force guests to stay in the lobby