Idle HTTP connections in Scala on Kubernetes

22 Jan 2022

Kubernetes

Both the HTTP client (Scala) and HTTP server (Python) are running in Kubernetes. Depending on the input, the service may take even more than 10 minutes to produce a response. Message queues would be a better solution instead of long HTTP connections but that is not an option at this point.

Everything works for short requests of a couple of minutes. Once a request takes 5-6 or more minutes, strange things start to happen:

The server returns a response after 6 minutes
The client does not receive it and keeps waiting until read timeout occurs even when it is 15 minutes, 30 minutes, or 60 minutes.

So it looks like a timeout issue but where? And why the client does not detect the broken connection? Let’s investigate:

The service can be called from my laptop so it’s not a server issue
The service can be called even from the client’s container using curl so it is not a problem with connectivity between containers
There should be something that the Scala code and curl do differently

Running curl using my old friend strace reveals some socket configuration curl does:

socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(3, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(3, SOL_TCP, TCP_KEEPIDLE, [60], 4) = 0
setsockopt(3, SOL_TCP, TCP_KEEPINTVL, [60], 4) = 0

What it does is set up TCP keep-alive to exchange packets at least once every 60 seconds. Why doesn’t Scala code do the same? Because that was added only to Java 11 and backported to Java 8 later. I haven’t seen much code using them and hours of Googling did not find a way how to set these values for the Scala HTTP library being used. So the client is using the operating system defaults. How bad are the default settings?

# more /proc/sys/net/ipv4/tcp_keepalive_* | cat
::::::::::::::
/proc/sys/net/ipv4/tcp_keepalive_intvl
::::::::::::::
75
::::::::::::::
/proc/sys/net/ipv4/tcp_keepalive_probes
::::::::::::::
9
::::::::::::::
/proc/sys/net/ipv4/tcp_keepalive_time
::::::::::::::
7200

tcp_keepalive_time value means that the first keep-alive packet will be sent after 2 hours. But can I change those Linux default values? I could do that for a VM or Linux running on bare metal easily. Kubernetes allows changing some kernel parameters but unfortunately the tcp_* keep-alive settings are considered unsafe. I would need some administrator superpowers for writing YAML and configuring kubelet on AWS EKS.

And then I found out about libkeepalive.so. I don’t have to dig through Java classes trying to inject Socket options after it is created but before it is used. Or to perform Kubernetes black mass and pray to AWS gods. Nobody can stop me from adding new files to the container and setting additional environment variables:

    env:
    - name: LD_PRELOAD
      value: /app/libkeepalive.so
    - name: KEEPIDLE
      value: "60"
    - name: KEEPINTVL
      value: "60"