Understanding and analyzing proxy server timeouts

Table of Contents

Intro
Backends side timeouts
Client side timeouts
Fixing backends side timeouts
Fixing client side timeouts

Intro #

A proxy service is software designed to transparently manage connections for clients to one or more services, delivering advanced data or connection handling at the application level (layer 7 in the OSI model). To achieve this, the proxy service establishes one connection with the client and another with the server, aiming to ensure seamless connectivity between them.

When implementing Load Balancing through a proxy service (effectively functioning as a reverse proxy), it is essential to customize timeouts to facilitate smooth connections. Default timeout values may not suffice, depending on the characteristics of clients or application services. Any timeout-related errors will be logged in the system logs at /var/log/syslog, making it crucial to review this file for potential issues.

This article provides insights into analyzing and identifying common timeout problems with proxy servers. The key information to investigate is whether the timeouts occur on the backend side or the client side. Once this distinction is made, appropriate timeout adjustments can be applied.

Backends side timeouts #

If timeouts are occurring on the backend side, corresponding messages are displayed as follows:

Aug 21 09:23:06 noid-ee-01 pound: noid-proxy-farm-01, service noid-service-01, backend 10.100.200.10:443, (7ff830b85700) error copy server cont: Connection timed out
Aug 21 09:23:06 noid-ee-01 pound: noid-proxy-farm-01, service noid-service-01, backend 10.100.200.11:443, (7ff832d8b700) error copy server cont: Connection timed out
Aug 21 09:23:16 noid-ee-01 pound: noid-proxy-farm-01, service noid-service-01, backend 10.100.200.10:443, (7ff799926700) error copy server cont: Connection timed out
Aug 21 09:23:18 noid-ee-01 pound: noid-proxy-farm-01, service noid-service-01, backend 10.100.200.10:443, (7ff830bc6700) error copy server cont: Broken pipe
Aug 21 09:23:19 noid-ee-01 pound: noid-proxy-farm-01, service noid-service-01, backend 10.100.200.10:444, (7f15f5a8c700) connect_nb: poll timed out
Aug 21 09:23:24 noid-ee-01 pound: noid-proxy-farm-01, service noid-service-01, backend 10.100.200.11:443, (7ff79a9a7700) error copy server cont: Connection timed out
Aug 21 09:23:24 noid-ee-01 pound: noid-proxy-farm-01, service noid-service-01, backend 10.100.200.10:443, (7ff79a28b700) error copy server cont: Connection timed out

These backend timeout errors specify the relevant farm, service, and backend associated with the error. This information clearly identifies the backend or backends linked to the problem. If multiple farms, services, or backends are implicated in the timeout issue, additional information may need to be gathered to investigate any potential networking issues.

Enabling farm logs can reveal instances where a specific backend initially responds swiftly, but suddenly encounters a timeout issue, as illustrated in the log excerpt below:

Jan 25 19:57:04 noid-ee-01 pound: noid-proxy-farm-01, my.service.com 185.106.182.130 - - [25/Jan/2024:19:57:04 +0000] "GET /myserv/ HTTP/1.1" 200 9 "" "Mozilla/3.0 (compatible; ...)" (noid-service-01 -> 10.100.200.10:443) 0.039 sec
Jan 25 19:57:04 noid-ee-01 pound: noid-proxy-farm-01, my.service.com 88.111.111.111 - - [25/Jan/2024:19:57:04 +0000] "GET /myserv/ HTTP/1.1" 200 9 "" "Mozilla/3.0 (compatible; ...)" (noid-service-01 -> 10.100.200.10:443) 0.035 sec

Jan 25 19:57:04 noid-ee-01 pound: noid-proxy-farm-01, service noid-service-01, backend 10.100.200.10:443, (7fcd8eb0f700) connect_nb: poll timed out
Jan 25 19:57:04 noid-ee-01 pound: noid-proxy-farm-01, service noid-service-01, backend 10.100.200.10:443, (7fcd8eb0f700) backend 10.100.200.10:443 connect: Connection timed out
Jan 25 19:57:04 noid-ee-01 pound: noid-proxy-farm-01, (7fcd8eb0f700) BackEnd 10.100.200.10:443 dead (killed) in farm: 'noid-proxy-farm-01', service: 'ocp-ocuco-com'
Jan 25 19:57:04 noid-ee-01 pound: noid-proxy-farm-01, service noid-service-01, backend 10.100.200.10:443, (7fcd8eb0f700) BackEnd dead (killed)

This behavior may indicate that the backend has reached its connection limit, preventing additional connections. Alternatively, it could suggest that the backend is not releasing connections quickly enough, leading to a bottleneck. To address this issue, it is recommended to monitor the backend and implement optimizations such as allowing more connections or scaling the service by adding additional backends.

If only specific backends are experiencing timeout problems within the same service, it implies that those particular backends may have issues related to slow application delivery or networking problems. Solutions or mitigations for these errors are outlined below.

Client side timeouts #

Conversely, client timeouts manifest in the syslog in the following format:

Aug 18 07:31:38 noid-ee-01 pound: noid-proxy-farm-01, (7f8862187700) error read from 12.91.1.78: Connection timed out
Aug 18 07:31:43 noid-ee-01 pound: noid-proxy-farm-01, (7f8863c71700) error read from 12.2.1.105: Connection timed out
Aug 18 07:32:03 noid-ee-01 pound: noid-proxy-farm-01, (7f886275e700) error read from 12.41.1.58: Connection timed out
Aug 18 07:32:07 noid-ee-01 pound: noid-proxy-farm-01, (7f8880d84700) error read from 12.88.1.67: Connection timed out
Aug 18 07:32:16 noid-ee-01 pound: noid-proxy-farm-01, (7f8880933700) error read from 12.3.1.158: Connection timed out

If the logs lack farm or service information, it indicates that the client’s request did not reach the proxy correctly. The client is taking an extended time to execute the HTTP request, rendering the proxy unaware of the requesting service. The array of IP addresses can help discern whether the issue pertains to an internal network, external clients, or those arriving through a specific firewall.

Moreover, for external clients, it’s crucial to ascertain their legitimacy. Utilizing AbuseIP services can assist in gathering such information.

Additionally, to address client timeout issues, it is crucial to verify that the proxy is not prematurely terminating connections. Ensure that the sum of the connection timeout and backend timeouts is less than the client timeout to avoid any premature disconnection by the proxy.

Refer below for insights on addressing or mitigating these errors.

Fixing backends side timeouts #

Network layer verification #

To begin with, it is essential to confirm the stability of the networking layer and ensure the absence of duplicated packets, lost packets, or significant fluctuations in latency. Follow these steps to perform network layer verification:

1. Execute a ping from the load balancer to the backend and allow it to run for several minutes.

root@noid-ee-01:~# ping 10.100.200.10

2. Observe the ping responses during the execution:

PING 10.100.200.10 (10.100.200.10) 56(84) bytes of data.
64 bytes from 10.100.200.10: icmp_seq=1 ttl=64 time=0.395 ms
64 bytes from 10.100.200.10: icmp_seq=2 ttl=64 time=0.626 ms
64 bytes from 10.100.200.10: icmp_seq=3 ttl=64 time=0.178 ms
[...]
64 bytes from 10.100.200.10: icmp_seq=21 ttl=64 time=0.502 ms
64 bytes from 10.100.200.10: icmp_seq=22 ttl=64 time=0.638 ms
64 bytes from 10.100.200.10: icmp_seq=23 ttl=64 time=0.573 ms
^C
--- 10.100.200.10 ping statistics ---
23 packets transmitted, 23 received, 0% packet loss, time 140ms
rtt min/avg/max/mdev = 0.178/0.539/0.854/0.141 ms

This response indicates that the network is stable, with no lost packets or latency issues. Ensure the absence of any anomalies during the ping test to confirm the reliability of the networking layer.

Additionally, by executing a tcpdump when the issue is replicated, you can analyze the network traffic to pinpoint the specific moment when communication experiences delays or when certain packets are missing. Utilize the following command:

root@noid-ee-01:~# tcpdump -i any tcp port PORT and host BACKENDIP -w /tmp/capture.pcap

This command will generate a file named /tmp/capture.pcap, which can be analyzed using Wireshark. Be cautious, as this file may grow rapidly if capturing a substantial amount of traffic.

Proxy timeouts tuning #

Adjusting various timeouts in the Farm Advanced Configuration allows us to tailor the proxy behavior to the needs of our application servers, especially when they require additional time for each request or when network performance is sluggish. Consider the following recommendations:

Backend Connection Timeout: Set the maximum time for a connect() operation against the selected backend. If messages like “connect_nb: poll timed out” are detected, consider increasing this value from the default 20 seconds to 30 or 40 seconds. Gradually raise the value based on observed results, keeping in mind that the timeout might be addressing an underlying issue elsewhere.

Backend Response Timeout: Adjust this value if the message “error copy server cont: Connection timed out” is identified. Similarly, incrementally raise this value until a decrease in such messages is observed. However, be cautious not to excessively increase this value, as it may mask underlying problems on the backend side. The default value is 45 seconds, so consider raising it to 60 seconds or more, monitoring for the absence of errors.

Frequency to Check Resurrected Backends: In cases where higher timeouts are necessary due to networking or application server issues causing intermittent HTTP 503 errors (indicating no service backend available), consider decreasing this value from 10 seconds to 5. This adjustment helps mitigate false positives in marking backends as down caused by timeouts. Analyze whether timeouts are within normal limits, as the proxy load balancer can alleviate certain issues in this context.

Please conduct a thorough analysis to determine the normalcy of timeouts. The proxy load balancer has the capability to address and mitigate certain timeout-related issues.

For more detailed information, refer to the following article:

https://www.relianoid.com/resources/knowledge-base/lslb/enterprise-edition-v6-2-administration-guide-lslb-farms-update-http-profile/

Configure health checks #

Farm Guardian serves the purpose of activating or deactivating backends based on their availability. In this scenario, leveraging Farm Guardian allows us to ascertain the actual availability of backends. This process operates independently and parallelly to the proxy, providing a means to verify whether backend issues genuinely contribute to a bottleneck on the backend side. Additionally, by examining backend statistics when a backend is marked as down, we can determine the number of connections handled by that server, enabling us to identify the connection limits of each backend.

Configuring a FarmGuardian check for TCP allows us to identify any issues with the handshake with the backends, which could indicate a bottleneck at the system or web server layer. On the other hand, configuring a FarmGuardian check for HTTP helps identify problems at the application layer with the backends, pointing to potential bottlenecks in the application or database layer.

Fixing client side timeouts #

Network layer verification #

To validate the stability of the network, execute a ping test from another server or machine to the IP address of the Virtual IP configured for the load balancing farm. This ensures an external perspective and helps confirm the network’s reliability.

Legitimate clients verification #

Employ secure tools to identify whether timeout issues are associated with legitimate clients, robots, or potential attackers. If the timeouts are linked to unauthorized users, implement IPDS blacklists and/or DoS protections to safeguard your services and ensure delivery only to valid and genuine users.

Proxy timeouts tuning #

Furthermore, consider adjusting the proxy timeouts to accommodate the nature of clients connecting to your services via the load balancer. If, for instance, mobile-based clients, firewall issues, or slow networks contribute to extended response times, fine-tune the Client Request Timeout option in the Advanced Settings of your LSLB Farm. The default value is 30 seconds; you may increase it to 60 seconds or more based on your specific requirements. The client timeout value should exceed the sum of the connection timeout and backend response timeout.

For more detailed information, please refer to the following article:

https://www.relianoid.com/resources/knowledge-base/lslb/enterprise-edition-v6-2-administration-guide-lslb-farms-update-http-profile/