Overview #
This guide provides a structured approach to validate and troubleshoot GSLB (Global Server Load Balancing / GTM) configurations in RELIANOID environments, particularly when services are expected to fail over automatically from on-premises to Disaster Recovery (DR) sites.
It also includes best practices for application-based public IPs and internal GSLB services.
Validation Scope #
This guide applies to:
- GSLB deployments with multiple sites (on-premises + DR)
- Services exposed via public IPs
- DNS-based failover using RELIANOID GSLB
- Automatic failover scenarios based on health checks
Key Components to Validate #
Before troubleshooting failover behavior, verify the following:
GSLB Configuration #
- GSLB service is properly configured with:
- Multiple backend sites (on-prem + DR)
- Correct resolution policies (priority, latency, etc.)
- DNS zone and records are correctly defined
Health Checks #
- Health checks are:
- Enabled for all backend services
- Properly targeting application endpoints (not just IP/port)
- Expected response codes or content validation is configured
DNS Configuration #
- TTL values are appropriately configured (low TTL recommended for failover)
- Authoritative DNS is pointing to RELIANOID GSLB
Validating Failover from On-Prem to DR #
Step 1: Confirm Normal Operation (Primary Active) #
- Query DNS resolution:
dig <your-domain>
- Verify that:
- The resolved IP corresponds to the on-premises site
- Application is accessible and healthy
Step 2: Simulate Failure #
Trigger a failure condition on the primary site:
- Stop backend services
- Block health check endpoint
- Disable farm or backend
Step 3: Validate Health Check Detection #
- Confirm RELIANOID marks the primary site as DOWN
- Check logs and monitoring to ensure:
- Health checks are failing as expected
- No false positives/negatives
Step 4: Validate DNS Failover #
- Re-run DNS query:
dig <your-domain>
- Expected result:
- IP should now resolve to the DR site
Note: DNS caching may delay propagation depending on TTL.
Step 5: Validate Application Availability #
- Access the application using the resolved DR IP
- Confirm:
- Application is fully functional
- No dependency issues (DB, APIs, etc.)
Common Issues and Troubleshooting #
Failover Not Triggered #
- Health checks too permissive (e.g., TCP instead of HTTP validation)
- Incorrect health check endpoint
- Backend still partially responding
Fix: Use application-level checks (HTTP status, response body)
DNS Still Resolves to Primary #
- TTL too high
- Client-side DNS caching
- Recursive DNS servers not refreshed
Fix:
- Lower TTL (e.g., 30–60 seconds)
- Flush local DNS cache
- Test with external resolvers (dig @8.8.8.8)
DR Site Not Serving Traffic #
- DR backend not properly configured
- Missing dependencies (database, storage, authentication)
- Firewall or routing issues
Fix: Validate full DR stack readiness, not just load balancer
Intermittent Failover (Flapping) #
- Unstable health checks
- Network latency or packet loss
- Inconsistent backend responses
Fix:
- Tune health check intervals and thresholds
- Increase failure tolerance
Application-Based Public IP Considerations #
When using public IPs per site:
- Ensure each site advertises its own public IP
- GSLB should return the correct IP per site
- Validate:
- NAT and firewall rules
- SSL certificates per endpoint
- Consistent application behavior across sites
Guidelines for Internal GSLB Services #
For internal-only services (private DNS / internal apps):
DNS Configuration #
- Use internal DNS servers integrated with RELIANOID GSLB
- Ensure clients resolve through the correct internal resolvers
Network Considerations #
- Verify routing between sites (VPN/MPLS)
- Ensure DR site is reachable from all client networks
Health Checks #
- Use internal endpoints (private IPs)
- Validate application-layer responses
Split-Brain Avoidance #
- Ensure proper synchronization between GSLB nodes
- Avoid scenarios where both sites are considered active incorrectly
Best Practices #
- Use low TTL values for faster failover
- Always use application-level health checks
- Regularly perform failover drills
- Monitor DNS resolution globally
- Ensure configuration parity between primary and DR
Validation Checklist #
[ ] GSLB service configured with all sites
[ ] Health checks validated and reliable
[ ] TTL configured appropriately
[ ] DR environment fully operational
[ ] DNS failover tested and confirmed
[ ] Application tested post-failover
[ ] Internal services validated (if applicable)
Summary #
Proper validation of RELIANOID GSLB ensures seamless automatic failover from on-premises to DR environments, minimizing downtime and maintaining service continuity.
A successful deployment requires coordination between:
- DNS configuration
- Health checks
- Application readiness
- Network design