Guide: Validating GSLB (GTM) Failover to DR in RELIANOID

Overview #

This guide provides a structured approach to validate and troubleshoot GSLB (Global Server Load Balancing / GTM) configurations in RELIANOID environments, particularly when services are expected to fail over automatically from on-premises to Disaster Recovery (DR) sites.

It also includes best practices for application-based public IPs and internal GSLB services.

Validation Scope #

This guide applies to:

GSLB deployments with multiple sites (on-premises + DR)
Services exposed via public IPs
DNS-based failover using RELIANOID GSLB
Automatic failover scenarios based on health checks

Key Components to Validate #

Before troubleshooting failover behavior, verify the following:

GSLB Configuration #

GSLB service is properly configured with:
- Multiple backend sites (on-prem + DR)
- Correct resolution policies (priority, latency, etc.)
DNS zone and records are correctly defined

Health Checks #

Health checks are:
- Enabled for all backend services
- Properly targeting application endpoints (not just IP/port)
Expected response codes or content validation is configured

DNS Configuration #

TTL values are appropriately configured (low TTL recommended for failover)
Authoritative DNS is pointing to RELIANOID GSLB

Validating Failover from On-Prem to DR #

Step 1: Confirm Normal Operation (Primary Active) #

Query DNS resolution:
```
dig <your-domain>
```
Verify that:
- The resolved IP corresponds to the on-premises site
- Application is accessible and healthy

Step 2: Simulate Failure #

Trigger a failure condition on the primary site:

Stop backend services
Block health check endpoint
Disable farm or backend

Step 3: Validate Health Check Detection #

Confirm RELIANOID marks the primary site as DOWN
Check logs and monitoring to ensure:
- Health checks are failing as expected
- No false positives/negatives

Step 4: Validate DNS Failover #

Re-run DNS query:
```
dig <your-domain>
```
Expected result:
- IP should now resolve to the DR site

Note: DNS caching may delay propagation depending on TTL.

Step 5: Validate Application Availability #

Access the application using the resolved DR IP
Confirm:
- Application is fully functional
- No dependency issues (DB, APIs, etc.)

Common Issues and Troubleshooting #

Failover Not Triggered #

Health checks too permissive (e.g., TCP instead of HTTP validation)
Incorrect health check endpoint
Backend still partially responding

Fix: Use application-level checks (HTTP status, response body)

DNS Still Resolves to Primary #

TTL too high
Client-side DNS caching
Recursive DNS servers not refreshed

Fix:

Lower TTL (e.g., 30–60 seconds)
Flush local DNS cache
Test with external resolvers (dig @8.8.8.8)

DR Site Not Serving Traffic #

DR backend not properly configured
Missing dependencies (database, storage, authentication)
Firewall or routing issues

Fix: Validate full DR stack readiness, not just load balancer

Intermittent Failover (Flapping) #

Unstable health checks
Network latency or packet loss
Inconsistent backend responses

Fix:

Tune health check intervals and thresholds
Increase failure tolerance

Application-Based Public IP Considerations #

When using public IPs per site:

Ensure each site advertises its own public IP
GSLB should return the correct IP per site
Validate:
- NAT and firewall rules
- SSL certificates per endpoint
- Consistent application behavior across sites

Guidelines for Internal GSLB Services #

For internal-only services (private DNS / internal apps):

DNS Configuration #

Use internal DNS servers integrated with RELIANOID GSLB
Ensure clients resolve through the correct internal resolvers

Network Considerations #

Verify routing between sites (VPN/MPLS)
Ensure DR site is reachable from all client networks

Health Checks #

Use internal endpoints (private IPs)
Validate application-layer responses

Split-Brain Avoidance #

Ensure proper synchronization between GSLB nodes
Avoid scenarios where both sites are considered active incorrectly

Best Practices #

Use low TTL values for faster failover
Always use application-level health checks
Regularly perform failover drills
Monitor DNS resolution globally
Ensure configuration parity between primary and DR

Validation Checklist #

[ ] GSLB service configured with all sites
[ ] Health checks validated and reliable
[ ] TTL configured appropriately
[ ] DR environment fully operational
[ ] DNS failover tested and confirmed
[ ] Application tested post-failover
[ ] Internal services validated (if applicable)

Summary #

Proper validation of RELIANOID GSLB ensures seamless automatic failover from on-premises to DR environments, minimizing downtime and maintaining service continuity.

A successful deployment requires coordination between:

DNS configuration
Health checks
Application readiness
Network design

Guide: Validating GSLB (GTM) Failover to DR in RELIANOID

Overview #

Validation Scope #

Key Components to Validate #

GSLB Configuration #

Health Checks #

DNS Configuration #

Validating Failover from On-Prem to DR #

Step 1: Confirm Normal Operation (Primary Active) #

Step 2: Simulate Failure #

Step 3: Validate Health Check Detection #

Step 4: Validate DNS Failover #

Step 5: Validate Application Availability #

Common Issues and Troubleshooting #

Failover Not Triggered #

DNS Still Resolves to Primary #

DR Site Not Serving Traffic #

Intermittent Failover (Flapping) #

Application-Based Public IP Considerations #

Guidelines for Internal GSLB Services #

DNS Configuration #

Network Considerations #

Health Checks #

Split-Brain Avoidance #

Best Practices #

Validation Checklist #

Summary #

📄 Download this document in PDF format #