Guide: Validating GSLB (GTM) Failover to DR in RELIANOID

View Categories

Guide: Validating GSLB (GTM) Failover to DR in RELIANOID

2 min read

Overview #

This guide provides a structured approach to validate and troubleshoot GSLB (Global Server Load Balancing / GTM) configurations in RELIANOID environments, particularly when services are expected to fail over automatically from on-premises to Disaster Recovery (DR) sites.

It also includes best practices for application-based public IPs and internal GSLB services.

Validation Scope #

This guide applies to:

  • GSLB deployments with multiple sites (on-premises + DR)
  • Services exposed via public IPs
  • DNS-based failover using RELIANOID GSLB
  • Automatic failover scenarios based on health checks

Key Components to Validate #

Before troubleshooting failover behavior, verify the following:

GSLB Configuration #

  • GSLB service is properly configured with:
    • Multiple backend sites (on-prem + DR)
    • Correct resolution policies (priority, latency, etc.)
  • DNS zone and records are correctly defined

Health Checks #

  • Health checks are:
    • Enabled for all backend services
    • Properly targeting application endpoints (not just IP/port)
  • Expected response codes or content validation is configured

DNS Configuration #

  • TTL values are appropriately configured (low TTL recommended for failover)
  • Authoritative DNS is pointing to RELIANOID GSLB

Validating Failover from On-Prem to DR #

Step 1: Confirm Normal Operation (Primary Active) #

  • Query DNS resolution:
    dig <your-domain>
  • Verify that:
    • The resolved IP corresponds to the on-premises site
    • Application is accessible and healthy

Step 2: Simulate Failure #

Trigger a failure condition on the primary site:

  • Stop backend services
  • Block health check endpoint
  • Disable farm or backend

Step 3: Validate Health Check Detection #

  • Confirm RELIANOID marks the primary site as DOWN
  • Check logs and monitoring to ensure:
    • Health checks are failing as expected
    • No false positives/negatives

Step 4: Validate DNS Failover #

  • Re-run DNS query:
    dig <your-domain>
  • Expected result:
    • IP should now resolve to the DR site

Note: DNS caching may delay propagation depending on TTL.

Step 5: Validate Application Availability #

  • Access the application using the resolved DR IP
  • Confirm:
    • Application is fully functional
    • No dependency issues (DB, APIs, etc.)

Common Issues and Troubleshooting #

Failover Not Triggered #

  • Health checks too permissive (e.g., TCP instead of HTTP validation)
  • Incorrect health check endpoint
  • Backend still partially responding

Fix: Use application-level checks (HTTP status, response body)

DNS Still Resolves to Primary #

  • TTL too high
  • Client-side DNS caching
  • Recursive DNS servers not refreshed

Fix:

  • Lower TTL (e.g., 30–60 seconds)
  • Flush local DNS cache
  • Test with external resolvers (dig @8.8.8.8)

DR Site Not Serving Traffic #

  • DR backend not properly configured
  • Missing dependencies (database, storage, authentication)
  • Firewall or routing issues

Fix: Validate full DR stack readiness, not just load balancer

Intermittent Failover (Flapping) #

  • Unstable health checks
  • Network latency or packet loss
  • Inconsistent backend responses

Fix:

  • Tune health check intervals and thresholds
  • Increase failure tolerance

Application-Based Public IP Considerations #

When using public IPs per site:

  • Ensure each site advertises its own public IP
  • GSLB should return the correct IP per site
  • Validate:
    • NAT and firewall rules
    • SSL certificates per endpoint
    • Consistent application behavior across sites

Guidelines for Internal GSLB Services #

For internal-only services (private DNS / internal apps):

DNS Configuration #

  • Use internal DNS servers integrated with RELIANOID GSLB
  • Ensure clients resolve through the correct internal resolvers

Network Considerations #

  • Verify routing between sites (VPN/MPLS)
  • Ensure DR site is reachable from all client networks

Health Checks #

  • Use internal endpoints (private IPs)
  • Validate application-layer responses

Split-Brain Avoidance #

  • Ensure proper synchronization between GSLB nodes
  • Avoid scenarios where both sites are considered active incorrectly

Best Practices #

  • Use low TTL values for faster failover
  • Always use application-level health checks
  • Regularly perform failover drills
  • Monitor DNS resolution globally
  • Ensure configuration parity between primary and DR

Validation Checklist #

[ ] GSLB service configured with all sites
[ ] Health checks validated and reliable
[ ] TTL configured appropriately
[ ] DR environment fully operational
[ ] DNS failover tested and confirmed
[ ] Application tested post-failover
[ ] Internal services validated (if applicable)

Summary #

Proper validation of RELIANOID GSLB ensures seamless automatic failover from on-premises to DR environments, minimizing downtime and maintaining service continuity.

A successful deployment requires coordination between:

  • DNS configuration
  • Health checks
  • Application readiness
  • Network design

📄 Download this document in PDF format #

    EMAIL: *

    Powered by BetterDocs