DNS failures are deceptive. Everything looks broken - and most issues are straightforward once you know where to look. Every issue here is something I’ve run into personally.
Misconfigured Records
The server is working - the data is wrong.
Symptoms
- Wrong Destination: Resolves, but to the wrong IP.
- Partial resolution: One record works, another doesn’t.
Common Causes
- Typos: A single digit wrong in an A record.
- Missing Trailing Dots: BIND appends the zone name again. This is one of the most common “everything looks right” mistakes in BIND.
- CNAME Loops: A CNAME pointing to a record that points back to the original CNAME.
Solutions
Query the authoritative server directly:
|
|
Serial Not Updated
This is a synchronization failure between primary and secondary servers.
Symptoms
- Inconsistent Answers: Some users (hitting the Primary) see the new IP, while others (hitting the secondary) see the old IP.
- “Ghost” Data: Changes exist on the primary but not the secondary.
Common Causes
- Stale Serial Number: You updated a record in the zone file but forgot to increment the Serial number in the SOA record (I do this all the time.)
- Logic: Secondary servers only pull updates when the serial increases. If the number doesn’t go up, nothing happens.
Solutions
Increment Serial: Increment the serial. Every time. No exceptions.
Force Transfer: Use the BIND control utility to force a refresh: rndc retransfer <zone>.
Log Analysis: Check the secondary server’s logs for “zone transfer failed” errors to ensure it isn’t a routing or firewall-related issue.
Propagation Delays
Propagation is cache expiration across resolvers.
Symptoms
- Inconsistent results: Different answers depending on where you query.
Common Causes
- High TTL (Time to Live): TTL is still active.
Solutions
Lower The TTL: Lower TTL before making changes, not after.
Caching Issues
Caching speeds things up - until it doesn’t.
Symptoms
- Local Stale Data: You can confirm that the record is correct on the server, but a specific machine or browser still sees the old IP.
- “Everyone but Me”: Only one device is wrong - everything else resolves correctly. If one device is wrong, it’s almost never DNS itself.
Common Causes
- OS Caching: Windows and macOS maintain a local DNS cache.
- Browser Caching: Chrome and Firefox often cache DNS internally to speed up page loads.
Solutions
Flush the OS cache:
- Windows: ipconfig
/flushdns - macOS: sudo killall
-HUP mDNSResponder
Clear Browser Cache: Use “Incognito Mode” or clear the browser’s internal DNS cache
Slow Queries
Slow queries are harder to diagnose because everything eventually works.
Symptoms
- Delayed Resolution: The browser status bar says “Looking up host…” for 2-5 seconds before the page suddenly loads.
- Timeouts: Some requests fail entirely with a “DNS Timeout” error.
Common Causes
- Dead Forwarders: Server waits for timeout before trying the next. Timeouts feel like slowness, but they’re usually failure + retry.
- Recursive Loops: Two DNS servers are configured to forward to each other, creating a loop.
- Poor Server Resources: The DNS server is running out of RAM or CPU, causing delays in processing requests.
Solutions
- Optimize Forwarders: Ensure forwarders are fast and reliable, i.e. Quad9 - 9.9.9.9
- Check Timeout Settings: Adjust the timeout and retry intervals in named.conf.
- Monitor Resource Usage: Use top or htop to ensure the named process isn’t hitting CPU limits
- Enable Caching: Ensure that the resolver has enough memory allocated to its cache to reduce the need for recursive lookups.
Tools
When diagnosing DNS issues, you need visibility into how queries are being answered. Two tools are essential:
dig (Domain Information Groper)
dig is the most powerful and precise DNS troubleshooting tool available. It allows you to:
- Query specific servers directly
- See full responses
- Inspect TTL and caching
- Debug propagation and replication
Example:
|
|
This bypasses all intermediate resolvers and asks your authoritative server directly.
If you want to understand what DNS is actually doing, use dig.
nslookup
nslookup is simpler and more accessible.
It allows you to:
- Perform basic DNS queries
- Test name resolution
- Query specific servers
Example:
|
|
While it lacks the depth of dig, it’s quick, accessible, and useful for basic checks.
Which Should You Use?
- Use dig when you need detail and accuracy
- Use nslookup when you need a quick answer
If you’re troubleshooting a real issue, start with dig.
Troubleshooting Workflow
When DNS breaks, don’t guess - follow a process.
1. Check if the record exists
Query the authoritative server directly: dig @ns1.home.foundry81.com service.home.foundry81.com If it’s not here, it doesn’t exist.
2. Query the correct server
Verify which server you’re querying. Assumptions here waste the most time. Clients may still be using:
- old DNS settings
- cached results
- a different resolver entirely
3. Compare answers across servers
Query:
- primary
- secondary
- client-configured resolver
Differences usually mean:
- replication issues
- stale zones
- serial number problems
4. Eliminate caching
If everything looks correct on the server but wrong on the client:
- flush OS cache
- test in incognito
- try another device
DNS caching is often the culprit when “everything looks right.” Always prove it’s not cache before going deeper.
5. Check TTL and Timing
If changes aren’t showing up:
- your TTL may still be active
- resolvers may still be serving cached data
Follow this process and DNS stops being guesswork.
DNS has a reputation for being unpredictable, but most of that comes from not being able to see what it’s doing. Once you know where to look - the authoritative server, the cache, the path a query takes - it becomes far more mechanical than mysterious.
DNS is one of those systems that fades into the background when it’s working and takes the blame when it’s not. The difference now is that you can see what it’s doing - and more importantly, why. And with that, everything built on top of it becomes a lot easier to trust.
Access Your Homelab from Anywhere.
Once you can trace and verify DNS behavior, it stops being guesswork.
That same clarity matters even more when access extends beyond your local network. Platforms like Twingate rely on predictable DNS - here’s what that looks like in practice.
DNS and HTTPS in the Homelab: Moving Beyond IP Addresses
Homelab DNS: Running BIND with Docker
DNS in the Homelab