Friday took the prize for the “weirdest issue of the week” for this last week, which is funny for me since like most people Mondays are usually the day I dread. Or any day (night) when I happen to be on-call.

It was simple enough and wacky enough to make me take a time-out and actually write up a post. Beware – it’s a long one. It took me less time to figure out and fix the problem below than it will take you to read about it. I’m sorry.

There is a little backstory to this one that I wasn’t involved in, but I’m including just to set the stage and also to illustrate how some little thing can have a big impact.  Actually in this case, two little things. A customer’s solitary domain controller happily humming away for the last 8 years was in need of replacement. A pair of servers were promoted to DCs that just never replicated properly. After 60+ hours of work and a case with Microsoft later (that I thankfully was not part of) they ended up with a third new DC after trashing the first new DC when no one could get it replicate with its peers.

This network was also suffering from an exhaustion of IP addresses – this is the part were I came in this past Tuesday. They only have a single /24 subnet, so I figured we might be able to start with setting up a guest wireless VLAN for all the smartphones and such. I ended up sending the next few hours crawling their network to figure out their topology as there was really no documentation for them. They’ve been a customer at least 12 years, so I assume the MSP I work for is still in the planning stage of onboarding them. Anyway, I had to make note of and change the factory default passwords of five rack-mounted and two desktop switches no one knew about. It was during this that I noticed that their Internet connectivity was a little slow, so as I was downloading an IP scanner and Putty I did a couple of speed tests and some nslookups to see if I could spot any trouble. I really didn’t see anything of note, so I carried on with my business unimpeded since I was just working on the LAN. 

It turned out that the wireless system that I knew about (a system of 8 UniFi APs and a cloud controller) was entirely located in an outbuilding connected over a 14 year-old wireless bridge, which meant there was another system or at least an AP servicing the main office space that I hadn’t found remotely that likely had the bulk of the BYOD devices. The cloud-key turned out to be a standalone unit that was failing to load its management page after accepting the credentials so I was really jammed up and it was late in the afternoon. A colleague would be looking into the WiFi device situation and was to reboot the cloud-key the next day while on-site.

As is customary for me, the next couple of days went by like a blink as I was dedicated to a phone system migration. So Friday rolls around and my colleague calls telling me the aforementioned customer was upset about their “network sluggishness” and somehow if was my fault I guess because I touched it last. I agreed to take a look at it with him and found that name lookups were taking in excess of 25 seconds. WOW.

When I connected into the server we were working from (one of the new DCs) I could see that he had already done some nslookups, the last one really catching my eye. He had done an nslookup of the local domain name itself. I noticed that it had timed out and that the lookup had appended the domain name to itself. That is to say that he looked up “mydomainname.local” and the response duplicated the lookup to be “mydomainname.local.mydomainname.local”.

I don’t really recall having run into that behavior before, but you never know – I’ve been at this stuff a long time. Anyway, the next piece of info in the lookup was also interesting in that despite the DNS timeout, an IP address was returned for the above lookup which was an IP from the customer’s remote access VPN pool. I had to confirm that by checking the pool configured on their Cisco ASA failover cluster (that I installed and configured about a decade ago). 

So after some head-scratching we found a static A record in their DNS that pointed *.mydomainname.local to that IP address in the RAVPN pool. Literally an asterisk in the host record to make it a wildcard, which I didn’t even know you could do in Windows DNS – at least I’ve never had a use-case to try it.  

I don’t know if that record would have the intended effect, but after thinking it over for a few seconds I descided to take a screenshot of it and delete the record. And no, there wouldn’t be another domain controller on the other end of that IP address. It was a random IP that would be someone’s home laptop or something.

After deleting that record, nslookups for the domain were still showing a timeout, but then a the DC’s IP address was returned below.    

So that was the first simple but weird issue. Then came the matter of the slow lookups and timeouts. I saw the same thing from both domain controllers. I didn’t see any issues with the DNS service itself – the event logs were pretty clean. I didn’t struggle with this one for long since I always like to look at Windows domain DNS issues with a simple process that takes all of a minute. I eliminated dumb things like an incorrect or missing DNS server IP and got to the point (in a minute or less) where I wanted to look at any forwarder settings on the DNS server. And I had a BINGO!

Where you might see something like my old friend 4.2.2.2 set up as a forwarder shown at right, the server I was on (let’s say it’s IP was 10.10.33.201) had it’s fellow DC at 10.10.33.202 set up as a forwarder. Hmm… that’s a bit strange. So of course the next thing I did was to log onto 10.10.33.202 to find that it had a forwarder set up pointing back to 10.10.33.201!

Like most things in networking, loops are not your friend. I deleted the forwarder on both servers and all the lookups and web browsing returned to normal. I think this also might explain the domain replication issues they had that even Microsoft couldn’t figure out that I opened with. DNS is really important. Comment below what you think about forwarders versus root hints. Hahaha! I can’t believe I want to start that war on my own site.

 

0
Would love your thoughts, please comment.x
()
x