[German]Another issue, which was brought to my attention by a reader. It's about Windows Server 2019, where there is a problem with the domain controller in the morning hours because the DNS service fails. This is not limited to one system, but occurs on several customer systems. I'm posting it on the blog to find out if there are any others affected.
Advertising
Reader reports DNS outages
German blog reader Patrick P. works as an IT supporter for various customers. In this context, he also administers several Windows Servers 2019, which also act as domain controllers. However, he has been observing a crude problem for some time.
- For some of the managed Windows Server 2019 domain controllers, the DNS service always fails in the morning hours.
- Restarting Windows Server 2019 or the DNS service fixes this failure until the service fails again the following day.
If the DNS service is down, clients can no longer access the domain. This results in corresponding calls from customers whose systems are no longer running. As a special problem, Patrick also states that he can no longer access the terminal server via the gateway using RDP. Patrick has observed this strange behavior with four customers so far – so it is not a single system problem.
Event viewer returns event 404
Patrick has also checked the event logs of the relevant domain controllers on the Windows Server 2019 systems. There are entries with the event ID 404 that relate to the DNS service.
Event viewer; Click to zoom
In the details it says that a TCP socket could not be bound to the IP address xx.xx.xx.xx. Somehow the resources seem to be running out. The recommendation is to restart the DNS server or the computer.
Advertising
A resource problem?
Based on the error description above, I would spontaneously guess something like a memory leak, so that the working memory is running full and resources are running low. Restarting the server or the DNS service will free up the memory so that the machine can run for a few hours again.
I am haunted by the article Windows Server: April 2024 Update KB5036909 causes also LSASS crashes on DCs. The memory leak there, which was caused by the March/April 2024 update, has actually been fixed by an out-of-band update since the mail and should have been fixed with the regular security updates in June 2024.
There is a Microsoft forum postt DNS server errors 404, 407, 408 when windows server installed on SSD from November 2020, where someone describes the error pattern and wants to attribute the whole thing to an SSD installation. Microsoft also has this resource from 2010, which also deals with event ID 404 on the DNS server. The suggestion there is to free up memory (which is occupied by applications or services) on the server in question.
There is still a fairly recent support article Event IDs 4016 and 4004 when DNS updates time out from May 2024, but it deals with a different error code. There Microsoft offers hotfixes to fix the problem. At this point the question: Has anyone also observed the behavior described above? Is there any insight into the cause and how to fix the problem permanently?
Advertising