Insecure defaults in Windows + NPS / Active Directory
Disclaimer: I got some comments that some descriptions etc. are a bit "vague". But it's on purpose. I don't want to (or can't) reveal too much information regarding the actual real life setup. It's sometimes quite difficult to do, to walk on the fine line between too little and too much information. Thus I try to always find the essence of the issue on hand, but I also understand that it means quite often that some important information may be missing. I may update some posts when enough years have passed and that particular information is not too critical any more. But I won't promise anything! ;)
Microsoft Network Policy Server, NPS. *sigh* Another *sigh* Did I already convinced you that I don't love this product?
This text is about the annoying and problematic defaults you'll find everywhere in Microsoft products. I'm not sure whether they value backward compatibility over everything else or if they just don't know what they are using or doing in different silos they have inside their company.
Or actually. This is not a text about the issue in general. This is a text about one particular issue I was forced to meet some moons ago. But this same thing seems to happen every now and then, especially when you try to make things more secure.
No, I'm not saying MS products are insecure. Yes, they have made huuuuge leaps in recent years. And still. Or perhaps because of those advancements these insecure defaults are feeling much more annoying?
We were doing for a customer of us the same migration quite many companies are doing these days. Older servers in own or colocation datacenters are being shutdown and migrated to the cloud, usually either Azure or AWS.
Basically the plan was to install new Win2019 servers in the cloud, promote them to the domain controllers and then demote the older ones from their own datacenter. Simple, huh? Yes, in theory. But there were quite many bits you needed to get right. Especially because this was my first time to actually play with promoting/demoting domain controllers and this kind of a migration. It's always nice to learn with customer infra, isn't it? ;)
Luckily we have google! I just can't understand how anyone could do anything before we had it? Umm... have I already wondered about this topic??
We also agreed with customer to set some quite strict security hardenings for these new servers, just because I wanted to do this correctly from the start.
I installed new servers, added them to the Active Directory cluster, added Network Policy Server role for RADIUS authentication and things seemed to work without any (bigger) issues. I almost popped the champagne.
Some weeks passed. I started to demote the old servers when things seemed to work. First one. And waited some days. Everything still worked. Logs and diagnostic looked fine. Dcdiag and repadmin /showrepl became my friend during that process.
Oh, all of the FSMO roles were still located on the last standing old server (netdom query fsmo). It was still the PDC (Primary Domain Controller). Umm.... Can I first transfer these roles to one of those new servers, and then verify things are working? Yes, there are multiple ways, but I prefer PowerShell in most of my admin duties, and here's the way to do it. More explanation in the link, but this will do it.
$ Import-Module ActiveDirectory
$ Move-ADDirectoryServerOperationMasterRole “your.new.ad.server” –OperationMasterRole 0,1,2,3,4
Everything went well. Until....
Oh, you probably guessed that something popped up? Yes, customer called furiously next morning 'Our VPN is broken, it need to be fixed ASAP!
I have to say that I was confused. I had done nothing the day before which could have broken the VPN authentication. Or so I thought. So, I started digging. Nothing. It wasn't working, but I had no ** idea why. For some reason the authentication part had broken down.
Basically VPN was using RADIUS to authenticate against NPS which would use some protocol to authenticate against AD. VPN server just said 'Access denied', and NPS was no better 'User account or password is not correct, or user account is not found.'
oookay. But I was able to login to Windows with the same account + password, so that part should be good. If you have read my previous NPS article, you have quite good understanding how I'm feeling about NPS (or generally Microsoft) logging. Let's just say that I have seen better and easier to use. So, logging didn't reveal anything more.
I started to get a bit frustrated. I had some other things to do also, and customer kept pressing about this issue. I totally can understand that, but still it didn't help.... Just before I left from work, I was able to find a "workaround" for the issue. We just lowered the security settings of the authentication part between the VPN gateway and NPS, which wasn't great thing but we could live with that a couple of days in this environment. At this point I had already used a bit too many hours playing with WireShark and trying to understand what was happening with the RADIUS traffic. I was almost 100% sure that the problem wasn't between VPN and NPS. It was between NPS and AD, but I just couldn't figure out how to debug that part.
A night passed.
Next moring I once again entered some words to Google. They were probably a bit different compared to ones I used day before. There was one link to a blog post which caught my eye. Could this be??
I clicked it. I read it. I started to feel it.
Yes. This was it.
NPS uses NTLM to communicate with AD. At least with the authentication method this setup was using. There are two versions, NTLMv1 and NTLMv2. The latter one was introduced already in Win NT 4.0SP4 which was released 25.10.1998. And it's not very hard to guess that v1 has big security issues, so usage of v2 is preferred.
Now comes the weird part. We had totally disabled NTLMv1 with those security hardenings we used in this setup. And because it was disabled, NPS of Win 2019 server couldn't communicate with AD. It defaults to using NTLMv1 and support for v2 communication must be manually added by adding a registry value:
- Open regedit
- Go to key: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\RemoteAccess\Policy
- Add DWORD named 'Enable NTLMv2 Compatibility' and set it to 1
Can someone tell me, why NPS of Win2019 does not use the more secure NTLMv2 by default? It has been here almost 22 years already.... Or if it would prefer v1 for some weird reason, could it still accept v2 automatically when v1 has been disabled by a security conscious sysadmin?
Comments
Post a Comment