This is a quick addendum to the recent posts on my home lab, showing what can happen when you don't think all the way through a migration project!

As part of building out the new lab, I needed to move all of my internal management VMs over. They had been residing on an HP DL360 G5 with an external storage shelf attached, so part of this process included a storage vMotion over to my trusty ReadyNAS NV+ which I've had for years. I didn't expect a lot of performance out of it, but it did well enough that I ended up moving ALL of my lab VMs over to it. I can definitely put it in the dirt if I'm not careful, but overall it does well, and it meant I didn't have to spring for new storage!

So the SvMotion went fine, and everything is working perfectly. Both AD controllers, vCenter, vCOps, vMA, everything goes smoothly. I even anticipate one issue, making sure that I have the DNS name of the NAS array in the hosts file on the host, since DNS may not be available when it first boots up. Once everything is over, I pull all of the old gear out, shut the new hosts down, rack and cable everything and then boot the management host back up. It comes up, and there's no connection to the NAS. I make sure I can ping the array by name from the host, and there's no problem there. I start debugging the vmwarning logs, and I see that it's getting a "permission denied" error when it tries to connect. It also looks like it's trying to query LDAP? Strange. Let's go look at the array.

facepalmThe array looks fine, nothing has changed and I didn't even reboot it, so I'm puzzled. I see that the NFS share that the array is using has been added to the AD domain as well as allowing the ESXi hosts root NFS access, so nothing strange there. Very perplexing. Some digging on Google finally reveals the issue, and it's a doozy…

The basic problem is that the ReadyNAS NV+ can be added to an AD domain, but it doesn't CACHE any of the credentials! This means that when the domain is unavailable, so are the files for every share that uses CIFS sharing, including all of the VMs located on it. Including the AD controller… #facepalm

Now I have two issues: one, how do I fix the permissions enough to get the files OFF the NAS, and two, once I do, how do I get the AD controller working? It took a little hacking, a download off the ReadyNAS community site and a reboot, but I was able to access the NAS via SSH. Then I was able to manually change all of the permissions on the files that corresponded to my AD controller. Then I enabled FTP and pulled all of the files to my desktop. Once there, I was able to boot my AD controller using a copy of VMware Workstation. After it booted, the host was able to mount the NFS share again, and I was able to bring the rest of the environment up. Once the secondary AD controller was up, I stopped the one on my desktop and booted it back where it belonged…

The moral of the story for me was to make sure my next array can cache AD credentials in case of an outage. And to keep a copy of my AD controller up-to-date on my desktop, just in case!

13,813 total views, 20 views today

 

2 Responses to Despite My Best Intentions…

  1. I’ve been burned a few times by not having a physical AD system available. Far too much can go wrong and in the long run the piece of mind it buys outweighs the flexibility of it being a vm. We have always have a vm dc but keep that physical one around just in case

  2. In an enterprise environment, I’m completely with you. Even if it’s a tertiary controller that doesn’t have any FISMO roles and it’s sitting on an old desktop PC that should have been thrown out years ago, having at least one physical is a good idea. In the lab, where physical boxes are hard to come by, it’s been OK until now. The NAS definitely bit me on this one.