Tag Archives: vmware

Issues with Storevirtual volumes, when using them without Network-RAID10?

Just a little mind-boggling thing I’ve been troubled with over the last months and finally I realised what the issue was. I’ve been troubleshooting an issue on one of my ESXi clusters which uses HPE StoreVirtual VSA and Synology as datastores.

During this troubleshooting (trouble was the Synology – looong sad story) I’ve had to reboot the hosts numerous times and every time I’ve been forced to shut down the network ports on the switch, that holds the iSCSI kernels’ network ports, otherwise the hosts wouldn’t boot it would just stay up and look as if it was waiting for a timeout. It might be that this timeout would’ve occurred if I had waited long enough, but I found myself patiently waiting for 10-20 minutes it seemed at the time… Suddenly I read and understood the message it was giving me: My set up is I’m running 3 Storevirtual datastores with Network-RAID10, 1 without Network RAID and 3 Synology datastores. When I wanted to reboot, I shut down the VSA appliance on the host and all my VMs were moved to the other host – but the issue was that since I had 1 datastore without Network RAID, this goes offline whenever I shut down any 1 of my VSA appliances.

This creates the issue, that the ESXi host cannot tell that volume that it is going down for a reboot and for some reason this is bad for the host – even though there is nothing using that datastore on the host. The VMs that have a VMDK on that datastore just shouldn’t use data to or from that VMDK while it’s offline and then there is no issue. The problem is that I cannot turn on the network ports again, until the host is completely booted because it also MUST for some reason connect to this offline volume under start-up, it just stays saying starting up ISCSI in the ESXi boot screen. If I keep the ports offline, the host boots quickly and then I can turn on the VSA appliance and the network ports and do a rescan of the iSCSI adapter and it’ll connect to the datastores – but it sometimes has the error message “All shared datastores has failed” on the Summary page.

To me this seems like a design problem and it would seem to me that a lot of people could have this potential issue/bad experience if using volumes without Network-RAID10?

New book – “Learning Veeam Backup & Replication for VMware vSphere”

Disclaimer:
I have received this book as a free review sample, with the only requirement that I would write a review of it here on my blog and post short reviews of it on amazon.com, goodreads.com and a few other book sites. These should be unbiased and I was in no way obligated to write positive reviews.

The title of this book hits it right on the spot, it teaches the reader what you need to know and is very instructional and a great guide to start up a Veeam project.

Starting of with a background understanding of backup concepts, from the start it seems to be a self teaching tutorial with walkthrough guides for the basic concepts and a great explanation for the more advanced topics, this book is a perfect match if you are starting up or investigating a backup solution for your virtual infrastructure.

I’ve been working with Veeam for a bit the last couple of months and the book cleared a few concepts for me and also gave me ideas on how to make even more out of our current setup and what is needed of infrastructure to make this happen. Thumbs up to Christian Mohn for this title, which certainly will be near by when I’m messing around with the Veeam setup! 🙂

Here is a link to Packt Publishing which are the publishers, find it as ebook or physical
http://www.packtpub.com/learning-veeam-backup-and-replication-for-vmware-vsphere/book

Virtual machine failing with STOP 0x0000007F error after upgrading VMtools.

Came quite unexpected, and the machine also had a failing office 2010 installation runnning twice on top of each other, so there was a lot of reasons to why the machine failed after the reboot. But after trying a lot of troubleshooting, I first upgraded the virtual hardware from version 4 to 7 – then I could get into safe mode and run a checkdisk. This didn’t fix the problem, but after this I was able to get a look at the exact STOP error and this led me to Microsoft supportsite, saying it could be either defunct RAM or if you are trying to run the CPU at a higher clock speed than it’s intended to. This being a virtual machine gave these errors a couple of doable solutions. I set the CPU to the same Reserved and Limited megahertz, not sure this puts the machine at the specified speed but anyway… Also I changed the RAM-size, just to get that rewritten in the .vmx file also. After this the machine booted on the first try. The Office installation is still defunct but thats got nothing to do with the STOP error.

Update: Seems it can still stall the machine, but changing RAM size makes it reboot fine…

vmware Update manager errors

Today I wanted to upgrade an old ESX 3.5 host to the much newer ESX 4.1. A thing I’ve done through vmware Update Manager a couple of times over the last couple of days – but today showed a different side of VUM. It started out with that I couldn’t even get to scan the host for updates, that got solved with re-installing the vCenter agent on the host after quite a bit of troubleshooting. Then the next problem came, when I tried to remediate the upgrade to the host, it goes on for just a minute or so, and then fails and crashes the VUM service on the vCenter server… After another few hours of troubleshooting I found this in the VUM logdirectory

VUM log directory

‘Activation’ 5384 INFO] [activationValidator, 188] ManagedObjectNotFound. Leave Validate. Failed for vmodl.query.PropertyCollector.Filter.destroy on session private target: session[7BE206AA-16EA-4AD0-88F7-B8A0467D6800]95748830-9C71-4BBD-8488-B17934154A50 [2010-09-23 15:21:08:548 ‘Activation’ 5384 INFO] [activationValidator, 188] ManagedObjectNotFound. Leave Validate. Failed for vmodl.query.PropertyCollector.Filter.destroy on session private target: session[7BE206AA-16EA-4AD0-88F7-B8A0467D6800]FB7A2330-9122-43C6-91F0-443012184115 [2010-09-23 15:21:08:551 ‘Activation.trace’ 6252 DEBUG] [activationValidator, 1028] —————————————————— Invoking logout on integrity.SessionManager:Integrity.SessionMgr session B4CAF89A-E6CB-4721-ADB3-10250114A699 —————————————————— [2010-09-23 15:21:08:551 ‘Activation.trace’ 6252 DEBUG] [activationValidator, 1094] Throw vmodl.fault.SecurityError Result: (vmodl.fault.SecurityError) { dynamicType = <unset>, faultCause = (vmodl.MethodFault) null, msg = “”, } [2010-09-23 15:21:08:551 ‘Activation’ 6252 INFO] [activationValidator, 308] Leave Validate. No user logged in. Failed for integrity.SessionManager.logout on target: Integrity.SessionMgr


After searching on the error on vmware.com and just google.com – I came across this post: http://www.techhead.co.uk/vmware-vsphere-update-manager-there-are-errors-during-the-remediation-operation-and-inaccessible-vms which pointed me in the direction that it had to do with the storage, but not the same as the post mentioned, as there was not VMs on the host, and no orphaned VMs or anything, but anyway, I disabled the iSCSI adapter and removed the dynamic discory IP’s and rebooted the server, whereafter the server only could see its local storage. Then on the first try to remediate it worked and approx 20 minutes later I had a ESX 4.1 host up and running.

installere OpenManage på DELL 2900 ESX 3.5 server

Gå på support.DELL.com

Find server modellen på Servicetag->Vælg Red Hat Enterprise Server 4 som OS->Under System Administration->Vælg

download:

Så skal filen ligges på serveren, dette skal være på et lokalt storage og ikke et SAN volume, kan fx være: /var/log eller /vmtools/ (med WinSCP)

Connect med Putty:

gunzip xxxx.tar.gz

tar –xvf xxxx.tar

sh linux/supportscripts/srvadmin-install.sh –express


sh linux/supportscripts/srvadmin-services.sh start

esxcfg-firewall –o 1311,tcp,in,OpenManageRequest

Hvis filen fejler ved gunzip, kan files sendes til et SAN-volume og derefter gunzip på SAN volume, herefter kan ***.tar filen flyttes i Putty med:
mv “source” “destination” hvor destination er på lokalt storage

herefter kan man lave tar-kommandoen og de efterfølgende som ovenfor, bare i det lokale directory i stedet.

Posted via email from danevald’s posterous

Error when relocating virtual machine on VI ESX 3.5

I was facing the need to move machines from one SAN LUN to another, but kept on hitting an error where the vCenter server just keeps saying “Cannot connect to host” after trying to use the “migrate” option, both with the guest machine being turned on and off. It seems to be some kind of time-out it’s waiting for – after approx. 5-10 minutes. The first ESX is running 3.5.0 64607 and the other one is a lot newer running 3.5.0 207095. The vCenter 4 is running update 1.

Luckily we have another ESX 3.5 server on the same location, so I tried to add it to the inventory under the other host. After this it was possible to move the machine and then remove it from the inventory again, and add it to the original host and started it up.

I have not been able to find an explanation in any log-files, configuration files or on the VmWare community or anywhere on the internet, but as far as I can see there can be two: Either this version of ESX doesn’t support Storage-vmotion, both “hot or cold”, or the other option is that somewhere in the configuration, I am missing a setting – the server was moved from one site to another, with another IP-segment, but I haven’t found a lot more things that doesn’t work and I would expect this if there was some major configuration error.