Virtual Machines (VMs) behaving abnormally after Storage Incident

Problem

When VMs experience underlying storage errors, it is possible to encounter VMs stop behaving normally such as the file systems become ready only and not responsive.

For some Linux-based servers in particular, it is common for file systems to be configured to remount in read-only mode if errors are detected and the same behavior is expected even on a native Linux environment.

Symptoms

Below is some of common symptoms VMs could experience after being recovered from underlying storage problems.

Blue Screens - Older version of Windows
VM State is “powered on” but VMWare tool status shows “Not Running”
Unable to login to OS even with correct credentials
CPU Utilization pegs at 100%

Solution

vSphere Client

If you are having any of these symptoms above, please try to Reset VM (not “Restart Guest”) from vSphere Client menu and see problems are resolved.

"Restart Guest” may not work as it requires VMware Tools up and running.

VSS CLI

Using https://vss-cli.eis.utoronto.ca or a local VSS CLI install:

Look for the VM uuid.
vss-cli compute vm ls -f name=<vm-name>
Submit a request to reboot (VMware Tools running required) or power cycle the VM if not responsive.
vss compute vm set <uuid> state reset|reboot Host Name: hostname (Ubuntu Linux (64-bit)) IP Address: 172.17.0.1, 192.168.2.31 Are you sure you want to change the state from "running to reset" of the above VM? [y/N]:
Verify if the VM issues have been solved

VSS Portal

The VSS Portal only offers a way to power cycle the virtual machine. To do so, please do the following:

Login to https://vss-portal.eis.utoronto.ca
Search for the VM to change and click on the
button.
Toggle the "Power Status" button to OFF.
Click on Save
Wait a few seconds and repeat steps 2 to 4 but this time toggle the button to ON.
Verify if the VM issues are resolved.

Virtualization, Servers and Storage