VMware All Flash VSAN Implementation (Home Lab)

I’ve been waiting for a while to be able to implement an all flash VSAN in my lab and now that VSAN 6.2 has been announced, I thought it would be time to upgrade my capacity disks from HDD’s to SSD’s and get cracking..! (note: despite the announcement, VSAN 6.2 binaries are NOT YET available to download. I’m hearing it would be available in a week or two on My VMware though so until then, mine is based on VSAN 6.1 – ESXi 6.0U1 binaries)

As I already had a normal (Hybrid) VSAN implementation using SSD+HDD in my management vSphere cluster, the plan was to keep the existing SSD’s as caching tier and replace the current HDD’s with high capacity SSD drives. So I bought 3 new Samsung 850 EVO 256GB drives from Amazon (here)                                       Capture

All Flash VSAN Setup

Given below are the typical steps involved in the processes to implement All Flash VSAN within the VMware cluster (I’m using the 3 node management cluster within my lab for the illustration below)

  1. Install the SSD drives in the server – This should be easy enough. If you are doing this in a production environment, you need to ensure that the capacity SSD’s (similar to all other components in your VSAN ready nodes)  are in the VMware HCL
  2. Enable VSAN on the cluster – Need to be done on the web client      1 - Enable VSAN
  3. Verify the new SSDs are available & recognised within the web client – All SSD’s are recognised as caching disks by default.              0 - Default disk assignment  2 - all caching
  4. Manually tag the required SSD drives as capacity disks VIA COMMANDLINE for them to be recognised as capacity disks within VSAN configuration – This step MUST be carried out using one the ways explained below and until then, SSD disks WILL NOT be available to be used as capacity disks within an all flash VSAN otherwise. (There currently is no GUI option on the web client to achieve this and cli must be used)
    1. Use esxcli command on each ESXi server
      1. SSH in to the ESXi server shell
      2. Use the vdq -q command to get the T10 SCSI name for the capacity SSD drive (Also verify “IsCapacityFlash” option is set to 0) 3 SSH
      3. Use the “esxcli vsan storage tag add -d <SCSI T10 name of the disk> -t capacityFlash” command to mark the disk as capacity SSD.   4 ESXCLI
      4. Use the vdq -q command to query the disk status and ensure the disk is now marked as “1” for “IsCapacityFlash” 5 esxcli verify
      5. If you now look at the Web client UI, the capacity SSD disk will now have been correctly identified as capacity (note the drive type changed to HDD which is somewhat misleading as the drive type is still SSD) 8.1 GUI
    2. Use the “VMware Virtual SAN All-Flash Configuration Utility” software – This is a 3rd party tool and not an officially supported VMware tool but if you do not want to manually SSH in to the ESXi servers 1 by 1, this software could be quite handy as you can bulk tag on many ESXi servers all at the same time. I’ve used this tool to tag the SSD’s in the next 2 servers of my lab in the illustration below. xx - Use VMware Virtual SAN all-flash configuration utility
  5. Verify capacity SSD across all hosts – Now that all the capacity SSD’s have been tagged as capacity disks, verify that the web client sees all capacity SSD’s across all hosts                                                9 Disk group manual
  6. Create the disk groups on each host – I’m opting to create this manually as shown below 9 Disk group manual 10 - Verify disk groups
  7. Verify the VSAN datastore now being available and accessible 11 - VSAN datastore active

There you have it. Implementing all flash VSAN requires manually tagging the SSDs as capacity SSDs for the time being and this is how you do it. I may also add that since the all flash VSAN, my storage performance has gone through the roof in my home lab which is great too. However this is all done on Whitebox hardware and not all of them are fully on VMware HCL….etc which makes those performance figures far from optimal. It would be really good to see performance statistics if you have deployed all flash VSAN in your production environment.

Cheers

Chan

 

 

vSphere Troubleshooting Commands & Tools – Summary

I’ve attended the vSphere Troubleshooting Workshop (5.5) this week at EMC Brentford (delivered by VMware Education) and found the whole course and the content covered to be a good refresher in to some key vSphere troubleshooting commands & tools available that I have used often when troubleshooting issues. And since they say sharing is caring,  (with an ulterior motive of documenting it all in one place for my own future reference too), I thought I would summarise the key commands and tools covered in the course, with some additional information all in one place for the easy of reference.

First of all, a brief intro to the course…,

The vSphere Troubleshooting course is not really a course per se, but more a workshop which consist of 2 aspects.

  • 30% Theory content – Mostly consist of quick reminders of each major component that makes up vSphere, their architecture and what can possibly go wrong in their configuration & operational life.

 

  • 70% Actual troubleshooting of vSphere issues – A large number of lab based exercises where you have to troubleshoot a number of deliberately created issues (issues simulating real life configuration issues. Note that performance issues are NOT part of this course). Before each lab, there’s a pre-configured powerCLI script you need to run (provided by VMware) which breaks / deliberately mis-configure something in a functioning vSphere environment and it is then your job to work out what was the root cause and fix it.  Another powerCLI script is run at the end will verify that you’ve addressed the correct root cause and fixed it properly (as VMware intended)

A little more about the troubleshooting itself first, during the lab exercises, you are encouraged to use any method necessary to fix the given issues such as command line, GUI  (web client), VMware KB articles….. But I found the best experience to be to try and stick to command line where possible, which turned out to be a very good way of giving myself a refresher on all the various command line tools and logs available within VMware vSphere,  yet I don’t get to use often in my normal day to day life. I attended this course primarily because its supposed to aid towards the preparation of the VMware VCAP-DCA certification I’m planning to take soon and if you are planning for the same, unless you are in a dedicated 2nd line or 3rd line VMware support role in your daily life where you are bound to know most of the commandlets by heart, I’d encourage you to attend this course too. It wont give you very many silver bullets when it comes to ordinary troubleshooting but it makes you work over and over again with some of the command line tools and logs you previously would have used very occasionally at best.  (for example, I learnt a lot about the various use of esxcli command in the course which was real handy. Before the course, I was aware of the esxcli command, and have used in few times to do couple of tasks but never looked at the whole hierarchy and their application to troubleshoot and fix various vSphere issues)

It may also be important to mention that there’s a dedicated lab on setting up SSL Certificates for communication between all key vSphere components (a very tedious task by the way) which some may find quite useful.

So, the aim of this post is to summarise some key commands covered within the course, in a easy to read hierarchical format which you can use for troubleshooting VMware vSphere configuration issues, all in one place. (if you are an expert of vSphere troubleshooting, I’d advice on taking a rain check on the rest of this post)

The below commands can be run in ESXi shell, vCLI, SSH session or within the vMA (vSphere Management Assistant – highly recommend that you deploy this and integrate it with your active directory)

  • Generic Commands Available

    • vSphere Management Assistant appliance – Recommended, safest way to execute commands
      • vCLI commands
        • esxcli-* commands
          • Primary set of commands to be used for most ESXi host based operations
          • VMware online reference
            • esxcli device 
              • Lists descriptions of device commands.
            • esxcli esxcli
              • Lists descriptions of esxcli commands.
            • esxcli fcoe
              • FCOE (Fibre Channel over Ethernet) commands
            • esxcli graphics
              • Graphics commands
            • esxcli hardware
              • Hardware namespace. Used primarily for extracting information about the current system setup.
            • esxcli iscsi
              • iSCSI namespace for monitoring and managing hardware and software iSCSI.
            • esxcli network
              • Network namespace for managing virtual networking including virtual switches and VMkernel network interfaces.
            • esxcli sched
              • Manage the shared system-wide swap space.
            • esxcli software
              • Software namespace. Includes commands for managing and installing image profiles and VIBs.
            • esxcli storage
              • Includes core storage commands and other storage management commands.
            • esxcli system
              • System monitoring and management command.
            • esxcli vm
              • Namespace for listing virtual machines and shutting them down forcefully.
            • esxcli vsan
              • Namespace for VSAN management commands. See the vSphere Storage publication for details.
        • vicfg-* commands
          • Primarily used for managing Storage, Network and Host configuration
          • Can be run against ESXi systems or against a vCenter Server system.
          • If the ESXi system is in lockdown mode, run commands against the vCenter Server
          • Replaces most of the esxcfg-* commands. A direct comparison can be found here
          • VMware online reference
            • vicfg-advcfg
              • Performs advanced configuration including enabling and disabling CIM providers. Use this command as instructed by VMware.
            • vicfg-authconfig
              • Manages Active Directory authentication.
            • vicfg-cfgbackupBacks up the configuration data of an ESXi system and
              • Restores previously saved configuration data.
            • vicfg-dns
              • Specifies an ESX/ESXi host’s DNS configuration.
            • vicfg-dumppart
              • Manages diagnostic partitions.
            • vicfg-hostops
              • Allows you to start, stop, and examine ESX/ESXi hosts and to instruct them to enter maintenance mode and exit from maintenance mode.
            • vicfg-ipsec
              • Supports setup of IPsec.
            • vicfg-iscsi
              • Manages iSCSI storage.
            • vicfg-module
              • Enables VMkernel options. Use this command with the options listed, or as instructed by VMware.
            • vicfg-mpath
              • Displays information about storage array paths and allows you to change a path’s state.
            • vicfg-mpath35
              • Configures multipath settings for Fibre Channel or iSCSI LUNs.
            • vicfg-nas
              • Manages NAS file systems.
            • vicfg-nics
              • Manages the ESX/ESXi host’s NICs (uplink adapters).
            • vicfg-ntp
              • Specifies the NTP (Network Time Protocol) server.
            • vicfg-rescan
              • Rescans the storage configuration.
            • vicfg-route
              • Lists or changes the ESX/ESXi host’s route entry (IP gateway).
            • vicfg-scsidevs
              • Finds available LUNs.
            • vicfg-snmp
              • Manages the Simple Network Management Protocol (SNMP) agent.
            • vicfg-syslog
              • Specifies the syslog server and the port to connect to that server for ESXi hosts.
            • vicfg-user
              • Creates, modifies, deletes, and lists local direct access users and groups of users.
            • vicfg-vmknic
              • Adds, deletes, and modifies virtual network adapters (VMkernel NICs).
            • vicfg-volume
              • Supports resignaturing a VMFS snapshot volume and mounting and unmounting the snapshot volume.
            • vicfg-vswitch
              • Adds or removes virtual switches or vNetwork Distributed Switches, or modifies switch settings.
        • vmware-cmd commands
          • Commands implemented in Perl that do not have a vicfg- prefix.
          • Performs virtual machine operations remotely including creating a snapshot, powering the virtual machine on or off, and getting information about the virtual machine.
          • VMware online reference
            • vmware-cmd <path to the .vmx file> <VM operations>
        • vmkfstools command
          • Creates and manipulates virtual disks, file systems, logical volumes, and physical storage devices on ESXi hosts.
          • VMware online reference
    • ESX shell / SSH
      • esxcli-* commandlets
        • Primary set of commands to be used for most ESXi host based operations
        • VMware online reference
          • esxcli device 
            • Lists descriptions of device commands.
          • esxcli esxcli
            • Lists descriptions of esxcli commands.
          • esxcli fcoe
            • FCOE (Fibre Channel over Ethernet) commands
          • esxcli graphics
            • Graphics commands
          • esxcli hardware
            • Hardware namespace. Used primarily for extracting information about the current system setup.
          • esxcli iscsi
            • iSCSI namespace for monitoring and managing hardware and software iSCSI.
          • esxcli network
            • Network namespace for managing virtual networking including virtual switches and VMkernel network interfaces.
          • esxcli sched
            • Manage the shared system-wide swap space.
          • esxcli software
            • Software namespace. Includes commands for managing and installing image profiles and VIBs.
          • esxcli storage
            • Includes core storage commands and other storage management commands.
          • esxcli system
            • System monitoring and management command.
          • esxcli vm
            • Namespace for listing virtual machines and shutting them down forcefully.
          • esxcli vsan
            • Namespace for VSAN management commands. See the vSphere Storage publication for details.
      • esxcfg-* commands (deprecated but still works on ESXi 5.5)
        • VMware online reference
      • vmkfstools command
        • Creates and manipulates virtual disks, file systems, logical volumes, and physical storage devices on ESXi hosts.
        • VMware online reference

 

  • Log File Locations

    • vCenter Log Files
      • Windows version
        • C:\Documents and settings\All users\Application Data\VMware\VMware VirtualCenter\Logs
        • C:\ProgramData\Vmware\Vmware VirtualCenter\Log
      • Appliance version
        • /var/log
      • VMware KB for SSO log files
    • ESXi Server Logs
      • /var/log (Majority of ESXi log location)
      • /etc/vmware/vpxa/vpxa.cfg (vpxa/vCenter agent configuration file)
      • VMware KB for all ESXi log file locations
      • /etc/opt/VMware/fdm (FDM agent files for HA configuration)
    • Virtual Machine Logs
      • /vmfs/volumes/<directory name>/<VM name>/VMware.log (Virtual machine log file)
      • /vmfs/volumes/<directory name>/<VM name>/<*.vmdk files> (Virtual machine descriptor files with references to CID numbers of itself and parent vmdk files if snapshots exists)
      • /vmfs/volumes/<directory name>/<VM name>/<*.vmx files> (Virtual machine configuration settings including pointers to vmdk files..etc>

 

  • Networking commands (used to identify and fix network configuration issues)

    • Basic network troubleshooting commands
    • Physical Hardware Troubleshooting
      • lspci -p
    • Traffic capture commands
      • tcpdump-uw
        • Works with all versions of ESXi
        • Refer to VMware KB for additional information
      • pktcap-uw
        • Only works with ESXi 5.5
        • Refer to VMware KB for additional information
    • Telnet equivilent
      • nc command (netcat)
        • Used to verify that you can reach a certain port on a destination host (similar to telnet)
        • Run on the esxi shell or ssh
        • Example:  nc -z <ip address of iSCSI server> 3260 check if the iSCSI port can be reached from esxi to iSCSI server
        • VMware KB article
    • Network performance related commands
      • esxtop (ESXi Shell or SSH) & resxtop (vCli) – ‘n’ for networking

 

  • Storage Commands (used to identify & fix vaious storage issues)

    • Basic storage commands
    • VMFS metadata inconsistencies
      • voma command (VMware vSphere Ondisk Metadata Analyser)
        • Example: voma -m vmfs -f check -d /vmfs/devices/disks/naa.xxxx:y (where y is the partition number)
        • Refer to VMware KB article for additional information
    • disk space utilisation
      • df command
    • Storage performance related commands
      • esxtop (ESXi Shell or SSH) & resxtop (vCli) – ‘n‘ for networking

 

  • vCenter server commands (used to identify & fix vCenter, SSO, Inventory related issues)

    • Note that most of the commands available here are Windows commands that can be used to troubleshoot these issues which I wont mention here. Only few key VMware vSphere specific commands are mentioned below instead.
    • SSO
      • ssocli command (C:\Program Files\VMware\Infrastructure\SSOServer\utils\ssocli)
    • vCenter
      • vpxd.exe command (C:\Program Files\VMware\Infrastructure\VirtualCenter Server\vpxd.exe)
      • vpxd

 

  • Virtual Machine related commands (used to identify & fix VM related issues)
    • Generic VM commands
      • vmware-cmd commands (vCLI only)
      • vmkfstools command
    • File locking issues
      • touch command
      • vmkfstools -D command
        • Example: vmkfstools -D /vmfs/volumes/<directory name>/<VM name>/<VM Name.vmdk> (shows the MAC address of the ESXi server with the file lock. it its locked by the same esxi server as where the command was run, ‘000000000000’ is shown)
      • lsof command (identifies the process locking the file)
        • Example: lsof | grep <name of the locked file>
      • kill command (kills the process)
        • Example: kill <PID>
      • md5sum command (used to calculate file checksums)

 

Please note that this post (nor the vSphere Troubleshooting Course) does NOT cover every single command available for troubleshooting different vSphere components but only cover a key subset of the commands that are usually required 90% of the time. Hopefully having them all in one place within this post would be handy for you to look them up. I’ve provided direct links to VMware online documentation for each command above so you can delve further in to each command.

Good luck with your troubleshooting work..!!

Command line rules….!!

Cheers

Chan