VMware vRealize Automation part 2 – Deployment Architecture – Dedicated Management Cluster

Next: vRA Part 3 – vRA Appliance Deployment –>

Having had a look at the vRA support matrix, next point to consider in a typical vRA deployment is the deployment architecture which I’ll briefly explain below

vRA is part of the VMware product set that recommends the use of a dedicated management cluster (along with vCD and NSX). This is important because the concept behind this is that a dedicated management cluster will isolate all the VM’s that make up the management infrastructure such as Active Directory VMs, vCenter & SQL server VMs, Monitoring VMs…etc. This separation provides a separate execution context from those virtual machines that provides end user accessible resources, in other words, compute VM’s / production VMs that actually run the business critical workloads. Such a separation inherently provide a number of benefits to an enterprise.

  • Security & isolation of management workloads
  • Elimination of resource (and otherwise) contention between management & production workloads.
  • DR and Business Continuity without replicating un-necessary management components

An example would look like below

1.0 Mgmt cluster

Within a typical vRA deployment, the management cluster would host the following vRA components

  • vRA UI appliance (if using a distributed high availability deployment model, the vRA appliance cluster, Postgre SQL cluster & load balancers)
  • vRA Identity appliance (of is using the vSphere SSO, vSphere SSO server/s)
  • IAAS windows VMs (if using a distributed high availability deployment model, IAAS web servers, Model Manager web servers, MS SQL DB cluster, DEM Orchestrators, DEM Workers & Agents and the required load balancers)
  • vRO appliance (if using a distributed high availability deployment model, vRO cluster and the backend SQL DB cluster with relevant load balancers)

During the configuration of IAAS components, vRA will connect to various end points (such as a vCenter server instance that manages a number of resource clusters) and once an endpoint such as a vCenter instance is connected, a Fabric Administrator would create resource reservations for each cluster managed by that vCenter instance. Once these reservations are created, vRA typically assumes complete control over those clusters (resource reservations within the clusters) to be able to use those resource reservations as how it sees fit. This could present problems if you run your management infrastructure VMs (such as vCenter server and vRA appliances..etc.) in one of those same clusters as vRA will not take in to account the existence of other VMs in the same cluster, that was not created by itself. This could result in vRA deploying VM’s (based on IAAS request from users) which will affect the resources available for the management VMs with a  potential to affect performance of both the management VMs and production VMs (created by the vRA based on blueprints). It is therefore typically recommended that you keep all resource / compute clusters separate from the vRA management VMs and under the full control of vRA itself (no manual creation of VM’s in the resource clusters).

If you have an existing vCloud Director deployment or an NSX deployment, you may already have a dedicated management ESXi cluster in place as these products makes it a mandatory requirement to have one. However even if you don’t and are considering a vRA deployment, I would highly encourage you to have a dedicated management cluster to host the vRA infrastructure components.

An example high level design where vRA along with VMware NSX is deployed using a Management cluster could look like below.

2.0 HLD arhcitecture

 

Next: vRA Part 3 – vRA Appliance Deployment –>

VMware vRealize Automation Part 1 – vRA Support Matrix

Next: vRA part 2 – Deployment Architecture – Dedicated Management Cluster –>

The first step that should be involved in deploying vRealize Automation in any one’s book is to refer to the support matrix PDF on VMware web site. There are a strict number of support limitations which you must be aware of, and all the key information you need can be found within this document.

I’d encourage you to read the document for complete support details (and stay up to date with newer versions too) but given below is a high level summary of some key contents (based on the current vRA release of 6.2.1).

  • vRA IAAS server
    • Host OS (for IAAS components) – W2k8R2, W2k12 & W2k12R2 only (note that Windows 2008 is NOT supported)
    • IAAS DB: SQL 2008 R2 SP3 or higher (up to SQL 2014)
    • Web Server (for IAAS model manager…etc.): IIS 2008 R2 & IIS 2012 only

 

  • vRA Appliance
    • DB Support: vPostgres Appliance 9.2.4 / 9.2.9.x / 9.3.5.x, PostgreSQL 9.2.4 / 9.2.6 / 9.3.4
    • SSO / Authentication sources: vRA Identity Appliance v6.2, vSpere SSO 5.5 1b or above (up to PSC 1.0 with vSphere 6.0)

 

  • Hypervisor Support (for the vRA Hypervisor proxy agent):
    • VMware: ESX 4.1 to U2, ESX 4.1 to U2, ESXi 5.0 onwards (including ESXi 6.0) – note that Application Director only works with vSphere and NOT other hypervisors
    • Red Hat: KVM RHEN 3.1 only
    • Microsoft: Hyper-V 2008 R2 SP1 onwards (inc 2012 R2)
    • Citrix: XenServer 5.6 through to SP2, 6.0.2 & 6.2 through to SP1

 

  • Hypervisor management platform support (for vRA proxy agent and DEM worker compatibility)
    • VMware: vCenter 4.1 through to U2, vCenter 5.0 U3 onwards (till vCenter 6.0)
    • Microsoft: SCVMM 2012 (Hyper-V) only
    • Red Hat: RHEV-Manager 3.1 / 3.3

 

  • Network Virtualisation support
    • VMware vCNS 5.5.3 only, NSX 6.1 and above (up to 6.1.3)

 

  • Cloud Support (IAAS Endpoint compatibility)
    • VMware: vCD 5.1.x & 5.5.x, vCloud Air
    • Amazon: AWS
    • (Note that Azure is NOT support as a cloud endpoint)

 

  • Image Deployment Methods (IAAS)
    • Microsoft: SCCM 2012 & SCVMM 2012 only, Windows WinPE & WIM imaging
    • NetApp: FlexClone on Data OnTap 7.3.1.1, 8.0.1 & 8.1 (Note that this doesn’t state whether its cDOT or 7Mode. Also, the most latest OnTap version 8.3 is NOT supported yet)
    • BMC: Blade Logic Operations Manager 7.6 & 8.2
    • HP: Software Server Automation 7.7
    • Citrix: Provisioning Server 6.0 & 6.1
    • Linux: Red Hat Linux kickstart, SUSE AutoYaST
    • PXE boot

 

  • Guest OS
    • Microsoft: Windows 7, 8, 8.1, W2K8R2, W2K12 & W2K12R2
    • Red Hat: RHEL 5.9, 5.10, 6.1, 6.4, 6.5, 7.0
    • SUZE: SLES 11 SP2 & SP3
    • CentOS: CentOS 5.10, 6.4. 6.5, 7.0
    • Debian: 6 & 7.0
    • Ubuntu: 12.04 LTS & 13.10
    • Oracle: Oracle Enterprise Linux
    • VMware: ESX/i 4.1 U2, ESXi 5.1 and above (up to ESXi 6.0)

 

  • VDI Connection Broker support
    • Citrix: XenDesktop 5.5 and above (up to 7.6.x)
    • VMware: Horizon View 6.x only

 

  • Task Automation Engines / Scripting support
    • VMware: vCO 5.5.1 and above (up to vRO 6.0)
    • Microsoft: PowerShell 2.0

 

Next: vRA part 2 – Deployment Architecture – Dedicated Management Cluster –>

7. NSX L2 Bridging

Next Article: ->

NSX L2 bridging is used to bridge a VXLAN to a VLAN to enable direct Ethernet connectivity between VMs in a logical switch, or a distributed dvPG (or between VMs and physical devices through an uplink to the external physical network). This provides direct L2 connectivity rather than L3 routed connectivity which could have been achieved via attaching the VXLAN network on to an internal interface of the DLR and also attaching the VLAN tagged port  group to another interface to route traffic. While the use cases of this may be limited, it can be handy during P2V migrations where direct L2 access from the physical network to the VM network is required. 0 Overview Given below are some key points to note during design.

  • L2 bridging is enabled in a DLR using the “Bridging” tab (DLR is a pre-requisite)
  • Only VXLAN and VLAN bridging is supported (no VLAN & VLAN or VXLAN & VXLAN bridging)
  • All participants of the VXLAN and VLAN bridge must be in the same datacentre
  • Once configured on the DLR, the actual bridging takes place on the specific ESXi server, designated as the bridge instance (usually the host where DLR control VM runs).
  • If the ESXi host acting as the bridge instance fails, NSX controller will move the role to a different server and pushes a copy of the MAC table to the new bridge instance to keep it synchronised.
  • L2 bridging and distributed routing cannot be enabled on the same logic switch at present (meaning, the VMs attached to the logical switch cannot use the DLR as the default gateway)
  • Bridge instances are limited to the throughput of a single ESXi server.
    • Since, for each bridge, the bridging happens in a single ESXi server, all the related traffic are hair-pinned to that server
    • Therefore, if deploying multiple bridges, its better to use multiple DLR’s where the control VM’s are spread across multiple ESXi servers to get aggregate throughput from multiple bridge instances
  • VXLAN & VLAN port groups must be on the same distributed virtual switch
  • Bridging to a VLAN id of 0 is NOT supported (similar to an uplink interface not being able to be mapped to a dvPG with no VLAN tag)

 

  • Given below is an illustration of the packet flow, during an ARP request from the VXLAN to a physical device RP-VXLAN to physical]
    • 1. The ARP request from VM1 comes to the ESXi host with the IP address of a host on the physical network
    • 2. The ESXi host does not know the destination MAC address. So the ESXi host contacts NSX Controller to find the destination MAC address
    • 3. The NSX Controller instance is unaware of the MAC address. So the ESXi host sends a broadcast to the VXLAN segment 5001
    • 4. All ESXi hosts on the VXLAN segment receive the broadcast and forward it up to their virtual machines
    • 5. VM2 receives the request because it is a broadcast and disregards the frame and drops it. 6. The designated instance receives the broadcast
    • 7. The designated instance forwards the broadcast to VLAN 100 on the physical network
    • 8. The physical switch receives the broadcast on the VLAN 100 and forwards it out to all ports on VLAN 100 including the desired destination device.
    • 9. The Physical server responds

 

  • Given below is an illustration of the packet flow, during the ARP response to the above, from the physical device in the VLAN to the VM in the VXLAN    2. ARP response
    • 1. The physical host creates an ARP response for the machine. The source MAC address is the physical host’s MAC and the destination MAC is the virtual machine’s MAC address
    • 2. The physical host puts the frame on the wire
    • 3. The physical switch sends the packet out of the port where the ARP request originated
    • 4. The frame is received by the bridge instance
    • 5. The bridge instance examines the MAC address table, sends the packet to the VNI that contains the virtual machine’s MAC address, and sends the frame. The bridge instance also stores the MAC address of the physical server in the MAC address table
    • 6. The ESXi host receives the frame and stores the MAC address of the physical server in its own local MAC address table.
    • 7. The virtual machine receives the frame
  • Given below is an illustration of the packet flow, from the VM to the physical server / device, after the initial ARP request is resolved (above)  3. Unicast
    • 1. The virtual machine sends a packet destined for the physical server
    • 2. The ESXi host locates the destination MAC address in its MAC address table
    • 3. The ESXi host sends the traffic to the bridge instance
    • 4. The bridge instance receives the packet and locates the destination MAC address
    • 5. The bridge instance forwards the packet to the physical network
    • 6. The switch on the physical server receives the traffic and forwards the traffic to the physical host.
    • 7. The physical host receives the traffic.
  • Given below is an illustration of the packet flow, during an ARP request from the physical network (VLAN) to the VXLAN vm. 4. ARP from Physical#
    • 1. An ARP request is received from the physical server on the VLAN that is destined for a virtual machine on the VXLAN through broadcast
    • 2. The frame is sent to the physical switch where it is forwarded to all ports on VLAN 100
    • 3. The ESXi host receives the frame and passes it up to the bridge instance
    • 4. The bridge instance receives the frame and looks up the destination IP address in its MAC address table
    • 5. Because the bridge instance does not know the destination MAC address, it sends a broadcast on VXLAN 5001 to resolve the MAC address
    • 6. All ESXi hosts on the VXLAN receive the broadcast and forward the frame to their virtual machines
    • 7. VM2 drops the frame, but VM1 sends an ARP response

Deployment of the L2 bridge is pretty easy and given below are the high level steps involved (Unfortunately, I cannot provide screenshots due to the known bug on NSX for vSphere 6.1.x as per documented in the VMware KB article 2099414).

 

Prerequisites

  • An NSX logical router must be deployed in the environment.

 

High level deployment steps involved

  1. Log in to the vSphere Web Client.
  2. Click Networking & Security and then click NSX Edges.
  3. Double-click an NSX Edge.
  4. Click Manage and then click Bridging.
  5. Click the Add icon.
  6. Type a name for the bridge.
  7. Select the logical switch that you want to create a bridge for.
  8. Select the distributed virtual port group that you want to bridge the logical switch to.
  9. Click OK.

Hope these make sense. In the next post of the series, we’ll look at other NSX edge services gateway functions available.

The slide credit goes to VMware….!!

Thanks

Chan

 

Next Article: ->

6. NSX Distributed Logical Router

Next: 7. NSX L2 Bridging ->

In the previous article of this VMware NSX tutorial, we looked at the VXLAN & Logical switch deployment within NSX. Its now time to look at another key component of NSX, which is the Distributed Logical Router, also known as DLR.

VMware NSX provide the ability to do traffic routing (between 2 different L2 segments for example) within the hypervisor without ever having to send the packet out to a physical router. (It has always been the case with traditional vSphere solutions where, for example, if the application server VM in vlan 101 need to talk to the DB server VM in vlan 102, the packet need to go out of the vlan 101 tagged port group via the uplink ports to the L3 enabled physical switch which will perform the routing and send the packet back to the vlan 102 tagged portgroup, even if both VM’s reside on the same ESXi server). This new ability to route within the hypervisor is made available as follows

  • East-West routing = Using the Distributed Logical Router (DLR)
  • North-South routing = NSX Edge Gateway device

This post aim to summarise the DLR architecture, key points and its deployment.

DLR Architectural Highlights

  • DLR data plane capability is provided to each ESXi host during the host preparation stage, through the deployment of the DLR VIB component (as explained in a previous article of this topic)
  • DLR control plane capability is provided through a dedicated control virtual machine, deployed during the DLR configuration process, by the NSX manager. 0. Architecture
    • This control VM may reside on any ESXi server host on the computer & edge cluster.
    • Since the separation of data and control planes, a failure of the control VM doesn’t affect routing operations (no new routes are learnt during the unavailability however)
    • Control VM doesn’t perform any routing
    • Can be deployed in high availability mode (active VM and a passive control VM)
    • Its main function is to establish routing protocol sessions with other routers
    • Supports OSPF and BGP routing protocols for dynamic route learning (within the virtual as well as physical routing infrastructure) as well as static routes which you can manually configure.

 

  • Management, Control and Data path communication looks like below

1. mgmt & control & datapath

  • Logical Interfaces (LIF)
    • A single DLR instance can have a number of logical interfaces (LIF) similar to a physical router having multiple interfaces.
    • A DLR can have up to 1000 LIFs
    • A LIF connects to Logical Switches (VXLAN virtual wire) or distributed portgroups (tagged with a vlan)
      • VXLAN LIF – Connected to a NSX logical switch
        • A virtual MAC address (vMAC) assigned to the LIF is used by all the VMs that connect to that LIF as their default gateway MAC address, across all the hosts in the cluster.
      • VLAN LIF – Connected to a distributed portgroup with one or more vlans (note that you CANNOT connect to a dvPortGroup with no vlan tag or vlan id 0)
        • A physical MAC address (pMAC) assigned to an uplink through which the traffic flows to the physical network is used by the VLAN LIF.
        • Each ESXi host will maintain a pMAC for the VLAN LIF at any point in time, but only one host responds to ARP requests for the VLAN LIF and this host is called the designated host
          • Designated host is chosen by the NSX controller
          • All incoming traffic (from the physical world) to the VLAN LIF is received by the designated instance
          • All outgoing traffic from the VLAN LIF (to the physical world) is sent directly from the originating ESXi server rather than through the designated host.
        • One designated instance (an ESXi server the LIF is used by all the VMs that connect to that LIF as their default gateway MAC address, across all the hosts in the cluster.
    • The LIF configuration is distributed to each host
    • An ARP table is maintained per each LIF
    • DLR can route between 2 VXLAN LIFs (web01 VM on VNI 5001 on esxi01 server talking to app01 VM on VNI 5002 on the same or different ESXi hosts) or between physical subnets / VLAN LIFs (web01 VM on VLAN 101 on esxi01 server talking to app01 VM on VLAN 102 on the same or different ESXi hosts)

 

  • DLR deployment scenarios
    • One tier deployment with an NSX edge 2. One tier deployment
    • Two tier deployment with an NSX edge  3. two tier routing

 

  • DLR traffic flows – Same host     4. Traffic Flow same host
    • 1. VM1 on VXLAN 5001 attempts to communicate with VM2 on VXLAN 5002 on the same host
    • 2. VM1 sends a frame with L3 IP on the payload to its default gateway. The default gateway uses the destination IP to determine that it is directly connected to that subnet.
    • 3. the default gateway checks its ARP table and sees the correct MAC for that destination
    • 4. VM2 is running on the same host. Default gateway passes the frame to VM2, packet never leaves the host.

 

  • DLR traffic flow – Different hosts    5. Traffic flow - different hosts
    • 1. VM1 on VXLAN 5001 attempts to communicate with VM2 on VXLAN 5002 on a different host. Since the VM2 is on a different host, VM1 sends the frame to the default gateway
    • 2. The default gateway sends the traffic to the router and the router determines that the destination IP is on a directly connected interface
    • 3. The router checks its ARP table to obtain the MAC adders of the destination VM but the MAC is not listed. The router sends the frame to the logical switch for VXLAN 5002.
    • 4. The source and destination MAC addresses on the internal frame are changed. So the destination MAC is the address for the VM2 and the source MAC is the vMAC LIF for that subnet. The logical switch in the source host determines that the destination is on host 2
    • 5. The logical switch puts the Ethernet frame in a VXLAN frame and sends the frame to host 2
    • 6. Host 2 takes out the L2 frame, looks at the destination MAC and delivers it to the destination VM.

DLR Deployment Steps

Given below are the key deployment steps.

 

  1. Go to vSphere Web client -> Networking & Security and click on the NSX Edges on the left had pane. Click the plus sign at the top and select Logical Router and provide a name & click next.0
  2. Now provide the CLI (SSH) username and password. Note that the password here need to be of a minimum of 12 digits and must include a special character. Click on enable SSH to be able to putty on to it (note that you also need to enable the appropriate firewall rules later without which SSH wont work. Enabling HA will deploy a second VM as a standby. Click next. 1. CLI
  3. Select the cluster location and the datastore where the NSX Edge appliances (provides the DLR capabilities) will be deployed. Note that both appliances will be deployed on the same datastore and if you require that they be deployed in different datastores for HA purposes, you’d need to svmotion one to a different datastore manually. 2. Edge location
  4. In the next screen, you configure the interfaces (LIF – explained above). There are 2 main different types of interfaces here. Management interface is used to manage the NSX edge device (i.e. SSH on to) and is usually mapped to the management network. All other interfaces are mapped to either VXLAN networks or VLAN backed portgroups. First, we create the management interface by mapping / connecting the management interface to the management distributed port group and providing an IP address on the management subnet for this interface. 3. Management interface
  5. And then, click the plus sign at the bottom to create & configure other interfaces used to connect to VXLAN networks  or VLAN tagged port groups. These interfaces fall in to 2 parts. Uplink interface and internal interfaces. Uplink interface would be to connect the DLR /Edge device to an external network and often this would be connected to one of the “VM Network” portgroups to connect the internal interfaces to the outside world. Internal interfaces are typically mapped to NSX virtual wires (VXLAN networks) or a dvPortGroup. Below, we create 2 interfaces and map them to the 2 VXLAN networks called App-Tier and Web-Tier (created in a previous post of this series during the VXLAN & Logical switch deployment). For each interface you create, an interface subnet must be specified along with an ip address for the interface. Often, this would be the default gateway IP address to all the VM’s belonging to the VXLAN network / dvPortGroup mapped to that interface). Below we create 3 different interfaces
    1. Uplink Interface – This uplink interface would map to the “VM network” dvPortGroup and would provide external connectivity from the internal interfaces of the DLR to the outside world. It will have an IP address from the external network IP subnet and is reachable from the outside work using this IP (after the appropriate firewall rules). Note that this dvPG need to have a vlan tag other than 0 (a VLAN ID must be defined on the connected portgroup) 5.1 Interface Uplink
    2. We then create 2 internal interfaces, one for the Web-tier (192.168.2.0/24) and another for the App-Tier (192.168.1.0/24). The interface IP would be the default gateway for the VMs.4. Interface Interface 1 5. Interface Interface 2
  6. Once all the 3 interfaces are configured, verify settings and click next. 8. Config Summary
  7. Next screen allows you to create a default gateway and had the Uplink interface been correctly configured, this uplink interface would need to be selected as the vNIC and gateway IP would have been the default gateway of the external network. In my example below, I’m not configuring this as I do not need my VM traffic (configured on VXLAN network) to go to outside world. 7. Default GW
  8. In the final screen, review all settings and click finish for the NSX DLR (edge devices) to be deployed as appliances. These would be the control VM’s referred to earlier in the post. 8. Config Summary
  9. Once the apliances have been deployed on to the vSphere cluster (compute & Edge clusters), you can see the Edge devices under the NSX Edges section as shown below 9. DLR deployed
  10. You can double click on the edge device to go the configuration details as shown below 9.1
  11. You can make further configurations here including adding additional interfaces or removing existing interfaces…etc. 9.2
  12. By default, all NSX edge devices contain a built in firewall which bocks all traffic due to a global deny rule. If you need to be able to ping the management address / external uplink interface address or putty in to the management IP from the outside network, you’d need to enable the appropriate firewall rules within the firewall section of the DLR. Example rules shown below. 10. Firewall rules
  13. That is it. You now have a DLR with 3 interfaces deployed as follows
    1. Uplink interface – Connected to the external network using a VLAN tagged dvPortGroup
    2. Web-Interface (internal) – An internal interface connected to a VXLAN network (virtual wire) where all the Web server VMs on IP subnet 192.168.2.0/24 resides. The interface IP of 192.168.2.1 is set as the default gateway for all the VMs on this network
    3. App-Interface (internal) – An internal interface connected to a VXLAN network (virtual wire) where all the App server VMs on IP subnet 192.168.1.0/24 resides. The interface IP of 192.168.1.1 is set as the default gateway for all the VMs on this network
  14. App VM’s and Web VMs could not communicate with each other before, as there was no way of being able to route between the 2 networks. Once the DLR has been deployed and connected to the interfaces as listed above, each VM can now talk to the other from the other subnet.

 

That’s is it and its as simple as that. Now obviously you can configure these DLR’s to have dynamic routing via OSPF or BGP…etc should you deploy these in an enterprise network with external connectivity which I’m not going to go in to but the above should give you a high level, decent understanding of how to deploy the DLR and get things going to begin with.

In the next post, we’d look at Layer 2 bridging.

Slide credit goes to VMware…!!

Cheers

Chan

Next: 7. NSX L2 Bridging ->

 

vSphere Troubleshooting Commands & Tools – Summary

I’ve attended the vSphere Troubleshooting Workshop (5.5) this week at EMC Brentford (delivered by VMware Education) and found the whole course and the content covered to be a good refresher in to some key vSphere troubleshooting commands & tools available that I have used often when troubleshooting issues. And since they say sharing is caring,  (with an ulterior motive of documenting it all in one place for my own future reference too), I thought I would summarise the key commands and tools covered in the course, with some additional information all in one place for the easy of reference.

First of all, a brief intro to the course…,

The vSphere Troubleshooting course is not really a course per se, but more a workshop which consist of 2 aspects.

  • 30% Theory content – Mostly consist of quick reminders of each major component that makes up vSphere, their architecture and what can possibly go wrong in their configuration & operational life.

 

  • 70% Actual troubleshooting of vSphere issues – A large number of lab based exercises where you have to troubleshoot a number of deliberately created issues (issues simulating real life configuration issues. Note that performance issues are NOT part of this course). Before each lab, there’s a pre-configured powerCLI script you need to run (provided by VMware) which breaks / deliberately mis-configure something in a functioning vSphere environment and it is then your job to work out what was the root cause and fix it.  Another powerCLI script is run at the end will verify that you’ve addressed the correct root cause and fixed it properly (as VMware intended)

A little more about the troubleshooting itself first, during the lab exercises, you are encouraged to use any method necessary to fix the given issues such as command line, GUI  (web client), VMware KB articles….. But I found the best experience to be to try and stick to command line where possible, which turned out to be a very good way of giving myself a refresher on all the various command line tools and logs available within VMware vSphere,  yet I don’t get to use often in my normal day to day life. I attended this course primarily because its supposed to aid towards the preparation of the VMware VCAP-DCA certification I’m planning to take soon and if you are planning for the same, unless you are in a dedicated 2nd line or 3rd line VMware support role in your daily life where you are bound to know most of the commandlets by heart, I’d encourage you to attend this course too. It wont give you very many silver bullets when it comes to ordinary troubleshooting but it makes you work over and over again with some of the command line tools and logs you previously would have used very occasionally at best.  (for example, I learnt a lot about the various use of esxcli command in the course which was real handy. Before the course, I was aware of the esxcli command, and have used in few times to do couple of tasks but never looked at the whole hierarchy and their application to troubleshoot and fix various vSphere issues)

It may also be important to mention that there’s a dedicated lab on setting up SSL Certificates for communication between all key vSphere components (a very tedious task by the way) which some may find quite useful.

So, the aim of this post is to summarise some key commands covered within the course, in a easy to read hierarchical format which you can use for troubleshooting VMware vSphere configuration issues, all in one place. (if you are an expert of vSphere troubleshooting, I’d advice on taking a rain check on the rest of this post)

The below commands can be run in ESXi shell, vCLI, SSH session or within the vMA (vSphere Management Assistant – highly recommend that you deploy this and integrate it with your active directory)

  • Generic Commands Available

    • vSphere Management Assistant appliance – Recommended, safest way to execute commands
      • vCLI commands
        • esxcli-* commands
          • Primary set of commands to be used for most ESXi host based operations
          • VMware online reference
            • esxcli device 
              • Lists descriptions of device commands.
            • esxcli esxcli
              • Lists descriptions of esxcli commands.
            • esxcli fcoe
              • FCOE (Fibre Channel over Ethernet) commands
            • esxcli graphics
              • Graphics commands
            • esxcli hardware
              • Hardware namespace. Used primarily for extracting information about the current system setup.
            • esxcli iscsi
              • iSCSI namespace for monitoring and managing hardware and software iSCSI.
            • esxcli network
              • Network namespace for managing virtual networking including virtual switches and VMkernel network interfaces.
            • esxcli sched
              • Manage the shared system-wide swap space.
            • esxcli software
              • Software namespace. Includes commands for managing and installing image profiles and VIBs.
            • esxcli storage
              • Includes core storage commands and other storage management commands.
            • esxcli system
              • System monitoring and management command.
            • esxcli vm
              • Namespace for listing virtual machines and shutting them down forcefully.
            • esxcli vsan
              • Namespace for VSAN management commands. See the vSphere Storage publication for details.
        • vicfg-* commands
          • Primarily used for managing Storage, Network and Host configuration
          • Can be run against ESXi systems or against a vCenter Server system.
          • If the ESXi system is in lockdown mode, run commands against the vCenter Server
          • Replaces most of the esxcfg-* commands. A direct comparison can be found here
          • VMware online reference
            • vicfg-advcfg
              • Performs advanced configuration including enabling and disabling CIM providers. Use this command as instructed by VMware.
            • vicfg-authconfig
              • Manages Active Directory authentication.
            • vicfg-cfgbackupBacks up the configuration data of an ESXi system and
              • Restores previously saved configuration data.
            • vicfg-dns
              • Specifies an ESX/ESXi host’s DNS configuration.
            • vicfg-dumppart
              • Manages diagnostic partitions.
            • vicfg-hostops
              • Allows you to start, stop, and examine ESX/ESXi hosts and to instruct them to enter maintenance mode and exit from maintenance mode.
            • vicfg-ipsec
              • Supports setup of IPsec.
            • vicfg-iscsi
              • Manages iSCSI storage.
            • vicfg-module
              • Enables VMkernel options. Use this command with the options listed, or as instructed by VMware.
            • vicfg-mpath
              • Displays information about storage array paths and allows you to change a path’s state.
            • vicfg-mpath35
              • Configures multipath settings for Fibre Channel or iSCSI LUNs.
            • vicfg-nas
              • Manages NAS file systems.
            • vicfg-nics
              • Manages the ESX/ESXi host’s NICs (uplink adapters).
            • vicfg-ntp
              • Specifies the NTP (Network Time Protocol) server.
            • vicfg-rescan
              • Rescans the storage configuration.
            • vicfg-route
              • Lists or changes the ESX/ESXi host’s route entry (IP gateway).
            • vicfg-scsidevs
              • Finds available LUNs.
            • vicfg-snmp
              • Manages the Simple Network Management Protocol (SNMP) agent.
            • vicfg-syslog
              • Specifies the syslog server and the port to connect to that server for ESXi hosts.
            • vicfg-user
              • Creates, modifies, deletes, and lists local direct access users and groups of users.
            • vicfg-vmknic
              • Adds, deletes, and modifies virtual network adapters (VMkernel NICs).
            • vicfg-volume
              • Supports resignaturing a VMFS snapshot volume and mounting and unmounting the snapshot volume.
            • vicfg-vswitch
              • Adds or removes virtual switches or vNetwork Distributed Switches, or modifies switch settings.
        • vmware-cmd commands
          • Commands implemented in Perl that do not have a vicfg- prefix.
          • Performs virtual machine operations remotely including creating a snapshot, powering the virtual machine on or off, and getting information about the virtual machine.
          • VMware online reference
            • vmware-cmd <path to the .vmx file> <VM operations>
        • vmkfstools command
          • Creates and manipulates virtual disks, file systems, logical volumes, and physical storage devices on ESXi hosts.
          • VMware online reference
    • ESX shell / SSH
      • esxcli-* commandlets
        • Primary set of commands to be used for most ESXi host based operations
        • VMware online reference
          • esxcli device 
            • Lists descriptions of device commands.
          • esxcli esxcli
            • Lists descriptions of esxcli commands.
          • esxcli fcoe
            • FCOE (Fibre Channel over Ethernet) commands
          • esxcli graphics
            • Graphics commands
          • esxcli hardware
            • Hardware namespace. Used primarily for extracting information about the current system setup.
          • esxcli iscsi
            • iSCSI namespace for monitoring and managing hardware and software iSCSI.
          • esxcli network
            • Network namespace for managing virtual networking including virtual switches and VMkernel network interfaces.
          • esxcli sched
            • Manage the shared system-wide swap space.
          • esxcli software
            • Software namespace. Includes commands for managing and installing image profiles and VIBs.
          • esxcli storage
            • Includes core storage commands and other storage management commands.
          • esxcli system
            • System monitoring and management command.
          • esxcli vm
            • Namespace for listing virtual machines and shutting them down forcefully.
          • esxcli vsan
            • Namespace for VSAN management commands. See the vSphere Storage publication for details.
      • esxcfg-* commands (deprecated but still works on ESXi 5.5)
        • VMware online reference
      • vmkfstools command
        • Creates and manipulates virtual disks, file systems, logical volumes, and physical storage devices on ESXi hosts.
        • VMware online reference

 

  • Log File Locations

    • vCenter Log Files
      • Windows version
        • C:\Documents and settings\All users\Application Data\VMware\VMware VirtualCenter\Logs
        • C:\ProgramData\Vmware\Vmware VirtualCenter\Log
      • Appliance version
        • /var/log
      • VMware KB for SSO log files
    • ESXi Server Logs
      • /var/log (Majority of ESXi log location)
      • /etc/vmware/vpxa/vpxa.cfg (vpxa/vCenter agent configuration file)
      • VMware KB for all ESXi log file locations
      • /etc/opt/VMware/fdm (FDM agent files for HA configuration)
    • Virtual Machine Logs
      • /vmfs/volumes/<directory name>/<VM name>/VMware.log (Virtual machine log file)
      • /vmfs/volumes/<directory name>/<VM name>/<*.vmdk files> (Virtual machine descriptor files with references to CID numbers of itself and parent vmdk files if snapshots exists)
      • /vmfs/volumes/<directory name>/<VM name>/<*.vmx files> (Virtual machine configuration settings including pointers to vmdk files..etc>

 

  • Networking commands (used to identify and fix network configuration issues)

    • Basic network troubleshooting commands
    • Physical Hardware Troubleshooting
      • lspci -p
    • Traffic capture commands
      • tcpdump-uw
        • Works with all versions of ESXi
        • Refer to VMware KB for additional information
      • pktcap-uw
        • Only works with ESXi 5.5
        • Refer to VMware KB for additional information
    • Telnet equivilent
      • nc command (netcat)
        • Used to verify that you can reach a certain port on a destination host (similar to telnet)
        • Run on the esxi shell or ssh
        • Example:  nc -z <ip address of iSCSI server> 3260 check if the iSCSI port can be reached from esxi to iSCSI server
        • VMware KB article
    • Network performance related commands
      • esxtop (ESXi Shell or SSH) & resxtop (vCli) – ‘n’ for networking

 

  • Storage Commands (used to identify & fix vaious storage issues)

    • Basic storage commands
    • VMFS metadata inconsistencies
      • voma command (VMware vSphere Ondisk Metadata Analyser)
        • Example: voma -m vmfs -f check -d /vmfs/devices/disks/naa.xxxx:y (where y is the partition number)
        • Refer to VMware KB article for additional information
    • disk space utilisation
      • df command
    • Storage performance related commands
      • esxtop (ESXi Shell or SSH) & resxtop (vCli) – ‘n‘ for networking

 

  • vCenter server commands (used to identify & fix vCenter, SSO, Inventory related issues)

    • Note that most of the commands available here are Windows commands that can be used to troubleshoot these issues which I wont mention here. Only few key VMware vSphere specific commands are mentioned below instead.
    • SSO
      • ssocli command (C:\Program Files\VMware\Infrastructure\SSOServer\utils\ssocli)
    • vCenter
      • vpxd.exe command (C:\Program Files\VMware\Infrastructure\VirtualCenter Server\vpxd.exe)
      • vpxd

 

  • Virtual Machine related commands (used to identify & fix VM related issues)
    • Generic VM commands
      • vmware-cmd commands (vCLI only)
      • vmkfstools command
    • File locking issues
      • touch command
      • vmkfstools -D command
        • Example: vmkfstools -D /vmfs/volumes/<directory name>/<VM name>/<VM Name.vmdk> (shows the MAC address of the ESXi server with the file lock. it its locked by the same esxi server as where the command was run, ‘000000000000’ is shown)
      • lsof command (identifies the process locking the file)
        • Example: lsof | grep <name of the locked file>
      • kill command (kills the process)
        • Example: kill <PID>
      • md5sum command (used to calculate file checksums)

 

Please note that this post (nor the vSphere Troubleshooting Course) does NOT cover every single command available for troubleshooting different vSphere components but only cover a key subset of the commands that are usually required 90% of the time. Hopefully having them all in one place within this post would be handy for you to look them up. I’ve provided direct links to VMware online documentation for each command above so you can delve further in to each command.

Good luck with your troubleshooting work..!!

Command line rules….!!

Cheers

Chan