Reporting on RPO violations from vSphere Replication

vSphere Replication was introduced with vSphere 5.0 and SRM 5.0 providing a way for customers without the ability to leverage storage array replication to utilize Site Recovery Manager (SRM) within their environments as their disaster recovery solution.

With the release of vSphere 5.1 vSphere Replication also became available as a standalone replication solution included with Essentials Plus and higher vSphere license editions. Customers now have the option of deploy vSphere Replication on its own to provide basic per-VM replication for use cases where SRM’s advanced DR orchestration capabilities were not applicable.

As adoption of vSphere Replication is growing amongst the vSphere 5.1 customer base so has the number of requests for ideas, hints and tips on what kind of reporting could be added to provide usage information for the various events that could occur amongst a collection of replicated VM’s.

Some examples would be:

  • highlight when a given VM violates its RPO
  • detect when the violated RPO state is restored
  • compute the time duration for the RPO violation
  • collate the total amount of data replicated for a given VM over a given time period

The number of customers asking for this kind of information led to this blog being written and the scripts being created in conjunction with Lee Dilworth who is a Principal Systems Engineer with VMware and SRM expert.

In this initial article we have included some scripts that will provide the administrator with a means to produce a csv file that contains (for each replicated VM) a history of its RPO violations, when they started/ended and how long each lasted.

Working with events

PowerCLI has a great cmdlet which allows us to work with vSphere Events, this means we can programmatically display and export the information we need by finding a list of events which have been produced in vSphere.  This cmdlet is called Get-VIEvent, more information on its usage can be found here.

Retrieving all vSphere Replication events

With our first example, once connected to the source replication vCenter Server in PowerCLI through the Connect-VIServer cmdlet we can easily pull any events which have been generated by vSphere Replication and export them straight into a CSV file onto the current users desktop by using the following One-Liner:

Get-VIEvent -MaxSamples ([int]::MaxValue) | Where { $_.EventTypeId -match "hbr|rpo" } | Select CreatedTime, FullFormattedMessage, @{Name="VMName";Expression={$_.Vm.Name}} | export-csv -NoTypeInformation -Path ([Environment]::GetFolderPath("Desktop") + "\HBR-RPOEvents.csv")

Example formatted output:

TinyGrab Screen Shot 13-06-2013 15.45.59

Creating the RPO Report

Now we are able to retrieve the event information for RPO events we can format the data to give us exactly what we needed from the report, again this will output a file to the users desktop, make sure you are connected to the source vCenter server and then run this script:

Write-Host "[$(Get-Date)] Retrieving VMs"
$VMs = Get-VM

$Results = @()
Foreach ($VM in $VMs) {
	Write-Host "[$(Get-Date)] Retrieving events for $($VM.name)"
	$Events = Get-VIEvent -MaxSamples ([int]::MaxValue) -Entity $VM
	Write-Host "[$(Get-Date)] Filtering RPO events for $($VM.name)"
	$RPOEvents = $Events | where { $_.EventTypeID -match "rpo" } | Where { $_.Vm.Name -eq $VM.Name } | Select EventTypeId, CreatedTime, FullFormattedMessage, @{Name="VMName";Expression={$_.Vm.Name}} | Sort CreatedTime
	if ($RPOEvents) {
		$Count = 0
		Write-Host "[$(Get-Date)] Finding replication results for $($VM.Name)"
		do {
			$details = "" | Select VMName, ViolationStart, ViolationEnd, Mins
			if ($RPOEvents[$count].EventTypeID -match "Violated") {
				If (-not $details.Start) {
					$Details.VMName = $RPOEvents[$Count].VMName
					$Details.ViolationStart = $RPOEvents[$Count].CreatedTime
					Do {
					$Count++
					} until (($RPOEvents[$Count].EventTypeID -match "Restored") -or ($Count -gt $RPOEvents.Count))
					if ($RPOEvents[$count].EventTypeID -match "Restored") {
						$details.ViolationEnd = $RPOEvents[$Count].CreatedTime
						$Time = $details.ViolationEnd - $details.ViolationStart
						$details.Mins = "{0:N2}" -f $Time.TotalMinutes
					} Else {
						$details.ViolationEnd = "No End Date"
						$details.Mins = "N/A"
					}
				}
			}
			$Results += $details
			$Count++
		} until ($count -gt $RPOEvents.Count)
	}
}
$Results | export-csv -NoTypeInformation -Path ([Environment]::GetFolderPath("Desktop") + "\ViolationReport.csv")

Example formatted output:

TinyGrab Screen Shot 14-06-2013 14.31.37

Alan

Alan Renouf has a role of Automation Frameworks Product Manager at VMware responsible for providing the architects and operators of the cloud infrastructure with the toolkits/frameworks and command-line interfaces they require to build a fully automated software-defined datacenter. Alan is a frequent blogger at http://blogs.vmware.com/vipowershell and has a personal blog at http://virtu-al.net. You can follow Alan on twitter as @alanrenouf.

You may also like...

8 Responses

  1. Brian Frank says:

    Diego, I had never thought about scripting that action but that would be fantastic!!! Wonder if it’s possible?

    • Diego says:

      I was playing around with vim-cmd hbrsvc/ yesterday and although there are subcommands to pause/resume/reconfig a replication session, I could not make a paused session be resumed. I am kind of giving up for now and have already requested this feature to be implemented by VMware in a future release.

  2. Diego says:

    Hi Alan,

    Nice work here (and with vCheck as well), thanks for that!

    I was looking for a script which would return VMs with replication status = paused, due to disks being added/removed from a VM. I would then need to invoke a task to basically ignore those messages and resume replication. Do you know whether that could be done?

    Thanks!

  3. Dan Sylvester says:

    Alan, interesting script.. one of the examples you mentioned was “collate the total amount of data replicated for a given VM over a given time period.” Is there an existing script, or report I can run that can show me this metric? Thanks a ton!

  4. That’s a cool report Alan. I think an important concept is Recovery Point Estimate (RPE) which means what is the recovery point that is available at any given time. Watching this dynamic value over time can show up any potential risks before they become an issue (with associated SLA breach). I talk more extensively about this on a recent blog which you can read here – http://bit.ly/1etd3Qr

  5. Brian Frank says:

    Alan this is an awesome report. Do you think it would be possible to show the amount of data transferred while the RPO was failed? From there it wouldn’t be very difficult to calculate the average transfer speed. This would be very helpful when troubleshooting RPO. Thanks!!

  1. June 19, 2013

    [...] out the blog post that covers this scripting solution over at Virtu-Al.net that solves this enigma for you.  It’s a great job by Alan and [...]

Leave a Reply

%d bloggers like this: