Reporting on RPO violations from vSphere Replication

vSphere Replication was introduced with vSphere 5.0 and SRM 5.0 providing a way for customers without the ability to leverage storage array replication to utilize Site Recovery Manager (SRM) within their environments as their disaster recovery solution.

With the release of vSphere 5.1 vSphere Replication also became available as a standalone replication solution included with Essentials Plus and higher vSphere license editions. Customers now have the option of deploy vSphere Replication on its own to provide basic per-VM replication for use cases where SRM’s advanced DR orchestration capabilities were not applicable.

As adoption of vSphere Replication is growing amongst the vSphere 5.1 customer base so has the number of requests for ideas, hints and tips on what kind of reporting could be added to provide usage information for the various events that could occur amongst a collection of replicated VM’s.

Some examples would be:

  • highlight when a given VM violates its RPO
  • detect when the violated RPO state is restored
  • compute the time duration for the RPO violation
  • collate the total amount of data replicated for a given VM over a given time period

The number of customers asking for this kind of information led to this blog being written and the scripts being created in conjunction with Lee Dilworth who is a Principal Systems Engineer with VMware and SRM expert.

In this initial article we have included some scripts that will provide the administrator with a means to produce a csv file that contains (for each replicated VM) a history of its RPO violations, when they started/ended and how long each lasted.

Working with events

PowerCLI has a great cmdlet which allows us to work with vSphere Events, this means we can programmatically display and export the information we need by finding a list of events which have been produced in vSphere.  This cmdlet is called Get-VIEvent, more information on its usage can be found here.

Retrieving all vSphere Replication events

With our first example, once connected to the source replication vCenter Server in PowerCLI through the Connect-VIServer cmdlet we can easily pull any events which have been generated by vSphere Replication and export them straight into a CSV file onto the current users desktop by using the following One-Liner:

Get-VIEvent -MaxSamples ([int]::MaxValue) | Where { $_.EventTypeId -match "hbr|rpo" } | Select CreatedTime, FullFormattedMessage, @{Name="VMName";Expression={$_.Vm.Name}} | export-csv -NoTypeInformation -Path ([Environment]::GetFolderPath("Desktop") + "\HBR-RPOEvents.csv")

Example formatted output:

TinyGrab Screen Shot 13-06-2013 15.45.59

Creating the RPO Report

Now we are able to retrieve the event information for RPO events we can format the data to give us exactly what we needed from the report, again this will output a file to the users desktop, make sure you are connected to the source vCenter server and then run this script:

Write-Host "[$(Get-Date)] Retrieving VMs"
$VMs = Get-VM

$Results = @()
Foreach ($VM in $VMs) {
	Write-Host "[$(Get-Date)] Retrieving events for $($VM.name)"
	$Events = Get-VIEvent -MaxSamples ([int]::MaxValue) -Entity $VM
	Write-Host "[$(Get-Date)] Filtering RPO events for $($VM.name)"
	$RPOEvents = $Events | where { $_.EventTypeID -match "rpo" } | Where { $_.Vm.Name -eq $VM.Name } | Select EventTypeId, CreatedTime, FullFormattedMessage, @{Name="VMName";Expression={$_.Vm.Name}} | Sort CreatedTime
	if ($RPOEvents) {
		$Count = 0
		Write-Host "[$(Get-Date)] Finding replication results for $($VM.Name)"
		do {
			$details = "" | Select VMName, ViolationStart, ViolationEnd, Mins
			if ($RPOEvents[$count].EventTypeID -match "Violated") {
				If (-not $details.Start) {
					$Details.VMName = $RPOEvents[$Count].VMName
					$Details.ViolationStart = $RPOEvents[$Count].CreatedTime
					Do {
					$Count++
					} until (($RPOEvents[$Count].EventTypeID -match "Restored") -or ($Count -gt $RPOEvents.Count))
					if ($RPOEvents[$count].EventTypeID -match "Restored") {
						$details.ViolationEnd = $RPOEvents[$Count].CreatedTime
						$Time = $details.ViolationEnd - $details.ViolationStart
						$details.Mins = "{0:N2}" -f $Time.TotalMinutes
					} Else {
						$details.ViolationEnd = "No End Date"
						$details.Mins = "N/A"
					}
				}
			}
			$Results += $details
			$Count++
		} until ($count -gt $RPOEvents.Count)
	}
}
$Results | export-csv -NoTypeInformation -Path ([Environment]::GetFolderPath("Desktop") + "\ViolationReport.csv")

Example formatted output:

TinyGrab Screen Shot 14-06-2013 14.31.37

16 thoughts on “Reporting on RPO violations from vSphere Replication

  1. usersc

    Hello by executing the command mentioned above will I get replication status of vm’s in vcenters??I want to fetch status of the consistency groups like whether it is Enabled- High load,Enabled-Active,NA.Will i get dat uisng the above command??

  2. james

    Hi , Creating the RPO Report script is fine. Is there any option to add one more column for Size.

  3. Riaan

    Hi Leo, did you ever get something to do this? The RPO and Sync Point columns are no longer in vSphere Replication v6

  4. Leo Ching

    Hi Alan, I have check the script that you posted. Can powerCLI extract out “Last sync duration, Last instance sync point and Last sync Size” into Creating the RPO Report?
    can you give some example for the scripting?

  5. Pingback: A private cloud – all for myself » Quick Post: One-Liner PowerCLI to get vSphere Replication Statistics

  6. imvivek

    Sorry, I am naive to SRM and have it running in our environment. To get this report, we have to connect to vCenter?

  7. Pingback: Plugin per Nagios | enricodurani

  8. Diego

    I was playing around with vim-cmd hbrsvc/ yesterday and although there are subcommands to pause/resume/reconfig a replication session, I could not make a paused session be resumed. I am kind of giving up for now and have already requested this feature to be implemented by VMware in a future release.

  9. Diego

    Hi Alan,

    Nice work here (and with vCheck as well), thanks for that!

    I was looking for a script which would return VMs with replication status = paused, due to disks being added/removed from a VM. I would then need to invoke a task to basically ignore those messages and resume replication. Do you know whether that could be done?

    Thanks!

  10. Dan Sylvester

    Alan, interesting script.. one of the examples you mentioned was “collate the total amount of data replicated for a given VM over a given time period.” Is there an existing script, or report I can run that can show me this metric? Thanks a ton!

  11. Douglas Hanley

    That’s a cool report Alan. I think an important concept is Recovery Point Estimate (RPE) which means what is the recovery point that is available at any given time. Watching this dynamic value over time can show up any potential risks before they become an issue (with associated SLA breach). I talk more extensively about this on a recent blog which you can read here – http://bit.ly/1etd3Qr

  12. Brian Frank

    Alan this is an awesome report. Do you think it would be possible to show the amount of data transferred while the RPO was failed? From there it wouldn’t be very difficult to calculate the average transfer speed. This would be very helpful when troubleshooting RPO. Thanks!!

  13. Pingback: VMware vSphere Blog: Reporting on RPO Violations From vSphere Replication | System Knowledge Base

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.