Alarm expressions - Part 1 : Metric alarms

In a previous entry (Scripts for Yellow Bricks’ advise: Thin Provisioning alarm & eagerZeroedThick) I showed how you could use performance metrics to fire an alarm. The MetricAlarmExpression in that script requires a PerfMetricId to specify which performance metric the alarm should monitor. The counterId in that object is an integer and it is perhaps not too obvious which value corresponds with which metric.

This blog entry shows how you can quickly get a list of permitted counterIds (and instances) for a specific entity. And it will show how to create some “impossible” alarms !

In the vSphere environment all things “performance” are managed by the PerformanceManager. In the PerformanceManager object the perfCounter property contains a list of all the performance counters the system supports.

But not every entity supports every performance counter. That is why I will use the QueryAvailablePerfMetric method to obtain the list of metrics for a specific entity.

The following script will produce a CSV file with the available metrics per entity type (cluster, host, virtual machine, datastore…) per interval.

Update September 8th 2010: This is a complete rewrite of the “Metrics” script. There was in fact no need to query the metrics for each historical interval, it was sufficient to query the metrics for “Current” (realtime) and “Summary” (historical). The new script also contains some tests which will avoid error messages when a specific entity is not present on the vCenter.

The metrics

function Get-PerfCounterList{
param($pm)
$list = @{}
$pm.PerfCounter | % {
$obj = "" | Select Name, Level, Summary
$obj.Name = $_.GroupInfo.key + "." + $_.NameInfo.key + "." + $pm.Description.CounterType[$_.RollupType].Key
$obj.Level = $_.Level
$obj.Summary = $_.NameInfo.Summary
$list[$_.Key] = $obj
}
$list
}
function Get-StatId{
param($entity, $pm, $perfCounterList)
$result = @()
$perfProvider = $pm.QueryPerfProviderSummary($entity.MoRef)
if($perfProvider.SummarySupported){
$perfMetrics = $pm.QueryAvailablePerfMetric($entity.MoRef,$null,$null,$null)
if($perfMetrics){
foreach($metric in $perfMetrics){
$row = "" | Select Entity, Interval, CounterId, Stat, Instance, Level, Summary
$row.Entity = $entity.GetType().Name
$row.Interval = "Aggregate"
$row.CounterId = $metric.CounterId
$row.Stat = $perfCounterList[$metric.CounterId].Name
$row.Instance = $metric.Instance
$row.Level = $perfCounterList[$metric.CounterId].Level
$row.Summary = $perfCounterList[$metric.CounterId].Summary
$result += $row
}
}
}
if($perfProvider.CurrentSupported){
$perfMetrics = $pm.QueryAvailablePerfMetric($entity.MoRef,$null,$null,$perfProvider.refreshRate)
foreach($metric in $perfMetrics){
$row = "" | Select Entity, Interval, CounterId, Stat, Instance, Level, Summary
$row.Entity = $entity.GetType().Name
$row.Interval = "RealTime"
$row.CounterId = $metric.CounterId
$row.Stat = $perfCounterList[$metric.CounterId].Name
$row.Instance = $metric.Instance
$row.Level = $perfCounterList[$metric.CounterId].Level
$row.Summary = $perfCounterList[$metric.CounterId].Summary
$result += $row
}
}
$result
}
$pm = Get-View (Get-View ServiceINstance).Content.PerfManager
$perfCounterList = Get-PerfCounterList $pm
$report = @()
# Datacenter
$dc = Get-Datacenter | Select-Object -First 1 | Get-View
$report += (Get-StatId $dc $pm $perfCounterList)
# Datastore
$ds = Get-Datastore | Select-Object -First 1 | Get-View
if($ds){
$report += (Get-StatId $ds $pm $perfCounterList)
}
# VirtualMachine
$vm = Get-VM | where{$_.PowerState -eq 'PoweredOn'} | Select-Object -First 1 | Get-View
if($vm){
$report += (Get-StatId $vm $pm $perfCounterList)
}
# HostSystem
$vmhost = Get-VmHost | where{$_.State -eq 'Connected'} | Select-Object -First 1 | Get-View
if($vmhost){
$report += (Get-StatId $vmhost $pm $perfCounterList)
}
# ClusterComputeResource
$cluster = Get-Cluster | Select-Object -First 1 | Get-View
if($cluster){
$report += (Get-StatId $cluster $pm $perfCounterList)
}
# ResourcePool
$respool = Get-ResourcePool | Select-Object -First 1 | Get-View
$report += (Get-StatId $respool $pm $perfCounterList)
$report | Export-Csv "C:\Stat-Ids.csv" -NoTypeInformation -UseCulture

function Get-PerfCounterList{

param($pm)

$list = @{}

$pm.PerfCounter | % {

$obj = "" | Select Name, Level, Summary

$obj.Name = $_.GroupInfo.key + "." + $_.NameInfo.key + "." + $pm.Description.CounterType[$_.RollupType].Key

$obj.Level = $_.Level

$obj.Summary = $_.NameInfo.Summary

$list[$_.Key] = $obj

}

$list

}

function Get-StatId{

param($entity, $pm, $perfCounterList)

$result = @()

$perfProvider = $pm.QueryPerfProviderSummary($entity.MoRef)

if($perfProvider.SummarySupported){

$perfMetrics = $pm.QueryAvailablePerfMetric($entity.MoRef,$null,$null,$null)

if($perfMetrics){

foreach($metric in $perfMetrics){

$row = "" | Select Entity, Interval, CounterId, Stat, Instance, Level, Summary

$row.Entity = $entity.GetType().Name

$row.Interval = "Aggregate"

$row.CounterId = $metric.CounterId

$row.Stat = $perfCounterList[$metric.CounterId].Name

$row.Instance = $metric.Instance

$row.Level = $perfCounterList[$metric.CounterId].Level

$row.Summary = $perfCounterList[$metric.CounterId].Summary

$result += $row

}

if($perfProvider.CurrentSupported){

$perfMetrics = $pm.QueryAvailablePerfMetric($entity.MoRef,$null,$null,$perfProvider.refreshRate)

foreach($metric in $perfMetrics){

$row = "" | Select Entity, Interval, CounterId, Stat, Instance, Level, Summary

$row.Entity = $entity.GetType().Name

$row.Interval = "RealTime"

$row.CounterId = $metric.CounterId

$row.Stat = $perfCounterList[$metric.CounterId].Name

$row.Instance = $metric.Instance

$row.Level = $perfCounterList[$metric.CounterId].Level

$row.Summary = $perfCounterList[$metric.CounterId].Summary

$result += $row

}

$result

}

$pm = Get-View (Get-View ServiceINstance).Content.PerfManager

$perfCounterList = Get-PerfCounterList $pm

$report = @()

# Datacenter

$dc = Get-Datacenter | Select-Object -First 1 | Get-View

$report += (Get-StatId $dc $pm $perfCounterList)

# Datastore

$ds = Get-Datastore | Select-Object -First 1 | Get-View

if($ds){

$report += (Get-StatId $ds $pm $perfCounterList)

}

# VirtualMachine

$vm = Get-VM | where{$_.PowerState -eq 'PoweredOn'} | Select-Object -First 1 | Get-View

if($vm){

$report += (Get-StatId $vm $pm $perfCounterList)

}

# HostSystem

$vmhost = Get-VmHost | where{$_.State -eq 'Connected'} | Select-Object -First 1 | Get-View

if($vmhost){

$report += (Get-StatId $vmhost $pm $perfCounterList)

}

# ClusterComputeResource

$cluster = Get-Cluster | Select-Object -First 1 | Get-View

if($cluster){

$report += (Get-StatId $cluster $pm $perfCounterList)

}

# ResourcePool

$respool = Get-ResourcePool | Select-Object -First 1 | Get-View

$report += (Get-StatId $respool $pm $perfCounterList)

$report | Export-Csv "C:\Stat-Ids.csv" -NoTypeInformation -UseCulture

Annotations

Line 1-13: in this function I create a hash table where the key is performance counter Id and the value is an object that contains specific information about the metric: the composite name (group-name-rollup), the statistics level where it is available and the summary description of the metric

Line 21: In the PerfProvider for an entity we can find out if the entity has “realtime” and/or “aggregate” metrics

Line 23-38: this part of the function retrieves the available “aggregate” metrics if they exist

Line 24: the QueryAvailablePerfMetric method is where the script finds out which metrics are available for a specific entity.

Line 40-53: this part of the function retrieves the available “realtime”metrics if they exist

Line 62-92: the metrics are collected for all entities that have performance metrics. You can leave out specific entities if you don’t want those metrics in the list.

Line 73,79: a powered-off guest or a disconnected host will not return all the possible metrics. By adding a where-clause this problem is avoided.

This resulting list makes it easy to define MetricAlarmExpression objects. In fact you can now create alarms that you can not create via the vSphere client !

As an example, suppose you want to be alerted when one of your ESX servers is receiving network traffic over a specific watermark.

In the vSphere client you can create an alarm for Network Usage, but that will be the total of incoming and outgoing network traffic. And here I want to create an alarm for incoming network traffic only.

From the CSV file it’s simple to find that the script will need to use performance counter Id 102.

Note that the instances you see in this CSV file are not necessarily all the possible instances you can encounter in your vSphere environment.

For example there could be a virtual machine with 10 hard disks connected. If that guest was not used to compile the available metrics for the VirtualMachine object, you would not see for example all the available instances in the virtualdisk group !

In the screenshot you can see that the script used a guest with 2 virtual hard disks to compile the CSV.

The alarm

The following script will create the alarm.

$esxName = "esx1.lucd.info"
$mailTo = "luc.dekens@lucd.info"
$alarmMgr = Get-View AlarmManager
$entity = Get-VmHost $esxName | Get-View
# AlarmSpec
$alarm = New-Object VMware.Vim.AlarmSpec
$alarm.Name = "Net received rate"
$alarm.Description = "Testing network related alarms"
$alarm.Enabled = $TRUE
# Action - Send email
$alarm.action = New-Object VMware.Vim.GroupAlarmAction
$trigger = New-Object VMware.Vim.AlarmTriggeringAction
$trigger.action = New-Object VMware.Vim.SendEmailAction
$trigger.action.ToList = $mailTo
$trigger.action.Subject = "Net received alarm"
$trigger.Action.CcList = ""
$trigger.Action.Body = ""
# Transition a - yellow --> red
$transa = New-Object VMware.Vim.AlarmTriggeringActionTransitionSpec
$transa.StartState = "yellow"
$transa.FinalState = "red"
# Transition b - red --> yellow
$transb = New-Object VMware.Vim.AlarmTriggeringActionTransitionSpec
$transb.StartState = "red"
$transb.FinalState = "yellow"
$trigger.TransitionSpecs += $transa
$trigger.TransitionSpecs += $transb
$alarm.action.action += $trigger
# Expression - Network data receive rate
$expression = New-Object VMware.Vim.MetricAlarmExpression
$expression.Metric = New-Object VMware.Vim.PerfMetricId
$expression.Metric.CounterId = 102
$expression.Metric.Instance = ""
$expression.Operator = "isAbove"
$expression.Red = 300
$expression.Yellow = 150
$expression.Type = "HostSystem"
$alarm.expression = New-Object VMware.Vim.OrAlarmExpression
$alarm.expression.expression += $expression
$alarm.setting = New-Object VMware.Vim.AlarmSetting
$alarm.setting.reportingFrequency = 0
$alarm.setting.toleranceRange = 0
# Create alarm.
$alarmMgr.CreateAlarm($entity.MoRef, $alarm)

$esxName = "esx1.lucd.info"

$mailTo = "luc.dekens@lucd.info"

$alarmMgr = Get-View AlarmManager

$entity = Get-VmHost $esxName | Get-View

# AlarmSpec

$alarm = New-Object VMware.Vim.AlarmSpec

$alarm.Name = "Net received rate"

$alarm.Description = "Testing network related alarms"

$alarm.Enabled = $TRUE

# Action - Send email

$alarm.action = New-Object VMware.Vim.GroupAlarmAction

$trigger = New-Object VMware.Vim.AlarmTriggeringAction

$trigger.action = New-Object VMware.Vim.SendEmailAction

$trigger.action.ToList = $mailTo

$trigger.action.Subject = "Net received alarm"

$trigger.Action.CcList = ""

$trigger.Action.Body = ""

# Transition a - yellow --> red

$transa = New-Object VMware.Vim.AlarmTriggeringActionTransitionSpec

$transa.StartState = "yellow"

$transa.FinalState = "red"

# Transition b - red --> yellow

$transb = New-Object VMware.Vim.AlarmTriggeringActionTransitionSpec

$transb.StartState = "red"

$transb.FinalState = "yellow"

$trigger.TransitionSpecs += $transa

$trigger.TransitionSpecs += $transb

$alarm.action.action += $trigger

# Expression - Network data receive rate

$expression = New-Object VMware.Vim.MetricAlarmExpression

$expression.Metric = New-Object VMware.Vim.PerfMetricId

$expression.Metric.CounterId = 102

$expression.Metric.Instance = ""

$expression.Operator = "isAbove"

$expression.Red = 300

$expression.Yellow = 150

$expression.Type = "HostSystem"

$alarm.expression = New-Object VMware.Vim.OrAlarmExpression

$alarm.expression.expression += $expression

$alarm.setting = New-Object VMware.Vim.AlarmSetting

$alarm.setting.reportingFrequency = 0

$alarm.setting.toleranceRange = 0

# Create alarm.

$alarmMgr.CreateAlarm($entity.MoRef, $alarm)

Annotations

Line 44-45: the watermarks are defined in KBps. The values in the script are values for demonstration purposes only. Correct these for your own environment.

In the vSPhere client you will see the following warning when you edit the settings of the alarm and select the Triggers tab.

But rest assured, this alarm works and will fire when the metric goes over the yellow and red watermarks.

9 Comments

Alex

May 4, 2015 at 17:40

Thank you again Luc!
I even gave up counting how many hours saved me your amazing scripts!

LucD

May 4, 2015 at 19:15

Well, thank you, much appreciated.

Ziof3ster

November 15, 2011 at 15:09

Hi LucD,

this script is fantastic and I’ve modified it to my enfvironment (Datacenter Level) to monitor the disk latency.

One question. If I want modify the email’s subject using the {eventDescription} variable, it doesn’t work, and simply prints “eventDescription”.

I’ve used the environmental variable successfully inf I create the AlarmAction using the “new-AlarmAction” cmlet. Any Idea how I can use the Enviromental Variables inside your script?

Thanks

November 15, 2011 at 20:52

Thanks.
I just changed the line where I specify the subject to
$trigger.action.Subject = "{eventDescription}"
The email I received got the correct subject.

Alarm email action

If that doesn’t work for you, send me the script you are actually using (to lucd (at) lucd (dot) info)

Tom

November 18, 2010 at 16:06

Luc,

Have you been able to create an alarm using the perfcounter with counterID = 2 “CPU Usage” on a “ClusterComputeResource” ? When you peform a QueryAvailablePerfMetric on a cluster it returns the perfMetric with counter ID 2, so expect it should work however I keep on getting a specified parameter is incorrect…

Any ideas ?

Thanks,
Tom

jrob24

August 18, 2010 at 20:54

Every time I run the script to gather all of the metrics for each entity I keep receiving the following error:

Exception calling “QueryAvailablePerfMetric” with “4” argument (s): “entity”

Any ideas? Running against a 4.0 U2 host and vCenter. Using PowerCLI 4.1.

August 19, 2010 at 10:26

Hi Jason, thanks for discovering this bug.
The new version of the script should solve the problem.

Tim

March 30, 2010 at 12:39

Hi,

I’ve modified the above to take parameters for VM, alarm name, description, counterid and red and yellow thresholds. Fairly simple modification really, however the alarm (alarm 129) is always triggering and the e-mail shows:

Alarm Definition:
[Yellow metric Is above 0%; Red metric Is above 0%]

Despite me passing in 85/95 as yellow/red. When I try and drill down into the alarm settings for the created alarm:
$alarm = get-view ($alarms | where-object {$_.Value -eq “alarm-50”})
$alarm.info.expression.expression.metric

I don’t get anything back from the metric. The level above ($alarm.info.expression.expression) shows:
Operator : isAbove
Type : VirtualMachine
Metric : VMware.Vim.PerfMetricId
Yellow : 85
YellowInterval : 0
Red : 95
RedInterval : 0
DynamicType :
DynamicProperty :

Any idea?

Cheers,

March 30, 2010 at 20:46

Hi Tim, when you use a metricId that returns percentages you have to multiply the desired threshold values by 100.
For example, if you want the alarm to trigger at 75% you would have to specify 7500 in the yellow or red property.

Alarm expressions – Part 1 : Metric alarms

The metrics

Annotations

The alarm

Annotations

9 Comments

Alex

LucD

Ziof3ster

LucD

Tom

jrob24

LucD

Tim

LucD

Leave a Reply Cancel reply