Alarm expressions – Part 1 : Metric alarms

In a previous entry (Scripts for Yellow Bricks’ advise: Thin Provisioning alarm & eagerZeroedThick) I showed how you could use performance metrics to fire an alarm. The MetricAlarmExpression in that script requires a PerfMetricId to specify which performance metric the alarm should monitor. The counterId in that object is an integer and it is perhaps not too obvious which value corresponds with which metric.

metric-alarm

This blog entry shows how you can quickly get a list of permitted counterIds (and instances) for a specific entity. And it will show how to create some “impossible” alarms !

In the vSphere environment all things “performance” are managed by the PerformanceManager. In the PerformanceManager object the perfCounter property contains a list of all the performance counters the system supports.

But not every entity supports every performance counter. That is why I will use the QueryAvailablePerfMetric method to obtain the list of metrics for a specific entity.

The following script will produce a CSV file with the available metrics per entity type (cluster, host, virtual machine, datastore…) per interval.

Update September 8th 2010: This is a complete rewrite of the “Metrics” script. There was in fact no need to query the metrics for each historical interval, it was sufficient to query the metrics for “Current” (realtime) and “Summary” (historical). The new script also contains some tests which will avoid error messages when a specific entity is not present on the vCenter.

The metrics

Annotations

Line 1-13: in this function I create a hash table where the key is performance counter Id and the value is an object that contains specific information about the metric: the composite name (group-name-rollup), the statistics level where it is available and the summary description of the metric

Line 21: In the PerfProvider for an entity we can find out if the entity has “realtime” and/or “aggregate” metrics

Line 23-38: this part of the function retrieves the available “aggregate” metrics if they exist

Line 24: the QueryAvailablePerfMetric method is where the script finds out which metrics are available for a specific entity.

Line 40-53: this part of the function retrieves the available “realtime”metrics if they exist

Line 62-92: the metrics are collected for all entities that have performance metrics. You can leave out specific entities if you don’t want those metrics in the list.

Line 73,79: a powered-off guest or a disconnected host will not return all the possible metrics. By adding a where-clause this problem is avoided.

This resulting list makes it easy to define MetricAlarmExpression objects. In fact you can now create alarms that you can not create via the vSphere client !

As an example, suppose you want to be alerted when one of your ESX servers is receiving network traffic over a specific watermark.

In the vSphere client you can create an alarm for Network Usage, but that will be the total of incoming and outgoing network traffic. And here I want to create an alarm for incoming network traffic only.

From the CSV file it’s simple to find that the script will need to use performance counter Id 102.

Perf-Id-102

Note that the instances you see in this CSV file are not  necessarily all the possible instances you can encounter in your vSphere environment.

For example there could be a virtual machine with 10 hard disks connected. If that guest was not used to compile the available metrics for the VirtualMachine object, you would not see for example all the available instances in the virtualdisk group !

In the screenshot you can see that the script used a guest with 2 virtual hard disks to compile the CSV.

The alarm

The following script will create the alarm.

Annotations

Line 44-45: the watermarks are defined in KBps. The values in the script are values for demonstration purposes only. Correct these for your own environment.

In the vSPhere client you will see the following warning when you edit the settings of the alarm and select the Triggers tab.

trigger-not-visible

But rest assured, this alarm works and will fire when the metric goes over the yellow and red watermarks.

9 Comments

    Alex

    Thank you again Luc!
    I even gave up counting how many hours saved me your amazing scripts!

      LucD

      Well, thank you, much appreciated.

    Ziof3ster

    Hi LucD,

    this script is fantastic and I’ve modified it to my enfvironment (Datacenter Level) to monitor the disk latency.

    One question. If I want modify the email’s subject using the {eventDescription} variable, it doesn’t work, and simply prints “eventDescription”.

    I’ve used the environmental variable successfully inf I create the AlarmAction using the “new-AlarmAction” cmlet. Any Idea how I can use the Enviromental Variables inside your script?

    Thanks

      LucD

      Thanks.
      I just changed the line where I specify the subject to

      $trigger.action.Subject = "{eventDescription}"

      The email I received got the correct subject.

      Alarm email action

      If that doesn’t work for you, send me the script you are actually using (to lucd (at) lucd (dot) info)

    Tom

    Luc,

    Have you been able to create an alarm using the perfcounter with counterID = 2 “CPU Usage” on a “ClusterComputeResource” ? When you peform a QueryAvailablePerfMetric on a cluster it returns the perfMetric with counter ID 2, so expect it should work however I keep on getting a specified parameter is incorrect…

    Any ideas ?

    Thanks,
    Tom

    jrob24

    Luc,

    Every time I run the script to gather all of the metrics for each entity I keep receiving the following error:

    Exception calling “QueryAvailablePerfMetric” with “4” argument (s): “entity”

    Any ideas? Running against a 4.0 U2 host and vCenter. Using PowerCLI 4.1.

      LucD

      Hi Jason, thanks for discovering this bug.
      The new version of the script should solve the problem.

    Tim

    Hi,

    I’ve modified the above to take parameters for VM, alarm name, description, counterid and red and yellow thresholds. Fairly simple modification really, however the alarm (alarm 129) is always triggering and the e-mail shows:

    Alarm Definition:
    [Yellow metric Is above 0%; Red metric Is above 0%]

    Despite me passing in 85/95 as yellow/red. When I try and drill down into the alarm settings for the created alarm:
    $alarm = get-view ($alarms | where-object {$_.Value -eq “alarm-50”})
    $alarm.info.expression.expression.metric

    I don’t get anything back from the metric. The level above ($alarm.info.expression.expression) shows:
    Operator : isAbove
    Type : VirtualMachine
    Metric : VMware.Vim.PerfMetricId
    Yellow : 85
    YellowInterval : 0
    Red : 95
    RedInterval : 0
    DynamicType :
    DynamicProperty :

    Any idea?

    Cheers,

    Tim

      LucD

      Hi Tim, when you use a metricId that returns percentages you have to multiply the desired threshold values by 100.
      For example, if you want the alarm to trigger at 75% you would have to specify 7500 in the yellow or red property.

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*

This site uses Akismet to reduce spam. Learn how your comment data is processed.