wiki:plugin-bcp

Best Current Practices for good plugin graphs

Attention: Content of this page should be moved to the Munin-Guide --> Visit the Guide now.

These are some guidelines that will make it easier to understand the graphs produced by your plugins.

Graph labeling

  • The different labels should be short enough to fit the graph
  • The label should be specific: "transaction volume" is better than "volume", "5 min load average" is better than "load average".
  • If the measure is a rate the time unit should be given in the vertical label: "bytes / ${graph_period}" is better than "bytes" or the even worse "throughput".
  • All the graph_* values specified by the plugin can be used in the title and vlabel values.

${graph_*} in plugin output will be magically replaced with the correct value by Munin.

This is a good example of all this:

http://munin.ping.uio.no/ping.uio.no/rossum.ping.uio.no-exim_mailstats.html

Values

Plugins that measure rates should strive to use absolute counters (COUNTER, DERIVE) rather than averages (GAUGE) calculated by an OS tool. E.g. iostat on Solaris will output counters rather than short term averages when given the right options. Counters will be much more correct since Munin can average the measure over its own sample interval instead - this will for example pick up short peaks in loads that Munin might otherwise not see.

Spikes and wraparound

To avoid spikes in the graph when counters are reset (as opposed to wrapping), use ${name}.type DERIVE and ${name}.min? 0. Note that this will cause lost data points when the counter wraps, and should therefore not be used with plugins that are expected to wrap more often than be reset (or sampled). An example of this is the Linux if_ plugin on 32bit machines with a busy (100Mbps) network.

Graph scaling

graph_args --base 1000 --lower-limit 0
graph_scale no

See graph_args for its documentation.

Choosing a scaler:

  • For disk bytes use 1024 as base (df and other Unix tools still use this though disks are sold assuming 1K=1000)
  • For RAM bytes use 1024 as base (RAM is sold that way and always accounted for that way)
  • For network bits or bytes use 1000 as base (ethernet and line vendors use this)
  • For anything else use 1000 as base

The key is to choose the base that people are used to dealing with the units in. Of the four points above, what units to use for disk storage is most in doubt: the sale of disks the last 10-15 years with 1K=1000 and the recent addition of --si options to GNU tools tell us that people are starting to think of disks that way too. But 1024 is very basic to the design of disks and filesystems on a low level so the 1024 is likely to remain.

In addition, most people want to see network speeds in bits not bytes. If your readings are in bytes you might multiply the number by 8 yourself to get bits, or you may leave it to Munin (actually rrd). If the throughput number is reported in down.value the config output may specify down.cdef down,8,* to multiply the down number by 8 (this syntax is known as Reverse Polish Notation).

Direction

For a rate measurement plugin that can report on data both going in and out, such as the if_(eth0) plugin that would report on bytes (or packets) going in and out, it makes sense to graph incoming and outgoing on the same graph. The convention in Munin has become that outgoing data is graphed above the x-axis (i.e., positive) and incoming data is graphed below the y-axis like this:

http://www.linpro.no/projects/munin/example/ping.uio.no/cappuccino.ping.uio.no-if_eth0-day.png

This is achieved by using the following field attributes. This example assumes that your plugin generates two fieldnames inputrate and outputrate. The input rate goes under the x-axis so it needs to be manipulated:

inputrate.graph no
outputrate.negative inputrate

The first disables normal graphing of inputrate. The second activates a hack in munin to get the input and output graphs in the same color and on opposite sides of the x-axis.

Legends

As of version 1.2 Munin supports explanatory legends on both the graph and field level. Many plugins - even the CPU use plugin - should make use of this. The CPU "io wait" number for example will only get larger than 0 if the CPU has nothing else to to in the time interval. Many (nice) graphs will only be completely clear once a rather obscure man page has been read (or in the Linux case perhaps even the kernel source). Using the legend possibilities Munin supports will help this.

Graph legends are added by using the graph_info? attribute, while field legends use the ${name}.info? attribute.

Category

If the plugin outputs a graph_category attribute the graph will be grouped with other graphs of the same category. Please consult the category list for a list of some categories currently in use.

Legal characters

The legal characters in a field name are documented in Notes on Field Names

Documentation

In Munin 1.3.4 and 1.2.6 there is a munindoc command that can be used to view POD documentation in plugins or separate .pod files if the plugin language is not suited for embedded POD. The apache_ and irqstats plugins in these versions are examples on what information should be included in a plugin POD. For SNMP plugins please see snmp__uptime -- in particular a MIB INFORMATION header has been added. Please note that documenting the plugins in the distribution is an ongoing project. Volunteers would be appreciated.

POD documents originate from the world of Perl, but it's possible to embed POD in other language scripts as well. In a shell script you can go like this:

: <<=cut
=head1 NAME

multips - Munin plugin ...

...

=cut

The ":" starts a null command with the text up until the line "=cut" as standard input (a "here document"). In other languages it could be embedded in strings, comment blocks or something like that. But, you can also write a separate .pod file for the plugin, when munindoc multips is run, the command first looks for POD in multips.pod and then in multips.

Last modified at 2016-10-21T15:46:29+02:00 Last modified on 2016-10-21T15:46:29+02:00