Debugging Munin plugins

0. Restart munin-node (new plugins require munin-node to be restarted to register the plugin)

# sudo /etc/init.d/munin-node restart

1. On the host where Munin runs, run munin-update as the Munin user account.

This step will tell you whether munin (the server) is able to communicate with munin-node (the agent).

# su -s /bin/bash munin
# /usr/share/munin/munin-update --debug --nofork --stdout --host foo.example.com --service df

You should get a line like this:

Aug 11 22:39:51 - [6846] Updating /var/lib/munin/example.com/foo.example.com-df-_dev_hda1-g.rrd with 57

After this, replace df with the service you want to check (e.g. hddtemp_smartctl). If one of these steps does not work, something is probably wrong with the plugin or how munin-node talks to the plugin.

2. On the host where munin-node runs, check to see whether the plugin runs through munin-run. Test with and without config, and with and without --debug.

Regular run:

# munin-run df
_dev_hda1.value 83

Config run:

# munin-run df config
graph_title Filesystem usage (in %)
graph_args --upper-limit 100 -l 0
graph_vlabel %
graph_category disk
graph_info This graph shows disk usage on the machine.
_dev_hda1.label /
_dev_hda1.info / (ext3) -> /dev/hda1
_dev_hda1.warning 92
_dev_hda1.critical 98

3. If not, does the plugin run when executed directly? If it runs when executed as root and not through munin-run (as described in bullet point 1), the plugin has a permission problem. See the article on environment files.

4. Does the plugin run through munin-node, with and without config? Hint: Telnet to port 4949.

Regular run:

# telnet foo.example.com 4949
Trying foo.example.com...
Connected to foo.example.com.
Escape character is '^]'.
# munin node at foo.example.com
fetch df
_dev_hda1.value 83
[...]
.

With config:

# telnet foo.example.com 4949
Trying foo.example.com...
Connected to foo.example.com.
Escape character is '^]'.
# munin node at foo.example.com
config df
graph_title Filesystem usage (in %)
graph_args --upper-limit 100 -l 0
graph_vlabel %
graph_category disk
graph_info This graph shows disk usage on the machine.
_dev_hda1.label /boot
_dev_hda1.info /boot (ext3) -> /dev/hda1
_dev_hda1.warning 92
_dev_hda1.critical 98
[...]
.

If the plugin does run with munin-run but not through telnet, you probably have a PATH problem. Tip: Set env.PATH for the plugin in the plugin's environment file.

5. Does the plugin output contain too few, too many and/or illegal characters?

6. Does Munin (munin-cron and its children) write values into RRD files? Hint: rrdtool fetch [rrd file] AVERAGE

7. Does the plugin use legal field names? See Notes on Field names.

8. In case you loan data from other graphs, check that the {fieldname}.type is set properly. See Munin file names for a quick reference on what any error messages in the logs might indicate.

Cases

SELinux sometimes break Munin plugins

munin-node seems to show sane values, but RRD files are filled with 0

  • The plugin's output values are GAUGE values, but the plugin thinks they are COUNTER or DERIVE. Note that by default, a plugin thinks the values are GAUGE values.

munin-node seems to show sane values, but RRD files are filled with 'nan'

  • Check that there are no invalid characters in the plugin's output.
  • For new plugins let munin gather data for about 20 minutes and things will unwrinkle

munin-node is configured properly, but won't give any data

  • Check that the plugin's field name(s) has/have the .value directive on each field name (yes, I managed to forget that recently).

munin-node some times returns valid data, some times not

  • Check that no race conditions occur. A typical race condition is updating a file with crontab while the plugin is trying to read the file.

The graphs are empty

  • The plugin's output values are GAUGE values, but the plugin thinks they are COUNTER or DERIVE. Note that by default, a plugin thinks the values are GAUGE values.
  • The files to be updated by Munin are owned by root or another user account

Other mumbo-jumbo

  • Run the different stages in munin-cron manually, using --debug, --nofork, --stdout, something like this:
    su - munin -c "/usr/lib/munin/munin-update --debug --nofork --stdout --host foo.example.com --service df"