wiki:Debugging_Munin_plugins

HowToWritePlugins > Debugging Munin plugins

Debugging Munin plugins

Note that all the commands run here need to be run as the root user. A common method of becoming root is via the sudo command, but refer to your local documentation for a more specific info.

  1. Restart munin-node, as it only reads the plugin list upon start. This enables adding a new plugin, testing it with munin-run, without enabling it right away.
    # /etc/init.d/munin-node restart
    
  1. On the host where munin-node runs, check to see whether the plugin runs through munin-run.

Try with and without the config plugin argument. Both runs should not emit any error message. You can also try the --debug munin-run argument, as it shows if the configuration file is correctly parsed, mostly for UID & environment variables.

Regular run:

# munin-run df
_dev_hda1.value 83

Config run:

# munin-run df config
graph_title Filesystem usage (in %)
graph_args --upper-limit 100 -l 0
graph_vlabel %
graph_category disk
graph_info This graph shows disk usage on the machine.
_dev_hda1.label /
_dev_hda1.info / (ext3) -> /dev/hda1
_dev_hda1.warning 92
_dev_hda1.critical 98
  1. Does the plugin run through munin-node, with and without config? Hint: use netcat to port 4949.

Using telnet was the previous recommended way as it was a fairly standard install. We don't recommend it anymore since netcat is now almost as ubiquitous as telnet and it offers a real native TCP connection, whereas telnet does not. Note that using socat also works perfectly, but it is not as mainstream.

Regular run:

# nc foo.example.com 4949
Trying foo.example.com...
Connected to foo.example.com.
Escape character is '^]'.
# munin node at foo.example.com
fetch df
_dev_hda1.value 83
[...]
.

With config:

# nc foo.example.com 4949
Trying foo.example.com...
Connected to foo.example.com.
Escape character is '^]'.
# munin node at foo.example.com
config df
graph_title Filesystem usage (in %)
graph_args --upper-limit 100 -l 0
graph_vlabel %
graph_category disk
graph_info This graph shows disk usage on the machine.
_dev_hda1.label /boot
_dev_hda1.info /boot (ext3) -> /dev/hda1
_dev_hda1.warning 92
_dev_hda1.critical 98
[...]
.

If the plugin does run with munin-run but not through telnet, you might have a $PATH problem. Tip: Set env.PATH for the plugin in the plugin's environment file.

  1. On the host where Munin runs, run munin-update as the munin user account.

This step will tell you whether munin-update(the server) is able to communicate with munin-node(the agent).

# su -s /bin/bash munin
$ /usr/share/munin/munin-update --debug --nofork --stdout --host foo.example.com --service df

You should get a line like this:

Aug 11 22:39:51 - [6846] Updating /var/lib/munin/example.com/foo.example.com-df-_dev_hda1-g.rrd with 57

After this, replace df with the service you want to check, such as hddtemp_smartctl. If one of these steps does not work, something is probably wrong with the plugin or how munin-node talks to the plugin.

  1. If not, does the plugin run when executed directly? If it runs when executed as root and not through munin-run (as described in bullet point 1), the plugin has a permission problem. See the article on environment files.
  1. Does the plugin output contain too few, too many and/or illegal characters?
  1. Does Munin (munin-cron and its children) write values into RRD files? Hint: rrdtool fetch [rrd file] AVERAGE
  1. Does the plugin use legal field names? See Notes on Field names.
  1. In case you loan data from other graphs, check that the {fieldname}.type is set properly. See Munin file names for a quick reference on what any error messages in the logs might indicate.

Cases

SELinux sometimes break Munin plugins

munin-node seems to show sane values, but RRD files are filled with 0

  • The plugin's output values are GAUGE values, but the plugin thinks they are COUNTER or DERIVE. Note that by default, a plugin thinks the values are GAUGE values.

munin-node seems to show sane values, but RRD files are filled with 'NaN'

  • Check that there are no invalid characters in the plugin's output.
  • For new plugins let munin gather data for about 20 minutes and things will unwrinkle

munin-node is configured properly, but won't give any data

  • Check that the plugin's field name(s) has/have the .value directive on each field name (yes, I managed to forget that recently).

munin-node some times returns valid data, some times not

  • Check that no race conditions occur. A typical race condition is updating a file with crontab while the plugin is trying to read the file.

The graphs are empty

  • The plugin's output values are GAUGE values, but the plugin thinks they are COUNTER or DERIVE. Note that by default, a plugin thinks the values are GAUGE values.
  • The files to be updated by Munin are owned by root or another user account

Other mumbo-jumbo

  • Run the different stages in munin-cron manually, using --debug, --nofork, --stdout, something like this:
# su - munin -c "/usr/lib/munin/munin-update \
    --debug --nofork --stdout \
    --host foo.example.com \
    --service df"
Last modified 4 weeks ago Last modified on 04/17/13 16:11:56