wiki:PluginDebugging

HowToWritePlugins > Debugging Munin plugins

Debugging Munin plugins

Attention: The content of this page has been moved to the Munin Guide Troubleshooting. This wiki page has therefore been set to "Read only" and later will be purged.

Note that all the commands run here need to be run as the root user. A common method of becoming root is via the sudo command, but refer to your local documentation for a more specific info.

  1. Restart munin-node, as it only reads the plugin list upon start. This enables adding a new plugin, testing it with munin-run?, without enabling it right away.
    # /etc/init.d/munin-node restart
    
  1. On the host where munin-node runs, check to see whether the plugin runs through munin-run?.

Try with and without the config plugin argument. Both runs should not emit any error message. You can also try the --debug munin-run? argument, as it shows if the configuration file is correctly parsed, mostly for UID & environment variables.

Regular run:

# munin-run df
_dev_hda1.value 83

Config run:

# munin-run df config
graph_title Filesystem usage (in %)
graph_args --upper-limit 100 -l 0
graph_vlabel %
graph_category disk
graph_info This graph shows disk usage on the machine.
_dev_hda1.label /
_dev_hda1.info / (ext3) -> /dev/hda1
_dev_hda1.warning 92
_dev_hda1.critical 98
  1. Does the plugin run through munin-node, with and without config? Hint: use netcat to port 4949.

Using telnet was the previous recommended way as it was a fairly standard install. We don't recommend it anymore since netcat is now almost as ubiquitous as telnet and it offers a real native TCP connection, whereas telnet does not. Note that using socat also works perfectly, but it is not as mainstream.

Regular run:

# nc foo.example.com 4949
Trying foo.example.com...
Connected to foo.example.com.
Escape character is '^]'.
# munin node at foo.example.com
fetch df
_dev_hda1.value 83
[...]
.

With config:

# nc foo.example.com 4949
Trying foo.example.com...
Connected to foo.example.com.
Escape character is '^]'.
# munin node at foo.example.com
config df
graph_title Filesystem usage (in %)
graph_args --upper-limit 100 -l 0
graph_vlabel %
graph_category disk
graph_info This graph shows disk usage on the machine.
_dev_hda1.label /boot
_dev_hda1.info /boot (ext3) -> /dev/hda1
_dev_hda1.warning 92
_dev_hda1.critical 98
[...]
.

If the plugin does run with munin-run but not through telnet, you might have a $PATH problem. Tip: Set env.PATH for the plugin in the plugin's environment file.

  1. On the host where Munin runs, run munin-update? as the munin user account.

This step will tell you whether munin-update?(the server) is able to communicate with munin-node(the agent).

# su -s /bin/bash munin
$ /usr/share/munin/munin-update --debug --nofork --stdout --host foo.example.com --service df

You should get a line like this:

Aug 11 22:39:51 - [6846] Updating /var/lib/munin/example.com/foo.example.com-df-_dev_hda1-g.rrd with 57

After this, replace df with the service you want to check, such as hddtemp_smartctl. If one of these steps does not work, something is probably wrong with the plugin or how munin-node talks to the plugin.

  1. If not, does the plugin run when executed directly? If it runs when executed as root and not through munin-run? (as described in bullet point 1), the plugin has a permission problem. See on plugin configuration.
  1. Does the plugin output contain too few, too many and/or illegal characters?
  1. Does Munin (munin-cron and its children) write values into RRD files? Hint: rrdtool fetch [rrd file] AVERAGE
  1. Does the plugin use legal field names? See Notes on Field names.
  1. In case you loan data from other graphs, check that the {fieldname}.type is set properly. See Munin file names for a quick reference on what any error messages in the logs might indicate.

Cases

SELinux sometimes break Munin plugins

munin-node seems to show sane values, but RRD files are filled with 0

  • The plugin's output values are GAUGE values, but the plugin thinks they are COUNTER or DERIVE. Note that by default, a plugin thinks the values are GAUGE values.

munin-node seems to show sane values, but RRD files are filled with 'NaN'

  • Check that there are no invalid characters in the plugin's output.
  • For new plugins let munin gather data for about 20 minutes and things will unwrinkle

munin-node is configured properly, but won't give any data

  • Check that the plugin's field name(s) has/have the .value directive on each field name (yes, I managed to forget that recently).

munin-node some times returns valid data, some times not

  • Check that no race conditions occur. A typical race condition is updating a file with crontab while the plugin is trying to read the file.

The graphs are empty

  • The plugin's output values are GAUGE values, but the plugin thinks they are COUNTER or DERIVE. Note that by default, a plugin thinks the values are GAUGE values.
  • The files to be updated by Munin are owned by root or another user account
  • The local user browser cache may be corrupt, especially if "most" graphs are displayed correctly and "some" graphs are blank. In Firefox (or your browser of choice) go to tools and clear recent history, then check to see if the graphs are now properly displayed.

Other mumbo-jumbo

  • Run the different stages in munin-cron manually, using --debug, --nofork, --stdout, something like this:
# su - munin -c "/usr/lib/munin/munin-update \
    --debug --nofork --stdout \
    --host foo.example.com \
    --service df"
Last modified at 2016-10-21T16:08:18+02:00 Last modified on 2016-10-21T16:08:18+02:00