Project ideas for Munin:

Munin is a working system/network resource/utilization/environment graphing/trending tool written in Perl using RRD to produce graphs. Graph data input is produced by plugins which are quite easy to write in the common case.

  • Interface: Munin needs a operator dashboard. The dashboard would show "interesting" graphs. Interesting could be detected by RRD's (the used graphing tool) anomalty detection. Another easy "interesting" is graphs exceeding the warning or critical limits set by plugins. The project would need to identify further criteria for "interesting" and produce a dashboard interface which would alert operators to these conditions so they can be evaluated and acted on. ... Detecting that a disk will fill up within 1 day is interesting on Tuesdays. Detecting that a disk will fill up within 3 days is interesting on the day before Christmas, but not on a regular tuesday. (Linpro probably does not have the needed expertise to discover "interesting", patterns and trends. Anyone that wants to take on this idea needs to figure that out themselves).
  • Interface (configuration and web interface): Enhancements to introduce and increase usability of powerful features
    • Munin supports gathering data from many hosts with many plugins each. Plugins may set warning and critical levels, but not many do. And the levels may be customized by the system operators, but this often requires reading the plugins source code and setting an environment variable in a munin-node configuration file. In most scenarios the operator would rather have a central configuration file to edit, or even a web-interface. A configuration file and syntax exists, but it's not very expressive, there are no wild-cards for example. While Munin supports host groupings, graph categories and so on there is no way to configure by these groupings.
    • Munin supports making compound graphs by combining data from multiple other graphs. The syntax for this is obtuse, and in most cases it would be easier to pick data sources more visually in a web interface.
    • Munin supports central overriding of graph creation settings, such as making a graph logarithmic. To make this really useful wild-cards introduced earlier would be very useful, as well as a web interface to manipulate these settings - and see how the graph changes at once when making the changes.

This project would need to introduce syntax extensions, and an arbitrarily cool/nice web/web 2.0 interface to edit.

  • Scalability (refactoring, refinement and extention): All these scalability extensions are already specified in some way, at least conceptually, and also the protocol extensions needed to realize them. We have a pretty good idea how it could be implemented.
    • Munin currently polls nodes every 5 minutes and generates graphs in a batch like way. This scales pretty well but is very limited. If nodes were to schedule plugins themselves the plugins could easily run more often, and take longer time to run - a typical host IPMI (environment sensors specified by Intel) scan can take up to a minute. The node would have to spool the results and play them back when a server next contacts it. The server should support polling some nodes more frequently in order to produce real-time or near real-time updated graphs. Playback of a spool is also such a fast and light that Munin can very easily scale to a hundred to thousand times increase in number of hosts practical to gather data from.
    • Current plugins can produce data input for one graph (which can contain multiple lines/curves etc). This does not scale well. The overhead for graphing traffic on one switch port is relatively high (SNMP can be slow). If the plugin instead were to do a SNMP-walk of the entire port array of a switch or router (Network, SAN or otherwise) the overhead goes down and given the right protocol extensions the munin server can produce one "headline" graph and multiple detailed graphs based on this. This would make Munin equal to, or superior to, MRTG in ease for graphing network devices. (This task is to produce the server side of this, not the plugins, the plugin side extensions are relatively trivial).
    • Munin produces simple one page presentations of all graphs relating to a host. Once you realize that one host in the previous point can be equal to 50 ports you realize that this does not scale well either. A single switch would produce in the order of 50-60 graphs (times 4 for all different time scales of graphs). Some hierarchic presentation would be needed. If the "headline" graph is shown on the main page and clicking this resulted in a new page with single port graphs the whole thing would be more easy to oversee.
    • All this data, produced and updated at a higher frequency will require a lot more work by RRD. Currently Munin runs into problems with tens of thousands of graphs updated every 5 minutes. The above extensions would make RRD a serious bottleneck. There is a proposal in the RRD Trac to make a RRD API or socket interface that would buffer data and as a result of this and other details require fewer graph redraws and save resources. Earlier we introduced log spooling. If the munin-update process were to just save the retrived logs to disk in the same way as they were retrived that would make munin-update fast. Multiple consecutive log retrivals would append to this log file if it had not been read (and rotated) since the last update. A daemon process would then need to pick out the logs longest neglected and update the RRD files and corresponding graphs. Two log queues could exist, one slow and one fast, the fast one would be suitable for graphs that the users wants realtime updates of. The strategy could be mixed with a cgi (or apache mod_$language) that could lazily update graphs viewed upon user request (AJAX?)
Last modified at 2007-03-09T23:03:31+01:00 Last modified on 2007-03-09T23:03:31+01:00