wiki:protocol-spool

RFC spool protocol

This text follows RFC-like language. Refer to RFC2119.

Design goals:

  • Better scaling for
    • More nodes
    • More frequent sampling
  • Gather data locally on nodes while the network is down
  • Multi-master model

Existing model

Master (process: munin-update) asks node (process: munin-node) for data, the node executes the plugin (syncronously) and data is returned to the master at the end of the plugin execution. This makes the master wait for every plugin to execute before it continues.

Advantages

  • Lightweight node (agent)

Disadvantages

  • Client state is unknown if network is down
  • Server needs constant access to agent to form a complete picture

Proposed new model

Let the node include a scheduler to execute plugins at the aproproate times. Data readings are stored in spool files on the node for a defined period of time, configured in munin-node.conf. The stored data MUST NOT be averaged or in other ways manipulated.

Using RRD files for storage has been suggested. Nicolai will not allow this because compiling and installing RRD on each node is a burden in the install process - and it serves no purpose.

munin-update will still be present on server.

Plugins SHOULD be able to suggest possible reading intervals. Intervals MAY be overridden in munin-node.conf. In lieu of a suggestion 5 minutes SHOULD be used - this ensures backward compatability.

Below is a proposed protocol between server and node:

With new model server and new model agent

This is still sketchy.

  1. Intiated by cron, the master connects to munin-node for data.
  2. Server MUST present itself with the capability command.
  3. The node MUST respond with the spool capability.
  4. A server that does not recoginize the spool capability will continue its interogation of the node in the normal way.
  5. If the server understands the meaning of the spool capability the server MUST initiate with the command spoolfetch timestamp. The node responds with spooled 'config' and 'fetch' commands since given timestamp. The server will remember the latest timestamp seen in the spool and use that plus one second for the next spoolfetch command. It is probably a good idea if this appears at the start of the output. The server will remember this time and use it in the next spoolfetch command. This avoids issues with time-stamps due to the server and node having unsyncronized clocks. The server could also use this to calculate the time-offset of the node and "timeshift" the spooled output so that it appears at the correct times in the graphs. -- I'm not sure if this timeshifting is needed.
  6. The node plays back the spool in "multigraph+dirtyconfig" format and the master updates the rrd files in the normal manner. Values are timestamped in the normal manner (timeseries.value epoch:value)

With old model server and new model agent

  1. Intiated by cron, the master connects to munin-node for data.
  2. Server MUST present its capabilites
  3. And old server will fail to initiate the correct commands to retrieve the spool and will fall back to old behavior. This may cause additional load on the node, as old servers queries may happen at the same time as the scheduled queries. An alternative is to let the agent just show the last set of values from the spool to avoid the overhead of re-running the plugins. At least if the values are less than 1m old for example.

Advantages

  • Node stores data for later pick-up, protects from network outage
  • Node will be able to decide on data gathering frequency. E.g., the admin may choose to let the server gather data in 5 minutes interval during work hours but only every hour during night time.
  • Node will be able to handle larger amounts of nodes, as server must not wait while node runs plugins - node now only spools data to server.

Disadvantages

  • Slightly heavier node

Node issues

  • Need capability protocol (we have that as of 1.4.0)
  • Need multigraph capability (ditto)
  • Need dirtyconfig capability (we will have that in 1.5)
  • Should retain reasonable compatibility with old servers. I think this is taken care of by the above specification
  • Need atomic write for large data-sets. OS level open-to-append and write guarantees this on Unix. Could also pipe results back to parent process for writing there. Locking should be avoided at almost any cost.
  • Introduces requirement to rotate and/or remove spool files.
  • How long to keep spool?

Spoolfile proposals:

  • Avoid rotation and locking by using the MAILDIR way: Store results in $plugin/$node/$timestamp. This uses inodes (and space rounding overhead) but avoids rotation entirely and makes cleanup as easy as a find command
  • Halfway: $plugin/$node.$dateandhour.log avoids eating as many inodes, avoids locking and rotation and is as easy to clean up.

Server issues

  • Should be very stright forward on the server
  • _Must_ calculate node-to-server time-offset to normalize graph times. +/-1 second precission should be both sufficient and very satisfactory.
  • When the sampling interval is changed the rrd file needs to be tuned or maybe even copied to a new file.

Protocol example

S: cap multigraph dirtyconfig spool
N: cap spool
S: list
N: load
S: spoolfetch 1271157838
N: multigraph load
N: graph_title Load average
N: graph_args --base 1000 -l 0
N: graph_vlabel load
N: graph_scale no
N: graph_category system
N: load.label load
N: graph_info The load average of the machine describes how many processes are in the run-queue (scheduled to run "immediately").
N: load.info 1 minute load average
N: load.value 1271157848:0.11
N: multigraph load
N: graph_title Load average
N: graph_args --base 1000 -l 0
N: graph_vlabel load
N: graph_scale no
N: graph_category system
N: load.label load
N: graph_info The load average of the machine describes how many processes are in the run-queue (scheduled to run "immediately").
N: load.info 1 minute load average
N: load.value 1271157858:0.11
N: multigraph load
N: graph_title Load average
N: graph_args --base 1000 -l 0
N: graph_vlabel load
N: graph_scale no
N: graph_category system
N: load.label load
N: graph_info The load average of the machine describes how many processes are in the run-queue (scheduled to run "immediately").
N: load.info 1 minute load average
N: load.value 1271157908:0.11
N: .
S: quit
Last modified 19 months ago Last modified on 02/02/13 08:56:45