diff --git a/README.pod b/README.pod deleted file mode 120000 index abe0c14..0000000 --- a/README.pod +++ /dev/null @@ -1 +0,0 @@ -bin/feedgnuplot \ No newline at end of file diff --git a/README.pod b/README.pod new file mode 100644 index 0000000..d3b8261 --- /dev/null +++ b/README.pod @@ -0,0 +1,1043 @@ +=head1 TALK + +I just gave a talk about this at L. Presentation lives +L. + +=head1 NAME + +feedgnuplot - General purpose pipe-oriented plotting tool + +=head1 SYNOPSIS + +Simple plotting of piped data: + + $ seq 5 | awk '{print 2*$1, $1*$1}' + 2 1 + 4 4 + 6 9 + 8 16 + 10 25 + + $ seq 5 | awk '{print 2*$1, $1*$1}' | + feedgnuplot --lines --points --legend 0 "data 0" --title "Test plot" --y2 1 + --unset grid --terminal 'dumb 80,40' --exit + + Test plot + + 10 +-----------------------------------------------------------------+ 25 + | + + + + + + + *##| + | data 0 ***A*#* | + | ** # | + 9 |-+ ** ## | + | ** # | + | ** # | + | ** ## +-| 20 + 8 |-+ A # | + | ** # | + | ** ## | + | ** # | + | ** B | + 7 |-+ ** ## | + | ** ## +-| 15 + | ** # | + | ** ## | + 6 |-+ *A ## | + | ** ## | + | ** # | + | ** ## +-| 10 + 5 |-+ ** ## | + | ** #B | + | ** ## | + | ** ## | + 4 |-+ A ### | + | ** ## | + | ** ## +-| 5 + | ** ## | + | ** ##B# | + 3 |-+ ** #### | + | **#### | + | #### | + |## + + + + + + + | + 2 +-----------------------------------------------------------------+ 0 + 1 1.5 2 2.5 3 3.5 4 4.5 5 + +Here we asked for ASCII plotting, which is useful for documentation. + +Simple real-time plotting example: plot how much data is received on the wlan0 +network interface in bytes/second (uses bash, awk and Linux): + + $ while true; do sleep 1; cat /proc/net/dev; done | + gawk '/wlan0/ {if(b) {print $2-b; fflush()} b=$2}' | + feedgnuplot --lines --stream --xlen 10 --ylabel 'Bytes/sec' --xlabel seconds + +=head1 DESCRIPTION + +This is a flexible, command-line-oriented frontend to Gnuplot. It creates plots +from data coming in on STDIN or given in a filename passed on the commandline. +Various data representations are supported, as is hardcopy output and streaming +display of live data. For a tutorial and a gallery please see the guide at +L + +A simple example: + + $ seq 5 | awk '{print 2*$1, $1*$1}' | feedgnuplot + +You should see a plot with two curves. The C command generates some data to +plot and the C reads it in from STDIN and generates the plot. The +C invocation is just an example; more interesting things would be plotted +in normal usage. No commandline-options are required for the most basic +plotting. Input parsing is flexible; every line need not have the same number of +points. New curves will be created as needed. + +The most commonly used functionality of gnuplot is supported directly by the +script. Anything not directly supported can still be done with options such as +C<--set>, C<--extracmds> C<--style>, etc. Arbitrary gnuplot commands can be +passed in with C<--extracmds>. For example, to turn off the grid, you can pass +in C<--extracmds 'unset grid'>. Commands C<--set> and C<--unset> exists to +provide nicer syntax, so this is equivalent to passing C<--unset grid>. As many +of these options as needed can be passed in. To add arbitrary curve styles, use +C<--style curveID extrastyle>. Pass these more than once to affect more than one +curve. + +To apply an extra style to I the curves that lack an explicit C<--style>, +pass in C<--styleall extrastyle>. In the most common case, the extra style is +C. To support this more simply, you can pass in C<--with +something> instead of C<--styleall 'with something'>. C<--styleall> and +C<--with> are mutually exclusive. Furthermore any curve-specific C<--style> +overrides the global C<--styleall> or C<--with> setting. + +=head2 Data formats + +By default, each value present in the incoming data represents a distinct data +point, as demonstrated in the original example above (we had 10 numbers in the +input and 10 points in the plot). If requested, the script supports more +sophisticated interpretation of input data + +=head3 Domain selection + +If C<--domain> is passed in, the first value on each line of input is +interpreted as the I-value for the rest of the data on that line. Without +C<--domain> the I-value is the line number, and the first value on a line is +a plain data point like the others. Default is C<--nodomain>. Thus the original +example above produces 2 curves, with B<1,2,3,4,5> as the I-values. If we run +the same command with C<--domain>: + + $ seq 5 | awk '{print 2*$1, $1*$1}' | feedgnuplot --domain + +we get only 1 curve, with B<2,4,6,8,10> as the I-values. As many points as +desired can appear on a single line, but all points on a line are associated +with the I-value at the start of that line. + +=head3 Curve indexing + +We index the curves in one of 3 ways: sequentially, explicitly with a +C<--dataid> or by C<--vnlog> headers. + +By default, each column represents a separate curve. The first column (after any +domain) is curve C<0>. The next one is curve C<1> and so on. This is fine unless +sparse data is to be plotted. With the C<--dataid> option, each point is +represented by 2 values: a string identifying the curve, and the value itself. +If we add C<--dataid> to the original example: + + $ seq 5 | awk '{print 2*$1, $1*$1}' | feedgnuplot --dataid --autolegend + +we get 5 different curves with one point in each. The first column, as produced +by C, is B<2,4,6,8,10>. These are interpreted as the IDs of the curves to +be plotted. + +If we're plotting C data (L) then we +can get the curve IDs from the vnlog header. Vnlog is a trivial data format +where lines starting with C<#> are comments and the first comment contains +column labels. If we have such data, C can interpret these +column labels if the C perl modules are available. + +The C<--autolegend> option adds a legend using the given IDs to +label the curves. The IDs need not be numbers; generic strings are accepted. As +many points as desired can appear on a single line. C<--domain> can be used in +conjunction with C<--dataid> or C<--vnlog>. + +=head3 Multi-value style support + +Depending on how gnuplot is plotting the data, more than one value may be needed +to represent the range of a single point. Basic 2D plots have 2 numbers +representing each point: 1 domain and 1 range. But if plotting with +C<--circles>, for instance, then there's an extra range value: the radius. Many +other gnuplot styles require more data: errorbars, variable colors (C), variable sizes (C), labels and so on. +The feedgnuplot tool itself does not know about all these intricacies, but they +can still be used, by specifying the specific style with C<--style>, and +specifying how many values are needed for each point with any of +C<--rangesizeall>, C<--tuplesizeall>, C<--rangesize>, C<--tuplesize>. These +options are required I for styles not explicitly supported by feedgnuplot; +supported styles do the right thing automatically. + +Specific example: if making a 2d plot of y error bars, the exact format can be +queried by running C and invoking C. This tells us +that there's a 3-column form: C and a 4-column form: C. With 2d plots feedgnuplot will always output the 1-value domain C, so +the rangesize is 2 and 3 respectively. Thus the following are equivalent: + + $ echo '1 2 0.3 + 2 3 0.4 + 3 4 0.5' | feedgnuplot --domain --rangesizeall 2 --with 'yerrorbars' + + $ echo '1 2 0.3 + 2 3 0.4 + 3 4 0.5' | feedgnuplot --domain --tuplesizeall 3 --with 'yerrorbars' + + $ echo '1 2 1.7 2.3 + 2 3 2.6 3.4 + 3 4 3.5 4.5' | feedgnuplot --domain --rangesizeall 3 --with 'yerrorbars' + +=head3 3D data + +To plot 3D data, pass in C<--3d>. C<--domain> MUST be given when plotting 3D +data to avoid domain ambiguity. If 3D data is being plotted, there are by +definition 2 domain values instead of one (I as a function of I and I +instead of I as a function of I). Thus the first 2 values on each line are +interpreted as the domain instead of just 1. The rest of the processing happens +the same way as before. + +=head3 Time/date data + +If the input data domain is a time/date, this can be interpreted with +C<--timefmt>. This option takes a single argument: the format to use to parse +the data. The format is documented in 'set timefmt' in gnuplot, although the +common flags that C understands are generally supported. The backslash +sequences in the format are I supported, so if you want a tab, put in a tab +instead of \t. Whitespace in the format I supported. When this flag is +given, some other options act a little bit differently: + +=over + +=item + +C<--xlen> is an I in seconds + +=item + +C<--xmin> and C<--xmax> I use the format passed in to C<--timefmt> + +=back + +Using this option changes both the way the input is parsed I the way the +x-axis tics are labelled. Gnuplot tries to be intelligent in this labelling, but +it doesn't always do what the user wants. The labelling can be controlled with +the gnuplot C command, which takes the same type of format string as +C<--timefmt>. Example: + + $ sar 1 -1 | + awk '$1 ~ /..:..:../ && $8 ~/^[0-9\.]*$/ {print $1,$8; fflush()}' | + feedgnuplot --stream --domain + --lines --timefmt '%H:%M:%S' + --set 'format x "%H:%M:%S"' + +This plots the 'idle' CPU consumption against time. + +Note that while gnuplot supports the time/date on any axis, I +currently supports it I as the x-axis domain. This may change in the +future. + +=head2 Real-time streaming data + +To plot real-time data, pass in the C<--stream [refreshperiod]> option. Data +will then be plotted as it is received. The plot will be updated every +C seconds. If the period isn't specified, a 1Hz refresh rate is +used. To refresh at specific intervals indicated by the data, set the +refreshperiod to 0 or to 'trigger'. The plot will then I be refreshed when +a data line 'replot' is received. This 'replot' command works in both triggered +and timed modes, but in triggered mode, it's the only way to replot. Look in +L for more information. + +To plot only the most recent data (instead of I the data), C<--xlen +windowsize> can be given. This will create an constantly-updating, scrolling +view of the recent past. C should be replaced by the desired length +of the domain window to plot, in domain units (passed-in values if C<--domain> +or line numbers otherwise). If the domain is a time/date via C<--timefmt>, then +C is and I in seconds. If we're plotting a histogram, then +C<--xlen> causes a histogram over a moving window to be computed. The subtlely +here is that with a histogram you don't actually I the domain since only +the range is analyzed. But the domain is still there, and can be utilized with +C<--xlen>. With C<--xlen> we can plot I histograms or I +I-histograms. + +=head3 Special data commands + +If we are reading streaming data, the input stream can contain special commands +in addition to the raw data. Feedgnuplot looks for these at the start of every +input line. If a command is detected, the rest of the line is discarded. These +commands are + +=over + +=item C + +This command refreshes the plot right now, instead of waiting for the next +refresh time indicated by the timer. This command works in addition to the timed +refresh, as indicated by C<--stream [refreshperiod]>. + +=item C + +This command clears out the current data in the plot. The plotting process +continues, however, to any data following the C. + +=item C + +This command causes feedgnuplot to exit. + +=back + +=head2 Hardcopy output + +The script is able to produce hardcopy output with C<--hardcopy outputfile>. The +output type can be inferred from the filename, if B<.ps>, B<.eps>, B<.pdf>, +B<.svg>, B<.png> or B<.gp> is requested. If any other file type is requested, +C<--terminal> I be passed in to tell gnuplot how to make the plot. If +C<--terminal> is passed in, then the C<--hardcopy> argument only provides the +output filename. + +The B<.gp> output is special. Instead of asking gnuplot to plot to a particular +terminal, writing to a B<.gp> simply dumps a self-executable gnuplot script into +the given file. This is similar to what C<--dump> does, but writes to a file, +and makes sure that the file can be self-executing. + +=head2 Self-plotting data files + +This script can be used to enable self-plotting data files. There are several +ways of doing this: with a shebang (#!) or with inline perl data. + +=head3 Self-plotting data with a #! + +A self-plotting, executable data file C is formatted as + + $ cat data + #!/usr/bin/feedgnuplot --lines --points + 2 1 + 4 4 + 6 9 + 8 16 + 10 25 + 12 36 + 14 49 + 16 64 + 18 81 + 20 100 + 22 121 + 24 144 + 26 169 + 28 196 + 30 225 + +This is the shebang (#!) line followed by the data, formatted as before. The +data file can be plotted simply with + + $ ./data + +The caveats here are that on Linux the whole #! line is limited to 127 +characters and that the full path to feedgnuplot must be given. The 127 +character limit is a serious limitation, but this can likely be resolved with a +kernel patch. I have only tried on Linux 2.6. + +=head3 Self-plotting data with gnuplot + +Running C will create a self-executable +gnuplot script in C + +=head3 Self-plotting data with perl inline data + +Perl supports storing data and code in the same file. This can also be used to +create self-plotting files: + + $ cat plotdata.pl + #!/usr/bin/perl + use strict; + use warnings; + + open PLOT, "| feedgnuplot --lines --points" or die "Couldn't open plotting pipe"; + while( ) + { + my @xy = split; + print PLOT "@xy\n"; + } + __DATA__ + 2 1 + 4 4 + 6 9 + 8 16 + 10 25 + 12 36 + 14 49 + 16 64 + 18 81 + 20 100 + 22 121 + 24 144 + 26 169 + 28 196 + 30 225 + +This is especially useful if the logged data is not in a format directly +supported by feedgnuplot. Raw data can be stored after the __DATA__ directive, +with a small perl script to manipulate the data into a useable format and send +it to the plotter. + +=head1 ARGUMENTS + +=over + +=item + +--C<[no]domain> + +If enabled, the first element of each line is the domain variable. If not, the +point index is used + +=item + +--C<[no]dataid> + +If enabled, each data point is preceded by the ID of the data set that point +corresponds to. This ID is interpreted as a string, NOT as just a number. If not +enabled, the order of the point is used. + +As an example, if line 3 of the input is "0 9 1 20" then + +=over + +=item + +C<--nodomain --nodataid> would parse the 4 numbers as points in 4 different +curves at x=3 + +=item + +C<--domain --nodataid> would parse the 4 numbers as points in 3 different +curves at x=0. Here, 0 is the x-variable and 9,1,20 are the data values + +=item + +C<--nodomain --dataid> would parse the 4 numbers as points in 2 different +curves at x=3. Here 0 and 1 are the data IDs and 9 and 20 are the +data values + +=item + +C<--domain --dataid> would parse the 4 numbers as a single point at +x=0. Here 9 is the data ID and 1 is the data value. 20 is an extra +value, so it is ignored. If another value followed 20, we'd get another +point in curve ID 20 + +=back + +=item + +C<--vnlog> + +Vnlog is a trivial data format where lines starting with C<#> are comments and +the first comment contains column labels. Some tools for working with such data +are available from the C project: L. +With the C perl modules installed, we can read the vnlog column headers +with C. This replaces C<--dataid>, and we can do all the +normal things with these headers. For instance C will generate plot legends for each column in the vnlog, using the +vnlog column label in the legend. + +=item + +C<--[no]3d> + +Do [not] plot in 3D. This only makes sense with C<--domain>. Each domain here is +an (x,y) tuple + +=item + +--C + +Interpret the X data as a time/date, parsed with the given format + +=item + +C<--colormap> + +Show a colormapped xy plot. Requires extra data for the color. zmin/zmax can be +used to set the extents of the colors. Automatically sets the +C<--rangesize>/C<--tuplesize>. + +=item + +C<--stream [period]> + +Plot the data as it comes in, in realtime. If period is given, replot every +period seconds. If no period is given, replot at 1Hz. If the period is given as +0 or 'trigger', replot I when the incoming data dictates this. See the +L section of the man page. + +=item + +C<--[no]lines> + +Do [not] draw lines to connect consecutive points + +=item + +C<--[no]points> + +Do [not] draw points + +=item + +C<--circles> + +Plot with circles. This requires a radius be specified for each point. +Automatically sets the C<--rangesize>/C<--tuplesize>. C supported for 3d +plots. + +=item + +C<--title xxx> + +Set the title of the plot + +=item + +C<--legend curveID legend> + +Set the label for a curve plot. Use this option multiple times for multiple +curves. With C<--dataid>, curveID is the ID. Otherwise, it's the index of the +curve, starting at 0 + +=item + +C<--autolegend> + +Use the curve IDs for the legend. Titles given with C<--legend> override these + +=item + +C<--xlen xxx> + +When using C<--stream>, sets the size of the x-window to plot. Omit this or set +it to 0 to plot ALL the data. Does not make sense with 3d plots. Implies +C<--monotonic>. If we're plotting a histogram, then C<--xlen> causes a histogram +over a moving window to be computed. The subtlely here is that with a histogram +you don't actually I the domain since only the range is analyzed. But the +domain is still there, and can be utilized with C<--xlen>. With C<--xlen> we can +plot I histograms or I I-histograms. + + +=item + +C<--xmin/xmax/x2min/x2max/ymin/ymax/y2min/y2max/zmin/zmax xxx> + +Set the range for the given axis. These x-axis bounds are ignored in a streaming +plot. The x2/y2-axis bounds do not apply in 3d plots. The z-axis bounds apply +I to 3d plots or colormaps. Note that there is no C<--xrange> to set both +sides at once or C<--xinv> to flip the axis around: anything more than the +basics supported in this option is clearly obtainable by talking to gnuplot, for +instance C<--set 'xrange [20:10]'> to set the given inverted bounds. + +=item + +C<--xlabel/x2label/ylabel/y2label/zlabel xxx> + +Label the given axis. The x2/y2-axis labels do not apply to 3d plots while the +z-axis label applies I to 3d plots. + +=item + +C<--x2/--y2/--x1y2/--x2y1/--x2y2 xxx> + +By default data is plotted against the x1 and y1 axes (the left and bottom one +respectively). If we want a particular curve plotted against a different axis, +we can specify that with these options. You pass C<--AXIS ID> where C +defines the axis (C or C or C or C or C) and the C +is the curve ID. C<--x2> is a synonym for C<--x2y1> and C<--y2> is a synonym for +C<--x1y2>. The curve ID is an ordered 0-based index or a specific ID if +C<--dataid> or C<--vnlog>. None of these apply to 3d plots. Can be passed +multiple times for different curve IDs, multiple IDs can be passed in as a +comma-separated list. By default the curves plotted against the various axes +aren not drawn in any differentiated way: the viewer of the resulting plot has +to be told which is which via an axes label, legend, colors, etc. Prior to +version 1.25 of C the curves plotted on the y2 axis were drawn with +a thicker line. This is no longer the case, but that behavior can be brought +back by passing something like + + --y2 curveid --style curveid 'linewidth 3' + +=item + +C<--histogram curveID> + +Set up a this specific curve to plot a histogram. The bin width is given with +the C<--binwidth> option (assumed 1.0 if omitted). If a drawing style is not +specified for this curve (C<--curvestyle>) or all curves (C<--with>, +C<--curvestyleall>) then the default histogram style is set: filled boxes with +borders. This is what the user generally wants. This works with C<--domain> +and/or C<--stream>, but in those cases the x-value is used I to cull old +data because of C<--xlen> or C<--monotonic>. I.e. the domain values are I +drawn in any way. Can be passed multiple times, or passed a comma- separated +list + +=item + +C<--xticlabels> + +If given, the x-axis tic labels are not numerical, but are read from the data. +This changes the interpretation of the input data: with C<--domain>, each line +begins with C. Without C<--domain>, each line begins with C