Update go dependencies

2018-12-05 13:27:09 -03:00 · 2018-12-05 13:27:09 -03:00 · f4a4daed84
commit f4a4daed84
parent 432f534383
1299 changed files with 71186 additions and 91183 deletions
--- a/vendor/github.com/ncabatoff/process-exporter/README.md
+++ b/vendor/github.com/ncabatoff/process-exporter/README.md
@ -1,64 +1,111 @@
 # process-exporter
 Prometheus exporter that mines /proc to report on selected processes.

-The premise for this exporter is that sometimes you have apps that are
-impractical to instrument directly, either because you don't control the code
-or they're written in a language that isn't easy to instrument with Prometheus.
-A fair bit of information can be gleaned from /proc, especially for
-long-running programs.
+[release]: https://github.com/ncabatoff/process-exporter/releases/latest

-For most systems it won't be beneficial to create metrics for every process by
-name: there are just too many of them and most don't do enough to merit it.
-Various command-line options are provided to control how processes are grouped
-and the groups are named.  Run "process-exporter -man" to see a help page
-giving details.
+[![Release](https://img.shields.io/github/release/ncabatoff/process-exporter.svg?style=flat-square")][release]
+[![Build Status](https://travis-ci.org/ncabatoff/process-exporter.svg?branch=master)](https://travis-ci.org/ncabatoff/process-exporter)
+[![Powered By: GoReleaser](https://img.shields.io/badge/powered%20by-goreleaser-green.svg?branch=master)](https://github.com/goreleaser)

-Metrics available currently include CPU usage, bytes written and read, and
-number of processes in each group.  
+Some apps are impractical to instrument directly, either because you
+don't control the code or they're written in a language that isn't easy to
+instrument with Prometheus.  We must instead resort to mining /proc.

-Bytes read and written come from /proc/[pid]/io in recent enough kernels.
-These correspond to the fields `read_bytes` and `write_bytes` respectively.
-These IO stats come with plenty of caveats, see either the Linux kernel 
-documentation or man 5 proc.
+## Installation

-CPU usage comes from /proc/[pid]/stat fields utime (user time) and stime (system
-time.)  It has been translated into fractional seconds of CPU consumed.  Since
-it is a counter, using rate() will tell you how many fractional cores were running
-code from this process during the interval given.
+Either grab a package for your OS from the [Releases][release] page, or
+install via [docker](https://hub.docker.com/r/ncabatoff/process-exporter/).

-An example Grafana dashboard to view the metrics is available at https://grafana.net/dashboards/249
+## Running

-## Instrumentation cost
+Usage:

-process-exporter will consume CPU in proportion to the number of processes in
-the system and the rate at which new ones are created.  The most expensive
-parts - applying regexps and executing templates - are only applied once per
-process seen.  If you have mostly long-running processes process-exporter
-should be lightweight: each time a scrape occurs, parsing of /proc/$pid/stat
-and /proc/$pid/cmdline for every process being monitored and adding a few
-numbers.
+```
+  process-exporter [options] -config.path filename.yml
+```

-## Config
+or via docker:
+
+```
+  docker run -d --rm -p 9256:9256 --privileged -v /proc:/host/proc -v `pwd`:/config ncabatoff/process-exporter --procfs /host/proc -config.path /config/filename.yml
+
+```
+
+Important options (run process-exporter --help for full list):
+
+-children (default:true) makes it so that any process that otherwise
+isn't part of its own group becomes part of the first group found (if any) when
+walking the process tree upwards.  In other words, resource usage of
+subprocesses is added to their parent's usage unless the subprocess identifies
+as a different group name.
+
+-recheck (default:false) means that on each scrape the process names are
+re-evaluated. This is disabled by default as an optimization, but since
+processes can choose to change their names, this may result in a process
+falling into the wrong group if we happen to see it for the first time before
+it's assumed its proper name.
+
+-procnames is intended as a quick alternative to using a config file.  Details
+in the following section.
+
+## Configuration and group naming

 To select and group the processes to monitor, either provide command-line
 arguments or use a YAML configuration file. 

-To avoid confusion with the cmdline YAML element, we'll refer to the
-null-delimited contents of `/proc/<pid>/cmdline` as the array `argv[]`.
+The recommended option is to use a config file via -config.path, but for
+convenience and backwards compatability the -procnames/-namemapping options
+exist as an alternative.
+
+### Using a config file
+
+The general format of the -config.path YAML file is a top-level
+`process_names` section, containing a list of name matchers:
+
+```
+process_names:
+  - matcher1
+  - matcher2
+  ...
+  - matcherN
+```
+
+The default config shipped with the deb/rpm packages is:
+
+```
+process_names:
+  - name: "{{.Comm}}"
+    cmdline: 
+    - '.+'
+```
+
+A process may only belong to one group: even if multiple items would match, the
+first one listed in the file wins.
+
+(Side note: to avoid confusion with the cmdline YAML element, we'll refer to
+the command-line arguments of a process `/proc/<pid>/cmdline` as the array
+`argv[]`.)
+
+#### Using a config file: group name

 Each item in `process_names` gives a recipe for identifying and naming
 processes.  The optional `name` tag defines a template to use to name
 matching processes; if not specified, `name` defaults to `{{.ExeBase}}`.

 Template variables available:
+- `{{.Comm}}` contains the basename of the original executable, i.e. 2nd field in `/proc/<pid>/stat`
 - `{{.ExeBase}}` contains the basename of the executable
 - `{{.ExeFull}}` contains the fully qualified path of the executable
+- `{{.Username}}` contains the username of the effective user
 - `{{.Matches}}` map contains all the matches resulting from applying cmdline regexps

+#### Using a config file: process selectors
+
 Each item in `process_names` must contain one or more selectors (`comm`, `exe`
 or `cmdline`); if more than one selector is present, they must all match.  Each
 selector is a list of strings to match against a process's `comm`, `argv[0]`,
-or in the case of `cmdline`, a regexp to apply to the command line.  
+or in the case of `cmdline`, a regexp to apply to the command line.  The cmdline
+regexp uses the [Go syntax](https://golang.org/pkg/regexp).

 For `comm` and `exe`, the list of strings is an OR, meaning any process
 matching any of the strings will be added to the item's group.  
@ -67,10 +114,7 @@ For `cmdline`, the list of regexes is an AND, meaning they all must match.  Any
 capturing groups in a regexp must use the `?P<name>` option to assign a name to
 the capture, which is used to populate `.Matches`.

-A process may only belong to one group: even if multiple items would match, the
-first one listed in the file wins.
-
-Other performance tips: give an exe or comm clause in addition to any cmdline
+Performance tip: give an exe or comm clause in addition to any cmdline
 clause, so you avoid executing the regexp when the executable name doesn't
 match.

@ -95,8 +139,7 @@ process_names:
    exe: 
    - /usr/local/bin/process-exporter
    cmdline: 
-    - -config.path\\s+(?P<Cfgfile>\\S+)
-    
+    - -config.path\s+(?P<Cfgfile>\S+)

 ```

@ -118,43 +161,195 @@ process_names:

 ```

-## Docker
+### Using -procnames/-namemapping instead of config.path

-A docker image can be created with
+Every name in the procnames list becomes a process group. The default name of
+a process is the value found in the second field of /proc/<pid>/stat
+("comm"), which is truncated at 15 chars.  Usually this is the same as the
+name of the executable.
+
+If -namemapping isn't provided, every process with a comm value present
+in -procnames is assigned to a group based on that name, and any other 
+processes are ignored.
+
+The -namemapping option is a comma-separated list of alternating 
+name,regexp values. It allows assigning a name to a process based on a
+combination of the process name and command line. For example, using
+
+  -namemapping "python2,([^/]+)\.py,java,-jar\s+([^/]+).jar" 
+
+will make it so that each different python2 and java -jar invocation will be
+tracked with distinct metrics. Processes whose remapped name is absent from
+the procnames list will be ignored. On a Ubuntu Xenian machine being used as
+a workstation, here's a good way of tracking resource usage for a few
+different key user apps:
+
+  process-exporter -namemapping "upstart,(--user)" \
+    -procnames chromium-browse,bash,gvim,prometheus,process-exporter,upstart:-user
+
+Since upstart --user is the parent process of the X11 session, this will
+make all apps started by the user fall into the group named "upstart:-user",
+unless they're one of the others named explicitly with -procnames, like gvim.
+
+## Group Metrics
+
+There's no meaningful way to name a process that will only ever name a single process, so process-exporter assumes that every metric will be attached
+to a group of processes - not a 
+[process group](https://en.wikipedia.org/wiki/Process_group) in the technical
+sense, just one or more processes that meet a configuration's specification
+of what should be monitored and how to name it.
+
+All these metrics start with `namedprocess_namegroup_` and have at minimum
+the label `groupname`.
+
+### num_procs gauge
+
+Number of processes in this group.
+
+### cpu_user_seconds_total counter
+
+CPU usage based on /proc/[pid]/stat field utime(14) i.e. user time.
+A value of 1 indicates that the processes in this group have been scheduled
+in user mode for a total of 1 second on a single virtual CPU.
+
+### cpu_system_seconds_total counter
+
+CPU usage based on /proc/[pid]/stat field stime(15) i.e. system time.
+
+### read_bytes_total counter
+
+Bytes read based on /proc/[pid]/io field read_bytes.  The man page
+says 
+
+> Attempt to count the number of bytes which this process really did cause to be fetched from the storage layer.  This is accurate for block-backed filesystems.
+
+but I would take it with a grain of salt.
+
+### write_bytes_total counter
+
+Bytes written based on /proc/[pid]/io field write_bytes.  As with 
+read_bytes, somewhat dubious.  May be useful for isolating which processes
+are doing the most I/O, but probably not measuring just how much I/O is happening.
+
+### major_page_faults_total counter
+
+Number of major page faults based on /proc/[pid]/stat field majflt(12).
+
+### minor_page_faults_total counter
+
+Number of minor page faults based on /proc/[pid]/stat field minflt(10).
+
+### context_switches_total counter
+
+Number of context switches based on /proc/[pid]/status fields voluntary_ctxt_switches
+and nonvoluntary_ctxt_switches.  The extra label `ctxswitchtype` can have two values:
+`voluntary` and `nonvoluntary`.
+
+### memory_bytes gauge
+
+Number of bytes of memory used.  The extra label `memtype` can have two values:
+
+*resident*: Field rss(24) from /proc/[pid]/stat, whose doc says:
+
+> This is just the pages which count toward text, data, or stack space.  This does not include pages which have not been demand-loaded in, or which are swapped out.
+
+*virtual*: Field vsize(23) from /proc/[pid]/stat, virtual memory size.
+
+*swapped*: Field VmSwap from /proc/[pid]/status, translated from KB to bytes.
+
+### open_filedesc gauge
+
+Number of file descriptors, based on counting how many entries are in the directory
+/proc/[pid]/fd.
+
+### worst_fd_ratio gauge
+
+Worst ratio of open filedescs to filedesc limit, amongst all the procs in the
+group. The limit is the fd soft limit based on /proc/[pid]/limits. 
+
+Normally Prometheus metrics ought to be as "basic" as possible (i.e. the raw
+values rather than a derived ratio), but we use a ratio here because nothing
+else makes sense. Suppose there are 10 procs in a given group, each with a
+soft limit of 4096, and one of them has 4000 open fds and the others all have
+40, their total fdcount is 4360 and total soft limit is 40960, so the ratio
+is 1:10, but in fact one of the procs is about to run out of fds. With
+worst_fd_ratio we're able to know this: in the above example it would be
+0.97, rather than the 0.10 you'd see if you computed sum(open_filedesc) /
+sum(limit_filedesc).
+
+### oldest_start_time_seconds gauge
+
+Epoch time (seconds since 1970/1/1) at which the oldest process in the group
+started.  This is derived from field starttime(22) from /proc/[pid]/stat, added
+to boot time to make it relative to epoch.
+
+### num_threads gauge
+
+Sum of number of threads of all process in the group.  Based on field num_threads(20)
+from /proc/[pid]/stat.
+
+### states gauge
+
+Number of threads in the group in each of various states, based on the field
+state(3) from /proc/[pid]/stat.
+
+The extra label `state` can have these values: `Running`, `Sleeping`, `Waiting`, `Zombie`, `Other`.
+
+## Group Thread Metrics
+
+All these metrics start with `namedprocess_namegroup_` and have at minimum
+the labels `groupname` and `threadname`.  `threadname` is field comm(2) from
+/proc/[pid]/stat.  Just as groupname breaks the set of processes down into
+groups, threadname breaks a given process group down into subgroups.
+
+### thread_count gauge
+
+Number of threads in this thread subgroup.
+
+### thread_cpu_seconds_total counter
+
+Same as cpu_user_seconds_total and cpu_system_seconds_total, but broken down
+per-thread subgroup.  Unlike cpu_user_seconds_total/cpu_system_seconds_total,
+the label `cpumode` is used to distinguish between `user` and `system` time.
+
+### thread_io_bytes_total counter
+
+Same as read_bytes_total and write_bytes_total, but broken down
+per-thread subgroup.  Unlike read_bytes_total/write_bytes_total, 
+the label `iomode` is used to distinguish between `read` and `write` bytes.
+
+### thread_major_page_faults_total counter
+
+Same as major_page_faults_total, but broken down per-thread subgroup.
+
+### thread_minor_page_faults_total counter
+
+Same as minor_page_faults_total, but broken down per-thread subgroup.
+
+### thread_context_switches_total counter
+
+Same as context_switches_total, but broken down per-thread subgroup.
+
+## Instrumentation cost
+
+process-exporter will consume CPU in proportion to the number of processes in
+the system and the rate at which new ones are created.  The most expensive
+parts - applying regexps and executing templates - are only applied once per
+process seen, unless the command-line option -recheck is provided.
+
+If you have mostly long-running processes process-exporter overhead should be
+minimal: each time a scrape occurs, it will parse of /proc/$pid/stat and
+/proc/$pid/cmdline for every process being monitored and add a few numbers.
+
+## Dashboards
+
+An example Grafana dashboard to view the metrics is available at https://grafana.net/dashboards/249
+
+## Building
+
+Install [dep](https://github.com/golang/dep), then:

 ```
-make docker
+dep ensure
+make
 ```
-
-Then run the docker, e.g.
-
-```
-docker run --privileged --name pexporter -d -v /proc:/host/proc -p 127.0.0.1:9256:9256 process-exporter:master -procfs /host/proc -procnames chromium-browse,bash,prometheus,gvim,upstart:-user -namemapping "upstart,(-user)"
-```
-
-This will expose metrics on http://localhost:9256/metrics.  Leave off the
-`127.0.0.1:` to publish on all interfaces.  Leave off the --priviliged and
-add the --user docker run argument if you only need to monitor processes
-belonging to a single user.
-
-## History
-
-An earlier version of this exporter had options to enable auto-discovery of
-which processes were consuming resources.  This functionality has been removed.
-These options were based on a percentage of resource usage, e.g. if an
-untracked process consumed X% of CPU during a scrape, start tracking processes
-with that name.  However during any given scrape it's likely that most
-processes are idle, so we could add a process that consumes minimal resources
-but which happened to be active during the interval preceding the current
-scrape.  Over time this means that a great many processes wind up being
-scraped, which becomes unmanageable to visualize.  This could be mitigated by
-looking at resource usage over longer intervals, but ultimately I didn't feel
-this feature was important enough to invest more time in at this point.  It may
-re-appear at some point in the future, but no promises.
-
-Another lost feature: the "other" group was used to count usage by non-tracked
-procs.  This was useful to get an idea of what wasn't being monitored.  But it
-comes at a high cost: if you know what processes you care about, you're wasting
-a lot of CPU to compute the usage of everything else that you don't care about.
-The new approach is to minimize resources expended on non-tracked processes and
-to require the user to whitelist the processes to track.