16. Logfile monitoring/analysis

This option is available with samhain version 2.5.0 and higher. To compile with support for this option, use the configure option

./configure --enable-logfile-monitor

[Note]PCRE library required

This option requires the PCRE (Perl Compatible Regular Expressions) library. Many Linux distributions split library packages into a runtime package (required to run a dependent executable) and a development package (required to compile an executable). At least on the build host where samhain is compiled, the development package is required if you use this option.

This module enables samhain to monitor/analyze logfiles of other applications. Currently ( samhain 2.5.0) the following logfile formats are supported:

  • Syslog

  • Apache (access and error log)

  • Samba

  • 'pacct' BSD-style process accounting (also available on Linux)

Logfile analysis will always start from the point the last one ended; the pointer into the file is stored persistently on disk. Logfile rotation is handled automatically as long as the rotated logfile remains in the same directory and is not compressed(usually log rotation tools can be configured to compress only after the second rotation, which is advisable for unrelated reasons - the logging application may still have an open file pointer after logfile rotation).

Logfile entries can be filtered with Perl-style regular expressions (filter rules). Regular expressions must match the whole logfile record. For efficiency, regular expressions can be grouped under a common regular expression, i.e. if the group expression fails to match, no RE in the group is tried. Furthermore, (groups of) regular expressions can be grouped by host, if the logfile(s) contain host information (such as host information in centralized syslog server logfiles, or virtual host information in Apache logfiles). Note that host->group->rule is supported (just as host->rule or group->rule), while group->host->rule isn't.

Each filtering rule (regular expression) is assigned to an output queue. Currently (samhain 2.5.0) queues only differ in the assigned severity of an event, but more options (per-queue mail addresses for alerts) are under development.

Filtering rules are processed in the order given in the configuration file, i.e. the first match wins.

[Note]Blacklisting vs. whitelisting, and the 'trash' output queue

Output queues are labelled. The label 'trash' is reserved and refers to the trash bin (no output, throw away log entries if the matching rule is assigned to the 'trash' queue).

If a logfile entry does not match any rule, it is reported (i.e. the default is whitelisting known-good entries). To turn this into a blacklisting policy, simply add a catch-all rule at the end and assign it to the 'trash' queue.

16.1. Event Correlation

Sometimes it is desirable to report on the fact that several events happend at a similar time, possibly in a particular order. As of version 2.6.1, samhain supports this in the following way:

16.1.1. Marking individual events to be correlated

First, individual events to be correlated need to be marked for keeping them, under an arbitrary user-defined label, for an arbitrary user-defined time. So the rule for matching an event has to be modified like this:

LogmonRule=KEEP( seconds,label): queue_label: (perl)regex matches a logfile entry against the provided regular expression, AND keeps it for the specified time in seconds, with the specified label. In other words, processing of this rule will be no different than other rules, except for the fact that also a memory of the event is kept for the specified amount of time. So if you e.g. don't want a separate report for this individual event, just assign it to the trash queue.

16.1.2. Correlating the marked events

To correlate events labelled label_one, label_two, etc., just build a regular expression that matches the labels, in the temporal order you want to check for. E.g. if the temporal order is irrelevant, you may want to match (label_one.*label_two)|(label_two.*label_one). Use this expression in a rule maked as CORRELATE( description), like this:

LogmonRule=CORRELATE( description): queue_label: (perl)regex

[Note]Old records in existing logfiles

Because the 'keep' timeout is relative to the current time, correlation of old entries in logfiles (i.e. when, at startup, an existing logfile with old entries is scanned) will only work if you specify 'keep' timeouts that are long enough to cover the whole timespan from the first logfile record until now.

16.2. Reporting non-occurence of an event

To check whether a given event occurs at least once within some given interval, the rule for matching an event can be modified like this:

LogmonRule=MARK( seconds,description): queue_label: (perl)regex matches a logfile entry against the provided regular expression, AND checks whether is occurs at least once within the specified interval (seconds).

Processing of this rule will be no different than other rules otherwise, so if you e.g. only want a report for this event if it is missing, just assign it to the trash queue. However, in the latter case the severity for reporting the messages must be set separately with the LogmonMarkSeverity directive, because the 'trash' queue has no severity assigned:

LogmonMarkSeverity= severity — Severity for reports on missing heartbeat messages if the messages themselves are assigned to the 'trash' queue (default: crit).

16.3. Reporting bursts of similar, repeated events

Samhain can automatically detect and report bursts of similar, repeated events in the monitored logfiles. Here similar, repeated events refers to events that differ (only) in details that can be expected to differ for events of the same kind: IP adresses, FQDNs, email adresses, and numbers. The event history goes back 12 minutes, and thus a report is triggered if the number of similar events within the last 12 minutes exceeds a given threshold (default: 24).

This feature is off by default. In order to switch it on, you need to set a reporting queue:

LogmonBurstQueue= queue — Set the reporting queue for reporting bursts of similar log messages (default: don't report).

In addition, there are two more configurable parameters, one to set the triggering threshold (i.e. the number of messages within 12 minutes that need to be exceeded to raise an alert), and another one to indicate whether messages from the cron daemon should be considered as well (default: no):

LogmonBurstThreshold= number — The number of repeated messages within 12 minutes that must be exceeded to report a burst of repeated messages (default: 24).

LogmonBurstCron= boolean — Whether to report also on bursts of repeated cron messages (default: false).

16.4. Options

LogmonActive= boolean switches this module on or off (default: off).

LogmonSaveDir= /absolute/path sets the directory where checkpoint data for logfiles is stored (default: same as for database file).

LogmonClean= boolean delete old checkpoint data unmodified for 30 days or more (default: off).

LogmonInterval= seconds sets the interval for logfile checking (default: 10 seconds).

LogmonMarkSeverity= severity — Severity for reports on missing heartbeat messages if the messages themselves are assigned to the 'trash' queue (default: crit).

LogmonBurstThreshold= number — The number of repeated messages within 12 minutes that must be exceeded to report a burst of repeated messages (default: 24).

LogmonBurstQueue= queue — Set the reporting queue for reporting bursts of similar log messages (default: don't report).

LogmonBurstCron= boolean — Whether to report also on bursts of repeated cron messages (defaul: false).

LogmonDeadtime= seconds — Do not report a correlated event again within the given time (default: 60 seconds).

LogmonWatch= TYPE:path[:format] advises the module to monitor the logfile with the specified path, which is of type 'TYPE' (logfile types are uppercase). Some logfile types (e.g. Apache access logs) can be customized, and hence some format information must be provided.

[Note]Do not quote the format

Please note that it's neither required nor supported to add quotes around the format string. Likewise, quotes within the format should not be escaped. Wrong:

LogmonWatch= APACHE:/var/log/apache/access.log:"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\""

Correct:

LogmonWatch= APACHE:/var/log/apache/access.log:%h %l %u %t "%r" %>s %b "%{Referer}i"

Currently ( samhain 2.6.4) the following logfile types are supported

SYSLOG

Standard UNIX style syslog files. Matching starts at the command (i.e. after the hostname). To select certain hostnames, place the rule under a LogmonHost directive (see below). If the LogmonHidePID option is used, the RE should not account for the process PID.

APACHE

Apache (or compatible) webserver access and/or error logs. Required format information: either one of combined, common, or error(error log), or the Apache custom log format specification used (also '%{X-Forwarded-For}i' is recognized). The whole log line is matched. If there are virtual hosts (%v), then the LogmonHost directive will match the virtual host.

In addition to the Apache format specifications, is possible to insert a literal regular expression as RE{ regex} ( samhain 2.8.4+).

SAMBA

Samba logfile format (multiline, timestamp and origin within samba source code on first line, log message on continuation lines). The RE will match the continuation line (with the log message) only.

PACCT

BSD style process accounting (also available on Linux). This is a binary logfile. The module will build a text line like the 'last' command does, and match it against the RE.

What is pacct good for? Note that pacct records contain only the executable name, not the arguments. This may look somewhat useless for shell accounts, but is quite useful for servers: how many different commands can e.g. postfix legitimately execute? Just a handful, indeed, and certainly none of them is /bin/sh! So if pacct says that the 'postfix' user has executed a shell, then this would be rather alarming...

SHELL

A shell command. The full output on stdout will be read and matched. The PATH environment variable will be set to /sbin:/bin:/usr/sbin:/usr/bin:/usr/ucb, and the SHELL, IFS, and TZ variables will be defined. The command is executed via /bin/sh -c command .

LogmonHidePID= boolean is an option that only affects logfiles of type SYSLOG. It causes the PID to be stripped from the log line (before matching against the RE).

LogmonQueue= label:[interval]:(sum|report):severity[:alias] defines an output queue. Here, label is an arbitrary name which is used to assign rules to this queue; interval is the timespan over which messages are summarized if the queue is of type 'sum'; sum(summarize over some interval) or report(report each event separately and immediately) are the two queue type supported, and severity is the severity assigned to an event. Furthermore, optionally it is possible to specify an alias(must be defined in the email configuration) to direct email for this rule to a specific list of recipients.

[Note]Email

If you spefify a list alias, email will still go to all defined email recipients unless filtered, e.g. with

	    SetMailFilterNot = \[Logfile\]
	  

I.e. you may want to define recipients, filter them as above, and then define list aliases to be used in an event queue. See Section 4 for more information.

LogmonHost= (perl)regex causes the following rules to be applied only to entries for this host(s). It is ended implicitely by another LogmonHost directive, or explicitely by a LogmonEndHost directive.

LogmonEndHost explicitely ends a preceding LogmonHost directive.

LogmonGroup= group_label:(perl)regex causes the following rules to be applied only if the group regex matches (i.e. rules within the group are skipped if the group regex doesn't match. This can be used to improve speed/efficiency of matching, i.e. you can group regexes by a common prefix. A group is ended implicitely by another LogmonGroup directive, or explicitely by a LogmonEndGroup directive.

LogmonEndGroup explicitely ends a preceding LogmonGroup directive.

LogmonRule= queue_label:(perl)regex matches a logfile entry against the provided regular expression. If the expression matches, then captured subexpressions are replaced by '___', and the logfile entry is reported as specified for the queue referenced by queue_label. Non-captured subexpressions (i.e. subexpressions where the opening bracket is followed by '?:') are not replaced by '___', but reported literally.

LogmonRule=KEEP( seconds,label): queue_label:(perl)regex as above, but additionally keep the event label for seconds to perform event correlation.

LogmonRule=CORRELATE( description): queue_label: (perl)regex perform event correlation by matching the labels(as specified in KEEP rules) of a sequence of events against the given regular expression.

LogmonRule=MARK( seconds,description): queue_label: (perl)regex matches a logfile entry against the provided regular expression, AND checks whether is occurs at least once within the specified interval (seconds).

16.5. Example configuration

	  [Logmon]
	  
	  #
	  # Switch on the module
	  #
	  LogmonActive = yes
	  
	  # Check every second
	  #
	  LogmonInterval = 1
	  
	  # Strip PIDs from syslog messages
	  #
	  Logmonhidepid = true
	  
	  # Define a queue with severity 'crit'.
	  # This is a 'report' queue, hence 'interval' (10)
	  # will be ignored.
	  #
	  LogmonQueue = q1:10:report:crit
	  
	  # Define a second queue with severity 'alert'
	  # 
	  LogmonQueue = q2:10:report:alert
	  
	  # Monitor /var/log/messages, which is a syslog file
	  #
	  LogmonWatch = SYSLOG:/var/log/messages
	  
	  # Monitor /var/log/samba/log.nmbd, which is a samba
	  # logfile
	  #
	  LogmonWatch = SAMBA:/var/log/samba/log.nmbd
	  
	  # Monitor /var/log/apache2/access.log, which is
	  # an Apache logfile in 'combined' format
	  #
	  LogmonWatch = APACHE:/var/log/apache2/access.log:combined
	  
	  # Monitor disks to check for full /dev/sda1
	  #
	  LogmonWatch = SHELL:df -h
	  
	  # Syslog messages for the pppd deamon
	  #
	  LogmonGroup = g1:pppd.*
	    #
	    # Rules in this group
	    #
	    LogmonRule     = q1:pppd:\s+primary.*
	    LogmonRule     = q1:pppd:\s+secondary.*
	    #
	  LogmonEndGroup

	  # Warn about disk /dev/sda1 nearly full (80% or more. Use a 
	  # non-capturing subexpression [the (?:8|9)] for the percentage full.
	  #
	  LogmonRule = q1:/dev/sda1\s+[0-9GM.]+\s+[0-9GM.]+\s+[0-9GM.]+\s+(?:8|9).%.*

	  # Messages starting with WARNING (some samba stuff)
	  #
	  LogmonGroup = g2:WARNING.*
	  LogmonRule     = q2:.*interfaces.*
	  LogmonEndGroup

	  # Report on these events if happening within 120 seconds.
	  # Set LogmonDeadtime to 120 seconds to avoid multiple reports.
	  # Use the 'trash' queue for the keep rules to avoid reports on
	  #   the individual events.
	  #
	  LogmonRule = KEEP(120,event1):trash:sshd: Accepted publickey for root.*
	  LogmonRule = KEEP(120,event2):trash:sshd: pam_unix\(sshd:session\).*
	  LogmonRule = CORRELATE(root_login):q1:(event1.*event2)|(event2.*event1)

	  LogmonDeadtime = 120

	  # Throw away all non-matching entries. This amounts
	  # to a blacklist policy (only report known bad).
	  #
	  # Usually considered bad practice!!! Use whitelisting!
	  #
	  # 'trash' is a built in queue, no definition needed.
	  #
	  LogmonRule = trash:.*