This option is available with samhain version 2.5.0 and higher. To compile with support for this option, use the configure option
./configure --enable-logfile-monitor
PCRE library required | |
---|---|
This option requires the PCRE (Perl Compatible Regular Expressions) library. Many Linux distributions split library packages into a runtime package (required to run a dependent executable) and a development package (required to compile an executable). At least on the build host where samhain is compiled, the development package is required if you use this option. |
This module enables samhain to monitor/analyze logfiles of other applications. Currently ( samhain 2.5.0) the following logfile formats are supported:
Syslog
Apache (access and error log)
Samba
'pacct' BSD-style process accounting (also available on Linux)
Logfile analysis will always start from the point the last one ended; the pointer into the file is stored persistently on disk. Logfile rotation is handled automatically as long as the rotated logfile remains in the same directory and is not compressed(usually log rotation tools can be configured to compress only after the second rotation, which is advisable for unrelated reasons - the logging application may still have an open file pointer after logfile rotation).
Logfile entries can be filtered with Perl-style regular expressions (filter rules). Regular expressions must match the whole logfile record. For efficiency, regular expressions can be grouped under a common regular expression, i.e. if the group expression fails to match, no RE in the group is tried. Furthermore, (groups of) regular expressions can be grouped by host, if the logfile(s) contain host information (such as host information in centralized syslog server logfiles, or virtual host information in Apache logfiles). Note that host->group->rule is supported (just as host->rule or group->rule), while group->host->rule isn't.
Each filtering rule (regular expression) is assigned to an output queue. Currently (samhain 2.5.0) queues only differ in the assigned severity of an event, but more options (per-queue mail addresses for alerts) are under development.
Filtering rules are processed in the order given in the configuration file, i.e. the first match wins.
Blacklisting vs. whitelisting, and the 'trash' output queue | |
---|---|
Output queues are labelled. The label 'trash' is reserved and refers to the trash bin (no output, throw away log entries if the matching rule is assigned to the 'trash' queue). If a logfile entry does not match any rule, it is reported (i.e. the default is whitelisting known-good entries). To turn this into a blacklisting policy, simply add a catch-all rule at the end and assign it to the 'trash' queue. |
Sometimes it is desirable to report on the fact that several events happend at a similar time, possibly in a particular order. As of version 2.6.1, samhain supports this in the following way:
First, individual events to be correlated need to be marked for keeping them, under an arbitrary user-defined label, for an arbitrary user-defined time. So the rule for matching an event has to be modified like this:
LogmonRule=KEEP(
seconds,label
):
queue_label
:
(perl)regex
matches a
logfile entry against the provided regular expression,
AND keeps it for the specified time in
seconds, with the specified
label. In other words, processing of
this rule will be no different than other rules, except
for the fact that also a memory of the event is kept for
the specified amount of time. So if you e.g. don't want a
separate report for this individual event, just assign it
to the
trash queue.
To correlate events labelled label_one, label_two, etc., just build a regular expression that matches the labels, in the temporal order you want to check for. E.g. if the temporal order is irrelevant, you may want to match (label_one.*label_two)|(label_two.*label_one). Use this expression in a rule maked as CORRELATE( description), like this:
LogmonRule=CORRELATE(
description
):
queue_label
:
(perl)regex
Old records in existing logfiles | |
---|---|
Because the 'keep' timeout is relative to the current time, correlation of old entries in logfiles (i.e. when, at startup, an existing logfile with old entries is scanned) will only work if you specify 'keep' timeouts that are long enough to cover the whole timespan from the first logfile record until now. |
To check whether a given event occurs at least once within some given interval, the rule for matching an event can be modified like this:
LogmonRule=MARK(
seconds,description
):
queue_label
:
(perl)regex
matches a
logfile entry against the provided regular expression, AND
checks whether is occurs at least once within the specified
interval (seconds).
Processing of this rule will be no different than other rules otherwise, so if you e.g. only want a report for this event if it is missing, just assign it to the trash queue. However, in the latter case the severity for reporting the messages must be set separately with the LogmonMarkSeverity directive, because the 'trash' queue has no severity assigned:
LogmonMarkSeverity=
severity
—
Severity for reports on missing heartbeat messages if the
messages themselves are assigned to the 'trash' queue
(default: crit).
Samhain can automatically detect and report bursts of similar, repeated events in the monitored logfiles. Here similar, repeated events refers to events that differ (only) in details that can be expected to differ for events of the same kind: IP adresses, FQDNs, email adresses, and numbers. The event history goes back 12 minutes, and thus a report is triggered if the number of similar events within the last 12 minutes exceeds a given threshold (default: 24).
This feature is off by default. In order to switch it on, you need to set a reporting queue:
LogmonBurstQueue=
queue
— Set the
reporting queue for reporting bursts of similar log
messages (default: don't report).
In addition, there are two more configurable parameters, one to set the triggering threshold (i.e. the number of messages within 12 minutes that need to be exceeded to raise an alert), and another one to indicate whether messages from the cron daemon should be considered as well (default: no):
LogmonBurstThreshold=
number
— The
number of repeated messages within 12 minutes that must be
exceeded to report a burst of repeated messages (default:
24).
LogmonBurstCron=
boolean
—
Whether to report also on bursts of repeated cron messages
(default: false).
LogmonActive=
boolean
switches this
module on or off (default: off).
LogmonSaveDir=
/absolute/path
sets the
directory where checkpoint data for logfiles is stored
(default: same as for database file).
LogmonClean=
boolean
delete old
checkpoint data unmodified for 30 days or more (default:
off).
LogmonInterval=
seconds
sets the
interval for logfile checking (default: 10 seconds).
LogmonMarkSeverity=
severity
—
Severity for reports on missing heartbeat messages if the
messages themselves are assigned to the 'trash' queue
(default: crit).
LogmonBurstThreshold=
number
— The
number of repeated messages within 12 minutes that must be
exceeded to report a burst of repeated messages (default:
24).
LogmonBurstQueue=
queue
— Set the
reporting queue for reporting bursts of similar log
messages (default: don't report).
LogmonBurstCron=
boolean
—
Whether to report also on bursts of repeated cron messages
(defaul: false).
LogmonDeadtime=
seconds
— Do not
report a correlated event again within the given time
(default: 60 seconds).
LogmonWatch=
TYPE:path[:format]
advises the
module to monitor the logfile with the specified
path
, which is of type
'TYPE' (logfile types are uppercase). Some logfile types
(e.g. Apache access logs) can be customized, and hence some
format information must be
provided.
Do not quote the format | |
---|---|
Please note that it's neither required nor supported to add quotes around the format string. Likewise, quotes within the format should not be escaped. Wrong:
LogmonWatch=
Correct:
LogmonWatch=
|
Currently ( samhain 2.6.4) the following logfile types are supported
- SYSLOG
Standard UNIX style syslog files. Matching starts at the command (i.e. after the hostname). To select certain hostnames, place the rule under a LogmonHost directive (see below). If the LogmonHidePID option is used, the RE should not account for the process PID.
- APACHE
Apache (or compatible) webserver access and/or error logs. Required format information: either one of
combined
,common
, orerror
(error log), or the Apache custom log format specification used (also '%{X-Forwarded-For}i' is recognized). The whole log line is matched. If there are virtual hosts (%v), then the LogmonHost directive will match the virtual host.In addition to the Apache format specifications, is possible to insert a literal regular expression as RE{
regex
} ( samhain 2.8.4+).- SAMBA
Samba logfile format (multiline, timestamp and origin within samba source code on first line, log message on continuation lines). The RE will match the continuation line (with the log message) only.
- PACCT
BSD style process accounting (also available on Linux). This is a binary logfile. The module will build a text line like the 'last' command does, and match it against the RE.
What is pacct good for? Note that pacct records contain only the executable name, not the arguments. This may look somewhat useless for shell accounts, but is quite useful for servers: how many different commands can e.g. postfix legitimately execute? Just a handful, indeed, and certainly none of them is /bin/sh! So if pacct says that the 'postfix' user has executed a shell, then this would be rather alarming...
- SHELL
A shell command. The full output on stdout will be read and matched. The PATH environment variable will be set to
/sbin:/bin:/usr/sbin:/usr/bin:/usr/ucb
, and the SHELL, IFS, and TZ variables will be defined. The command is executed via /bin/sh -ccommand
.
LogmonHidePID=
boolean
is an option
that only affects logfiles of type SYSLOG. It causes the
PID to be stripped from the log line (before matching
against the RE).
LogmonQueue=
label:[interval]:(sum|report):severity[:alias]
defines
an output queue. Here,
label is an arbitrary name which is
used to assign rules to this queue;
interval is the timespan over which
messages are summarized if the queue is of type 'sum';
sum(summarize over some interval) or
report(report each event separately
and immediately) are the two queue type supported, and
severity is the severity assigned to an
event. Furthermore, optionally it is possible to specify an
alias(must be defined in the email
configuration) to direct email for this rule to a specific
list of recipients.
If you spefify a list alias, email will still go to all defined email recipients unless filtered, e.g. with SetMailFilterNot = \[Logfile\] I.e. you may want to define recipients, filter them as above, and then define list aliases to be used in an event queue. See Section 4 for more information. |
LogmonHost=
(perl)regex
causes the
following rules to be applied only to entries for this
host(s). It is ended implicitely by another LogmonHost
directive, or explicitely by a LogmonEndHost
directive.
LogmonEndHost explicitely ends a preceding LogmonHost directive.
LogmonGroup=
group_label:(perl)regex
causes the
following rules to be applied only if the group regex
matches (i.e. rules within the group are skipped if the
group regex doesn't match. This can be used to improve
speed/efficiency of matching, i.e. you can group regexes by
a common prefix. A group is ended implicitely by another
LogmonGroup directive, or explicitely by a LogmonEndGroup
directive.
LogmonEndGroup explicitely ends a preceding LogmonGroup directive.
LogmonRule=
queue_label:(perl)regex
matches a
logfile entry against the provided regular expression. If
the expression matches, then
captured subexpressions are replaced by
'___', and the logfile entry is reported as specified for
the queue referenced by
queue_label. Non-captured
subexpressions (i.e. subexpressions where the opening
bracket is followed by '?:') are
not replaced by '___', but reported
literally.
LogmonRule=KEEP(
seconds,label
):
queue_label:(perl)regex
as above,
but additionally keep the event
label for
seconds to perform event
correlation.
LogmonRule=CORRELATE(
description
):
queue_label
:
(perl)regex
perform
event correlation by matching the
labels(as specified in KEEP rules) of
a sequence of events against the given regular
expression.
LogmonRule=MARK(
seconds,description
):
queue_label
:
(perl)regex
matches a
logfile entry against the provided regular expression, AND
checks whether is occurs at least once within the specified
interval (seconds).
[Logmon] # # Switch on the module # LogmonActive = yes # Check every second # LogmonInterval = 1 # Strip PIDs from syslog messages # Logmonhidepid = true # Define a queue with severity 'crit'. # This is a 'report' queue, hence 'interval' (10) # will be ignored. # LogmonQueue = q1:10:report:crit # Define a second queue with severity 'alert' # LogmonQueue = q2:10:report:alert # Monitor /var/log/messages, which is a syslog file # LogmonWatch = SYSLOG:/var/log/messages # Monitor /var/log/samba/log.nmbd, which is a samba # logfile # LogmonWatch = SAMBA:/var/log/samba/log.nmbd # Monitor /var/log/apache2/access.log, which is # an Apache logfile in 'combined' format # LogmonWatch = APACHE:/var/log/apache2/access.log:combined # Monitor disks to check for full /dev/sda1 # LogmonWatch = SHELL:df -h # Syslog messages for the pppd deamon # LogmonGroup = g1:pppd.* # # Rules in this group # LogmonRule = q1:pppd:\s+primary.* LogmonRule = q1:pppd:\s+secondary.* # LogmonEndGroup # Warn about disk /dev/sda1 nearly full (80% or more. Use a # non-capturing subexpression [the (?:8|9)] for the percentage full. # LogmonRule = q1:/dev/sda1\s+[0-9GM.]+\s+[0-9GM.]+\s+[0-9GM.]+\s+(?:8|9).%.* # Messages starting with WARNING (some samba stuff) # LogmonGroup = g2:WARNING.* LogmonRule = q2:.*interfaces.* LogmonEndGroup # Report on these events if happening within 120 seconds. # Set LogmonDeadtime to 120 seconds to avoid multiple reports. # Use the 'trash' queue for the keep rules to avoid reports on # the individual events. # LogmonRule = KEEP(120,event1):trash:sshd: Accepted publickey for root.* LogmonRule = KEEP(120,event2):trash:sshd: pam_unix\(sshd:session\).* LogmonRule = CORRELATE(root_login):q1:(event1.*event2)|(event2.*event1) LogmonDeadtime = 120 # Throw away all non-matching entries. This amounts # to a blacklist policy (only report known bad). # # Usually considered bad practice!!! Use whitelisting! # # 'trash' is a built in queue, no definition needed. # LogmonRule = trash:.*