Monday, March 23, 2009

Features that fell off the radar

I was long pondering with the problem that it is quite tricky to enter regexps into syslog-ng configuration file, since if you enclose the string in double quotes (e.g. in ""), the backslash character needs to be escaped.

Since backslash is used in regexps quite often, it can become cumbersome to enter regexps like:

match("[a-z\\-]+");

Note that the backslash is doubled because otherwise the syslog-ng string parser would pass the sequence to the regexps compiler as: "[a-z-]+" which is certainly different in meaning what the above expression says.

I always remembered that syslog-ng also supports single quotes (aka apostrophes), but I remembered they behaved just as if you used normal quotation marks. Therefore I was thinking about a 3rd string format, one that would not require escaping.

However I was reading the related code the other day, and found that apostrophes work exactly the way I planned this 3rd string syntax to behave: not to get in the way when entering regexps. In fact it behaves just like apostrophes in the UNIX shells. It does not care about escaping, it only cares about the terminating apostrophe.

I was dealing with regexp related questions on the mailing list a lot, and the root cause of the problems was most times this escaping stuff, and I never knew the proper answer and behaviour is already in syslog-ng, I've just forgotten about it completely.

And now as I check the documentation for syslog-ng, it does not mention this syntax either, even though it had been present even in the 1.6.x times.

So if you had trouble writing lots of regexps in syslog-ng configuration, and I told you to properly escape your regexps, please forgive me. syslog-ng is better than I've thought :)

Monday, March 16, 2009

Newborn baby

After about two weeks being late, my son was born yesterday evening at 22:45CET. He weights 3270g and 56cm. Both the mother and the child are fine and I'm a proud new father.

I guess this starts a section in my life, hopefully for the better.

Saturday, March 14, 2009

syslog-ng OSE binary packages

I' happy to announce that BalaBit has decided to make the binary packages for syslog-ng OSE available for free.

As you may know, BalaBit has various syslog-ng support packages and as a part of this service it prepared binary installation packages for different platforms. The access to these packages either required a support contract but could also be purchased separately for a yearly fee.

With syslog-ng 3.0, the binary packages for syslog-ng OSE will become freely accessible.

Since syslog-ng is an open source project, BalaBit planned to finish this task in the Open Source spirit: open and visible to all community members. This also means that the set of packages published with this e-mail is NOT yet release grade, rather it is more of a development snapshot of the current state of affairs. So please don't ruin your production systems with this package, it is more advisable to try them in a test environment (chroot or a dedicated test machine).

With all these said, here is the link:

https://www.balabit.com/network-security/syslog-ng/opensource-logging-system/upgrades/

Please pick the release named "3.0HEAD". This contains a source snapshot (effectively git from two days ago), and a set of packages for SUSE 10, RHEL4/5, FreeBSD 6.x, Debian etch, and Linux generic.

The binary packages contain all runtime dependencies needed to run syslog-ng, thus no further packages are required, it is an all-in-one package. The rpm/deb packages are prepared the same, they install syslog-ng in /opt/syslog-ng in order to avoid clashes with a system supplied syslog-ng daemon.

There are two install kits for each platform:
  • one that includes database drivers (dubbed as "server")
  • one that does not include database drivers (dubbed as "client")

Currently there are no other differences between the packages, but later on there might be.

With the current infrastructure in place, I'm confident that with each syslog-ng OSE release, I can publish the source AND binary packages at the same time.

I'd really appreciate success/failure reports and also any kind of comment you may have.

I'd like to release 3.0.2 together with its binary packages, let's hope that I get enough feedback on these packages so that I can do that.

Enjoy!

Wednesday, March 11, 2009

First IETF syslog-protocol related question

I'm happy as I've received the first question about the new IETF specified syslog-protocol support. There's a need for that after all :)

Next event on the horizon

I didn't realize it is already that time of the year, but I was reminded that I'm going to give a talk on syslog-ng 3.0 on Open Source Data Center conference in N├╝rnberg, Germany at the end of April. I'm going to talk about the nifty new features of syslog-ng 3.0.

It would be very nice to meet syslog-ng users there. :)

Tuesday, March 03, 2009

An introduction to db-parser()

As promised on the mailing list here comes a short description of the new db-parser functionality of syslog-ng. For an introduction to parsers in general see my previous blog post here.

The aim for db-parser is two-fold:
  • extract interesting information from a log message
  • attach tags to a log message for later classification.
For instance here's a log sample (lines broken for readability):

Feb 24 11:55:22 bzorp sshd[4376]: Accepted password for bazsi \
from 10.50.0.247 port 42156 ssh2


This message states that a user named "bazsi" has logged into the host named "bzorp" using SSH2 from the quoted IP and port. When you read this message as a human, the event that happened is perfectly clear. However if it is not a human, but a piece of software that has to make out the meaning of the message, you need to identify the event (e.g. that a user login has happened) and the additional information associated with the event (e.g. that he used 10.50.0.247 as the client).

If I wanted to express this as name-value pairs, it would be something like this:

event="user login", protocol="ssh2", \
client="10.50.0.247:42156", method="password"

Surely this latter form is easier to analyze than the first. So the first step of all kinds of log analysis is to extract information from messages. At a first glance, the easiest way to extract this information is the use of
regular expressions. For example:

^\w{3} [ :0-9]{11} [._[:alnum:]-]+ sshd\[[0-9]+\]: Accepted \
(gssapi(-with-mic|-keyex)?|rsa|dsa|password|publickey|keyboard-interactive/pam) \
for [^[:space:]]+ from [^[:space:]]+ port [0-9]+( (ssh|ssh2))?

Once you match with the regular expression above (courtesy of the logcheck project), the parentheseses mark the variable part of the information that you can reference as $1, $2 and so on.

The problem with regular expressions are several fold:
  • they are difficult to write (just look at the example above)
  • they are even more difficult to understand, once written (again, please look at the example)
  • they are slow and they scale poorly with the number of regexps that we need to match against the incoming message stream.
Projects like logcheck use regular expressions, but with the number of patterns increasing, the time needed to analyze logs skyrockets, which makes the whole thing unfeasible. Also, logcheck does not aim at extracting information from messages, it merely classifies them.

Clearly a different approach is needed. And that's what db-parser in syslog-ng is.

The db-parser() functionality of syslog-ng has the following objectives:
  • use a database to match various messages (and not filters embedded in the configuration file)
  • classify events into logcheck-like classes (cracking, violation, ignore, unknown)
  • extract variable information from messages, and place those into name-value pairs
  • be fast, scale to a high number of events/sec and high number of patterns
  • integrate well to the rest of syslog-ng
db-parser() is a generic parser, fits nicely to the parser framework inside syslog-ng. You can use it just like csv-parser():

...
parser p_db { db-parser(); };
...
log { source(src); parser(p_db); destination(d_parsed); };
...

The database used by db-parser is an XML file that is read during syslog-ng startup. Here is an example entry from the db-parser() database:

<patterndb>
<ruleset name='sshd'>
<pattern>sshd</pattern>
<rules>
<rule provider='balabit' id='1' class='system'>
<patterns>
<pattern>Accepted rsa for@QSTRING:username: @from\
@QSTRING:client_addr: @port @NUMBER:port:@ ssh2</pattern>
</patterns>
</rule>
...
</rules>
</ruleset>
</patterndb>



As you can see the database is structured, and the first selection criteria to apply is the name of the application (e.g. the value for $PROGRAM). Then each rule matches against the message payload (e.g. the value for $MESSAGE) with the syslog header stripped off. The rule specifies the classification (e.g. 'system' in the example above) and lists one or more patterns. If any of the patterns match, the rule is considered a match.

The variable part of the pattern is specified using special sequences, starting and ending with a '@' character. Within the enclosing '@' characters a colon separated list of parameters are listed:
  • the parser to apply (QSTRING and NUMBER in the example above)
  • the name of the value to be extracted from this position
  • additional arguments to be passed to the parser
The available parsers are currently not really documented, but here is a
list of them (you can find these in the radix.c source file):
  • IPv4: to parse an IPv4 address
  • NUMBER: to parse a number
  • STRING: to parse a word
  • ESTRING: to parse a sequence of characters ending with a specific character
  • QSTRING: to parse a string enclosed within quotes
Of course further parsers can be added to the code easily. You don't have to specify monsterous regexps to match an IPv4 address anymore. Not to mention IPv6 :)

If a message matches a rule, the db-parser() will make the following list of values defined for the given message:
  • .classifier.class: logcheck-like classification
  • .classifier.rule_id: the ID of the database entry that matched
  • pattern specific values: variable part that get extracted from the message by patterns
Each of the values defined previously can be referenced inside syslog-ng using a macro, e.g. you can do things like:

# You can use them in a filter:
filter f_class {
match("system" value(".classifier.class"));
};

# but you can also use them in the names of files:
destination d_parsed {
file("/var/log/messages/${.classifier.class}.log");
};

That's a rough skeleton of what db-parser() is. If you are interested, you can find the db-parser() implementation in syslog-ng OSE 3.0:

http://www.balabit.com/network-security/syslog-ng/opensource-logging-system/

You can also find some example pattern databases here:

http://www.balabit.com/downloads/files/patterndb/

We are also thinking about further ideas to enhance db-parser() and make it the foundation of an Open Source log analysis framework. Stay tuned!