Thursday, December 11, 2008

include file support implemented

I've implemented basic include file/directory functionality in syslog-ng, using the format numbered second in my previous post.

I've now pushed an expermental implementation of include files in the syslog-ng OSE 3.0 repository, in a separate branch called 'include'.

E.g. in order to test the code, please clone the syslog-ng 3.0 repository:

$ git clone git://git.balabit.hu/bazsi/syslog-ng-3.0.git

Then check out the 'include' branch:

$ git checkout --track -b include origin/include

Then compile as usual. I didn't want to integrate it right into syslog-ng OSE 3.0 tree as I'd like to release that first as 3.0.1.

Wednesday, December 10, 2008

include syntax

I'm about to implement configuration file includes, and although the implementation is quite straightforward, the syntax to be used is something to give a thought or two.

Currently the syslog-ng configuration file consists of statements, each with the following basic format:

stmt [] { ... };

The "id" gives a unique identifier of the statement, and the braces enclose the contents. Currently only the ID part is optional, the braces are always there.

To make the include statement consistent with that, it'd have to look something like:

include { "filename" };

Obviously I don't like this too much, as it is way different from all other applications permitting the use of include statements. What about this:

include "filename";

E.g. use the ID part the name of the file to be included. I like this better. A third option might be the use of 'pragma' directives, currently only used to specify the file format compatibility in the case of syslog-ng 3.0:

@version: 3.0

This'd mean that include statements would look like this:

@include: filename

The problem with this last option is that pragmas are currently only processed at the beginning of the configuration file. So that code should also be generalized.

I think I'd go with the second option, that's not completely inconsistent, but still the most intuitive to use.

What do you think?

Sunday, November 23, 2008

syslog-ng 3.0 and SNMP traps

Last time I've written about how syslog-ng is able to change message contents. I thought it'd be useful to give you a more practical example, instead of a generic description.

It is quite common to convert SNMP traps to syslog messages. The easiest implementation is to run snmptrapd and have it create a log message based on the trap. There's a small issue though: snmptrapd uses the UNIX syslog() API, and as such it is not able to propagate the originating host of the SNMP trap to the hostname portion of the syslog message. This means that all traps are logged as messages coming from the host running snmptrapd, and the hostname information is part of the message payload.

Of course it'd be much easier to process syslog messages, if this were not the case.

A solution would be to patch snmptrapd to send complete syslog frames, but that would require changing snmptrapd source. The alternative is to use the new parse and rewrite features of syslog-ng 3.0.

First, you need to filter snmptrapd messages:

filter f_snmptrapd { program("snmptrapd"); };

Then we'd need to grab the first field of the message payload, where snmptrapd is configured to put it:

rewrite r_snmptrapd {
subst("^([^ ]+) (.*)$ ", "${2}");
set("${1}" value("HOST"));
};


Both rewrite expression kinds are demonstrated here:
  • subst() has two arguments: the first is a regexp to search for, the second is a template to be substituted if there's a match
  • set() has a single argument: a template to be used as the new value
Rewrite rules operate by the contents of the $MESSAGE value by default, which holds the message payload. Of course this can be changed by specifying the value() option. The notion 'value' in syslog-ng 3.0 refers to a name-value pair, in syslog-ng 3.0 every message is composed of a set of name-value pairs. The names of standard values match the name of the corresponding macro, but without the '$' sign.

Please NOTE that the new value is a template which makes it possible to use macros such as $HOST or $MESSAGE defined by syslog-ng.

Now let's wire the complete configuration together:

filter f_snmptrapd { program("snmptrapd"); };

rewrite r_snmptrapd {
subst("^([^ ]+) (.*)$ ", "${2}");
set("${1}" value("HOST"));
};

log {
source(s_all);
filter(f_snmptrapd);
rewrite(r_snmptrapd);
destination(d_all);
flags(final);
};

log {
source(s_all);
destination(d_all);
flags(final);
};

Of course this is only an example of the power of what syslog-ng is now capable of doing. Please let me know if you can think of other uses.

The current 3.0 branch of syslog-ng has not been released yet, it is available in the git repository at git.balabit.hu, and also as nightly snapshots.

I'd be grateful for any kind of feedback you might have, please post it either as comments on this blog, or to the mailing list.

Saturday, November 08, 2008

syslog-ng statistics

For a long time I meant to give the "log statistics" feature of syslog-ng an overhaul, and finally with the advent of syslog-ng 3.0, this was done.

I'm not sure all of you know, but even earlier syslog-ng versions (2.1 and 2.0) did collect some per-source and per-destination statistics. These were reported periodically in the system log. The problem with this approach that it didn't really scale: with a large configuration the statistics message could become kilobytes long, and parsing this information from a file possibly several gigabytes in size is daunting.

syslog-ng 3.0 has two important changes in this area: it adds several new kinds of counters (like per-host counters), and a UNIX domain socket where you can query the current status of these counters.

As counters certainly have an overhead, you can now control how much statistics you want to gather. The new stats_level() option has three levels for now:
  1. stats_level(0) is basically the same as earlier syslog-ng versions, per-source and per-destination statistics are kept here. This is the default.
  2. stats_level(1) adds new counters without a big overhead, that is it adds counters for TCP connections, but does not keep per-host counters
  3. stats_level(2) adds counters that can have a measurable performance impact, it adds for example per-host (as in $HOST) counters and also keeps track of the time the last message was received from a given host. These counters usually require an hash table lookup in the fastpath.
Once you have the counters, you can still use the venerable "log statistics" message, by setting the stats_freq() option which defaults to 10 minutes, just like in earlier versions.

However if you don't want to dig the logs produced by syslog-ng, you can also use the new UNIX domain socket at /var/run/syslog-ng/syslog-ng.ctl (the path might depend on the compilation options).

If you connect to this socket using netcat (some netcat versions do support UNIX domain sockets), and you send a "STATS" command to it, you get the list of counters.

There are no proper, command line clients for the UNIX domain channel yet, but if you have some scripting ability, you can start gather statistics easily, without the hassles of parsing log files, right after installing a syslog-ng 3.0 snapshot. :)

Thursday, October 30, 2008

syslog-ng message parsing

Earlier this month, I announced the new syslog-ng 3.0 git tree, adding a lot of new features to syslog-ng Open Source Edition. I thought it'd be useful to describe the new features with some more details, so this time I'd write about message parsing.

First of all, the message structure was a bit generalized in syslog-ng. Earlier it was encapsulating a syslog message and had little space to anything beyond that. That is, every log message that syslog-ng handled had date, host, program and message fields, but syslog-ng didn't care about message contents.

This has changed, a LogMessage became a set of name-value pairs, with some "built-in" pairs that correspond to the parts of a syslog message.

The aim with this change is: new name-value pairs can be associated with messages through the use of a parsing. It is now possible to parse non-syslog logs and use the columns the same way you could do it with syslog fields. Use them in the name of files, SQL tables or columns in an SQL table.

Here is an example:


parser p_parse_apache_logs { ... };

destination d_peruser {
file("/var/log/apache/${APACHE.USER_NAME}.log");
};

log {
source(s_local);
parser(p_parse_apache_logs);
destination(d_peruser_files);
};


This means that you can "extract" information from the payload and use this information for naming destination files or SQL tables, basically anywhere where you can use a template.

There are currently two parsers implemented in syslog-ng:
  • a generic CSV (comma separated-values) parser, which can be parameterized to basically accept any kind of formally formatted input (so tab/space separated is also ok)
  • a database based parser, which uses a log pattern database to recognize messages belonging to specific applications and extract information on that.
Since the database based parser is quite complex so it deserves its own post, I'd skip that for now. The CSV parser has the following options:

  • template: defines the input to be used for parsing, can use macros
  • columns: list of strings, the names to be associated with the columns parsed
  • delimiters: the set of characters that delimit columns
  • quotes or quote_pairs: the quote characters to support, quote_pairs makes it possible to use different start and end quote (like enclosing fields in braces)
  • null: the null value which if found should substituted with an empty string
  • flags: see the documentation
The csv parser is capable of parsing real CSV data, e.g. it knows about quoting rules. So if you have an application that logs into files using space or comma separated data, you can almost be sure that you can process it with CSV parser.

Here is an example that parses Apache logs, so that each field in the message becomes a name-value pair:


parser p_apache {
csv-parser(columns("APACHE.CLIENT_IP",
"APACHE.IDENT_NAME",
"APACHE.USER_NAME",
"APACHE.TIMESTAMP",
"APACHE.REQUEST_URL",
"APACHE.REQUEST_STATUS",
"APACHE.CONTENT_LENGTH",
"APACHE.REFERER",
"APACHE.USER_AGENT",
"APACHE.PROCESS_TIME",
"APACHE.SERVER_NAME")
# flags:
# escape-none,escape-backslash,escape-double-char,
# strip-whitespace
flags(escape-double-char,strip-whitespace)
delimiters(" ")
quote-pairs('""[]')
);
};

parser p_apache_timestamp {
csv-parser(columns("APACHE.TIMESTAMP.DAY",
"APACHE.TIMESTAMP.MONTH",
"APACHE.TIMESTAMP.YEAR",
"APACHE.TIMESTAMP.HOUR",
"APACHE.TIMESTAMP.MIN",
"APACHE.TIMESTAMP.MIN",
"APACHE.TIMESTAMP.ZONE")
delimiters("/: ")
flags(escape-none)
template("${APACHE.TIMESTAMP}"));
};


The first parser splits the major fields, and the second splits the timestamp to manageable pieces. You can then bind this parser to a log path of your choosing:


log {
source(s_apache);
parser(p_apache); parser(p_apache_timestamp);
destination(d_apache);
};


As you can see the second parser uses a value created by the previous parser, using its template() option. Once this parsing is done, you can use any of the values created this way
in your d_apache destination, be it the name of the file, or a column in an SQL table.

Wednesday, October 08, 2008

6th Netfilter workshop

I've spent my last week in Paris, where this year's Netfilter Workshop was held. I wanted to take this opportunity to thank Eric of INL for the organization. It was a wonderful and useful event, and I enjoyed it a lot. It is always nice to meet these wonderful guys.

Here are some blog posts about the same event:

Finally we could get Transparent Proxying merged, now queued for 2.6.28.

Wednesday, October 01, 2008

syslog-ng OSE 3.0 git tree published

I could finally get my syslog-ng 3.0 OSE tree published at git.balabit.hu. No nightly snapshots yet and I still have to prepare a formal announcement to post on the mailing list, but for those I teased with functions from the 3.0 branch, here it comes.

From the top of my head, OSE 3.0 supports:
  • TLS encrypted channels,
  • syslog message rewrite,
  • parse parts of the syslog message and use the parsed parts in macros
  • PCRE and glob filters (in addition to POSIX regexps),
  • support for the new IETF syslog protocols,
  • program sources,
  • new statistics framework that can be queried using UNIX domain sockets
  • etc.
I just wanted to get the word out. Success/failure reports would be appreciated.

Monday, June 30, 2008

Migrate over to PCRE?

As of now the development of the generic rewrite feature has been completed in one of my private git repositories. The new code uses PCRE and I'm somewhat undecided how to move forward with PCRE.

For those who might not know PCRE is an implementation of regular expressions and is an acronym for "Perl Compatible Regular Expressions". PCRE adds a lot more features and seems to perform better than its POSIX equivalent.

So the situation is as follows:
  • various filters use POSIX regexps
  • rewrite uses PCRE
This is not a very consistent combination, thus I'm planning to add PCRE support for filters too. The only question is whether it is needed to have two independent regexp styles in syslog-ng in the long run.

If I decide that one of them is enough, then I'd deprecate POSIX style regexps in filters and wouldn't implement POSIX in rewrite rules. This combination would yield a syslog-ng that would give warnings when POSIX-style regular expressions are in use and in a forthcoming release I'd change the default regexp style to PCRE, and yet another syslog-ng release later, I'd phase out POSIX completely.

If the decision is to keep them both in the long run, it would mean that I'd need to implement POSIX style regexps for rewrite rules as well. This would probably the least intrusive for users, but also a lot more work. Also, this would allow adding other filtering options like globbing or prefix search.

What do you think? Is the addition of modular search algorithms worth it?

Please send your opinions to the mailing list: syslog-ng@lists.balabit.hu

Sunday, June 08, 2008

Nordic Nagios Meet '08

I've spent the good part of last week in Stockholm at a gathering of Nagios users and developers. I was invited to give a talk on syslog-ng and security issues about collecting syslog data centrally as syslog-ng is often used hand-in-hand with Nagios.

This was my first time in Sweden and I can say that Swedish people are the most hospitable nation I met so far, and Stockholm is a very nice city. Also op5, the company organizing the event did their best to make us - the speakers - very welcome.

Thanks op5.

If you have a chance visit Stockholm I'd recommend to do so.

Monday, April 07, 2008

First incarnation of LogStore

I've disappeared from this blog in the recent month but I've not been idle: I've implemented initial support for LogStore in the Premium Edition of syslog-ng.

LogStore is a binary log file format, in semantics very similar to a plain log file. But the format allows much more:
  • on-line compression via gzip,
  • encryption via AES and X.509 certificates,
  • integrity protection via hmac-sha1.
And furthermore: it is indexed based on time, and it is quite efficient to look for a specific time range in GBs of log data. I'm quite satisfied, although there are some more work left to be done, for instance the query interface for the time based indexing is not completed.

In use it is quite simple: replace the "file" destination with "logstore" and you are done. More or less the same amount of options are supported: macro based file names, template based formatting, etc.

I'm still pondering with the idea of storing the complete internal representation of the logrecord in serialized form, so it'd be possible to perform template() based formatting in off-line mode.

This code will be released as an experimental part of syslog-ng PE 2.1 and will be finalized in syslog-ng PE 2.2.

Thursday, February 28, 2008

libdbi patches online

I've published our set of dbi and dbi-drivers patches in a git repository to push changes upstream. The patches were updated against the latest libdbi versions.

You can find these repositories at the BalaBit's git server, more precisely:
  • git://git.balabit.hu/bazsi/libdbi.git
  • git://git.balabit.hu/bazsi/libdbi-drivers.git
The "master" branch contains the direct import of the libdbi CVS tree, our fixes are in the 'upstream-fixes' branch. This setup will make it easier for me to publish patches and regularly rebase the not-yet-merged set against the latest upstream.

Among other small things, you can find a quite important patch against the Oracle driver. Without this patch Oracle 10.2 (the server!) segfaults and dumps core. So beware.

Monday, February 18, 2008

SFTP proxy

I installed Google analytics on this blog, and as it seems a number of people come here looking for "SFTP proxy", because of an old article I posted last July. Those interested primarily in my syslog-ng related articles may skip this post as this contains completely unrelated information, others please read on. :)

For those who don't know: SFTP is a file-system sharing protocol running on top of SSH. It is not yet an IETF standard, however more and more enterprises replaces the aging FTP protocol for SFTP. The reasons are numerous:
  • FTP uses plain text passwords,
  • FTP uses multiple TCP connections for file transfer,
  • FTP has inherent problems like bounce attacks,
  • FTP does not encrypt traffic,
  • FTP only supports filesystem metadata (last modification time, etc.) via extensions
  • and others.
All-in-all SFTP is newer, shinier and designed better. There's one problem though: SFTP uses SSH and SSH is encrypted. But wait, I said this is a drawback for FTP. Right, using encryption is good and bad at the same time. Good, because it prevents eavesdropping, bad because it cannot be controlled by security devices at the network perimeter.

Sometimes is it quite useful to see what's going on in a traffic crossing the network borders: you can restrict the usage of SFTP to a set of trustworthy clients, not for everyone. And even them can be controlled by enabling a full transaction log.

If your enterprise allows FTP traffic, there are tools to log FTP transfers, and in extreme cases to log actual data. For SFTP this is not so simple, once you permit outgoing port 22 (used for SSH), complete file system sharing can cross your firewall without you noticing. Scary, eh?

There are currently two solutions for this problem:
  1. Disable SSH and use FTP instead. This has the drawback that passwords travel in unencrypted form, and the traffic itself is easily sniffable.
  2. Use something like our Shell Control Box product, it is based on Zorp, with a complete SSH man-in-the-middle implementation, controls various SSH channels, limits what can get through, can log transaction data, and furthermore: at the end of the day the transmitted data is still encrypted on untrusted networks.
SCB is not using any of the OpenSSH code, it is a complete reimplementation of the SSH protocol stack, and because of Zorp all of it can run transparently (even in bridge mode) working in concert with your other firewalls/security devices.

So if you need to install proper SFTP controls, be sure to check it out.

Wednesday, February 13, 2008

syslog-ng feature sheet

We were asked to publish some more detailed "syslog-ng feature sheet". Albeit it might go into syslog-ng specific details we tried to be as generic as possible. And certainly everyone doing such feature sheets is biased, just as we were :)

It is available at http://www.balabit.com/network-security/syslog-ng/features/detailed/.

Friday, February 08, 2008

Redesigning syslog-ng internals

As promised earlier on the mailing list, I am designing the new message rewrite capabilities in syslog-ng.

As you probably know, syslog-ng currently supports message templates for each destination, and this template can be used to rewrite the message payload. Each template may contain literal text and macro references. Macros can either expand to parts of the original message or parts that were matched using a regexp.

Here's an example:

destination d_file { file("/var/log/messages" template("<$PRI> $HOST $MSG -- literal text $1\n")); };

The example above uses the format string specified at template to define the log file structure. The words starting with '$' are macros and expand to well defined parts of the original message. Numbered macros like $1 above are substituted to the last regular expression matches, all other characters are put into the result intact.

While this functionality is indeed useful, it is somewhat limited: you cannot use sed-like search-and-replace functions that some of the users requested.

My problem with rewriting message contents was somewhat fundamental: my original intention was to keep message pipelines independent from each other. If the message would be changed while traversing one pipe, this change would be propagated to the pipelines processed later.

This behaviour is sometimes desirable, sometimes directly unwanted. In the case of anonymization the changes would have to be global, e.g. all log paths would receive the anonimized messages, but if you want to store an unanomized version of the logs for troubleshooting, you want the original message, not a stripped version.

The solution I came up with is to generalize the log pipeline concept. Currently a pipe connects one or more sources with one or more destinations with some filtering added. In the new concept everything becomes a pipe element:
  • a filter is a pipe that either drops or forwards messages
  • a destination is a pipe that sends the message to a specific destination and the forwards the message to the next node
The current log statement becomes a pipeline:

source -> filter1 -> filter2 -> ... -> filterN -> destination1 -> destination2 -> ... -> destinationN

Each pipeline may fork into several pipes, e.g. it is possible to do the following:


destination1 -> destination2 -> ... -> destinationN
/
source -> filter1 -> filter2 -> ... -> filterN -
\
destination1' -> destination2' -> ... -> destinationN'



This is still nothing new, but consider this:


destination1 -> destination2 -> ... -> destinationN
/
source -> filter1 -> ... -> ... -> rewrite -
\
destination1' -> destination2' -> ... -> destinationN'


This means that rewrite happens before forking to the two set of destinations, they both receive the rewritten message. However if the user had another global pipeline in her configuration, it would start with the original, unchanged message.

In syslog-ng configuration file speak, this would be something like this:


log { source(s_all); rewrite(r_anonimize);
log { filter(f_anonimized_files); destination(d_files); flags(final); };
log { filter(f_anonimized_rest); destination(d_rest_log); };
};

log { source(s_all); destination(d_troubleshoot_logs); };


E.g. you can have log statements embedded into another log statement, log statements at the same level receive the same log message, and have retain the power of filters and log pipe construction at each level.

Not to mention that message pipelines are a natural place for paralellization, e.g. each log statement could be processed by a separate thread, which becomes necessary if the message transformations become CPU intensive.

Whew, this was a long post, expect another post about the message parsing capability I basically finished already.

Monday, January 28, 2008

syslog-ng OSE 2.1 released

I have just uploaded the first release in the syslog-ng Open Source Edition 2.1 branch to our website. It is currently only available in source format at this location:

http://www.balabit.com/downloads/files/syslog-ng/sources/2.1/src

This release synchronizes the core of syslog-ng to the latest PE version and adds the SQL destination driver.

This is an alpha release and thus might be rough around the edges, but it basically only contains code already tested in the context of the Premium Edition. The SQL functionality requires a patched libdbi package, which is available at the same link. We're going to work on integrating all our libdbi related patches to the upstream package.

If you want to know how the SQL logging works, please see the Administrator's Guide or our latest white paper Collecting syslog messages into an SQL database with syslog-ng. The latter describes the Premium Edition, but it applies to the Open Source one equally well.

Friday, January 11, 2008

syslog-ng roadmap 2.1 & 2.2

We had a meeting on the syslog-ng roadmap today where we decided some important things, and I thought I'd use this channel to tell you about it.

The Open Source Edition will see a 2.1 release incorporating all core changes currently in the Premium Edition and additionally the SQL destination driver. We are going to start development on the 2.2 PE features, but some of those will also be incorporated in the open source version:
  • support for the latest work of IETF syslog protocols
  • unique sequence numbering for messages
  • support for parsing message contents
Previously syslog-ng followed the odd/even version numbering to denote development/stable releases. I'm going to abandon this numbering now: the next syslog-ng OSE release is going to have a 2.1 version number and will basically come out with tested code changes only.

The current feature set in PE were developed in a closed manner and I don't want to repeat this mistake. The features that were decided to be part of the Open Source version will be developed as openly as possible: the features listed above are going to be developed and published in the open source branch with version number 2.2.x. The "alpha" and "beta" releases are going to be numbered 2.2alpha1, 2.2beta1 etc; the final stable is going to be called 2.2.1.

The aim is to avoid the mess of PE and OSE having a different version number.