Thursday, August 12, 2010

blog moved

I'm moving this blog to a wordpress instance deployed on our company webserver. The URL should be unchanged but the RSS/Atom feeds have changed, so please update your bookmarks.

Thanks and sorry for the confusion.

Sunday, August 08, 2010

LWN: syslog-ng rotten to the (Open) Core?

This was first posted as a comment under an article on lwn.net, but I thought it was important enough to post it here for others not reading lwn. Please go ahead and read the original article which is about the "Open Core" business model and its problems from the Free Software community point of view.

A commenter thought that syslog-ng was an example, which only exists as a marketing tool for the company's commercial offering. Anyway, here's my post:

First of all, I want to make it clear that I'm biased on the syslog-ng case, but still wanted to express my opinion here. I'm biased as I'm the primary author of syslog-ng.

I think syslog-ng is a completely different case from the one described by Neary. The GPL version is not crippleware, it was never published for marketing purposes only and for the majority of syslog-ng's existence only the Open Source stuff existed. The Premium Edition is only about 3 years old and syslog-ng started in 1998.

We never removed features from the OSE version, the Premium Edition only included _additional_ features, and a lot of those are already available in the OSE.

Some examples:
* TLS support (became available in 3.0, almost 2 years ago)
* SQL destination (became available in 2.1, 2.5 years ago)
* performance improvements (3.0)
* etc.

In the other direction, we usually receive bugfixes and it is a pure technical reason that we used to require copyright assignment: I wanted to keep the two branches as close as possible (which if not done is the reason #1 why Open Core products become crippleware fast). _And_ since we heavily invested in automatic testing and our customers report bugs directly to us, we fix way more bugs in the OSE version than the community.

But anyway, I didn't think that the dual license model was so problematic at the time we made this decision 3 years ago. Our efforts have never been "Rotten to the Open Core". If you don't believe that, check out the git repository or read the mailing list archive and see it yourself.

And this whole mess is the past, OSE 3.2 has been relicensed, and it is true that we're going to publish non-free plugins, but anyone else is welcome to join and do the same.

Saturday, August 07, 2010

syslog-ng 3.2alpha2 released

I've just uploaded syslog-ng 3.2alpha2 to the release directory. The last alpha release didn't compile on all supported platforms and the automatic test-suite was disabled, because it only worked if syslog-ng got installed first.

These obstacles have been overcome and together with some fixes and a couple of new features, 3.2alpha2 is now available. I've also forward ported all bugfixes from syslog-ng 3.1.2.

For those who are starting to experiment with the 3.2 branch, here's the list of new features compared to 3.1. Those who tried 3.2alpha1, the list of changes compared to 3.2alpha1 is at the end of this post.

Since the documentation of syslog-ng is not yet up-to-date with the new features introduced, I've tried to also include URLs for the best known descriptions. The references may not be 100% accurate, but should give anyone interested an idea how to start experimenting.

Also, please note that although this is an alpha release, the bulk of the changes are in the configuration parser, so once your configuration was parsed properly and syslog-ng starts up, an almost unchanged code is processing it. This means that this release should be good enough to start playing with. And feedback about what kind of syslog-ng.conf parsing errors you encounter on real-life configuration files is more than welcome.

Code quality & functionality wise, this could be a beta release, I only expect "procedural" changes, like cleaning up the plugin names, which wouldn't be nice to do in a beta release (though not unheard of :)

New features in 3.2:
  • Plugins: the new architecture replaces the old monolithic one, all syslog-ng functionality is loaded from external plugins when needed. It is possible to write plugins to extend syslog-ng functionality in the following areas: sources, destinations, filter expression, parsers, rewrite ops, message format.
  • The framework for a "syslog-ng configuration library" (aka SCL) a collection of configuration snippets installed along syslog-ng, simplifying the authoring of syslog-ng configuration files.
  • pdbtool match is now able to read a file containing syslog messages and apply patterndb and a filter expression on the contents.
  • pdbtool test is now able to perform pattern testing automatically based on the supplied example log message.
  • Persistent state containing the current file position for file sources is now continously updated during runtime, instead of updating it only at exit, which makes it much more reliable in case syslog-ng doesn't terminate normally.
  • Better syntax error reporting in the configuration file.
  • Support for reusable configuration snippets, similar to macros with parameters, named "blocks".
  • Added a confgen plugin that includes the output of a program into the configuration file, making it possible to generate configuration file snippets dynamically.
  • Support for BSD-style process accounting logs via the pacct() source driver defined in by SCL and the underlying pacctformat plugin.
  • Support for explicit COMMITs in the SQL driver, this speeds up SQL INSERT rate significantly if flush_lines() is non-zero.
  • It is now possible to supply a filter to rewrite expressions and only apply the rewrite rule in case the filter matches.
  • It is now possible to use multiple parser expressions in a single parser object, similar to rewrite rules.
  • Added support for using the include statement from anywhere in the configuration file, instead of only at top-level. Also introduced syslog-ng "global values" that can be defined and the substituted anywhere in the configuration file.

  • Default configuration file supplied as part of SCL.

Incompatible changes:
  • syslog-ng traditionally expected an optional hostname field even when a syslog message is received on a local transport (e.g. /dev/log). However no UNIX version is known to include this field. This caused problems when the application creating the log message has a space in its program name field. This behaviour has been changed for the unix-stream/unix-dgram/pipe drivers if the config version is 3.2 and can be restored by using an explicit 'expect-hostname' flag for the specific source.

Changes since 3.2alpha1:
  • Now compiles on all platforms and the unit/functional tests also run. (tested: AIX, HP-UX, Solaris, FreeBSD, Linux, Tru64)
  • Fixed pdbtool match --debug-pattern output for ESTRING parsers.
  • Fixed a possible memory leak in the lexer, which would accumulate in case SIGHUPs.
  • Fixed Solaris STREAMS device support.
  • Forward ported all bugfixes from syslog-ng OSE 3.0 & 3.1
  • Disable process accounting module by default as it doesn't compile on non-Linux platforms.
  • Added "pdbtool match --file" option to read and parse an existing logfile.
  • Added "pdbtool test" to check the log samples in the patterndb file.
  • Added "dont-create-tables" flag for the SQL destination to inhibit automatic table creation.
  • Added "condition()" support for rewrite expressions, which makes it possible to skip rewrite rules that do not match a filter expression.
  • Added "--module-path" command line option to control where modules are loaded from from the command line.

Happy logging!

Friday, August 06, 2010

syslog-ng name-value pair naming

I was giving a lot of thought recently to the topic of naming name-value pairs in syslog-ng. Until now the only documented rule is stating somewhat vaguely that whenever you use a parser you should choose a name that has at least one dot in it, and this dot must not be the initial character. This means that names like MSG or .SDATA.meta.sequenceId are reserved for syslog-ng, and APACHE.CLIENT_IP is reserved for users.

However things became more complex with syslog-ng OSE 3.2. Let's see what sources generate name-value pairs:
  • traditional macros (e.g. $DATE); these are not name-value pairs per-se, but behave much like them, except that they are read-only
  • syslog message fields (e.g. $MSG) if the message is coming from a syslog source
  • filters whenever the 'store-matches' flag is set and the regexp contains groups
  • rewrite rules, whenever the rewrite rule specifies a thus far unknown name-value pair, e.g. set("something" value("name-value.pair"));
  • and of course parsers when you tell syslog-ng to parse an input as a CSV, or use db-parser together with the patterns produced by the patterndb project
The latest stuff generating name-value pairs is the support for process accounting logs, in this case even the syslog related fields are missing and only things like "pacct.ac_comm" (to contain the program name) are defined.

So I was thinking whether it should be "pacct.ac_comm" or ".pacct.ac_comm". With the quoted rule it should be simple: it is generated by syslog-ng itself, thus it should be in the syslog-ng namespace and should start with a dot. However in the era of syslog-ng plugins, what consists of syslog-ng at all?

First, I wanted to use "pacct.ac_comm" (e.g. without a dot), because I liked this name better. I was trying to explain myself why it would not violate the rule above. The explanation I had for myself was: I'm going to "register" names such as this in the patterndb SCHEMAS.txt file. With this - not yet published - explanation, I've committed a patch to convert the pacctformat plugin to use a dotless prefix.

Next, I was figuring that it is true that process accounting creates name-value pairs without going through patternization, but I've felt, that nothing ensures that these name-value pairs would be directly usable, when trying to analyse the logs. The patterndb concept uses tags and schemas to convert the incoming unstructured data into a consistent structure. However, pacct may not completely match what the user needs. And, in the future, when SNMP traps or SQL table polling are going to be supported, it is going to be even more true: these name-value pairs may need a conversion: from the SNMP/pacct structure to the patterndb schema described structure in order to handle these message sources consistently with regular syslog (and to make it easy to correllate these).

So at the end, I've committed another patch, this time going back to ".pacct" as a prefix and leaving the original naming rule intact. The "pacct" prefix is up to the users to use, they may want the same information in a "pacct" schema, but that may come from data not directly tied from process accounting (e.g. from syslog messages).

So this post is about doing nothing with regards to the naming policy, but I thought it'd be important to shed a light behind the scenes. Giving such decisions enough thought and coming up a with a long-term plan makes our lives much easier in the future.

This post may be a bit more involved than the others, but feel free to ask me to elaborate, if you are interested.

Monday, August 02, 2010

syslog-ng & distributions

syslog-ng 1.6.x and 2.0.x versions had lived quite long. A lot of distributions used these versions and never upgraded to the newer ones.

This has changed recently, Peter Czanik was busy to help maintainers get to the latest versions.

Already available in the latest release:
  • openSUSE
  • FreeBSD ports
  • Mandriva
  • Gentoo portage
  • OpenBSD ports

In development branches:
  • Debian
  • Ubuntu
  • Fedora
These all carry 3.1.1, which is quite recent (and a successful release too). There are some fixes accumulated in the git tree though, so I hope to get 3.1.2 out of the door soon.

Thursday, July 29, 2010

syslog-ng and process accounting

In one of my previous posts, I've mentioned that syslog-ng is not for syslog anymore, we aim to support other log formats too, preferably those that have some kind of structure.

In fact syslog-ng is trying to convert all incoming messages (be them unstructured syslog messages, process accounting messages or SNMP traps) into the same, common format:
This information coming in from different sources can be stored and processed with the same infrastructure. Correllation between SNMP traps and syslog messages or netflow records should be possible.

I probably don't need to mention, that we use patterndb to extract information from syslog messages. But structured information sources contain name-value pairs in the first place, so why not use them natively?

This is what the experimental process accounting feature of syslog-ng demonstrates. With this module, syslog-ng is able to read the process accounting file produced by the Linux kernel directly (this is currently Linux-only, but should be easy to port to other platforms) and produce a set of name-value pairs mimicing the structure of the accounting record.

This is how it works:
  • the Linux kernel writes an accounting record to /var/log/account/pacct file (distro dependant though) whenever a process terminates and writes process related information to this record (exit code, execution time, etc)
  • syslog-ng uses the file() source driver, and periodically polls this file for changes (once per second by default)
  • instead of processing this as a plain text file, the "pacctformat" plugin tells syslog-ng to fetch binary records
  • the pacctformat plugin then transforms account record members into syslog-ng name-value pairs
Each name-value pair produced by the pacct plugin has a prefix of "pacct", and the members are described in the header file or in acct(5) manual page.

In order to try this feature, you need to tell syslog-ng to compile the "pacctformat" plugin by passing the --enable-pacct command line option to configure.

Also, there's support for the pacct module in the SCL, so in order to fetch process accounting records, you only need a very small configuration file:

@version: 3.2
@include "scl.conf"

source s_pacct {
pacct();
};

log { source(s_pacct); destination(...); };

After that, you only need to enable Linux accounting by issuing an accton command.

That's all.

Monday, July 26, 2010

patterndb status update

I thought I'd post a quick update on the patterndb project status. Our first aim was to draft a basic policy which governs how patterns should be created. This is available in the patterndb git repository as a README.txt file.

Although not completely finished, I feel the current description is enough for some basic work to start, to gather more experience. Here is the current version:

http://git.balabit.hu/?p=bazsi/syslog-ng-patterndb.git;a=blob;f=README.txt;hb=HEAD

Also, after discussing the policy we've set a target to cover login/logout events from all parts of a generic Linux system. Currently sshd is quite nicely covered, su is coming along and I still have some submitted log samples that need marking up.

With the sshd/su patterns a quite nice percentage of my "auth.log" file is covered and using pdbtool "grep on steroids" feature, the marked up patterns are already quite useful.

Further log samples and a hand in helping me out to mark up the patterns would be appreciated.