Thursday, August 12, 2010

blog moved

I'm moving this blog to a wordpress instance deployed on our company webserver. The URL should be unchanged but the RSS/Atom feeds have changed, so please update your bookmarks.

Thanks and sorry for the confusion.

Sunday, August 08, 2010

LWN: syslog-ng rotten to the (Open) Core?

This was first posted as a comment under an article on lwn.net, but I thought it was important enough to post it here for others not reading lwn. Please go ahead and read the original article which is about the "Open Core" business model and its problems from the Free Software community point of view.

A commenter thought that syslog-ng was an example, which only exists as a marketing tool for the company's commercial offering. Anyway, here's my post:

First of all, I want to make it clear that I'm biased on the syslog-ng case, but still wanted to express my opinion here. I'm biased as I'm the primary author of syslog-ng.

I think syslog-ng is a completely different case from the one described by Neary. The GPL version is not crippleware, it was never published for marketing purposes only and for the majority of syslog-ng's existence only the Open Source stuff existed. The Premium Edition is only about 3 years old and syslog-ng started in 1998.

We never removed features from the OSE version, the Premium Edition only included _additional_ features, and a lot of those are already available in the OSE.

Some examples:
* TLS support (became available in 3.0, almost 2 years ago)
* SQL destination (became available in 2.1, 2.5 years ago)
* performance improvements (3.0)
* etc.

In the other direction, we usually receive bugfixes and it is a pure technical reason that we used to require copyright assignment: I wanted to keep the two branches as close as possible (which if not done is the reason #1 why Open Core products become crippleware fast). _And_ since we heavily invested in automatic testing and our customers report bugs directly to us, we fix way more bugs in the OSE version than the community.

But anyway, I didn't think that the dual license model was so problematic at the time we made this decision 3 years ago. Our efforts have never been "Rotten to the Open Core". If you don't believe that, check out the git repository or read the mailing list archive and see it yourself.

And this whole mess is the past, OSE 3.2 has been relicensed, and it is true that we're going to publish non-free plugins, but anyone else is welcome to join and do the same.

Saturday, August 07, 2010

syslog-ng 3.2alpha2 released

I've just uploaded syslog-ng 3.2alpha2 to the release directory. The last alpha release didn't compile on all supported platforms and the automatic test-suite was disabled, because it only worked if syslog-ng got installed first.

These obstacles have been overcome and together with some fixes and a couple of new features, 3.2alpha2 is now available. I've also forward ported all bugfixes from syslog-ng 3.1.2.

For those who are starting to experiment with the 3.2 branch, here's the list of new features compared to 3.1. Those who tried 3.2alpha1, the list of changes compared to 3.2alpha1 is at the end of this post.

Since the documentation of syslog-ng is not yet up-to-date with the new features introduced, I've tried to also include URLs for the best known descriptions. The references may not be 100% accurate, but should give anyone interested an idea how to start experimenting.

Also, please note that although this is an alpha release, the bulk of the changes are in the configuration parser, so once your configuration was parsed properly and syslog-ng starts up, an almost unchanged code is processing it. This means that this release should be good enough to start playing with. And feedback about what kind of syslog-ng.conf parsing errors you encounter on real-life configuration files is more than welcome.

Code quality & functionality wise, this could be a beta release, I only expect "procedural" changes, like cleaning up the plugin names, which wouldn't be nice to do in a beta release (though not unheard of :)

New features in 3.2:
  • Plugins: the new architecture replaces the old monolithic one, all syslog-ng functionality is loaded from external plugins when needed. It is possible to write plugins to extend syslog-ng functionality in the following areas: sources, destinations, filter expression, parsers, rewrite ops, message format.
  • The framework for a "syslog-ng configuration library" (aka SCL) a collection of configuration snippets installed along syslog-ng, simplifying the authoring of syslog-ng configuration files.
  • pdbtool match is now able to read a file containing syslog messages and apply patterndb and a filter expression on the contents.
  • pdbtool test is now able to perform pattern testing automatically based on the supplied example log message.
  • Persistent state containing the current file position for file sources is now continously updated during runtime, instead of updating it only at exit, which makes it much more reliable in case syslog-ng doesn't terminate normally.
  • Better syntax error reporting in the configuration file.
  • Support for reusable configuration snippets, similar to macros with parameters, named "blocks".
  • Added a confgen plugin that includes the output of a program into the configuration file, making it possible to generate configuration file snippets dynamically.
  • Support for BSD-style process accounting logs via the pacct() source driver defined in by SCL and the underlying pacctformat plugin.
  • Support for explicit COMMITs in the SQL driver, this speeds up SQL INSERT rate significantly if flush_lines() is non-zero.
  • It is now possible to supply a filter to rewrite expressions and only apply the rewrite rule in case the filter matches.
  • It is now possible to use multiple parser expressions in a single parser object, similar to rewrite rules.
  • Added support for using the include statement from anywhere in the configuration file, instead of only at top-level. Also introduced syslog-ng "global values" that can be defined and the substituted anywhere in the configuration file.

  • Default configuration file supplied as part of SCL.

Incompatible changes:
  • syslog-ng traditionally expected an optional hostname field even when a syslog message is received on a local transport (e.g. /dev/log). However no UNIX version is known to include this field. This caused problems when the application creating the log message has a space in its program name field. This behaviour has been changed for the unix-stream/unix-dgram/pipe drivers if the config version is 3.2 and can be restored by using an explicit 'expect-hostname' flag for the specific source.

Changes since 3.2alpha1:
  • Now compiles on all platforms and the unit/functional tests also run. (tested: AIX, HP-UX, Solaris, FreeBSD, Linux, Tru64)
  • Fixed pdbtool match --debug-pattern output for ESTRING parsers.
  • Fixed a possible memory leak in the lexer, which would accumulate in case SIGHUPs.
  • Fixed Solaris STREAMS device support.
  • Forward ported all bugfixes from syslog-ng OSE 3.0 & 3.1
  • Disable process accounting module by default as it doesn't compile on non-Linux platforms.
  • Added "pdbtool match --file" option to read and parse an existing logfile.
  • Added "pdbtool test" to check the log samples in the patterndb file.
  • Added "dont-create-tables" flag for the SQL destination to inhibit automatic table creation.
  • Added "condition()" support for rewrite expressions, which makes it possible to skip rewrite rules that do not match a filter expression.
  • Added "--module-path" command line option to control where modules are loaded from from the command line.

Happy logging!

Friday, August 06, 2010

syslog-ng name-value pair naming

I was giving a lot of thought recently to the topic of naming name-value pairs in syslog-ng. Until now the only documented rule is stating somewhat vaguely that whenever you use a parser you should choose a name that has at least one dot in it, and this dot must not be the initial character. This means that names like MSG or .SDATA.meta.sequenceId are reserved for syslog-ng, and APACHE.CLIENT_IP is reserved for users.

However things became more complex with syslog-ng OSE 3.2. Let's see what sources generate name-value pairs:
  • traditional macros (e.g. $DATE); these are not name-value pairs per-se, but behave much like them, except that they are read-only
  • syslog message fields (e.g. $MSG) if the message is coming from a syslog source
  • filters whenever the 'store-matches' flag is set and the regexp contains groups
  • rewrite rules, whenever the rewrite rule specifies a thus far unknown name-value pair, e.g. set("something" value("name-value.pair"));
  • and of course parsers when you tell syslog-ng to parse an input as a CSV, or use db-parser together with the patterns produced by the patterndb project
The latest stuff generating name-value pairs is the support for process accounting logs, in this case even the syslog related fields are missing and only things like "pacct.ac_comm" (to contain the program name) are defined.

So I was thinking whether it should be "pacct.ac_comm" or ".pacct.ac_comm". With the quoted rule it should be simple: it is generated by syslog-ng itself, thus it should be in the syslog-ng namespace and should start with a dot. However in the era of syslog-ng plugins, what consists of syslog-ng at all?

First, I wanted to use "pacct.ac_comm" (e.g. without a dot), because I liked this name better. I was trying to explain myself why it would not violate the rule above. The explanation I had for myself was: I'm going to "register" names such as this in the patterndb SCHEMAS.txt file. With this - not yet published - explanation, I've committed a patch to convert the pacctformat plugin to use a dotless prefix.

Next, I was figuring that it is true that process accounting creates name-value pairs without going through patternization, but I've felt, that nothing ensures that these name-value pairs would be directly usable, when trying to analyse the logs. The patterndb concept uses tags and schemas to convert the incoming unstructured data into a consistent structure. However, pacct may not completely match what the user needs. And, in the future, when SNMP traps or SQL table polling are going to be supported, it is going to be even more true: these name-value pairs may need a conversion: from the SNMP/pacct structure to the patterndb schema described structure in order to handle these message sources consistently with regular syslog (and to make it easy to correllate these).

So at the end, I've committed another patch, this time going back to ".pacct" as a prefix and leaving the original naming rule intact. The "pacct" prefix is up to the users to use, they may want the same information in a "pacct" schema, but that may come from data not directly tied from process accounting (e.g. from syslog messages).

So this post is about doing nothing with regards to the naming policy, but I thought it'd be important to shed a light behind the scenes. Giving such decisions enough thought and coming up a with a long-term plan makes our lives much easier in the future.

This post may be a bit more involved than the others, but feel free to ask me to elaborate, if you are interested.

Monday, August 02, 2010

syslog-ng & distributions

syslog-ng 1.6.x and 2.0.x versions had lived quite long. A lot of distributions used these versions and never upgraded to the newer ones.

This has changed recently, Peter Czanik was busy to help maintainers get to the latest versions.

Already available in the latest release:
  • openSUSE
  • FreeBSD ports
  • Mandriva
  • Gentoo portage
  • OpenBSD ports

In development branches:
  • Debian
  • Ubuntu
  • Fedora
These all carry 3.1.1, which is quite recent (and a successful release too). There are some fixes accumulated in the git tree though, so I hope to get 3.1.2 out of the door soon.

Thursday, July 29, 2010

syslog-ng and process accounting

In one of my previous posts, I've mentioned that syslog-ng is not for syslog anymore, we aim to support other log formats too, preferably those that have some kind of structure.

In fact syslog-ng is trying to convert all incoming messages (be them unstructured syslog messages, process accounting messages or SNMP traps) into the same, common format:
This information coming in from different sources can be stored and processed with the same infrastructure. Correllation between SNMP traps and syslog messages or netflow records should be possible.

I probably don't need to mention, that we use patterndb to extract information from syslog messages. But structured information sources contain name-value pairs in the first place, so why not use them natively?

This is what the experimental process accounting feature of syslog-ng demonstrates. With this module, syslog-ng is able to read the process accounting file produced by the Linux kernel directly (this is currently Linux-only, but should be easy to port to other platforms) and produce a set of name-value pairs mimicing the structure of the accounting record.

This is how it works:
  • the Linux kernel writes an accounting record to /var/log/account/pacct file (distro dependant though) whenever a process terminates and writes process related information to this record (exit code, execution time, etc)
  • syslog-ng uses the file() source driver, and periodically polls this file for changes (once per second by default)
  • instead of processing this as a plain text file, the "pacctformat" plugin tells syslog-ng to fetch binary records
  • the pacctformat plugin then transforms account record members into syslog-ng name-value pairs
Each name-value pair produced by the pacct plugin has a prefix of "pacct", and the members are described in the header file or in acct(5) manual page.

In order to try this feature, you need to tell syslog-ng to compile the "pacctformat" plugin by passing the --enable-pacct command line option to configure.

Also, there's support for the pacct module in the SCL, so in order to fetch process accounting records, you only need a very small configuration file:

@version: 3.2
@include "scl.conf"

source s_pacct {
pacct();
};

log { source(s_pacct); destination(...); };

After that, you only need to enable Linux accounting by issuing an accton command.

That's all.

Monday, July 26, 2010

patterndb status update

I thought I'd post a quick update on the patterndb project status. Our first aim was to draft a basic policy which governs how patterns should be created. This is available in the patterndb git repository as a README.txt file.

Although not completely finished, I feel the current description is enough for some basic work to start, to gather more experience. Here is the current version:

http://git.balabit.hu/?p=bazsi/syslog-ng-patterndb.git;a=blob;f=README.txt;hb=HEAD

Also, after discussing the policy we've set a target to cover login/logout events from all parts of a generic Linux system. Currently sshd is quite nicely covered, su is coming along and I still have some submitted log samples that need marking up.

With the sshd/su patterns a quite nice percentage of my "auth.log" file is covered and using pdbtool "grep on steroids" feature, the marked up patterns are already quite useful.

Further log samples and a hand in helping me out to mark up the patterns would be appreciated.

Tuesday, July 20, 2010

patterndb: grep on steroids

You may have heard of my last project to collect log samples from various applications, in order to convert log data from free-form human readable strings into structured information.

The first round to collect login/logout messages from sshd is now complete.

You could ask: ok, but what is the immediate benefit? You supposedly have a lot of unprocessed log files, and syslog-ng's db-parser() has not been used to process them, thus they are stored as good-old plain text files.

I spent a couple of hours to add a "grep"-like functionality to pdbtool which makes it easy to process already existing log files, giving you immediate benefit for each and every sample added to patterndb.

For example, if you are interested in login failure events, you could say:

zcat logfile.gz | pdbtool match -p access/sshd.pdb \
--file - \
--filter 'tags("usracct") and match('REJECT' type(string) value("secevt.verdict"));' \
--template '${usracct.type},${secevt.verdict},${usracct.username}\n'


What the command above does is the following:
  • reads a compressed logfile from logfile.gz
  • tells pdbtool to use access/sshd.pdb (in the patterndb git repo) as its pattern database file
  • tells pdbtool to read its stdin as a logfile, and
  • apply the db-parser() for each log message
  • apply the syslog-ng filter specified above
  • and print matching messages using the template also specified above
As a combination, it results in a CSV file, containing login failure records found in the logfile. Also please note that as long there's a pattern in the pdb file, it doesn't really matter how that originally looked like, the fact that ssh can use 3-5 different messages for the same meaning is hidden nicely under the hood.

And imagine we'd have patterns for all common applications running on our computers: this would mean that the same command above would produce login-failure reports independently from the application/OS combination being used.

Try that with grep. :)

This pdbtool is in the OSE 3.2 tree, clone the tree from: git://git.balabit.hu/bazsi/syslog-ng-3.2.git

syslog-ng OSE 3.2 caveats

Starting with syslog-ng OSE 3.2, syslog-ng became plugin based, which has some consequences that even experienced syslog-ng users may stumble into.

The most obvious one, is that syslog-ng now produces a series of .so files loaded at runtime, instead of being a monolithic executable. If a given .so is not not or not loaded, some of the functionality may be missing. This usually manifests itself by a syntax error when parsing the configuration file.

Second: if you compile syslog-ng from source, the unit/functional test programs also want to load plugins, and they try to do that from the install directory. This means that you first have to install syslog-ng using "make install" before running the testsuite. This is not an ideal solution, but should work for now.

Plugins are loaded from $prefix/lib/syslog-ng by default, however this can be changed using the `module-path` global, which contains the list of directories where syslog-ng should look for modules. You can change this using the syntax:

@define module-path `module-path`:/usr/local/lib/syslog-ng-plugins

I think that's it for now.

Wednesday, July 14, 2010

syslog-ng contributions redefined

syslog-ng has been around for about 12 years now, but I think the biggest change in the project's life is imminent: with the upcoming release of syslog-ng OSE 3.2, syslog-ng will become an independent entity.

Until now, syslog-ng was primarily maintained & developed by BalaBit, copyrights needed to be reassigned in order to grant BalaBit special privileges. BalaBit used her privileges to create a dual-licensed fork of syslog-ng, named "syslog-ng Premium Edition". The value we offer over the Open Source Edition of syslog-ng are things that larger enterprises require:
  • support on a large number of UNIX platforms (27 as of 3.1),
  • smaller and larger feature differences (like the encrypted/digitally signed logfile feature)
  • better test coverage and release management
  • longer term support
Although perfectly legal, this business model was not welcome in various Free Software communities, and has caused friction and harm, because BalaBit has enjoyed a privilege that no others could get. We plan to fix this situation and we've worked hard in the past months to make this possible.

We're letting syslog-ng go: no signed Contributory License Agreements will be required in order to contribute to syslog-ng in the future.

This is not just a matter of policy: while BalaBit wants to be a true citizen of the Free Software world, we also need to ensure the continued revenue stream that the Premium Edition provides. The adjusted business model allows us to deliver what our customers need, and at the same time make it possible for anyone else in the community to have the same privileges as BalaBit has.

The syslog-ng changes in 3.2 that I feel are most important are detailed below.

Plugins

syslog-ng was transformed from being a monolithic executable, to a core and a set of plugins.

Plugins range from source and destination drivers, filters, parsers, rewrite objects, anything can be extended with the use of a relatively simple plugin.

And this change didn't transform the configuration file format, it remained the same as before: readable and flexible. If you don't want to look under the hood, no functionality has changed, except plugins are loaded at runtime, and as a side effect, the syntax error reports are way better.

The core is licensed under the LGPL and plugins are licensed under the GPL. This legal framework allows BalaBit to deliver value to its customers in non-free plugins and syslog-ng related services.

syslog-ng Configuration Library

syslog-ng is an infrastructure element: very flexible, but sometimes intimidating. For example, having to specify the pad_size() option for an HP-UX /dev/log device explicitly makes syslog-ng flexible, but not very user-friendly.

It is common to share working configuration snippets on the mailing list or elsewhere in the syslog-ng community. We try to concentrate this effort with the creation of the syslog-ng Configuration Library (aka SCL). This library is a set of config snippets that can be included into a syslog-ng configuration file, using proper defaults but still allowing customization.

For example: we've created a source driver named "system()" which automatically expands to the local log devices as needed by the current Operating System syslog-ng runs on, but it is also possible to create application specific configuration snippets (for example apache source?) and ship it as a ready to use config block. See this post for more information.

This makes it possible to create a configuration file that will run on each of your platforms. In fact, we did that too: syslog-ng now comes with a default configuration file.

Support for non-syslog messages

Since syslog-ng is now plugin based and even the "syslog" message format as such is a plugin, it is now quite easy to add support for non-syslog message sources. As an example, we've added a plugin to parse Linux process accounting records, which makes it trivial to collect this data as well and possibly use it as a source of information when correllating data.

Future plugins like creating a MIB aware SNMP listener, or possibly processing netflow data, but I'd like to create a generic SQL source as well.

Support for these formats doesn't mean that syslog-ng would transform them into textual messages: proper name-value structure is kept and it is possible to put them into nicely structured SQL tables. Of course if all you want to transform them to text for readability, that's possible too with the use of a template().

patterndb project

Like I announced before, we're starting a parallel project to create a set of message patterns directly usable with syslog-ng's db-parser().

Current state

The changes are happening in the syslog-ng 3.2 repository at:

http://git.balabit.hu/

We have created a source-only 3.2alpha1 release of the current state. It runs our automated test harness on Linux, but of course it is not yet recommended in production environments. The release can be downloaded from www.balabit.com.

There were significant changes in how syslog-ng is compiled, thus it is expected that we have build issues on non-Linux systems. We expect to address these issues as they are found and create a 3.2beta1 release, for a longer beta testing.

Future

Parallel to fixing up the remaining issues on syslog-ng 3.2, we're going to open new ways to improve syslog-ng in the 3.3 branch. The most important item on our product backlog is to address the scalability problems and improve performance of syslog-ng.

Summary

With the upcoming changes in syslog-ng OSE 3.2 possible contributions will be greatly expanded:
  • write a pattern for your favourite application and process log data faster + easier
  • write an SCL configuration snippet, make your application easier to integrate with syslog-ng
  • write a plugin for your favourite NoSQL database,
  • write a plugin for a transformation that syslog-ng is not capable of doing right now (what about facility / severity rewrite rules? easy as a piece of cake)
  • write a plugin for things that you do with an external script
  • or contribute to the core of syslog-ng.
And all this without having to sign paperwork.

Friday, June 25, 2010

patterndb project

By now probably most of you know about patterndb, a powerful framework in syslog-ng that lets you extract structured information from log messages and perform classification at a high speed:

http://www.balabit.com/dl/html/syslog-ng-ose-v3.1-guide-admin-en.html/concepts_pattern_databases.html

Until now, syslog-ng offered the feature, but no release-quality patterns were produced by the syslog-ng developers. Some samples based on the logcheck database were created, but otherwise every syslog-ng user had to create her samples manually, possibly repeating work performed by others.

Since this calls out to be a community project, I'm hereby starting one.

Goals

Create release-quality pattern databases that can simply be deployed to an existing syslog-ng installation. The goal of the patterns is to extract structured information from the free-form syslog messages, e.g. create name-value pairs based on the syslog message.

Since the key factor when doing something like this is the naming of fields, we're going to create our generic naming guidelines that can be applied to any application in the industry.

It is not our goal to implement correllation or any other advanced form of analysis, although we feel that with the results of this project, event correllation and analysis can be performed much easier than without it.

Related projects

I know there are other efforts in the field, why not simply join them?

CEF - is the log message format for a proprietary log analysis engine, primarily meant to be used to hold IP security device logs (firewalls, IPSs, virus gateways etc). The patterndb project aims to create patterns for a wider range of device logs and be more generic in the approach. On the other hand we feel that it might be useful to create a solution for converting db-parser output to the CEF format.

CEE - Common Event Expression project by Mitre has a focus on creating a nv pair dictionary for all kinds of devices/log messages out there. Although I might be missing something, but I didn't find the concrete results so far, apart from a nicely looking white paper. If the CEE delivers something, then patterndb would probably adapt the naming/taxonomy structure. But I guess not all devices will start logging in the new shiny format, thus the existing devices would need
their logs converted, so the patterndb work wouldn't be wasted.

Infrastructure

Our original patterndb related plans were to create an easy to use web based interface for editing patterns, but since that project is progressing slowly, I'm calling for a minimalist approach: git based version control of simple plain text files. Of course once the nice web based interface is finished, we're going to be ready to use it.

First steps

I have created a git repository at:

git://git.balabit.hu/bazsi/syslog-ng-patterndb.git

This contains the initial version of the naming policy document and a simple schema for SIEM-style and a user login-logout naming schema.

If you are interested please read the file README.txt in the git archive, or if you prefer a web browser, use this link:

http://git.balabit.hu/?p=bazsi/syslog-ng-patterndb.git;a=blob;f=README.txt;h=9bbfeaead0c21dcf6171e12e311ae8612f572bfc;hb=6061e22221a72d35238b35f82b04afd436341b5c

Licensing

I do not have a decision yet, but for sure this is going to use one of the open source licenses or Creative Commons. Let me know if you have a preference in this area.

Getting involved

Join the syslog-ng mailing list, a start discussing! If you have existing patterns, great. If you don't, it is not late to join.

http://lists.balabit.hu/mailman/listinfo/syslog-ng

The posting address of the mailing list (to subscribers only) is:

syslog-ng@lists.balabit.hu

Monday, May 03, 2010

small incompatible change for 3.1

I've just commited a small incompatible change for syslog-ng 3.1, even though theoreticaly I shouldn't have.

The change is not big, simply the 'store-legacy-msghdr' flag became default for all sources, whereas earlier you had to specify that explicitly.

In order to understand why I did that, a short description of the flag follows below.

syslog-ng processes all incoming messages into fields (things like $PROGRAM and $DATE) and then reconstructs the message based on this parsed information when it has to write the message to a file.

Before syslog-ng 3.0 a message was split into the macros: "$DATE $HOST $MSG", which expanded to the actual log message. "$MSG" above was expanded to a line like:

"program[pid]: message"

With syslog-ng 3.0 and the integrated handling of RFC5424 and RFC3164 this format was changed and an $MSGHDR macro was created for the "program[pid]: " part and I got rid of this part from $MSG. (of course if you are running syslog-ng in compatibility mode, you get the old behaviour). The reason is simple: RFC5424 has separate fields for program/pid.

The contents of $MSGHDR is constructed programmatically, e.g. the punctuation characters '[' and ']' around the pid and the colon, is added to the format by syslog-ng, based on the available information in $PROGRAM and $PID.

However (and here comes the magic) there are programs that do not adhere to this format and omit the space after the colon character. E.g. if syslog-ng received:

"program:value"

as the syslog message, it added an explicit space character, and you'd get this in your log file:

"program: value"

NOTE the added space. This resulted in the workaround called "store-legacy-msghdr", which made syslog-ng remember the original formatting of the MSGHDR macro. However this proved to be a performance issue, thus it didn't become default, and I let my users discover this problem and add the flag explicitly if they cared about the extra space.

syslog-ng 3.1 however solves the performance issue (with the NVTable refactorization), and more and more people run into the very same issue, who are migrating from 2.1 or earlier.

Therefore I've decided to make 'store-legacy-msghdr' the default, and added a 'dont-store-legacy-msghdr' flag. My hope is that
  • people who cared: they already had the store-legacy-msghdr, for them, nothing is changed
  • people who didn't notice: they don't have the flag, but should be better of with the original formatting
  • people who changed their parsing scripts: well, those are who I address this message to as a HEADS up.
I hope this post makes things clearer.

Thursday, April 15, 2010

syslog-ng 3.2 changes

I've just pushed a round of updates to the syslog-ng 3.2 repository, featuring some interesting stuff, such as:
  • SQL reorganization: Patrick Hemmer sent in a patch to implement explicit transaction support instead of the previous auto-commit mode used by syslog-ng. I threw in some fixes and refactored the code somewhat.
  • Configuration parser changes: the syntax errors produced by syslog-ng became much more user-friendly: not only the column is displayed, but also the erroneous line is printed and the error location is also highlighted.
  • Additional plugin modules were created: afsql for the SQL destination, and afstreams for Solaris STREAMS devices. Creating a new plugin from core code takes about 15 minutes. I'm quite satisfied. With the addition of these two modules, it is now possible to use syslog-ng without any kind of runtime dependency except libc.
  • The already existing afsocket module (providing tcp/udp sources & destinations) is compiled twice: once with and once without SSL support, so it is now possible to choose which one to use at runtime.
And since a blog post is not complete without a "screenshot", here's how the new error message looks like:

Error parsing plugin unix-stream, syntax error in etc/syslog-ng-null.conf at line 3, column 34:

source s_log { unix-stream("log" default-facilitz(syslog)); };
                                 ^^^^^^^^^^^^^^^^

Neat, eh?

One more thing I need to think about is how to configure module loading. I guess it'd be less than user friendly if I'd rely on the user to load the modules that so far were the core functionality of syslog-ng.

E.g. right now if you want udp() support, you'd need something like this in the syslog-ng.conf header:

@module: afsocket

Having to remember all the module names is cumbersome and irritating. On the other hand I'd like to make it possible to run a bare-bones syslog-ng without socket support, so autoloading modules without any configurability is also out of the question.

In the current implementation syslog-ng automatically loads core modules, if the configuration version is below 3.2, and does nothing if it is 3.2 (e.g. you need an explicit @module directive for the current configuration format).

If you have an idea about how you think configuring modules should look like, just drop me an email/comment.

Monday, April 12, 2010

Explicit transaction support in SQL

The SQL destination in syslog-ng so far assumed that databases automatically start a new transaction for each INSERT statement that syslog-ng issues. This works fine, however there is a significant overhead of starting new transactions, with sqlite I've measured about 20 times performance increase on my development notebook and my debug build.

With explicit-commits:
bazsi@bzorp:~/.zwa/install/syslog-ng-ose-3.2$ loggen -x -r 1000000 -I 10 -S log
average rate = 9377.28 msg/sec, count=93776, time=10.003, msg size=256, bandwidth=2344.32 kB/sec

With per-statement (automatic) commits:
bazsi@bzorp:~/.zwa/install/syslog-ng-ose-3.2$ loggen -x -r 1000000 -I 10 -S log
average rate = 529.46 msg/sec, count=5299, time=10.083, msg size=256, bandwidth=132.36 kB/sec

So this really seem to matter.

In order to configure it you can use the following options in an SQL destination:
  • flush_lines/flush_timeout controls how much messages get into the same transaction, similar to what these parameters mean for standard log files
  • flags("explicit-commits") enables explicit transaction handling
Also an option named "session_statements" was added where you can list initial SQL commands, issued right after the connection is established.

This work has been started by Patrick Hemmer (thanks again Patrick). I had to do some work on it though, since in order to avoid races the timer code of the main loop couldn't be used.

You can find all this in the syslog-ng OSE 3.2 branch. I'd love to hear success/failure stories and performance numbers you can measure with your favourite database.

Monday, April 05, 2010

syslog-ng 3.2 opened, experimental "blocks" branch opened

After last the stable syslog-ng 3.1.0 release last week, I've opened the 3.2 branch to receive the new stuff. The first bits are already in the repository: the basic plugin framework and the conversion of the socket related stuff (tcp, udp, unix-dgram, unix-stream, syslog drivers) into a separate plugin.

The reason of the afsocket plugin conversion is to help moving the OpenSSL dependency to a separate package in distributions where this dependency cannot be associated with core packages like syslog-ng.

But I see a lot of potential behind the plugin framework, and I still have a lot of ideas, just check my last blog post in the topic.

Also, there's an even more experimental feature in the "blocks" branch right now, it is still incomplete, the naming and the syntax is still vague. The aim is there to provide syslog-ng.conf C++ template-like functionality in order to make the configuration easier by using pre-configured config snippets.

For example:

block source tomcat<root_dir="/opt/tomcat"> {
   file("<<root_dir>>/var/tomcat.log" follow_freq(1));
};

source s_local {
   tomcat(root_dir("/usr/local/tomcat"));
};

The idea is that if "tomcat" is referenced in a source statement, it'll be expanded to the content defined before that, all with proper argument passing and default values. This example actually works fine with the "blocks" branch right now.

Combining this feature with include files I foresee something like a "syslog-ng configuration library", which would contain pre-configured syslog-ng sources containing settings needed in order for a specific application to log properly, this way avoiding the hassle of integrating each and every application to syslog separately.

Another (not-yet-working) example would be to get rid off platform specific /dev/log declarations:

block source system<> {
@if (OS == "solaris")
@  if (OS_VERSION >= "10")
    sun-streams("/dev/log" door("/var/run/syslog_door"));
@  else
    sun-streams("/dev/log" door("/etc/.syslog_door"));
@  endif
@elif (OS == "linux")
    unix-dgram("/dev/log");
    file("/proc/kmsg" program_override("kernel"));
@elif (OS == ...)
    ...
@endif
};

source s_local { system(); };

This would make the initial configuration of syslog-ng much easier, and would allow you to use the same configuration file everywhere.

As I said this is still very experimental, I'm not yet sure about:
  • the name (I would have called these templates, but that is already taken in syslog-ng, just like "macro"). so if you have an idea of a better name I'd love hear about it
  • argument passing, whether to allow positional arguments
  • argument expansion syntax, "<<arg>>", and "$$arg" both occurred to me, the latter seems to be used more universally. I'd want to include both in strings (like in the example above) and in non-string arguments, e.g. follow-freq($$arg)
If you have comments, ideas for names, please post them as comments, or send an email to the syslog-ng mailing list.

Also, while I was there changing the parser/lexer framework for pluginization, I added more details to the error message to make it easier to locate the syntax error in the configuration file.

As for the next things, I'm going to start integrating the external contributions I've received so far, so stay tuned.

Monday, March 22, 2010

syslog-ng 3.1 final release

I'm proud to announce that both the Open Source and the Premium editions of syslog-ng 3.1 was published and are available on our website.

This is an important milestone in multiple ways:
  • the new feature/stable release schema is making its debut
  • the patterndb got significant improvements: new parsers, pdbtool, tagging support
  • the ability to change/add RFC5424 style structured data to messages
  • even more supported platforms (Tru64 on alpha, HP-UX 11iv2 on Itanium and older Linux versions)
  • the diverging developments of syslog-ng Open Source Edition, Premium Edition and syslog-ng Store Box was merged into a new base,
Some interesting (ok, for us developers :) statistics follow:

Premium Edition:
  • 586 commits
  • 200 files changed, 23479 insertions(+), 5513 deletions(-)
Open Source Edition:
  • 189 commits
  • 115 files changed, 9020 insertions(+), 3225 deletions(-)
The reason for the big difference is the merger of the currently propriatery log indexer engine used in SSB into the current Premium Edition tree, otherwise the two should be in sync.

The binaries/source packages can be downloaded via the usual URL:

http://www.balabit.com/downloads/upgrades/

Changelogs for the two releases:

Premium Edition:
http://www.balabit.com/downloads/files/syslog-ng/premium-edition/3.1.0/changelog-en.txt

Open Source Edition:
http://www.balabit.com/downloads/files/syslog-ng/open-source-edition/3.1.0/changelog-en.txt

And of course the OSE source is also available in our public git repository:

http://git.balabit.hu/

Happy logging!

Saturday, March 06, 2010

plugins branch updated

Since the last post, I could hack a couple of hours on the plugins branch, which now compiles. The plugin framework is capable for supporting a quite important core functionality: all socket like sources/destinations are now found in an external plugin called "afsocket".

The reason I've started with afsocket is to make syslog-ng a bit less dependant on OpenSSL. A couple of distributions didn't include syslog-ng 3.0 in their current releases, because it uses OpenSSL from /usr, while syslog-ng should remain in the root directory.

By separating afsocket from the syslog-ng core, I can compile afsocket with and without TLS support, which can be put into separate packages. Thus syslog-ng can operate without OpenSSL.

And the same plugin framework will enable me to create a wide variety of plugins. My ideas:
  • Plugins for all syslog-ng components (source, destination, filter, rewrite, parser)
  • Python scriptability (a simple correllation engine in Python?)
  • macro transformation functions, for example: $(stripslashes $macro), usable anywhere in templates and stripslashes a plugin that is invoked whenever such an expansion occurs
  • Hooks for transforming the log message as it enters syslog-ng (to fix parsing errors for example),
Do you have other ideas? Please post them as comments or as emails to the mailing list.

Again, this functionality is experimental, and I'm still going to rebase the current code and will probably be integrated to syslog-ng 3.2. I got to release 3.1 final first though. :)

plugins preview

Things have been a little rough last couple of months, that's why I haven't posted here. I'm in a rush right now as well, but I just wanted to let you know that I have started working on modularizing syslog-ng.

It is only a preliminary prototype, and as of now it doesn't compile, but the way it's going to work is already visible: each plugin will have its own plugin and with some trickery the large syslog-ng.conf parser will call out to the plugin parser. The user will recognize such a plugin as an integral part of syslog-ng.

E.g. this is a sample configuration file:

@version: 3.0
@module: dummy

...

destination d_dummy { dummy(dummy_opt(yes)); };

...

See the dummy plugin code in my git repository, in the "plugins" branch. Please note that that branch is going to be rebased a couple of times yet, I've released it in the spirit of "release early, release often".

I hope to get some of the recent contributions into plugins, instead of bloating the core syslog-ng code. For example output colorization. I'm also thinking about adding built-in scripting support via Python.