Skip to main content

Redesigning syslog-ng internals

As promised earlier on the mailing list, I am designing the new message rewrite capabilities in syslog-ng.

As you probably know, syslog-ng currently supports message templates for each destination, and this template can be used to rewrite the message payload. Each template may contain literal text and macro references. Macros can either expand to parts of the original message or parts that were matched using a regexp.

Here's an example:

destination d_file { file("/var/log/messages" template("<$PRI> $HOST $MSG -- literal text $1\n")); };

The example above uses the format string specified at template to define the log file structure. The words starting with '$' are macros and expand to well defined parts of the original message. Numbered macros like $1 above are substituted to the last regular expression matches, all other characters are put into the result intact.

While this functionality is indeed useful, it is somewhat limited: you cannot use sed-like search-and-replace functions that some of the users requested.

My problem with rewriting message contents was somewhat fundamental: my original intention was to keep message pipelines independent from each other. If the message would be changed while traversing one pipe, this change would be propagated to the pipelines processed later.

This behaviour is sometimes desirable, sometimes directly unwanted. In the case of anonymization the changes would have to be global, e.g. all log paths would receive the anonimized messages, but if you want to store an unanomized version of the logs for troubleshooting, you want the original message, not a stripped version.

The solution I came up with is to generalize the log pipeline concept. Currently a pipe connects one or more sources with one or more destinations with some filtering added. In the new concept everything becomes a pipe element:
  • a filter is a pipe that either drops or forwards messages
  • a destination is a pipe that sends the message to a specific destination and the forwards the message to the next node
The current log statement becomes a pipeline:

source -> filter1 -> filter2 -> ... -> filterN -> destination1 -> destination2 -> ... -> destinationN

Each pipeline may fork into several pipes, e.g. it is possible to do the following:


destination1 -> destination2 -> ... -> destinationN
/
source -> filter1 -> filter2 -> ... -> filterN -
\
destination1' -> destination2' -> ... -> destinationN'



This is still nothing new, but consider this:


destination1 -> destination2 -> ... -> destinationN
/
source -> filter1 -> ... -> ... -> rewrite -
\
destination1' -> destination2' -> ... -> destinationN'


This means that rewrite happens before forking to the two set of destinations, they both receive the rewritten message. However if the user had another global pipeline in her configuration, it would start with the original, unchanged message.

In syslog-ng configuration file speak, this would be something like this:


log { source(s_all); rewrite(r_anonimize);
log { filter(f_anonimized_files); destination(d_files); flags(final); };
log { filter(f_anonimized_rest); destination(d_rest_log); };
};

log { source(s_all); destination(d_troubleshoot_logs); };


E.g. you can have log statements embedded into another log statement, log statements at the same level receive the same log message, and have retain the power of filters and log pipe construction at each level.

Not to mention that message pipelines are a natural place for paralellization, e.g. each log statement could be processed by a separate thread, which becomes necessary if the message transformations become CPU intensive.

Whew, this was a long post, expect another post about the message parsing capability I basically finished already.

Comments

Anonymous said…
I try this:

program("/root/test.sh $MSG");

Into test.sh, i get the message with $1 statement. But it doesn't work.

It's possible to set/give macros value in a external program ?
Bazsi said…
program() in syslog-ng launches your program _once_ during startup, and then feeds the log messages to its standard input.

So you cannot pass macros as arguments.
Anonymous said…
it's not possible to pass macros as arguments, but they are a way to get them into my program:

template t_essai { template("$HOSTµ$FACILITYµ$PRIORITYµ$LEVELµ$TAGµ$YEAR-$MONTH-$DAY $HOUR:$MIN:$SECµ$PROGRAMµ$MSG'\n"); };

program("/root/getIdPostfix.sh " template(t_essai));

In my getIdPostfix.sh, i use the command 'read' to get the macros from STDIN. It's work !

But i can't receive all the logs by this method. For real 14 logs, i lost 8.
Bazsi said…
It's probably because your shell script is not fast enough.

You could increase the buffer size (log_fifo_size) if it is only peaks that your script cannot handle, or you could enable flow-control but that could reduce the performance of your applications, or you could write in something different than shell.

Popular posts from this blog

syslog-ng fun with performance

I like christmas for a number of reasons: in addition to the traditional "meet and have fun with your family", eat lots of delicious food and so on, I like it because this is the season of the year when I have some time to do whatever I feel like. This year I felt like doing some syslog-ng performance analysis. After reading Ulrich Deppert's series about stuff "What every programmer should know about memory" on LWN, I thought I'm more than prepared to improve syslog-ng performance. Before going any further, I'd recommend this reading to any programmer, it's a bit long but every second reading it is worth it. As you need to measure performance in order to improve it, I wrote a tool called "loggen". This program generates messages messages at a user-specifyable rate. Apart from the git repository you can get this tool from the latest syslog-ng snapshots. Loggen supports TCP, UDP and UNIX domain sockets, so really almost everything can be me

syslog-ng roadmap 2.1 & 2.2

We had a meeting on the syslog-ng roadmap today where we decided some important things, and I thought I'd use this channel to tell you about it. The Open Source Edition will see a 2.1 release incorporating all core changes currently in the Premium Edition and additionally the SQL destination driver. We are going to start development on the 2.2 PE features, but some of those will also be incorporated in the open source version: support for the latest work of IETF syslog protocols unique sequence numbering for messages support for parsing message contents Previously syslog-ng followed the odd/even version numbering to denote development/stable releases. I'm going to abandon this numbering now: the next syslog-ng OSE release is going to have a 2.1 version number and will basically come out with tested code changes only. The current feature set in PE were developed in a closed manner and I don't want to repeat this mistake. The features that were decided to be part of the Open

syslog-ng 3.0 and SNMP traps

Last time I've written about how syslog-ng is able to change message contents. I thought it'd be useful to give you a more practical example, instead of a generic description. It is quite common to convert SNMP traps to syslog messages. The easiest implementation is to run snmptrapd and have it create a log message based on the trap. There's a small issue though: snmptrapd uses the UNIX syslog() API, and as such it is not able to propagate the originating host of the SNMP trap to the hostname portion of the syslog message. This means that all traps are logged as messages coming from the host running snmptrapd, and the hostname information is part of the message payload. Of course it'd be much easier to process syslog messages, if this were not the case. A solution would be to patch snmptrapd to send complete syslog frames, but that would require changing snmptrapd source. The alternative is to use the new parse and rewrite features of syslog-ng 3.0. First, you need to f