Skip to main content

syslog-ng message parsing

Earlier this month, I announced the new syslog-ng 3.0 git tree, adding a lot of new features to syslog-ng Open Source Edition. I thought it'd be useful to describe the new features with some more details, so this time I'd write about message parsing.

First of all, the message structure was a bit generalized in syslog-ng. Earlier it was encapsulating a syslog message and had little space to anything beyond that. That is, every log message that syslog-ng handled had date, host, program and message fields, but syslog-ng didn't care about message contents.

This has changed, a LogMessage became a set of name-value pairs, with some "built-in" pairs that correspond to the parts of a syslog message.

The aim with this change is: new name-value pairs can be associated with messages through the use of a parsing. It is now possible to parse non-syslog logs and use the columns the same way you could do it with syslog fields. Use them in the name of files, SQL tables or columns in an SQL table.

Here is an example:


parser p_parse_apache_logs { ... };

destination d_peruser {
file("/var/log/apache/${APACHE.USER_NAME}.log");
};

log {
source(s_local);
parser(p_parse_apache_logs);
destination(d_peruser_files);
};


This means that you can "extract" information from the payload and use this information for naming destination files or SQL tables, basically anywhere where you can use a template.

There are currently two parsers implemented in syslog-ng:
  • a generic CSV (comma separated-values) parser, which can be parameterized to basically accept any kind of formally formatted input (so tab/space separated is also ok)
  • a database based parser, which uses a log pattern database to recognize messages belonging to specific applications and extract information on that.
Since the database based parser is quite complex so it deserves its own post, I'd skip that for now. The CSV parser has the following options:

  • template: defines the input to be used for parsing, can use macros
  • columns: list of strings, the names to be associated with the columns parsed
  • delimiters: the set of characters that delimit columns
  • quotes or quote_pairs: the quote characters to support, quote_pairs makes it possible to use different start and end quote (like enclosing fields in braces)
  • null: the null value which if found should substituted with an empty string
  • flags: see the documentation
The csv parser is capable of parsing real CSV data, e.g. it knows about quoting rules. So if you have an application that logs into files using space or comma separated data, you can almost be sure that you can process it with CSV parser.

Here is an example that parses Apache logs, so that each field in the message becomes a name-value pair:


parser p_apache {
csv-parser(columns("APACHE.CLIENT_IP",
"APACHE.IDENT_NAME",
"APACHE.USER_NAME",
"APACHE.TIMESTAMP",
"APACHE.REQUEST_URL",
"APACHE.REQUEST_STATUS",
"APACHE.CONTENT_LENGTH",
"APACHE.REFERER",
"APACHE.USER_AGENT",
"APACHE.PROCESS_TIME",
"APACHE.SERVER_NAME")
# flags:
# escape-none,escape-backslash,escape-double-char,
# strip-whitespace
flags(escape-double-char,strip-whitespace)
delimiters(" ")
quote-pairs('""[]')
);
};

parser p_apache_timestamp {
csv-parser(columns("APACHE.TIMESTAMP.DAY",
"APACHE.TIMESTAMP.MONTH",
"APACHE.TIMESTAMP.YEAR",
"APACHE.TIMESTAMP.HOUR",
"APACHE.TIMESTAMP.MIN",
"APACHE.TIMESTAMP.MIN",
"APACHE.TIMESTAMP.ZONE")
delimiters("/: ")
flags(escape-none)
template("${APACHE.TIMESTAMP}"));
};


The first parser splits the major fields, and the second splits the timestamp to manageable pieces. You can then bind this parser to a log path of your choosing:


log {
source(s_apache);
parser(p_apache); parser(p_apache_timestamp);
destination(d_apache);
};


As you can see the second parser uses a value created by the previous parser, using its template() option. Once this parsing is done, you can use any of the values created this way
in your d_apache destination, be it the name of the file, or a column in an SQL table.

Comments

Anonymous said…
Hi
I am trying to use the parsed macros in destination driver by calling a shell script and template(csv_parser macros)

somehow the values are not getting htrough.The output file created by shell script turns up empty.

Popular posts from this blog

syslog-ng fun with performance

I like christmas for a number of reasons: in addition to the traditional "meet and have fun with your family", eat lots of delicious food and so on, I like it because this is the season of the year when I have some time to do whatever I feel like. This year I felt like doing some syslog-ng performance analysis. After reading Ulrich Deppert's series about stuff "What every programmer should know about memory" on LWN, I thought I'm more than prepared to improve syslog-ng performance. Before going any further, I'd recommend this reading to any programmer, it's a bit long but every second reading it is worth it. As you need to measure performance in order to improve it, I wrote a tool called "loggen". This program generates messages messages at a user-specifyable rate. Apart from the git repository you can get this tool from the latest syslog-ng snapshots. Loggen supports TCP, UDP and UNIX domain sockets, so really almost everything can be me

syslog-ng roadmap 2.1 & 2.2

We had a meeting on the syslog-ng roadmap today where we decided some important things, and I thought I'd use this channel to tell you about it. The Open Source Edition will see a 2.1 release incorporating all core changes currently in the Premium Edition and additionally the SQL destination driver. We are going to start development on the 2.2 PE features, but some of those will also be incorporated in the open source version: support for the latest work of IETF syslog protocols unique sequence numbering for messages support for parsing message contents Previously syslog-ng followed the odd/even version numbering to denote development/stable releases. I'm going to abandon this numbering now: the next syslog-ng OSE release is going to have a 2.1 version number and will basically come out with tested code changes only. The current feature set in PE were developed in a closed manner and I don't want to repeat this mistake. The features that were decided to be part of the Open

syslog-ng 3.0 and SNMP traps

Last time I've written about how syslog-ng is able to change message contents. I thought it'd be useful to give you a more practical example, instead of a generic description. It is quite common to convert SNMP traps to syslog messages. The easiest implementation is to run snmptrapd and have it create a log message based on the trap. There's a small issue though: snmptrapd uses the UNIX syslog() API, and as such it is not able to propagate the originating host of the SNMP trap to the hostname portion of the syslog message. This means that all traps are logged as messages coming from the host running snmptrapd, and the hostname information is part of the message payload. Of course it'd be much easier to process syslog messages, if this were not the case. A solution would be to patch snmptrapd to send complete syslog frames, but that would require changing snmptrapd source. The alternative is to use the new parse and rewrite features of syslog-ng 3.0. First, you need to f