Skip to main content

patterndb project

By now probably most of you know about patterndb, a powerful framework in syslog-ng that lets you extract structured information from log messages and perform classification at a high speed:

http://www.balabit.com/dl/html/syslog-ng-ose-v3.1-guide-admin-en.html/concepts_pattern_databases.html

Until now, syslog-ng offered the feature, but no release-quality patterns were produced by the syslog-ng developers. Some samples based on the logcheck database were created, but otherwise every syslog-ng user had to create her samples manually, possibly repeating work performed by others.

Since this calls out to be a community project, I'm hereby starting one.

Goals

Create release-quality pattern databases that can simply be deployed to an existing syslog-ng installation. The goal of the patterns is to extract structured information from the free-form syslog messages, e.g. create name-value pairs based on the syslog message.

Since the key factor when doing something like this is the naming of fields, we're going to create our generic naming guidelines that can be applied to any application in the industry.

It is not our goal to implement correllation or any other advanced form of analysis, although we feel that with the results of this project, event correllation and analysis can be performed much easier than without it.

Related projects

I know there are other efforts in the field, why not simply join them?

CEF - is the log message format for a proprietary log analysis engine, primarily meant to be used to hold IP security device logs (firewalls, IPSs, virus gateways etc). The patterndb project aims to create patterns for a wider range of device logs and be more generic in the approach. On the other hand we feel that it might be useful to create a solution for converting db-parser output to the CEF format.

CEE - Common Event Expression project by Mitre has a focus on creating a nv pair dictionary for all kinds of devices/log messages out there. Although I might be missing something, but I didn't find the concrete results so far, apart from a nicely looking white paper. If the CEE delivers something, then patterndb would probably adapt the naming/taxonomy structure. But I guess not all devices will start logging in the new shiny format, thus the existing devices would need
their logs converted, so the patterndb work wouldn't be wasted.

Infrastructure

Our original patterndb related plans were to create an easy to use web based interface for editing patterns, but since that project is progressing slowly, I'm calling for a minimalist approach: git based version control of simple plain text files. Of course once the nice web based interface is finished, we're going to be ready to use it.

First steps

I have created a git repository at:

git://git.balabit.hu/bazsi/syslog-ng-patterndb.git

This contains the initial version of the naming policy document and a simple schema for SIEM-style and a user login-logout naming schema.

If you are interested please read the file README.txt in the git archive, or if you prefer a web browser, use this link:

http://git.balabit.hu/?p=bazsi/syslog-ng-patterndb.git;a=blob;f=README.txt;h=9bbfeaead0c21dcf6171e12e311ae8612f572bfc;hb=6061e22221a72d35238b35f82b04afd436341b5c

Licensing

I do not have a decision yet, but for sure this is going to use one of the open source licenses or Creative Commons. Let me know if you have a preference in this area.

Getting involved

Join the syslog-ng mailing list, a start discussing! If you have existing patterns, great. If you don't, it is not late to join.

http://lists.balabit.hu/mailman/listinfo/syslog-ng

The posting address of the mailing list (to subscribers only) is:

syslog-ng@lists.balabit.hu

Comments

Popular posts from this blog

syslog-ng fun with performance

I like christmas for a number of reasons: in addition to the traditional "meet and have fun with your family", eat lots of delicious food and so on, I like it because this is the season of the year when I have some time to do whatever I feel like. This year I felt like doing some syslog-ng performance analysis. After reading Ulrich Deppert's series about stuff "What every programmer should know about memory" on LWN, I thought I'm more than prepared to improve syslog-ng performance. Before going any further, I'd recommend this reading to any programmer, it's a bit long but every second reading it is worth it. As you need to measure performance in order to improve it, I wrote a tool called "loggen". This program generates messages messages at a user-specifyable rate. Apart from the git repository you can get this tool from the latest syslog-ng snapshots. Loggen supports TCP, UDP and UNIX domain sockets, so really almost everything can be me...

syslog-ng contributions redefined

syslog-ng has been around for about 12 years now, but I think the biggest change in the project's life is imminent: with the upcoming release of syslog-ng OSE 3.2, syslog-ng will become an independent entity. Until now, syslog-ng was primarily maintained & developed by BalaBit, copyrights needed to be reassigned in order to grant BalaBit special privileges. BalaBit used her privileges to create a dual-licensed fork of syslog-ng, named "syslog-ng Premium Edition". The value we offer over the Open Source Edition of syslog-ng are things that larger enterprises require: support on a large number of UNIX platforms (27 as of 3.1), smaller and larger feature differences (like the encrypted/digitally signed logfile feature) better test coverage and release management longer term support Although perfectly legal, this business model was not welcome in various Free Software communities, and has caused friction and harm, because BalaBit has enjoyed a privilege that no others cou...

An introduction to db-parser()

As promised on the mailing list here comes a short description of the new db-parser functionality of syslog-ng. For an introduction to parsers in general see my previous blog post here . The aim for db-parser is two-fold: extract interesting information from a log message attach tags to a log message for later classification. For instance here's a log sample (lines broken for readability): Feb 24 11:55:22 bzorp sshd[4376]: Accepted password for bazsi \ from 10.50.0.247 port 42156 ssh2 This message states that a user named "bazsi" has logged into the host named "bzorp" using SSH2 from the quoted IP and port. When you read this message as a human, the event that happened is perfectly clear. However if it is not a human, but a piece of software that has to make out the meaning of the message, you need to identify the event (e.g. that a user login has happened) and the additional information associated with the event (e.g. that he used 10.50.0.247 as the cl...