Thursday, February 28, 2008

libdbi patches online

I've published our set of dbi and dbi-drivers patches in a git repository to push changes upstream. The patches were updated against the latest libdbi versions.

You can find these repositories at the BalaBit's git server, more precisely:
  • git://
  • git://
The "master" branch contains the direct import of the libdbi CVS tree, our fixes are in the 'upstream-fixes' branch. This setup will make it easier for me to publish patches and regularly rebase the not-yet-merged set against the latest upstream.

Among other small things, you can find a quite important patch against the Oracle driver. Without this patch Oracle 10.2 (the server!) segfaults and dumps core. So beware.

Monday, February 18, 2008

SFTP proxy

I installed Google analytics on this blog, and as it seems a number of people come here looking for "SFTP proxy", because of an old article I posted last July. Those interested primarily in my syslog-ng related articles may skip this post as this contains completely unrelated information, others please read on. :)

For those who don't know: SFTP is a file-system sharing protocol running on top of SSH. It is not yet an IETF standard, however more and more enterprises replaces the aging FTP protocol for SFTP. The reasons are numerous:
  • FTP uses plain text passwords,
  • FTP uses multiple TCP connections for file transfer,
  • FTP has inherent problems like bounce attacks,
  • FTP does not encrypt traffic,
  • FTP only supports filesystem metadata (last modification time, etc.) via extensions
  • and others.
All-in-all SFTP is newer, shinier and designed better. There's one problem though: SFTP uses SSH and SSH is encrypted. But wait, I said this is a drawback for FTP. Right, using encryption is good and bad at the same time. Good, because it prevents eavesdropping, bad because it cannot be controlled by security devices at the network perimeter.

Sometimes is it quite useful to see what's going on in a traffic crossing the network borders: you can restrict the usage of SFTP to a set of trustworthy clients, not for everyone. And even them can be controlled by enabling a full transaction log.

If your enterprise allows FTP traffic, there are tools to log FTP transfers, and in extreme cases to log actual data. For SFTP this is not so simple, once you permit outgoing port 22 (used for SSH), complete file system sharing can cross your firewall without you noticing. Scary, eh?

There are currently two solutions for this problem:
  1. Disable SSH and use FTP instead. This has the drawback that passwords travel in unencrypted form, and the traffic itself is easily sniffable.
  2. Use something like our Shell Control Box product, it is based on Zorp, with a complete SSH man-in-the-middle implementation, controls various SSH channels, limits what can get through, can log transaction data, and furthermore: at the end of the day the transmitted data is still encrypted on untrusted networks.
SCB is not using any of the OpenSSH code, it is a complete reimplementation of the SSH protocol stack, and because of Zorp all of it can run transparently (even in bridge mode) working in concert with your other firewalls/security devices.

So if you need to install proper SFTP controls, be sure to check it out.

Wednesday, February 13, 2008

syslog-ng feature sheet

We were asked to publish some more detailed "syslog-ng feature sheet". Albeit it might go into syslog-ng specific details we tried to be as generic as possible. And certainly everyone doing such feature sheets is biased, just as we were :)

It is available at

Friday, February 08, 2008

Redesigning syslog-ng internals

As promised earlier on the mailing list, I am designing the new message rewrite capabilities in syslog-ng.

As you probably know, syslog-ng currently supports message templates for each destination, and this template can be used to rewrite the message payload. Each template may contain literal text and macro references. Macros can either expand to parts of the original message or parts that were matched using a regexp.

Here's an example:

destination d_file { file("/var/log/messages" template("<$PRI> $HOST $MSG -- literal text $1\n")); };

The example above uses the format string specified at template to define the log file structure. The words starting with '$' are macros and expand to well defined parts of the original message. Numbered macros like $1 above are substituted to the last regular expression matches, all other characters are put into the result intact.

While this functionality is indeed useful, it is somewhat limited: you cannot use sed-like search-and-replace functions that some of the users requested.

My problem with rewriting message contents was somewhat fundamental: my original intention was to keep message pipelines independent from each other. If the message would be changed while traversing one pipe, this change would be propagated to the pipelines processed later.

This behaviour is sometimes desirable, sometimes directly unwanted. In the case of anonymization the changes would have to be global, e.g. all log paths would receive the anonimized messages, but if you want to store an unanomized version of the logs for troubleshooting, you want the original message, not a stripped version.

The solution I came up with is to generalize the log pipeline concept. Currently a pipe connects one or more sources with one or more destinations with some filtering added. In the new concept everything becomes a pipe element:
  • a filter is a pipe that either drops or forwards messages
  • a destination is a pipe that sends the message to a specific destination and the forwards the message to the next node
The current log statement becomes a pipeline:

source -> filter1 -> filter2 -> ... -> filterN -> destination1 -> destination2 -> ... -> destinationN

Each pipeline may fork into several pipes, e.g. it is possible to do the following:

destination1 -> destination2 -> ... -> destinationN
source -> filter1 -> filter2 -> ... -> filterN -
destination1' -> destination2' -> ... -> destinationN'

This is still nothing new, but consider this:

destination1 -> destination2 -> ... -> destinationN
source -> filter1 -> ... -> ... -> rewrite -
destination1' -> destination2' -> ... -> destinationN'

This means that rewrite happens before forking to the two set of destinations, they both receive the rewritten message. However if the user had another global pipeline in her configuration, it would start with the original, unchanged message.

In syslog-ng configuration file speak, this would be something like this:

log { source(s_all); rewrite(r_anonimize);
log { filter(f_anonimized_files); destination(d_files); flags(final); };
log { filter(f_anonimized_rest); destination(d_rest_log); };

log { source(s_all); destination(d_troubleshoot_logs); };

E.g. you can have log statements embedded into another log statement, log statements at the same level receive the same log message, and have retain the power of filters and log pipe construction at each level.

Not to mention that message pipelines are a natural place for paralellization, e.g. each log statement could be processed by a separate thread, which becomes necessary if the message transformations become CPU intensive.

Whew, this was a long post, expect another post about the message parsing capability I basically finished already.