Monday, April 30, 2007

syslog-ng database support and other fixes

As the readers of this blog might know I've been working on persistent disk-buffering and SQL support recently. The configuration interface of the SQL destination became quite close to the description I gave in my last post.

I think from the user side it is pretty neat, no need to rely on the mysql client program or to create "buffer files" that are later fed to the database server. You simply define an SQL destination and tables are created automagically with proper (possibly even disk-based) queueing, flow control and error handling. And by using libdbi we immediately have support for 4 or 5 different database servers.

The implementation side was somewhat more intrusive however, the client libraries for various database servers use a blocking I/O model and syslog-ng was completely non-blocking until now. I had to create a separate thread for inserting records to tables, and there came some required changes to syslog-ng to support that: various reference counters and the internal acknowledge mechanism had to be made thread-aware.

This was not without benefit though, a change of this scale required a thorough review of the code involved and I have found and fixed several bugs that also affected the 2.0.x tree. I have committed some of these to my public git tree already, so it should be synchronized to our public webserver real soon now.

My next thing to do is to prepare 2.0.4 and probably a first public release of the 2.1.x open source tree. Stay tuned. :)

Tuesday, April 24, 2007

Persistent disk-buffering in syslog-ng

Two blog spots in a row on two consecutive days, wow :) Things are happening fast these times. Because of my company's efforts to create a commercial fork of syslog-ng, I have somewhat more time to do syslog-ng hacking. This time I've finished persistent disk-buffering, an often requested feature to be released in the commercial version.

It means that if a target server is down, then in addition to the in-memory buffers, syslog-ng is able to store messages in a disk-based queue until the connection is restored. What's more, this queue is persistent and syslog-ng keeps its contents accross restarts.

One less reason to keep logs locally. :)

Next on my list is native SQL support, combined with this disk-buffer feature to cover times when the database is too slow processing INSERTs. I'm thinking along the lines of:

destination d_sql {
sql(type(pgsql) host("loghost") user("syslog-ng") password("secret-password")
columns("date", "host", "program", "pid", "message")
values("$R_DATE", "$HOST", "$PROGRAM", "$PID", "$MSGONLY"));


Tables would be created/altered automatically on-demand, just like destination files. Values might refer to builtin macros, or matches within the message (like in sed, e.g. $1 refers to the first match).

The actual implementation might be somewhat different though.

Sunday, April 22, 2007

Switching version control systems

We have been using GNU arch the last couple of years as a version control system, however Tom Lords' implementation does not scale well, and some of our software packages have 10 thousands of commits. This means that a single commit operation may take _minutes_. It is awful to wait so much time for a single commit, and it really degrades productivity.

I was considering Mercurial, Bazaar-NG and git, however this was not an easy decision, as the "modern" version control systems promote the use of branches over anything else, and our current version control model relied on cherry-picking heavily:
  • developer commits the solution for each bug separately to his/her branch
  • QA people pick patches from developer branches and integrate them to a 'test' branch, once the test was successful,
  • release manager picks patches from the 'test' branch and integrates to mainline, if he doesn't find anything odd during review
This worked wonderfully in GNU arch, but new VC systems lack in this area. Bazaar has no cherry picking support at all, Mercurial has some incomplete support with a plugin named transplant, git has cherry picking, but that relies on heuristics (it guesses whether a patch was integrated by using a checksum of the patch).

I was considering to change the process I outlined above, but I'm not sure how that would work out. We sometimes need to work with people not really experienced with VC systems at all, asking them to manage their own branches for each bugfix/problem group seems to raise the bar a bit too high.

Nevertheless git seemed to have solutions for both worlds (e.g. picking patches AND merging branches), so I choose git over the other two, and now I converted some of the syslog-ng history to git in order to gather some real-life experience.

I like what I see so far, git 1.5.x is really way better than older versions on the usability and documentation front. I now feel comfortable enough with git as I could finally understand the working model and the structure of the git history.And git is fast like lightning :)