Monitoring the event log can quickly become straining for both the computer as well as the administrator as the event log grows and grows. To make this simpler for both the administrator and the computer NSClient++ 0.4.0 introduced real-time event log monitoring. This means we no longer scan the event log instead we simply scan events as they come in. The benefit, in addition to lowering the resources required, is that we can also get notified instantly when an error occurs instead of every 5 minutes or however often we check the log. Another addition is a simple client o generate event log message to help administrators debug event log filters. This is a quick introduction to event log monitoring and real-time event log monitoring showing how to set up real-time event log monitoring both for active and passive use via NSCA and NRPE.
Event Log
So to start this off first we need to understand how the event log works and what we can do with it. The first thing to notice is that the event log is a bit “odd”. The core event log message is an EVENTLOGRECORD which is we simplify a bit has the following fields:![]()
- RecordNumber
Sequence number - TimeGenerated
The time at which this entry was submitted. - TimeWritten
The time at which this entry was received by the service to be written to the log. - EventID
See below - EventType
The type of message is one of the following:- Error
- Failure Audit
- Success Audit
- Information
- Warning
- Strings
A list of strings for rendering the message. - EventCategory
A token with which the developer can provide filtering and such. - UserSid
Security identifier of the “active user” when the event was logged.
EventID
Now event id seems to me like something which is not really used as intended or was badly designed. The general idea is that each message (read text) has a unique identifier provided by the implementing application. Thus far it is even pretty smart as it means that the event log does not contain any actual messages only an id and some metadata. The oddities if you ask me is that they decided to use the event-id for a lot of other things as well. The field contained in this number (and since it is a number many of these are just bits) are:
- Severity. The severity is defined as follows:
- 00 – Success
- 01 – Informational
- 10 – Warning
- 11 – Error
- Customer bit. This bit is defined as follows:
- 0 – System code
- 1 – Customer code
- Facility
Facility code (I think of this as a component within the application). - Code
Number identifying the message
The first thing to notice is that if you look at modern windows the event viewer will not show you any of this information unless you “click advanced” and then only as a number next to the name Qualifiers.
The second thing is that some of these fields are a bit odd. Severity sounds a lot like “type” or level as it is called now. Facility sounds a lot like source. None of these fields will help provide any information for the end user and is mainly intended for the developer to help them organize message files or so it seems to me.
Back to NSClient++
So what can we do in NSClient++?
Well, first off we have two commands:
- check_eventlog
- check_eventlog_cache
There is also one command line command available used to generate event log messages (mainly useful when debugging filters).
So why two commands?
Well, the first command is the “old normal” check command when every time you run the check the entire event log is scanned which means if you have a few hundred thousand items it will quickly use up a lot of resources.
To combat this in 0.4.0 I introduced something called “real-time” event log monitoring. The idea is that we listen for events and as they happen we scan them and match them against the filters. The up side of this is obviously a much reduced CPU footprint. as we instead of every 5 minutes have to scan hundreds of thousands of entry’s we only have to scan the hand full your server generates. The other benefit is that we can now get real-time errors if we use passive checking. So you no longer have to wait “for 5 minutes” instead once the error is triggered you get the red light in your monitoring tool almost instantly.
But lets start with understanding filters.
Filters
Filters was introduced in 0.3.8 of NSClient++ and consists of something similar to SQL statements (or WQL statements for that matter) where you can filter information you want. As of 0.4.0 we have the following keywords on which we can filter:
- Numbers:
- id
The event id - category
The event category - qualifier
Qualifier part of event id - facility
Facility part of event id - customer
Customer bit part of event id - rawid
The entire event id including any qualifier/facility/customer/* parts
This is obviously faster than separating them into multiple statements.
- id
- Strings
- source
Event source - file / log
Log file - message
The renderd message - strings
The string data (used to render the message)
This is much much faster and yields the same result as message. - computer
- source
- Dates
- written
- generated
- Keywords
- type (level)
- severity
So essentially we can filter on anything except sid which for some strange reason I missed
. Writing filters is pretty straight forward as well and if you know some basic SQL you should not have too much trouble. This article is not about writing event log queries so I will only give you an example here (let me know if you want me to write an article about writing filters):
generated gt -2d AND severity NOT IN ('success', 'informational')
Real time event log
Real time event log monitoring is almost simpler to set up then using server-side checks. The drawback though is that it requires up-front and client-side configuration which is obviously not for everyone.
[/modules] CheckEventLog=1 [/settings/eventlog/real-time] enabled=true filter=id = 1000 and category = 0
The above configuration will do two things, first enable CheckEventLog module and then enable real-time checking. We also configure a random filter I was using when testing something. So configuring this is pretty straight forward and simple. Apart from this there are a hand full of options you can tweak as well. The only problem here is what happens when you hit a message?
Well, since we have not configured a destination nothing will happen so thus far this is a bit useless. Unfortunately this blog post is not about configuring NSCA so the next section will only skim through it a bit. But first lets look at the other options we can tweak:
- enabled
Spawns a background thread which detects issues and reports them back instantly. - destination
The destination for intercepted messages - maximum age
How long before reporting ”ok” (if this is set to off no ok will be reported only errors) - filter
The initial age to scan when starting NSClient++ - syntax
The initial age to scan when starting NSClient++ - log
Coma separated list of logs to check - debug
Log missed records (useful to detect issues with filters) not useful in production as it is a bit of a resource hog. - enable active
This will store all matches so you can use real-time filters from active monitoring (use CheckEventlogCache). - ok message
This is the message sent periodically when no error is discovered. - alias
The alias to use for this event (in NSCA this is the service name).
NSCA
If we are planning to use this with NSCA (and we are) there are a few things to extra to configure. First we have a concept called service_name which we need to define. A service name is the way Nagios (or Icinga) associates the result with a check. So if you have multiple checks you need to be able to configure multiple service names for different filters. The way this is handled is using aliases. Aliases can be configured in two ways:
- Locally by settings the key for the filter query under […/filters].
- Globally in the […/real-time] section
All filters without alias will use this alias.
So if you in Nagios have two different service_checks (eventlog_1 and eventlog_2) you need to set this using the […/filters] section like so:
[/settings/eventlog/real-time/filters] eventlog_1=id = 1000 and category = 1 eventlog_1=id = 1000 and category = 0
The other thing we need to configure is the NSCA client itself which for simple scenarios is pretty straight forward to configure as well:
[/modules] ; ... NSCAClient = 1 [/settings/NSCA/client/targets/default] address=nsca://127.0.0.1:5667 encryption=aes256 password=YL04nBb14stIgCjZxcudGtMqz4E6NN3W
Finally we need to tell CheckEventlog to send messages to NSCA which is done by specifying the destination:
[/settings/eventlog/real-time] ; ... destination=NSCA
Causing problems
Now, how can we test this (apart from sitting around waiting for something to happen?
Well, NSClient++ provides a rather nifty (and dangerous) command which allow you to inject messages into the event log.
nscp eventlog
CheckEventLog Command line syntax:
Allowed options:
-h [ --help ] Show help screen
-s [ --source ] arg (=Application Error)
source to use
-t [ --type ] arg Event type
-l [ --level ] arg Event level (type)
-f [ --facility ] arg Facility/Qualifier
-q [ --qualifier ] arg Facility/Qualifier
--severity arg Event severity
-c [ --category ] arg Event category
--customer arg Customer bit 0,1
-a [ --arguments ] arg Message arguments (strings)
--eventlog-arguments arg Message arguments (strings)
--event-arguments arg Message arguments (strings)
-i [ --id ] arg Event ID
In our case since we filter on event id 1000 we can use the following command to insert an application error.
nscp eventlog --exec insert-eventlog --source "Application Error" --id 1000 --level error --category 0
Which will trigger the following message to be sent to NSCA: “Felet uppstod i programmet med namn: %1, version %2, tidsstämpel 0x%3…” all the %1, %2 represent insert points which is where your event log message strings would normally end up. To add this we can add a series of –eventlog-argument options to insert some strings into these markers like so:
nscp eventlog --exec insert-eventlog --source "Application Error" --id 1000 --level error --category 0 --eventlog-argument a --eventlog-argument b ...
Active monitoring and real time
So maybe you are thinking, darn I just configure everything to use active monitoring via NRPE, cant I get any real-time goodness? Well fortunately the answer is: Yes you can!
UPDATE The details in this section is deprecated as in 0.4.1 the check_eventlog_cache command was replaced by the more generic check_cache command in the SimpleCache module. It works the same though so the concepts are still valid. For details about the SimpleCache module please refer to the following blog posts:
If you recall we had two commands: check_eventlog and check_eventlog_cache and the latter does just that. This is done by configuring active monitoring to store all matches for you and then you can check the results cache using the check_eventlog_cache command.
To enable this we need to add one more option to the configuration file.
[/settings/pytest_eventlog/real-time] ;... enable active=true
And then we need to run the following command:
check_eventlog_cache warn=gt:0 crit=gt:0
Which will give you a critical message when you have more then 0 items in the cache. Be advised though that when you check the cache the cache is emptied meaning you will only get this warning ”the first time” if you only have a single problem. So be sure you configure your Nagios server accordingly. Currently this is a bit simple and not very powerful so while you can use it it might not be what you are looking for and if that is the case please let me know so I can improve it in the future.
Summary
So I think with the introduction of filters in 0.3.8 and real-time filters in 0.4.0 as well as the ability to inject errors into the event log makes CheckEventlog a both powerful and simple tool for checking for errors in your event log. That’s pretty much it for now, please let me know what you want me to write about next!
UPDATE: Added information about check_eventlog_cache being replaced by the more generic check_cache.

Pingback/Trackback
Descriptive and granular Event Log Monitoring with NSClient++ and Nagios « mbrownnyc
Matt Brown says:
Happy to see this feature move along.
Thanks!
andreipoltoratsky says:
Very useful article!
I would really appreciate seeing some filtering examples from ‘real life’ here. You are doing a good job!
Thanks!
jth says:
Hi Michael,
if my applicaton creates an event (Error) when an error occurs, and a new event (Information) when the error is gone, how do I check this with Nagios?
The first event (Error) should trigger an alarm, and the next event should recover this alarm.
I need a method to check if the latest event of a specific type is an Information or an Error, even if the latest events are several hours or days old.
Michael Medin says:
Interesting problem.
This can be achieved using the new new (0.4.1) real-time eventlog facilities. It requires a bit of configuring essentially creating ”two real-time filters” one for the error and one for the recovery. Then set those to both submit the result back to Nagios. So it would (currently) require NSCA (passive monitoring) there are some plans to create a ”generic cache module” which could help facilitate this for NRPE (active) checks.
I think it is time for me to write a quick blog post about how to set this scenario up (Ill try to make some time this weekend).
Michael Medin says:
To get ahead of myself a bit this is a short extract from doing something similar. the id defines the various states if id is 100 it is good, 110 a warning and 120 is an error. In reality you would most likely have more complex filters (also note that this is pseudo configuration and have not been tested).
[settings/eventlog/real-time/filters/shopping_cart]
command=cart_checkout
destination=NSCA
is template=true
parent=default
syntax=hello world: %message%
[/settings/eventlog/real-time/filters/cart_checkout_ok]
filter=id = 100 AND source = ‘Shopping Basket’
severity = OK
parent=shopping_cart
[/settings/eventlog/real-time/filters/cart_checkout_warn]
filter=id = 110 AND source = ‘Shopping Basket’
severity = WARNING
parent=shopping_cart
[/settings/eventlog/real-time/filters/cart_checkout_crit]
filter=id = 120 AND source = ‘Shopping Basket’
severity = CRITICAL
parent=shopping_cart
David says:
Hi Michael,
I have run into a dead end, do you know of anyway that I can track all user load (remote,ssh,local,etc) on a Windows 2008 Enterprise R2 Server in python or C? I have tried everything but cannot think of anything else. One big letdown was that we dont have an active directory domain controller. Any Help will be appreciated. Thanks.
Michael Medin says:
What do you mean by ”all user load”?
All loads produced by a given user or all loads produced by all users? I would think (haven’t tried) the former could be achieved by grouping load counters on user (the way task manager does it) and the latter by using the system load?
David says:
By user load I mean, the amount of Disk Queue Length (read and write requests), CPU load, and etc. There seems to be no way to monitor that on the 80 or more users that use our servers.
Michael Medin says:
Much of the statistics can be received on a per-process level which can then be aggregated to the owner of that process which you could (again) aggregate as a global value across many servers. It would most likely not be 100% accurate as it would be samples but could very well give you the information you need (if you are looking for trends).
For an idea of some of the metrics you can look at the following page: http://msdn.microsoft.com/en-us/library/windows/desktop/aa394277(v=vs.85).aspx. These values can also be checked using normal performance counters so WMI is not really required (just they are better documented there).
You would also have to create a script to harvest (sample) the data and do the aggregation.
David says:
But how can I track the user that are accessing the server accross network, when i find the owner of a process, it gives me system or the currently logged on user….???
Michael Medin says:
Aha, then I see your conundrum. I guess your best bet then is the various ”servers” your users are using and see if they provide their own statistics modules you can use.
But that would much depend on which servers your users are using and what statistics they provide. My guess is that it would not be an easy task though…
But exchange for instance has a lot of statistics you can user same for sql server and file shares and such… But it is difficult to give a generic answer and whatever solution wold probably be totally rewritten for each server and many would probably not be able to provide the information.
Pingback/Trackback
Self-resetting event log alerts | Michael Medin
Ben says:
I am currently passively sending EventLog events to Nagios via NSCA. Now that I can capture the message and send to nagios. How do I format the message that is sent. My filter looks like the following:
; A set of options to configure the real time checks
[/settings/eventlog/real-time]
enabled=true
;filter=id = 1000 and category = 0
; A set of filters to use in real-time mode
[/settings/eventlog/real-time/filters]
eventlog_0=id = 1000 and category = 1
eventlog_3=id = 1000 and category = 0 and level = ‘warning’
eventlog_1=level = ‘error’
Michael Medin says:
The short answer is to use the syntax=%type% %source%: %message% key.
In your case you are using the ”short hands” (ie alias=filter) way of configuring them so you cant ”add that” as-is.
But fortunetly there are ways to arrange that. The simplest would be to create a default (template) which your filters will automagically inherit and place it there.
[/settings/eventlog/real-time/filters/default]
syntax=FOO %type% %source%: %message% BAR
Another option is to use the longer way to configure them (but it will be more typing):
[/settings/eventlog/real-time/filters/eventlog_0]
filter=id = 1000 and category = 1
syntax=FOO %type% %source%: %message% BAR
…
// Michael Medin
Pingback/Trackback
Real-time log file monitoring (both event log and text files) | Michael Medin
Pingback/Trackback
Supervision temps réel de fichier de log avec NSClient++ | Communauté Francophone de la Supervision Libre