Real-time trace sessions with TraceEvent

Spread the love

ETW log files are great for post-mortem debugging and analysis. But did you know that you can actually process ETW event streams in (near) real-time without log files at all? It’s true!

Real-time event consumption has historically been accessible only to native developers through low-level Win32 APIs. However, thanks to Vance Morrison of the CLR team, there is now a readymade solution for reading ETW events in .NET projects.

I have created a sample solution “TraceSample” to demonstrate this functionality — check out the commit history on GitHub. The solution includes the source release for TraceEvent 1.2.7.1 from the .NET Base Class Libraries project on CodePlex. (I made very minor edits to the project file to modify output paths and fix some compiler warnings.) I should note that a binary release of this project was made available via a NuGet package back in August 2013. For now I’ve opted to stick with the BCL source release so I can maintain low-level access to the TraceEvent payload for this code sample. (Presumably when the documentation matures, I can properly wire up a compile-time generated trace parser which would hopefully eliminate this low-level code.)

Now for a bit more explanation about TraceSample and to provide a motivating example of how real-time tracing can be helpful in testing. Let’s say you are monitoring processes on a machine and want to know reasonably quickly whenever an application has abnormally terminated. Depending on the application and the cause for termination, there may be an event log entry signifying this type of event. Some programs may have their own custom reporting mechanism; others may simply hook into Windows Error Reporting. Some apps may just silently exit.

As it turns out, there is really only one general purpose data source that is likely to be an accurate and real-time source of process termination info — the Microsoft-Windows-Kernel-Process ETW trace provider. First, we will collect a log file containing some process trace events to get an idea of the data we can gather. In a command prompt, run logman.exe query providers Microsoft-Windows-Kernel-Process to get the basic info:
Provider GUID ------------------------------------------------------------------------------- Microsoft-Windows-Kernel-Process {22FB2CD6-0E7B-422B-A0C7-2FAD1FD0E716}


Value               Keyword              Description

-------------------------------------------------------------------------------

0x0000000000000010  WINEVENT_KEYWORD_PROCESS

0x0000000000000020  WINEVENT_KEYWORD_THREAD

0x0000000000000040  WINEVENT_KEYWORD_IMAGE

0x0000000000000080  WINEVENT_KEYWORD_CPU_PRIORITY

0x0000000000000100  WINEVENT_KEYWORD_OTHER_PRIORITY

0x8000000000000000  Microsoft-Windows-Kernel-Process/Analytic
Value               Level                Description

-------------------------------------------------------------------------------

0x04                win:Informational    Information

PID Image ------------------------------------------------------------------------------- 0x00000000
The relevant pieces of data here are the provider ID ({22FB2CD6-0E7B-422B-A0C7-2FAD1FD0E716}) and the WINEVENT_KEYWORD_PROCESS keyword (value 0x10) which is the one we are interested in for process-related events. Also note that the only trace level shown is win:Informational which means we can use level 4 and gather all the trace events we need.

Now we can open an elevated command prompt and create a trace session (logman.exe create trace ProcessTest -p {22FB2CD6-0E7B-422B-A0C7-2FAD1FD0E716} 0x10 4 -o ProcessTest.etl), start the session (logman.exe start ProcessTest), and then open and close a few apps. For this example, try opening cmd.exe and typing exit 256. Finally, stop the session (logman.exe stop ProcessTest) and delete it (logman.exe delete ProcessTest). You should now have a file ProcessTest_000001.etl (or similar) containing a few process start and stop events. We can use tracerpt.exe ProcessTest_000001.etl -o ProcessTest.xml to decode the events for further inspection. Let’s zoom in on the process events for cmd.exe:

<!-- . . . -->
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Kernel-Process" Guid="{22fb2cd6-0e7b-422b-a0c7-2fad1fd0e716}" />
    <EventID>1</EventID>
    <Version>0</Version>
    <!-- . . . -->
  </System>
  <EventData>
    <Data Name="ProcessID">  240148</Data>
    <Data Name="CreateTime">2014-01-03T00:01:04.021866700Z</Data>
    <Data Name="ParentProcessID">    2620</Data>
    <Data Name="SessionID">       1</Data>
    <Data Name="ImageName">\Device\HarddiskVolume2\Windows\System32\cmd.exe</Data>
  </EventData>
  <!-- . . . -->
</Event>
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Kernel-Process" Guid="{22fb2cd6-0e7b-422b-a0c7-2fad1fd0e716}" />
    <EventID>2</EventID>
    <Version>1</Version>
    <!-- . . . -->
  </System>
  <EventData>
    <Data Name="ProcessID">  240148</Data>
    <Data Name="CreateTime">2014-01-03T00:01:04.021866700Z</Data>
    <Data Name="ExitTime">2014-01-03T00:01:05.994369500Z</Data>
    <Data Name="ExitCode">     256</Data>
    <!-- . . . -->
  </EventData>
    <!-- . . . -->
</Event>
<!-- . . . -->

The first event (ID=1, version=0) was emitted when the app was started. The second (ID=2, version=1) was emitted when it exited (with code 256, just as we planned it). To get more detailed information about the format of these events, we will need the ETW manifest. Luckily, tracerpt.exe ProcessTest_000001.etl -export ProcessTest.man can export a manifest for us! The ProcessTest.man file will come in handy later when we need to programmatically decode the events.

Armed with this knowledge, we can devise a strategy to solve the above problem of monitoring processes for abnormal termination. We’ll simply create a real-time trace session for kernel process events, read start and stop events from the session, and from these events gather process IDs, image names, start/stop times, and exit codes. Deciding what is abnormal termination can be left as an exercise for the reader — in this sample, we’ll just track the information.

First, we’ll create the KernelProcessSession which uses TraceEvent to manipulate real-time sessions and parse trace data. (Vance Morrison illustrates the basics in a blog post.) The start and stop events are processed using the template information from the manifest file. We use the low-level GetXxxAt methods to read the trace payload items based on byte offsets (hand-calculated from the manifest). On each start or stop event from the ETW stream, we raise a CLR event which can be consumed by another class for further processing.

The class that does further processing in this sample is ProcessTracker. On each started event, it tracks the process by ID until a subsequent stopped event, at which point it publishes its own ProcessStopped event combining all relevant info — exit code, image name, etc. The evolution of this class is best viewed via the commit history which shows each piece coming together as the unit tests are written.

The main app shows a sample use case of wiring up the ProcessTracker and KernelProcessSession. Real-time sessions require administrative privileges so make sure to run the app elevated to see it in action. Try running it and opening and closing a few apps. The output should look something like this:
[000.002] Starting session... [000.063] Running. [010.061] Stopping session... [010.064] Stopped. [010.069] Dumping event data... Process ID 325724 with image name 'cmd.exe' started at 1/2/2014 4:29:28 PM and exited at 1/2/2014 4:29:30 PM with code 256. Process ID 365740 with image name 'conhost.exe' started at 1/3/2014 4:29:28 PM and exited at 1/2/2014 4:29:30 PM with code 0.

WriteAsync .NET

Testing, coding, in that order.

Real-time trace sessions with TraceEvent

Leave a Reply Cancel reply