Tuesday, 18 December 2012

Updating Enterprise Logging Block config dynamically

We have a ClickOnce application that is using the Microsoft Enterprise Logging Block, its fairly easy to set up the blog from XML in the app.config, however we wanted out log path to be dynamic so that we could change it at runtime.

Here is our sample app.config.

  <!-- =================================== -->  
  <!-- logging configuration        -->  
  <!-- =================================== -->  
    <add name="LoggingPolicy">  
      <add name="All Exceptions" type="System.Exception" postHandlingAction="ThrowNewException">  
        <add name="Logging Exception Handler" type="Microsoft.Practices.EnterpriseLibrary.ExceptionHandling.Logging.LoggingExceptionHandler, Microsoft.Practices.EnterpriseLibrary.ExceptionHandling.Logging"  
         logCategory="General" eventId="100" severity="Error" title="Enterprise Library Exception Handling"  
         formatterType="Microsoft.Practices.EnterpriseLibrary.ExceptionHandling.TextExceptionFormatter, Microsoft.Practices.EnterpriseLibrary.ExceptionHandling"  
         priority="0" />  
  <loggingConfiguration name="" tracingEnabled="true" defaultCategory="General"  
    <add name="Rolling Flat File Trace Listener" type="Microsoft.Practices.EnterpriseLibrary.Logging.TraceListeners.RollingFlatFileTraceListener, Microsoft.Practices.EnterpriseLibrary.Logging"  
     listenerDataType="Microsoft.Practices.EnterpriseLibrary.Logging.Configuration.RollingFlatFileTraceListenerData, Microsoft.Practices.EnterpriseLibrary.Logging"  
     fileName="./Logs/Client.log" footer="" formatter="Text Formatter"  
     header="" rollSizeKB="20000" />  
    <add name="XML Trace Listener" type="Microsoft.Practices.EnterpriseLibrary.Logging.TraceListeners.XmlTraceListener, Microsoft.Practices.EnterpriseLibrary.Logging"  
     listenerDataType="Microsoft.Practices.EnterpriseLibrary.Logging.Configuration.XmlTraceListenerData, Microsoft.Practices.EnterpriseLibrary.Logging"  
     fileName="trace-xml.log" traceOutputOptions="None" filter="Verbose" />  
    <add name="Event Log Trace Listener" type="Microsoft.Practices.EnterpriseLibrary.Logging.TraceListeners.FormattedEventLogTraceListener, Microsoft.Practices.EnterpriseLibrary.Logging"  
     listenerDataType="Microsoft.Practices.EnterpriseLibrary.Logging.Configuration.FormattedEventLogTraceListenerData, Microsoft.Practices.EnterpriseLibrary.Logging"  
     source="Application Error" formatter="Text Formatter" log="Application" filter="Information" />  
    <add type="Microsoft.Practices.EnterpriseLibrary.Logging.Formatters.TextFormatter, Microsoft.Practices.EnterpriseLibrary.Logging"  
     template="{timestamp(local)} : {severity} : {keyvalue(namespace)} :{message}&#xA;"  
     name="Text Formatter" />  
    <add switchValue="information" name="General">  
 <!--     <add name="Rolling Flat File Trace Listener" />-->  
      <add name="Event Log Trace Listener" />  
    <allEvents switchValue="All" name="All Events" />  
    <notProcessed switchValue=
"All" name="Unprocessed Category">  
      <add name="Rolling Flat File Trace Listener" />  
    <errors switchValue="Verbose" name="Logging Errors &amp; Warnings">  
      <add name="Rolling Flat File Trace Listener" />  
      <add name="Event Log Trace Listener" />  

We can then dynamically update the config to add out new path :-

     private static DictionaryConfigurationSource ConfigureLoggingLocationForClickOnce()  
       string logFileName = @".\Logs\Client.log";  
       if (ApplicationDeployment.IsNetworkDeployed)  
         logFileName = ApplicationDeployment.CurrentDeployment.DataDirectory + @"\Logs\Client.log";  
       //Create the config builder for the Fluent APIvar   
       var source = new FileConfigurationSource(AppDomain.CurrentDomain.SetupInformation.ConfigurationFile);  
       ConfigurationSourceBuilder configBuilder = new ConfigurationSourceBuilder();  
       //Get the existing logging config section           
       var logginConfigurationSection = (LoggingSettings)source.GetSection("loggingConfiguration");  
       logginConfigurationSection.RevertImpersonation = false;  
       var _rollingFileListener = new RollingFlatFileTraceListenerData("Rolling Flat File Trace Listener", logFileName, "", "",  
                20000, "MM/dd/yyyy", RollFileExistsBehavior.Increment,  
                RollInterval.Day, TraceOptions.Callstack | TraceOptions.None,  
                "Text Formatter", SourceLevels.Information);  
       _rollingFileListener.MaxArchivedFiles = 1;  
       //Add trace listener to current config  
       //Configure the category source section of config for rolling file  
       var _rollingFileCategorySource = logginConfigurationSection.TraceSources.Get("General");  
       //Must be named exactly the same as the flat file trace listener above.  
       _rollingFileCategorySource.TraceListeners.Add(new TraceListenerReferenceData("Rolling Flat File Trace Listener"));  
       //Add category source information to current config  
       //Add the loggingConfiguration section to the config.  
       configBuilder.AddSection("loggingConfiguration", logginConfigurationSection);  
       //Required code to update the EntLib Configuration with settings set above.  
       var configSource = new DictionaryConfigurationSource();  
       //Set the Enterprise Library Container for the inner workings of EntLib to use when logging  
       EnterpriseLibraryContainer.Current = EnterpriseLibraryContainer.CreateDefaultContainer(configSource);  
       return configSource;  

Finally if using Unity and Prism you can setup both loggers like so:-

       var configurator = new UnityContainerConfigurator(Container);  
       // Read the configuration files and set up the container.  
       var configSource = ConfigureLoggingLocationForClickOnce();  
       EnterpriseLibraryContainer.ConfigureContainer(configurator, configSource);  
       // Set up logging with both IOC containers...  
       var oldLogger = EnterpriseLibraryContainer.Current.GetInstance<LogWriter>();  
       if (oldLogger != null)  
       var newLogger = Container.Resolve<LogWriter>();  
       EnterpriseLibraryContainer.Current = ServiceLocator.Current;  
       this.Container.RegisterType<ILoggerFacade, LoggingService>(new ContainerControlledLifetimeManager());  
       LoggingService logger = (LoggingService)this.Container.Resolve<ILoggerFacade>();  
       this.Container.RegisterInstance<ILoggingService>(logger, new ContainerControlledLifetimeManager());  

Monday, 17 December 2012

Forcing WCF generated proxies to share POJO's

We had a recent problem at work that seemed quite tough to solve, we have multiple SOAP endpoints exposed from a Java Enterprise Edition (JEE) container. The services were generated from Java code using Axis2. The services shared many POJOs.

We wanted to consume these services in a .NET client, client proxies were generated in the standard manner but this caused an issue. We ended up with the same POJO objects converted into multiple POCO's in different namespaces on the client.

I was resorting to svcutil without any success until I tried the following approach.

Open the Reference.Svc file and add multiple endpoints to one service definition.
   <MetadataSource Address="http://mycompany.com/BananaService?wsdl" Protocol="http" SourceId="1" />  
   <MetadataSource Address="http://mycompany.com/OrangeService?wsdl" Protocol="http" SourceId="2" />  
   <MetadataSource Address="http://mycompany.com/AppleService?wsdl" Protocol="http" SourceId="3" />  
   <MetadataFile FileName="BananaService.xsd" MetadataType="Schema" ...  
   <MetadataFile FileName="BananaService.wsdl" MetadataType="Wsdl" ...  
   <MetadataFile FileName="OrangeService.xsd" MetadataType="Schema" ...  
   <MetadataFile FileName="OrangeService.wsdl" MetadataType="Wsdl" ...  
   <MetadataFile FileName="AppleService.wsdl" MetadataType="Wsdl" ...  
   <MetadataFile FileName="AppleService.xsd" MetadataType="Schema" ...  

Now 'Update the service Reference', and bingo one C# source file with all client proxies in in one namespace all sharing the same POCOS!

Making WPF DataGrid Keyboard friendly

The out of the box DataGrid does a great job of making available data, but the standard default config leaves a little to be desired to those stuck with a keyboard.

I aim to show you how to make it a little friendlier with the help of a custom behavior.

First we create a custom binding.
   public class DataGridListBehavior : Behavior<DataGrid>  
     private bool _endingEdit;  
     protected IList Model { get; private set; }  
     protected override void OnAttached()  
       AssociatedObject.DataContextChanged += HandleDataContextChanged;  
       AssociatedObject.MouseDoubleClick += HandleMouseDoubleClick;  
       AssociatedObject.CellEditEnding += HandleCellEditEnding;  
       AssociatedObject.PreviewKeyDown += PreviewKeyDown;  
       AssociatedObject.PreviewMouseLeftButtonDown += HandleLeftButtonDown;  
       AssociatedObject.GotFocus += CellGotFocus;  
     protected override void OnDetaching()  

This allows up to hook important grid events and add in our custom code.

Most of the work is done by CellGotFocus().

     private void CellGotFocus(object sender, RoutedEventArgs args)  
       DataGridCell gridCell = args.OriginalSource as DataGridCell;  
       if (gridCell != null)  
         DataGridHelper.EnterEdit(gridCell, AssociatedObject);  

Inside here we do the major work of ensuring we enter into the cell then focus the control automatically.

     public static void EnterEdit(DataGridCell gridCell, DataGrid grid, bool bFocus = true)  
       if (gridCell != null && !gridCell.IsEditing)  
         // enables editing on single click  
         if (!gridCell.IsFocused)  
         if (!gridCell.IsSelected && grid.SelectionUnit == DataGridSelectionUnit.Cell)  
           gridCell.IsSelected = true;  
         if (bFocus)  
           var control = FindChildElement<Control>(gridCell);  
           if (control != null)  

You can get the code here.

Tuesday, 2 October 2012

Monitor Pulse/Wait, ResetEvent and SpinWait

After some lively discussion with about the merits of Monitor Pulse/Wait as a thread synchronization primitive in C# I decided to test the results for myself.

One argument was that Monitor Pulse / Wait performs better than ManualResetEvent due to it running in User mode without the need for Kernel objects and therefore should be used as the preferred thread synchronization mechanism in C# for performance reasons.

Counter arguments were :-

1. That the code would be less maintainable
2. Pulses can be missed in real world production code.
3. If keen on optimum speed through micro optimizations then maybe 'lock free programming' should be considered.

Its hard to draw real conclusions on a system with only 4 cores as really to determine the effect of massive parallelism you would need a NUMA machine with many more cores.

However I tried to test the theory anyway :-

Here is a run with more iterations but the same amount of concurrency and contention (Small runs are unreliable due to general background multi-tasking on the machine. There is also a CLR/JIT warm up cost but I've also ignored this to keep things simple.).

AMD Processor (4 cores)

ManualResetEvent suffers a lot showing the advantage in performance of MonitorWaitAndPulse.

Kernel objects do become expensive if you use a LOT of them, which can be seen here.

What would make more sense would be to up the concurrency, however I cannot up the true concurrency on my hardware, only the total number of threads which would then be time sliced.

The second run was with 10,000 iterations of 120 (40x3) tasks. In real world use I'd recommend using ManualResetEvent until you find you need more performance. Premature optimization is usually a bad idea.

MonitorWaitAndPulse offers good mid performance as long as you can ensure you don't miss pulses and have low contention.

As you can see from this example with mild contention MonitorWaitAndPulse is not a clear winner over lock free code.

As contention rises the locks in MonitorWaitAndPulse will mean its performance will degrade favoring code that is lock free like MySpinLock or the Flag with MemoryFence and SpinWait.

MySpinLock makes use of 'lock free programming' using CompareAndSwap (CAS).

 #pragma warning disable 0420  
 namespace ThreadCommunication  
   struct MySpinLock  
     volatile int _taken;  
     public void Enter()  
       while(Interlocked.CompareExchange(ref _taken, 1, 0) != 0) Thread.Yield() /*spin*/;  
     public void Exit()  
       _taken = 0;  

This allows us to create a SpinLock without using CLR or Kernel locking primitives. Normally you should only spin for short periods, a more typical use would be to simulate a critical section. When using spin locks for short periods you would not want to yield. In this case without the yield we would effectively waste a threads quantum every time slice until the work is done.

We should be able to do better with SpinWait, SpinWait allows us to wait for a condition variable to be set. It also allows us to use a 'backing off' algorithm in order to yield or sleep for longer periods if our wait is 'too busy'.

Note :- The .NET framework also comes with its own SpinLock class, in production code you should use the .NET framework where possible, the MySpinLock code is provided to demonstrate CAS programming principles only. The framework SpinLock is specifically designed for short running operations and favors thread affinity.

Code can be downloaded here.

Monday, 24 September 2012

Which WCF binding is best ?

Following on from Rick Rainey's excellent WCF binding article I decided to have a quick look myself.

Obviously you should choose a binding that suits your transport needs, for example if you need security or transaction support then you will need to add that. If you want peak performance its also likely that you would go for a lighter weight communications stack than WCF and possibly maybe even not use the TCP/IP protocol at all in the most extreme cases.

The following results were obtained by running a modified version of Rick's code :-

 Payload Size: 35000, Iterations: 1  
   Binding Elements:  
 Results (milliseconds): 10  
 CustomBinding (HTTP/Binary/ReliableSession(ordered)/NoSecurity)  
 Payload Size: 35000, Iterations: 1  
   Binding Elements:  
 Results (milliseconds): 9  
 CustomBinding (TCP/Binary/NoSecurity)  
 Payload Size: 35000, Iterations: 1  
   Binding Elements:  
 Results (milliseconds): 6  

It appears at least in this simple case that the number of BindingElements is the limiting factor on performance.

Example code here.

Based on Rick Rainey's original work.

Actors in C# with NAct

The Actor Model is a popular way to get reliable currency into your application through isolated immutability.

Here we will see how to create a basic Actor and use that Actor to calculate primes just as in the previous Java and C# examples.

private static void Main(string[] args)  
    if(args.Length < 2)  
        Console.WriteLine("Usage: number numberOfParts");  
    int primesBound = int.Parse(args[0]);  
    int concurrency = int.Parse(args[1]);  
    Stopwatch stopwatch = Stopwatch.StartNew();  
    int count = CountPrimes(primesBound, concurrency);  
    Console.WriteLine("Found {0} primes under {1}", count, primesBound);  
    Console.WriteLine("It took {0} seconds", (stopwatch.ElapsedMilliseconds)/1.0e3f);  

Here we simply take in the upper bound of the number range in which we want to find primes and the level of concurrency like before in our Java example.

Here we will see how to use the open source library NAct to create Actors in C#.First we must define an actor interface :-
public interface IPrimesActor : IActor  
    Task<int> CountPrimesInRange(int lower, int upper);  

Then we must implement our actor interface :-

public class PrimesActor : IPrimesActor  
    async Task<int> IPrimesActor.CountPrimesInRange(int lower, int upper)  
        return await Task.Factory.StartNew(() => PrimesComputation.CountPrimesInRange(lower, upper));  

And finally, you need to tell NAct to turn your object into an actor, as you create it by using the ActorWrapper.
The object you get back from the wrapper is an NAct proxy. Whenever you call a method on it, NAct will call the same method on your object at some point in the right thread.

Here is CountPrimes implemented using NAct Actors.
internal static int CountPrimes(int bound, int concurrency)  
    var tasks = CountPrimesAsync(bound, concurrency);  
    var sums = tasks.GetAwaiter().GetResult();  
    return sums.Sum();  
private async static Task<int[]> CountPrimesAsync(int bound, int concurrency)  
    int chunks = bound / concurrency;  
    var actors = new List<Task<int>>();  
    for (int i = 0; i < concurrency; i++)  
        int lower = i * chunks;  
        int upper = (i + 1) * chunks;  
        var actor = ActorWrapper.WrapActor<IPrimesActor>(() => new PrimesActor());  
        var task = actor.CountPrimesInRange(lower, upper);  
    var primeCounts = await Task.WhenAll(actors);  
    return primeCounts;  

And here is the result :-
 Found 664579 primes under 10000000  
 It took 3.636 seconds  

Source code here.

You can find out more about NAct here.
You can install NAct using NuGet using this command.

Debugging Concurrent Code in VS 2012

Visual Studio 2012 comes with some enhanced debugger features that help when debugging parallel code.

We can see this by running the C# Blender example.

You can see Threads and Tasks using the debug windows.

Here is the Parallel Tasks view, you can get it with Debug > Windows > Parallel Tasks

You can also launch a Parallel Watch window with Debug > Windows > Parallel Watch
You can enter watch expressions and see their evaluation under multiple tasks, here we examine num.

Another feature is the Parallel Stacks, here you can see the main thread stack along with the fork join example and its spawned threads.

There is also a concurrency visualizer that allows to see if you are making good use of you machines cores, it mainly shows occupancy, but you can also see events and locks.

The following picture shows the result of running the Thread pool blender followed by the sequential blender. You can see that we get good occupancy with the thread pool followed by a period of inactivity, followed by the sequential version which fully occupies just one core.

This image shows the four threads in green that are occupying the four cores of the machine, along with 9 blocked threads in red, one UI thread in brown, and one thread sleeping. 

That concludes a brief overview of the concurrency features in the Visual Studio 2012 debugger.

Saturday, 22 September 2012

Quickstart Task-based WCF service

This post shows you how to create a task based WCF Service using Visual Studio 2012.

1. Create a new project using File > New > Project.
In the wizard select the Visual C# \ WCF \ WCF Service Application

2. Alter the service definition to return Task
public async Task<string> GetData(int value)   
   Thread.Sleep(4); // Simulate a slow call...  
   return await Task.Factory.StartNew( 
       () => string.Format("You entered: {0}", value));  

You will also need to alter the interface to return Task also.

We use a sleep to simulate a lengthy operation and make the asynchronous behavior very obvious, this would not exist in real code.

3. Right click the solution select Add > New Project
In the wizard select Visual C# \ Windows \ WPF Application

4. Right click the WPF application
Select Add Service Reference
Click Discover, select Service1
Click Advanced, ensure 'Allow generation of asynchronous operations' and 'Generate task-based operations' are enabled.

5. Edit the MainWindow.xaml for the WPF application, make it look like the following :-
<Window x:Class="WpfApplication1.MainWindow"  
    Title="MainWindow" Height="100" Width="200">  
    <RowDefinition />  
    <RowDefinition />  
  <TextBlock x:Name="txtMessage" Grid.Row="0">Service Not Yet Called</TextBlock>  
  <Button x:Name="btnAsync" Click="btnAsync_Click" Grid.Row="1">Call Service</Button>  

6. Now implement the buttom event handler in the MainWindow.xaml.cs
async private void btnAsync_Click(object sender, RoutedEventArgs e)
  using(var client = new ServiceReference1.Service1Client("BasicHttpBinding_IService1"))
    txtMessage.Text = await client.GetDataAsync(5);

7. Now right click the solution and select Set Startup projects
Select 'Multiple Strtup projects' and set both projects to 'Start'.

8. Now press F5 to launch the debugger

The WCF service will be started and hosted by IISExpress, you will see the root folder listing appear in your browser.
At the same time the client will be launched and you will see the WPF user interface.
Click the 'Call Service' button and you should see the message change to 'You entered: 5' indicating a successful service call!
To prove your client really is making an async call and not blocking the UI thread, try moving the window with the mouse after clicking the 'Call Service' button.

Well done! You've created your first Task based WCF service...

WCF Serialization performance test

This is an application I wrote some time ago to try to determine the performance benefits of protobuf-net serialization when used with WCF.

(If running the program in Visual Studio be sure to use 'Run as Administrator' to start VS.)

You can configure various aspects like the number of client proxies and the number of calls to make to each proxy.

 WcfClient /?  
 Wcf Test Client...  
 WcfClient [Verbose=False] [NamedPipes=True] [ServerAddress=localhost] [ProxyCount=1] [RunCount=1]  
 Possible arguments:  
 Help      Display Usage Help information.  
 Verbose   Turn on verbose logging to console.  
 NamedPipes   Use NamedPipes to talk to a local server process.  
 ServerAddress Servername / IP Address to use for remote server when not using NamedPipes  
 ProxyCount   The number of client proxies to run in parallel.  
 RunCount    The total number of test WCF calls to make across all proxies.  

Here is a basic run with just one proxy and one call.

 Wcf Test Client...  
 Running with ProxyCount=1, RunCount=1...  
 ---------- Run Basic Orders Tests ----------  
 Testing NetTcp Protobuf v2.4.0a ...  
 Setup test endpoint net.tcp://localhost:8754/WcfLoadTest/Proto  
 Time: 383ms, Bytes sent: 406, Bytes received: 585,507  
 Testing NetTcp ServiceStack v2.05 TypeSerializer ...  
 Setup test endpoint net.tcp://localhost:8755/WcfLoadTest/ServiceType  
 Time: 842ms, Bytes sent: 416, Bytes received: 1,080,166  
 Testing DataContract NetTcp ...  
 Setup test endpoint net.tcp://localhost:8752/WcfLoadTest/Tuned  
 Time: 90ms, Bytes sent: 406, Bytes received: 585,507  
 Testing NetDataContract NetTcp ...  
 Setup test endpoint net.tcp://localhost:8753/WcfLoadTest/Tuned  
 Time: 223ms, Bytes sent: 495, Bytes received: 415,674  
 Testing BasicHttp ...  
 Setup test endpoint http://localhost:8750/WcfLoadTest/Vanilla  
 Time: 159ms, Bytes sent: 365, Bytes received: 2,050,499  
 Testing Json WebHttp ...  
 Setup test endpoint http://localhost:8751/WcfLoadTest/Json  
 Time: 764ms, Bytes sent: 277, Bytes received: 1,207,660  
 ---------- Run Complex Graph Tests ----------  
 Testing NetTcp Protobuf v2.4.0a ...  
 Setup test endpoint net.tcp://localhost:8754/WcfLoadTest/Proto  
 Time: 234ms, Bytes sent: 416, Bytes received: 1,102,680  
 Testing DataContract NetTcp ...  
 Setup test endpoint net.tcp://localhost:8752/WcfLoadTest/Tuned  
 Time: 138ms, Bytes sent: 416, Bytes received: 1,102,680  
 Testing NetDataContract NetTcp ...  
 Setup test endpoint net.tcp://localhost:8753/WcfLoadTest/Tuned  
 Time: 130ms, Bytes sent: 501, Bytes received: 150,334  
 Testing BasicHttp ...  
 Setup test endpoint http://localhost:8750/WcfLoadTest/Vanilla  
 Time: 92ms, Bytes sent: 344, Bytes received: 1,662,004  
 Press any key to exit...  

Here is a run with 20 proxies and 80 service calls.

 Wcf Test Client...  
 Running with ProxyCount=20, RunCount=80...  
 ---------- Run Basic Orders Tests ----------  
 Testing NetTcp Protobuf v2.4.0a ...  
 Setup test endpoint net.tcp://localhost:8754/WcfLoadTest/Proto  
 Time: 3505ms, Bytes sent: 21,620, Bytes received: 46,814,400  
 Testing NetTcp ServiceStack v2.05 TypeSerializer ...  
 Setup test endpoint net.tcp://localhost:8755/WcfLoadTest/ServiceType  
 Time: 7010ms, Bytes sent: 22,360, Bytes received: 86,407,760  
 Testing DataContract NetTcp ...  
 Setup test endpoint net.tcp://localhost:8752/WcfLoadTest/Tuned  
 Time: 3510ms, Bytes sent: 21,620, Bytes received: 46,814,400  
 Testing NetDataContract NetTcp ...  
 Setup test endpoint net.tcp://localhost:8753/WcfLoadTest/Tuned  
 Time: 4992ms, Bytes sent: 24,240, Bytes received: 33,217,140  
 Testing BasicHttp ...  
 Setup test endpoint http://localhost:8750/WcfLoadTest/Vanilla  
 Time: 4923ms, Bytes sent: 27,376, Bytes received: 164,039,900  
 Testing Json WebHttp ...  
 Setup test endpoint http://localhost:8751/WcfLoadTest/Json  
 Time: 8056ms, Bytes sent: 20,360, Bytes received: 96,612,800  
 ---------- Run Complex Graph Tests ----------  
 Testing NetTcp Protobuf v2.4.0a ...  
 Setup test endpoint net.tcp://localhost:8754/WcfLoadTest/Proto  
 Time: 4002ms, Bytes sent: 21,820, Bytes received: 88,195,440  
 Testing DataContract NetTcp ...  
 Setup test endpoint net.tcp://localhost:8752/WcfLoadTest/Tuned  
 Time: 3577ms, Bytes sent: 21,820, Bytes received: 88,195,440  
 Testing NetDataContract NetTcp ...  
 Setup test endpoint net.tcp://localhost:8753/WcfLoadTest/Tuned  
 Time: 2931ms, Bytes sent: 24,360, Bytes received: 11,999,420  
 Testing BasicHttp ...  
 Setup test endpoint http://localhost:8750/WcfLoadTest/Vanilla  
 Time: 2249ms, Bytes sent: 27,520, Bytes received: 132,960,300  
 Press any key to exit...  

There are two types of serialization tested basic flat non cyclic data and cyclic graphs. It can be seen from these results that in real world scenarios there is very little advantage to using Protobuf over the standard Microsoft serializers.

Some people have reported an improvement of 1 millisecond per call with large basic payloads using protobuf v1 (without graph support), however you would need to carefully consider your system to determine if this will be suitable in your system, for most people the MS serializers should be more suitable.

The actual test is rather more complicated than many similar tests, this is because it tries to more closely mimic a real world scenario. There are separate processes for the client and the server and the client passes the binding to the server using a management WCF service. The server (WCF host) will create and tear down services as requested by the management service. In this manner the client can automate and contol the server in order to control the number of endpoints and the binding types.

One issue with this approach is not all WCF bindings are serializable, however most are and I believe the test covers enough of the standard bindings to still be useful.

You can get the code here.

Friday, 21 September 2012

Simple Calculators

Basic calculator apps that demonstrate how to write simple C++ MFC applications and perform simple GUI I/O. Second example also demonstrates basic recursive descent parsing and expression tree evaluation. Operators supported are binary addition and subtraction. Integral number operands in decimal and hexadecimal are supported.

Download Source

Download Executables


Ruminations on C++ by Andrew Koenig & Barbara Moo, ISBN: 0-201-42339-1, Addison Wesley.
The C++ Standard Library by Nicolai M. Josuttis, ISBN: 0-201-37926-0, Addison Wesley.
Recursive descent parsing - Html parsing article.

Thermodynanic Equations for Water and Steam

C Library to perform thermodynamic water calculations. Code based on FORTRAN 77 code from Steam book.
Quantities measured in units other than temperature, pressure or density are stored in dimensionless units until output time.

Download Source


NBS/NRC STEAM TABLES: Thermodynamic and Transport Properties and Computer Programs for Vapor and Liquid States of Water in SI Units by Haar L, Gallagher JS, Kell GS, ISBN: 0-89116-353-0, Hemisphere Publishing Co.

OpenGL molecule viewer

This another old project of mine that demonstrates basic OpenGL, it also makes use of WTL and the Boost Graph Library.

Main Interface

Molecule Colour Key


Atilla - ATL/WTL framework by The Sells Brothers.
OpenGL Superbible by Richard S. Wright, Jr. & Michael Sweet, ISBN: 1-57169-164-2, Waite Group Press.
OpenGL III: Building an OpenGL C++ Class by Dale Rogerson, MSDN.
OpenGL VI: Rendering on DIBs with PFD_DRAW_TO_BITMAP by Dale Rogerson, MSDN.
Animation Techniques for Win32 by Nigel Thompson, ISBN: 1-55615-669-3, Microsoft Press.
(Nigel's Animate Library).
The Boost Graph Library by Jeremy G. Siek, Lie-Quan Lee, Andrew Lumsdaine, ISBN: 0-201-72914-8, Addison Wesley.
WTL Makes UI Programming a Joy, Part 1: The Basics Doc with samples, DevelopMentor.
WTL Makes UI Programming a Joy, Part 2: The bells and whistles Doc with samples, DevelopMentor.

The source is here.

PID Controller in C++

This is an old project I did to teach myself basic control theory, its a PID controller written in Visual C++ 6 using MFC.

The project also contains its own Graph MFC control which is based on MFC GDI primitives.

Its a simulation of a basic PID controller.

PID Control Theory

Proportional-Integral-Derivative (PID) is a control algorithm used to set and maintain a measurement at a set-point. The PID algorithm calculates its output by summing three terms.

  • Proportional - A value proportional to the error of the measurement. 
  • Integral - A value proportional to the integral of the error over time. 
  • Derivative - A value proportional to the rate of change of the error.

The general form of the PID equation is thus:

m(t) = K* (e(t) + Kiòe(t)dt + Kdde)
          proportional   integral       derivative


m(t) = controller output
Kc = proportional gain
Ki = integral time constant
Kd = derivative time constant
e(t)  = error as a function of time = S(t) - X(t)
S(t) = current process setpoint.
X(t) = measured input

Variables Kc, Ki and Kd are adjustable for a given application of the controller.
Computer PID algorithms are based on data sampled on a cyclic basis.

The sample based PID equation is thus:
m(i) = K* (e(i) + T*Kiåe(k) + (Kd/T)*(e(i)-e(i-1)))
           proportional    integral            derivative


T = sampling interval
e(i) = error at ith sampling interval = S(t)-X(t)
e(i-1) = error at pervious sampling interval
e(k) = error at k where k increments by 1 through the range 0 <= k <= i
m(i) = controller output
Kc = proportional gain
Ki = integral action time
Kd = derivative action time

Complex Error

In practice, the first order difference term, (e(i) - e(i-1)), is very susceptible to noise problems. In most practical systems this term is replaced by the more stable, higher order equation:

                De = ( e(i) + 3*e(i-1) - 3*e(i-2) - e(i-3) ) / 6

Ramp Limit

Many systems implement a limit on the maximum rate at which the output value can change in order to stop any physical output devices from being ‘thrashed’.


Practical systems stop the summation of error terms if the current PID output level is outside a user specified range of high and low output values. This limiting of the summation term is commonly referred to as anti-reset-windup.


The general flow of a PID control will be as follows:

PID Control Diagram

An individual PID control  will function as follows: Automatic PID control A desired value or setpoint is entered. The input will be read. The difference between the input and the setpoint or error is calculated. An output value is calculated using this information and the PID algorithm. The output is optionally limited buy various methods. The output ramping is optionally limited.

Ramp Limit

The ramp limiting is optional when the PID calculation is active. If the user output is set the ramp limiting will always be applied. The ramp limiting is specified in degrees representing the angle the output makes with the horizontal on a graph. The time axis resolution is therefore critical to the significance of this parameter. A limit of zero degrees would make output ramping impossible and a limit of 90 degrees or greater would allow maximum ramping.


Anti reset windup is only applied when selected and the PID calculation is active. The anti reset windup upper and lower bounds are specified as a percentage of the entire output range. A limit of zero percent would make output ARW impossible and a limit of 50 percent or greater would allow maximum ARW.


Tuning and control loop performance
by Gregory K. McMillan
ISBN: 1-55617-49261
ISA - The Instrumentation, Systems, and Automation Society.

Tuning of industrial control systems
by Armando B. Corripio
ISBN: 1-55617-6996
ISA - The Instrumentation, Systems, and Automation Society.

Quinn-Curtis, Real-time graphics tools for windows manual.


Source code here.
Executable here.

Thursday, 20 September 2012

Troubleshooting Kerberos

Here are some tools that can be used to diagnose Kerberos issues.


This is the main tool to use on Windows to display ticket information for a given computer running the Kerberos protocol. It displays information from the Windows ticket cache.


This command-line tool allows you to manage the Service Principal Names (SPN) property for an Active Directory™ directory service account. SPNs are used to locate a target principal name for running a service.
Useful when creating test accounts for use with Kerberos on Windows.

Process Monitor

Process Monitor is an advanced monitoring tool for Windows that shows real-time file system, Registry and process/thread activity. It combines the features of two legacy Sysinternals utilities, Filemon and Regmon, and adds an extensive list of enhancements including rich and non-destructive filtering, comprehensive event properties such session IDs and user names, reliable process information, full thread stacks with integrated symbol support for each operation, simultaneous logging to a file, and much more. Its uniquely powerful features will make Process Monitor a core utility in your system troubleshooting and malware hunting toolkit.

Process Explorer

Ever wondered which program has a particular file or directory open? Now you can find out. Process Explorer shows you information about which handles and DLLs processes have opened or loaded.
The Process Explorer display consists of two sub-windows. The top window always shows a list of the currently active processes, including the names of their owning accounts, whereas the information displayed in the bottom window depends on the mode that Process Explorer is in: if it is in handle mode you'll see the handles that the process selected in the top window has opened; if Process Explorer is in DLL mode you'll see the DLLs and memory-mapped files that the process has loaded. Process Explorer also has a powerful search capability that will quickly show you which processes have particular handles opened or DLLs loaded.
The unique capabilities of Process Explorer make it useful for tracking down DLL-version problems or handle leaks, and provide insight into the way Windows and applications work.
You should be able to see what security tokens a process has by using properties on a process and clicking the security tab. (Make sure you are running process explorer as administrator.)


This allows you to authenticate with a principal and generate tickets.
This can be used on Windows and linux. Its use on Windows is limited however as it works off its own file cache, not the Windows ticket cache.


View and deleting the Kerberos tickets granted to the current logon session.
This can be used on windows and linux. Its use on windows is limited however as it works off its own file cache, not the Windows ticket cache.

Enable Kerberos Logging

Windows offers the capability of tracing detailed Kerberos events through the event log mechanism. You can use this information too troubleshoot Kerberos. This article describes how to enable Kerberos event logging.

Insight for Active Directory v1.01

ADInsight is an LDAP (Light-weight Directory Access Protocol) real-time monitoring tool aimed at troubleshooting Active Directory client applications. Use its detailed tracing of Active Directory client-server communications to solve Windows authentication, Exchange, DNS, and other problems.
ADInsight uses DLL injection techniques to intercept calls that applications make in the Wldap32.dll library, which is the standard library underlying Active Directory APIs such LDAP API and ADSI . Unlike network monitoring tools, ADInsight intercepts and interprets all client-side APIs, including those that do not result in transmission to a server. ADInsight monitors any process into which it can load it’s tracing DLL, which means that it does not require administrative permissions, however, if run with administrative rights, it will also monitor system processes, including windows services.


Logman creates and manages Event Trace Session and Performance logs and supports many functions of Performance Monitor from the command line.  Filters can be added to log Kerberos events.

MIT Kerberos Client

Network Identity Manager (NetIdMgr) is a graphical tool designed to simplify the management of network identities and their credentials which are used by network authentication protocols while providing secure access to network services.
When NetIDMgr is used with Kerberos v5 each network identity is a unique Kerberos principal name and the credentials are Kerberos v5 tickets. Kerberos v5 tickets can be used by NetIDMgr to obtain Andrew File System (AFS) tokens and X.509 public key certificates if the appropriate plug-ins are installed.


This is an ASP.NET application used to help troubleshoot and configure IIS and Active Directory to allow Kerberos and delegating Kerberos credentials.

Kerberos SPN Viewer

Simplify listing the ServicePrincipalName (SPN) and an integrated helper tool which can help us find out what SPN should we set based on the configuration that we are using.


Open source packet sniffer for Windows and Unix.

Microsoft Network Monitor

Microsofts packet sniffer, allows capturing and protocol analysis of network traffic.


This tool will compute the maximum token size and is used to test whether a system may exhibit the issue described in KB article 327825.


This GUI tool is a Lightweight Directory Access Protocol (LDAP) client that allows users to perform operations (such as connect, bind, search, modify, add, delete) against any LDAP-compatible directory, such as Active Directory. LDP is used to view objects stored in Active Directory along with their metadata, such as security descriptors and replication metadata.

Active Directory Users and Computers

Active Directory® Users and Computers is a Microsoft Management Console (MMC) snap-in that you can use to administer and publish information in the directory.

NTP commands

These can be useful if you are experiencing NTP issues, this can be common with some virtual machines.
w32tm /resync
net start w32time

Troubleshooting Kerberos Problems


Programming for Multicore with Java 7

With the age of Amdals Law almost at an end mulitcore processors are set to become commonplace and concurrent programming is going to become important to utilize those cores.

In this article we will look at how you can program for multicore era in Java.


First we will code an algorithm in a sequential fashion and then see how various approaches can be used to get speedup.
public class PrimesSequential extends PrimesComputation {  
    public Boolean[] computePrimes(int upto) {  
        Boolean[] results = new Boolean[upto];  
        for (int x = 0; x < results.length; x++)  
            results[x] = isPrime(x);  
        return results;  


Here we can see the basic Java Threads implementation, we have created one thread per processor.
We (roughly) partition the work across the threads. Then we wait all threads to complete by calling join() on all the threads from the main thread.
public Boolean[] computePrimes(int upto) throws InterruptedException {  
    int processorCount = Runtime.getRuntime().availableProcessors();  
    PrimesComputationThread[] threads = new PrimesComputationThread[processorCount];  
    Boolean[] results = new Boolean[upto];  
    for (int i = 0; i < processorCount; i++) {  
        threads[i] = new PrimesComputationThread(i * upto / processorCount,  
                (i + 1) * upto / processorCount, results);  
    for (int i = 0; i < processorCount; i++) {  
    return results;  
static class PrimesComputationThread extends Thread {  
    public void run() {  
        for (int x = from; x < to; x++)  
            results[x] = isPrime(x);  


Here you can see a similar approach to our threads version but this time using a thread pool. Thread pools give us better control over a collection of threads, as well as allowing us to manage threads over time and amortize thread overheads.
public Boolean[] computePrimes(int upto) throws InterruptedException {  
    int processorCount = Runtime.getRuntime().availableProcessors();  
    ExecutorService threadPool = Executors.newFixedThreadPool(processorCount);  
    int noTasks = 100;  
    Boolean[] results = new Boolean[upto];  
    for (int i = 0; i < noTasks; i++) {  
        threadPool.execute(new PrimesComputationTask(i * results.length  
                / noTasks, (i + 1) * results.length / noTasks, results));  
    threadPool.awaitTermination(100, TimeUnit.SECONDS);  
    return results;  


One problem with our partitioning scheme is that we do not evenly division up the work because calculating some primes will be harder than others, what is worse we cannot easily predict the difficulty of finding a given prime. There is however a basic solution, there is a well known scheduling algorithm called work stealing, idle threads can steal work from busy threads therefore helping to spread the load. 
Another advantage of the ForkJoin pattern is it maps well to certain types of real world divide and conquer algorithms.
public class PrimesForkJoin extends PrimesComputation {  
    public Boolean[] computePrimes(int upto) throws Exception {  
        ForkJoinPool pool = new ForkJoinPool();  
        Boolean[] results = new Boolean[upto];  
        pool.invoke(new PrimesTask(0, results.length, results));  
        return results;  
    static class PrimesTask extends RecursiveAction {  
        public void compute() {  
            if (to - from < 1000)  
                for (int x = from; x < to; x++)  
                    results[x] = isPrime(x);  
            else {  
                PrimesTask left = new PrimesTask(from, (to + from) / 2, results);  
                PrimesTask right = new PrimesTask((to + from) / 2, to, results);  


Java 8 will add support for ParallelArray's, you can get a glimpse of what this might look like by using the jsr166y libray jars. The idea is to eventually allow simple loop parallel like C#s Paralell.For.

The code is functional in nature, replaceWithMappedIndex() replaces each element in the array with the results of applying the given mapping. In this case our mapping is simply the primality status of the number.

public Boolean[] computePrimes(int upto) {  
    ParallelArray<Boolean> p = ParallelArray.create(upto, Boolean.class, 
    p.replaceWithMappedIndex(new IntToObject<Boolean>() {  
        public Boolean op(int x) {  
            return isPrime(x);  
    return p.getArray();  

 Sequential version  
 #   primes: 476648  
 Time: 15916ms  

 Threads version  
 #   primes: 476648  
 Time: 4818ms  
 Speed-up: 3.30  

 Pool version  
 #   primes: 476648  
 Time: 3450ms  
 Speed-up: 4.61  

 Fork-join version  
 #   primes: 476648  
 Time: 3380ms  
 Speed-up: 4.71  

 Parallel version  
 #   primes: 476648  
 Time: 4888ms  
 Speed-up: 3.26  

Here you can see the relative speed-up of the different approaches and how there is a trade off between implementation complexity and speed-up. In this case the sweet spot seems to be with the Pool version which is not too complicated but also gives good performance.

Parallel arrays ideally should provide the simplest implementation if there is sufficient language support, but it seems likely they will not quite match a hand crafted approach.

Correctly working with shared mutable state

Next we will look at the basic sorts of issues that arise with multi threaded applications with shared mutable state.

We are going to try to count the number of primes using a noLivePrimes integer field.

 public class PrimesLive extends PrimesComputation implements  
         LiveResults<Integer[]> {  
     Integer[] livePrimes;  
     Integer noLivePrimes;  
     public Boolean[] computePrimes(int upto) throws InterruptedException {  
         // Compute in Parallel ...
     class PrimesComputationTask implements Runnable {  
         public void run() {  
             for (int x = from; x < to; x++) {  
                 a[x] = isPrime(x);  
                 if (a[x])  
                     livePrimes[noLivePrimes++] = x;  

There is a problem with this code, the new functionality has introduced mutable shared state, we have introduced a Data Race.

A first attempt with synchronization...

The standard solution to this is use locks to serialise access to the shared state, in Java we normally do this with the synchronized keyword.

         public void run() {  
             for (int x = from; x < to; x++) {  
                 a[x] = isPrime(x);  
                 if (a[x]) {  
                     int tmp;  
                     synchronized (livePrimes) {  
                         tmp = noLivePrimes;  
                         noLivePrimes = tmp + 1;  
                     livePrimes[tmp] = x;  

Contended locks however hurt performance, we lose parallelism as threads have to wait for free access to the shared state.

 Live version  
 #   primes: 476648  
 # live primes: 471033  
 Time: 4147ms  
 Speed-up: 3.84  

 Live version with synchronization  
 #   primes: 476648  
 # live primes: 476648  
 Time: 3879ms  
 Speed-up: 4.10  

Here we can see the performance loss caused by the locks.

Atomics to the rescue...

AtomicInteger noLivePrimes;  
class PrimesComputationTask implements Runnable {  
    public void run() {  
        for (int x = from; x < to; x++) {  
            a[x] = isPrime(x);  
            if (a[x]) {  
                int tmp = noLivePrimes.getAndIncrement();  
                livePrimes[tmp] = x;  

There is a very simple solution in this case, since the shared state is a single primitive we can use an AtomicInteger to guard the  shared state.

Atomics are actually a form of lock free programming using special Compare and Swap (CAS) instructions supported in hardware. This works similarly to optimistic locking or STM where the processor proceeds on the basis of no conflict and then 'rolls back' if a conflict occurs.

 Live version with atomic
 #   primes: 476648  
 # live primes: 476648  
 Time: 3686ms  
 Speed-up: 4.32  

Here you can see we got our performance back.

You can find the source code here.

Based on I2PC Summer School on Multicore Programming course held at the University of Illinois at Urbana-Champaign.

Wednesday, 19 September 2012

Useful programs 2012

Here is a list of general free software that I use to make my life easier.

Ubuntu / Mint / Wubi - Popular Linux distros
VLC - Media player with many in built codecs
WinScp - SFTP client
WinWget - GUI based Wget
Wireshark - Packet sniffer
Fiddler - HTTP debug proxy
Http Ping - URL probing
Microsoft Security Essentials - Virus Scanner
Chrome - Web browser
Adblock - Ad blocker
Firefox  - Web browser
Firebug - DHTML debugger
Windows Live Essentials - Windows Extras
TeamViewer - Remote desktop
Skype - Voice Over IP client
PEBrowse - Portable Executable browser
IDAPro Free - Disassembler
WinDbg - Kernel mode debugger
Notepad2 - Notepad replacement
7-Zip - Archiver
SqlServer Express - RDBMS
Visual Studio Express - Integrated Development Environment
IIS Express - Web server
Eclipse -  Integrated Development Environment
Depends - DLL dependency viewer
Just Decompile - CIL decompiler
Java Decompiler - Java decompiler
Microsoft Windows SDK - Windows libraries and tools
ReferenceSource NET - Reference source for .NET
Microsoft Windows Performance Toolkit
XamlPad - XAML Editor
Kaxaml - XAML Editor
Tomcat - Web server
Tortoise SVN - Version Control client
Slik Subversion - Version Control client
AnkSvn - Version Control IDE plugin
CCleaner - Disk Cleanup
WinDirStat - Disk usage
SysInternals Suite - Utilities
ExactFile - Checksum tool
CCProxy - Personal proxy
Paint .NET - Paint package
EasyThumbnails - Thumbnail creator
Virtual CloneDrive - ISO mount tool
Microsoft Mathematics - Programmable Calculator
Speccy - System information
DiffMerge - Diff tool
EasyBCD - Boot editor
VirtualBox - Virtualization
VMware Player - Virtualization
CPU-Z - System information
doPDF - PDF Converter
HDTune - Hard disk benchmark
Windows Product Key Finder Professional - License key finder
RevoUninstaller - Uninstaller
LockHunter - File unlocker
ImgBurn - DVD Burner
Start8 - Get the 'start' button back on Windows 8 and Windows Server 2012

TDD with Mockito

You may not be getting the most out of your TDD and JUnit, Mockito is an effective mocking framework that works great.

Mockito Verify

Verify allows you to set up expectations on the mock methods that are 'probed' by the test case in a whitebox manner.
public void testMockitoVerify()  
    //mock creation  
    List<String> mockedList = mock(List.class);  
    //using mock object  

Mockito Stubbing (toReturn, toThrow)

Mockito also supports stubs by allowing you to specify return values when certain methods are called. Once stubbed, the method will always return stubbed value regardless of how many times it is called.
public void testMockitoStubbing()  
     //You can mock concrete classes, not only interfaces  
     LinkedList mockedList = mock(LinkedList.class);  
     stub(mockedList.get(1)).toThrow(new RuntimeException());  
     //following prints "first"  
     //following prints "null" because get(999) was not stubbed  

Mockito Invocations (times, atLeast, atMost)

You can verify a certain number of calls by using times for an exact number of calls, atMost for upto a number and atLeast for above a number. 
public void testMockitoInvocations()  
    //mock creation  
    List<String> mockedList = mock(List.class);  
    //using mock  
    mockedList.add("three times");  
    mockedList.add("three times");  
    mockedList.add("three times");  
    //verification, times(1) is used by default  
    //verification using never(). never() is an alias to times(0)  
    verify(mockedList, never()).add("never happened");  
    //verification using atLeast()  
    verify(mockedList, atLeastOnce()).add("three times");  
    verify(mockedList, times(3)).add("three times");  
    //verify using atMost  
    verify(mockedList, atMost(3)).add("three times");  

Mockito Whitebox (when / thenReturn)

You can specify mock return values using when and thenReturn.
public void testMockitoWhitebox() {  
    //mock creation  
    Calendar mock = mock(Calendar.class);  
    TimeZone tz = mock(TimeZone.class);  
    String result = mock.getTimeZone().getID();  
    assertEquals("PST", result);  

Mockito Throws

You can set up exception throwing behaviour for you mocks using doThrow.  
@Test(expected = RuntimeException.class)   
public void testMockitoThrows() {  
    //mock creation  
    List<String> mockedList = mock(List.class);  
    doThrow(new RuntimeException()).when(mockedList).clear();        

Mockito Verification Equals

You can verify mock method parameters using eq.
public void testMockitoEq() {  
    //mock creation  
    List mock = mock(List.class);  

Mockito Verification Any

If you do not care to verify a specific parameter values being passed to a mock you can use 'any'.
public void testMockitoAny() {  
    //mock creation  
    List<String> mockedList = mock(List.class);  

Mockito Verification ArgThat

If you want to use a custom matcher to verify an argument then you can use argThat.
public void testMockitoArgThat() throws IOException{          
    OutputStream mock=mock(OutputStream.class);  
    OutputStreamWriter osw=new OutputStreamWriter(mock);  
    // can't do this as we don't know how long the array is going to be  
    // verify(mock).write(new byte[]{'a'},0,1);  
    BaseMatcher arrayStartingWithA=new BaseMatcher(){  
        public void describeTo(Description description) {  
            // nothing  
        // check that first character is A  
        public boolean matches(Object item) {  
            byte[] actual=(byte[]) item;  
            return actual[0]=='a';  
    // check that first character of the array is A, 
    // and that the other two arguments are 0 and 1  
    verify(mock).write((byte[]) argThat(arrayStartingWithA), eq(0),eq(1));      

You can get Mockito here.
Example source here.