Sunday, June 30, 2013

Optimizing and Debugging the Entity Framework

Using the Entity Framework for a while in multiple project and the object relational mapping that it provides in combination with LINQ makes it a fantastic framework for building data driven applications. My field of activity is mainly the development of web applications that are often based on ASP.NET MVC, Entity Framework and MySQL. Usually the database modelling and usage is the rather standard back end and not further to worry. Lately i was working on a project including a bigger set of data to deal with around 500k rows of data.

That was the first time I had to dig into optimization of SQL queries to maintain usability of the application. While I was looking to optimizing my LINQ structure, I had to go a level deeper into the SQL statements created by the framework. I came across two major "bugs" that i find worth sharing and knowing about. The first one might cause logical mistakes in your program, if you are not aware. The "bug" is related to the contains function which surfaces in the following example:

dbcontext.table.select(r => stringList.Contains(r.name));

While you would expect that the generated SQL would really check if the string is contained in the string list it actually does this:

dbcontext.table.select(r => string.join("", stringList).IndexOf(r.name) >= 0);

Or in words it concats the string list and only checks if that contains the name, which has a considerable different result than the intended one.
If you consider a string list with "1", "2", "3" the check - contains("23") - will have the result - true. I found a quick work around for my case by concatenating the string myself with

var concatString = "-" + string.Join("-", stringList) + "-";
dbcontext.table.select(r => concatString.IndexOf("-" + r.name + "-") >= 0);

This way the separators make sure that the string start and endings are matching with the intended string. If you are not using numbers you should consider using other separation chars than i did here.

The second bug is related to the new version of the entity framework 5. The LINQ statement

var q = dbcontext.table.Where(r => r.name = "1");

would be translated into SQL:

Select Extent1.name From (
Select name From table
) as Extent1
Where name = 1;

In small tables and easy structures this is not impacting the performance notably, but querying a 500k table definitely does. The only way to work around that problem i could think of was to move back to the Entity Framework version 4 type of model creation. By changing the Code Generation Strategy from Default to None this can be achieved without any further hastle.

To retrieve the actual sql statements created by the entity franework you could wether check on the database log or directly debug in the .net.

EF 4:
var query = dbcontext.Table.Where(r => r.Id == searchId) .Select(r => r);
Console.WriteLine(((ObjectQuery)query).ToTraceString());

EF 5:
var query = dbset.Table.Where(r => r.Id == searchId) .Select(r => r);
Console.WriteLine(((DbQuery)query).ToTraceString());

Sunday, September 30, 2012

Boost jquery datatables performance

I have been working on a project which included the display of 2000 rows of data which is at the end not a very high number. My preferred javascript library for tables is jquery datatables. Since the project server side component is ASP.NET MVC the easiest way to configure the final rendering of the HTML would be to generate the HTML with ASP and than when the page is loaded to call the datatables initializer on it.
In the latest firefox (version 15) this took 10 seconds to finish. Not acceptable of course. This split into 50-50 for server-side and client-side processing. Server side processing could be dramatically reduced by the following optimizations:
  1. Reduce the amout of database queries by adding Include("Related objects") in the entity framework initial query for the objects and related objects of interest in a lazy loading setup
  2. I created the table using type reflection to be more flexible on changing object properties and table display.
Client side optimization was a bit more tricky to resolve. The call to the dataTables() function which already included 2000 rows with 5 columns each needed ~5 seconds to return. This was mainly the tradeoff for analyzing all the data and building the internal datastructures for sorting and filtering. So in order to prevent heavy dom operations for building the internal datastructure it would be obvious to go the other way around by handing over the data as json and building the html from the datasource. This reduced the time needed to a not mentionable amount in firefox. So i happily used the mRender property for the creation of a bit advanced table content like links and one div with content. The bad awakening happend after testing other browsers, everything seemed fine, just the old lady Internet Explorer 8 (i haven't tested in 7 but it could only be worse) needed full 10 seconds for the creation of the whole table.

The process of data in the dataTables plugin is basically to create each and every single row and cell when the table is initilized. While thinking about it i only wanted to see 15, 30 or 50 rows of content at a time, since for 2000 rows you do need pagination anyhow. Forunately i was able to assign a function to mRender which allows you to seperate the content for display and sorting of the cell. But this did not lead to the wished result since the pagination was brought to datatables via a plugin the initial library is not aware of any functionality like that. Therefore the only way to achieve that is to use the rowCallback of the dataTables which is called whenever a row is shown. This allowed me to basically empty the table cells (at least the ones with a bit more than just text inside of them) and create the more complex content only at the point in time the table row is actually shown. Which at the end looks something like:

"aoColumns": [
{
    mRender: function (data, type, full) {
               if (type == "display")
                   return '';
               return data[0];
             }
         }

} ],
"fnRowCallback": function (tr, data, index) {
   var tds = $(tr).find("td");

   if (tds.children().length == 0) {
       tds.eq(0).html(

'<a href="' + data[0] + "'><img src="globe.png" /></a>'
       );
   }
}

Now the content of the cell is converted into a link represented by an image at the moment it is added to the dom and not when the table initializes. This reduced the processing time in IE8 to less than 1 second and therefore boosts the performance by more than factor 10.

--- edit 28.12.2012
The datatables do also offer a function for deferred rendering which is partly also achieving this optimization. The problem with this function is that the sorting and filtering operations are not working properly, at least in my tests.

Sunday, February 19, 2012

Javascript - How to keep it responsive?

One remark before you start reading the described problem and solution do work very similar in Flash as shown in a rather complex scenario here.

In case you do not want to read the background story click: shut up and show me the solution.

In my last project I was assigned to create a dashboard for presenting the key financial numbers of our company. In order to present these numbers not only on desktop but also on mobile devices - every developer knows what is going to come - mainly the iPad the application needed to be done in html for compatibility reasons.

Since the Internet Explorer (7 and above) is one of the official supported browsers in our company I was facing a particular problem. Whenever I opened a page with several dashboards and tables it was just running fine. When I navigated to a page showing around 15 dashboards the rendering of the dashboards took like 3 to 5 seconds. Since not all the versions of the Internet Explorer do support the Canvas element (which is desperately needed here) i had to use a javascript library which emulates the Canvas element in IE 7 and 8. This is a very nice library and worth a try http://excanvas.sourceforge.net/ Nevertheless it is either the javascript engine performance or the Canvas emulation or a combination of these two things. The result stands it takes 3 to 5 seconds, by this amount of loading time people tend to say "It's broken". In all the other browsers it worked significantly faster and even in IE with most of the pages just some not.


So let's get to the real problem, first thought is use the easy approach to let the people know something is going on so put a loading indicator there. Hmm.. nice thing, but the way javascript works this does not cut it. The problem here is as long as something is running in javascript the webpage is not refreshing. So no matter how many loading indicators you will put none of them is going to move. As long as you are just showing a text message or a static picture it is fine but if you want to show something animated like a animated gif it is not.


Everybody used to desktop application development as I am, e.g. with .NET, JAVA or Objective-C knows the concept of multi-threading which is the usual way out of the problem: "My UI is not responsive". Javascript is a in Browser language and for the sake of simplicity it is running in a single thread - the UI thread. That means everything that is going to happen is executed there and cannot be scheduled to run in parallel or in the background. Normally that is fine as there are some components in the browser working like this, e.g. data loading from the server therefore called AJAX (first a for asynchronies - meaning it is running in the background). But what can you do if you do have a longer running javascript code and you do want the user to see something moving in the UI.


Here comes the solution the setTimeout function. This function allows you to delay the execution of code by a certain amount of time. Since the function is not executed in the javascript environment but in the browser environment the browser considers the javascript execution done at least for a short timeframe and therefore uses the opportunity to update the UI. After the delay elapsed you can continue to process your longer running operation. Having said that you can already imagine what the trouble with this approach is. You do need to split your operations into shorter running operations and store the computed information so that you can continue on the next part. In case you are using jquery you can nicely structure the control flow. The last thing you need to define is an end condition so that it does turn in an endless loop. In the following I created a little example which is just counting to a certain number. Without any timeout you would count like this.

<html>
<head>
<script type="text/javascript">
  function count() { for(var i = 0; i < 2000000000; i++) {}
  alert("counting done"); }
</script>

</head>
<body>
<input type="button" value="Button" onClick="count();"> </body>
</html>

The execution freezes the website until it is done, depending on the number it might even freeze so long that the browser reports to you and stops the script from further executing. When restructuring the code to the following this problem will be solved.

<html>
<head>

<script type="text/javascript">
  var number = 0;
  var interval = 10000000;

  function count(target) {
    var nextTarget = number + interval;
    // end of counting is reached
    if (target < nextTarget) {
      nextTarget = target;
    }
    for(; number < nextTarget; number++) {}
    // stop the recursion
    if (nextTarget >= target) {
      alert("counting done");
    }
    else {
      setTimeout(function() { count(target); }, 5);
    }
  }
</script>

</head>
<body>
<input type="button" value="Button" onClick="count(2000000000);">
</body>
</html>


Ok I did not say that this comes for free :) the code now is a little more difficult and you have to consider what the user might do while the operation is ongoing. I especially see a problem when the user is leaving the page or triggers some sort of other operation it would be beneficial to cancel the current operation to not lead the browser into long running unnecessary operations. This can be achieved by additional variables to just stop the processing in case something happens it just requires extra work and attention from the developer to do so.

Sunday, December 18, 2011

QR Codes - What for?

QR codes or Q uick R esponse codes are codes which are designed in a way that they are easily readable by computer programs via a camera. The purpose of qr codes or actually any bar code is the link they do create between the physical and the digital world.

The codes can vary from simple to complex (details on Wikipedia) and do contain some sort of data. This data can be of multiple types usually a webpage or a simple code. The real power of qr codes evolves from the combination with personal and mobile devices. So that often an advertisement poster holds a qr code with a link to the ad campaign website with follow-up information.

The problem for most use cases is that today most of the people do not know what a qr code is and more eminent how these are working. So as soon as you decide to go with qr codes you should be aware that you do loose most of the people unless you do also provide the information in plain text (in this example the url of the ad campaign).

So some people put qr codes on their business card, which already holds all the information, so as receiver you can easily take your mobile device and scan the code, rather than typing everything from the business card once back in the office.

The same way today you find share links for facebook, twitter.. on every website, but how do you bring that on printed poster or static screens. Well with qr codes you can bridge this barrier and take advantage of both worlds, digital and physical.

So whenever you are working on something and you are facing the question how do i bridge from physical to digital you might consider qr codes as one possible answer.