Sunday, June 30, 2013

Optimizing and Debugging the Entity Framework

Using the Entity Framework for a while in multiple project and the object relational mapping that it provides in combination with LINQ makes it a fantastic framework for building data driven applications. My field of activity is mainly the development of web applications that are often based on ASP.NET MVC, Entity Framework and MySQL. Usually the database modelling and usage is the rather standard back end and not further to worry. Lately i was working on a project including a bigger set of data to deal with around 500k rows of data.

That was the first time I had to dig into optimization of SQL queries to maintain usability of the application. While I was looking to optimizing my LINQ structure, I had to go a level deeper into the SQL statements created by the framework. I came across two major "bugs" that i find worth sharing and knowing about. The first one might cause logical mistakes in your program, if you are not aware. The "bug" is related to the contains function which surfaces in the following example:

dbcontext.table.select(r => stringList.Contains(r.name));

While you would expect that the generated SQL would really check if the string is contained in the string list it actually does this:

dbcontext.table.select(r => string.join("", stringList).IndexOf(r.name) >= 0);

Or in words it concats the string list and only checks if that contains the name, which has a considerable different result than the intended one.
If you consider a string list with "1", "2", "3" the check - contains("23") - will have the result - true. I found a quick work around for my case by concatenating the string myself with

var concatString = "-" + string.Join("-", stringList) + "-";
dbcontext.table.select(r => concatString.IndexOf("-" + r.name + "-") >= 0);

This way the separators make sure that the string start and endings are matching with the intended string. If you are not using numbers you should consider using other separation chars than i did here.

The second bug is related to the new version of the entity framework 5. The LINQ statement

var q = dbcontext.table.Where(r => r.name = "1");

would be translated into SQL:

Select Extent1.name From (
Select name From table
) as Extent1
Where name = 1;

In small tables and easy structures this is not impacting the performance notably, but querying a 500k table definitely does. The only way to work around that problem i could think of was to move back to the Entity Framework version 4 type of model creation. By changing the Code Generation Strategy from Default to None this can be achieved without any further hastle.

To retrieve the actual sql statements created by the entity franework you could wether check on the database log or directly debug in the .net.

EF 4:
var query = dbcontext.Table.Where(r => r.Id == searchId) .Select(r => r);
Console.WriteLine(((ObjectQuery)query).ToTraceString());

EF 5:
var query = dbset.Table.Where(r => r.Id == searchId) .Select(r => r);
Console.WriteLine(((DbQuery)query).ToTraceString());

No comments: