Sep 22

My links of the week – September 22, 2013

This week provided quite a few interesting reads, so it wasn’t easy to pick just a few of them. Not really able to pick a favorite, as there quite a few good posts and no clear “winner”. Anyway, here we go.

  • Jnan Dash’s RDBMS vs. NoSQL: How do you pick? brings back the issue of choosing between RDMBS and NoSQL technologies. It provides some insightful advice on making such a choice. It covers relevant issues to consider, from the nature of data, to operational issues like performance, scale or availability. Although a brief article, it covers the most relevant criteria to consider when making such a choice and it is a very interesting read.
  • Continuing with NoSQL / Big Data articles, Chris Stucchio’s Don’t use Hadoop -your data is not that big is a witty article on how “hot” keywords can cloud anyone’s decision making. It analyses a few scenarios where using Hadoop doesn’t make much sense, even if you would be led to think otherwise, due to Hadoop’s increasing popularity.
  • J.D. Meier’s Cloud Scenarios at your fingertips provides a few decision making points and links to further readings, on the subject of evaluating Cloud Computing as a viable option to be considered for enterprises.
  • Moving to SQL Server performance related articles, The SQL Server Customer Advisory Team blog post When To Break Down Complex Queries provides a few anti-patterns to watch when writing queries, with suggestions to solve the associated performance issues. It’s a very interesting article.
  • Grant Fritchey’s Why the Lazy Spool is Bad analyzes the lazy spool operator, concluding that it is not bad, after all, and includes links to more information on spools.
  • Danny Dover’s The Web Developer’s SEO Cheat Sheet 2.0 is a very comprehensive SEO cheat sheet, that can be downloaded as a very useful PDF file to keep within easy reach, as it can be very, very useful.
  • Bruce Schneier’s How to Remain Secure Against the NSA provides a detailed description of several strategies that anyone can use to foil the NSA’s eavesdropping abilities. A must read.

That’s it for the week.

Mar 25

Is performance an issue with SQL Server on Azure?

pasturecloudsI inherited the development of a web app that was meant to run on Azure, using SQLAzure as the data store. Right from my first contact with the app, it was clear that it had performance issues. When running from a local SQL Server performance could be deemed as endurable, but it was totally unacceptable on Azure, where it took 3 times longer.  A performance analysis pointed out database access as the main factor, but there were other things that could be improved upon as well.

The identification of the performance issues coincided with some changes in requirements, that led to the decision to take a totally different approach, regarding the way data the app needed was to be handled. This new approach was motivated mainly by the need to support the new requirements, but it was also meant to see if it could help with the performance issues. A model to represent the data was developed and implemented on a SQL Server database. This model was tested and improved upon, up to a point where performance, in a worst case scenario, was deemed acceptable, when implementing it on a local database server. The model requires some calculations to be performed, and this is done through a stored procedure. Each of the stored procedure queries was optimized for performance, considering the overall goal of minimizing the time needed to execute the stored procedure and not the individual queries themselves. This involved, for example, leaving out some indexes that could improve individual queries, but contributed to the degradation of the overall performance. As stated before, we arrived at a point where performance in any of our local SQL Server 2012 databases was deemed good enough.

Having solved the performance issue with local instances of SQL Server 2012, we set to see how SQL Azure, the intended data store, handled the new way to process the data. We rebuilt the entire database, populated it with the exact same data we have in our local testing scenario and tested the new stored procedure. Performance was much worse – the time needed to complete the stored procedure in the worst case scenario was almost 3 times the time it took to execute it in any of our local servers. Let me be clear here – we are not even talking about time needed to retrieve data from the database – it’s just the time needed to process records inside the database and storing the results in a database table.

Trying to determine if SQL Azure was to blame for this, we decided to compare the performance in different hosting scenarios: Azure SQL VM, a standard hosting offering running SQL Server 2008 Web edition and Amazon RDS. The testing started with Azure SQL VM. Using the Azure Virtual Machine preview, we created several instances of a VM running SQL Server 2012 SP1 Evaluation edition, on Windows Server 2008 R2 SP1. To rule out the virtual machine size as a factor, we had a VM of each size – small, medium, large and extra large (an extra large machine offers 8 computing cores and 14 GB of memory). On all these VMs, used with their pre-configured settings, performance was pretty consistent and didn’t really change much, when compared with SQL Azure. The execution time for the stored procedure was very similar in all VM sizes – and too high in all of them.

We then tried the traditional hosting solution, a Windows 2K3 server running SQL Server 2008 Web edition on two Xeon processors with 4 GB of RAM. Surprisingly, or maybe not, performance was pretty similar to the one in SQL Azure and Azure SQL VM. Almost believing that somehow, hosted solutions of SQL Server were not up to the task, we decided to try Amazon RDS. We had never tried Amazon’s cloud solution before, so we had to check the options offered and create a new account. There are multiple options regarding VM features and size and we decided to test a Large DB Instance – a VM running SQL Server 2008 R2 with 7.5 GB of memory and 4 ECUs (2 virtual cores with 2 ECUs each – 1 ECU, according to Amazon Web Services is equivalent to a 1.0-1.2 GHz 2007 Opteron or Xeon processor). Setting up the VM was as easy as on Azure and a few minutes afterwards, I was creating the database and uploading the test data. Once this was completed, the test stored procedure was executed multiple times … and execution times were less than half, on average, than those from Azure and almost as good as with our local SQL Server instances.

All this testing was both reassuring and worrying. For one, it’s clear that there are cloud based offerings that can provide performance that is pretty similar to what can be obtained from in house SQL Server solutions. For our case, however, it seems that Azure is not one of those. We still need to decide what we will do, but the cost / performance factor for Azure based SQL Server solutions is not looking too good. Right now SQL Azure is the only non preview offer and its cost is rather high, for databases larger than 100 MB. Azure SQL VM won’t be a reality until this summer, but while it may provide a more cost effective solution, it’s not clear that it can be a competitive offer, performance wise. Of course, we are considering a rather specific use case, but the overall performance before we changed our model was not that good either and this experience, while not definitive in anyway, does seem to raise the question – are Azure based SQL Server solutions good enough, performance wise? The answer is not clear right now, but it is a bit unsettling to find a competitor’s cloud offerings a better choice than Microsoft’s, on a Microsoft product as relevant as SQL Server.