Sep 15

My links of the week – September 15, 2013

big-data

As a way to at least keep me writing, I am starting a links of the week category of post, that I expect to be a weekly staple. Links can cover just about any issue I find interesting, from technology to politics. So, let’s start.

  • My favorite link of the week is Derek Colley’s  Compare Big Data Platforms vs.SQL Server.

    NoSQL is a big word these days, and although there are good reasons to pick a NoSQL platform, it obviously should be done only when that choice is actually the best possible choice for the situation at hand. Derek makes a good presentation of the reasons relational databases are still a great solution, when they fit the bill. Every so often a new type technology is hyped beyond reason and that is surely the case with NoSQL. Derek’s article brings some sense into the discussion and even points out some of the issues in NoSQL technologies that may have a negative impact in the future.

  • Rob Farley’s Not-So-Dirty SQL Hacks presents a couple of interestings hacks for string concatenation in T-SQL.
  • David Kuhsner’s The Geeks in the Front Lines is an interesting account of how hackers are in short supply and are being hired by the US government to shore up cyber defenses.

That’s all for this week. 

Image courtesy of photoexplorer / FreeDigitalPhotos.net

Mar 25

Is performance an issue with SQL Server on Azure?

pasturecloudsI inherited the development of a web app that was meant to run on Azure, using SQLAzure as the data store. Right from my first contact with the app, it was clear that it had performance issues. When running from a local SQL Server performance could be deemed as endurable, but it was totally unacceptable on Azure, where it took 3 times longer.  A performance analysis pointed out database access as the main factor, but there were other things that could be improved upon as well.

The identification of the performance issues coincided with some changes in requirements, that led to the decision to take a totally different approach, regarding the way data the app needed was to be handled. This new approach was motivated mainly by the need to support the new requirements, but it was also meant to see if it could help with the performance issues. A model to represent the data was developed and implemented on a SQL Server database. This model was tested and improved upon, up to a point where performance, in a worst case scenario, was deemed acceptable, when implementing it on a local database server. The model requires some calculations to be performed, and this is done through a stored procedure. Each of the stored procedure queries was optimized for performance, considering the overall goal of minimizing the time needed to execute the stored procedure and not the individual queries themselves. This involved, for example, leaving out some indexes that could improve individual queries, but contributed to the degradation of the overall performance. As stated before, we arrived at a point where performance in any of our local SQL Server 2012 databases was deemed good enough.

Having solved the performance issue with local instances of SQL Server 2012, we set to see how SQL Azure, the intended data store, handled the new way to process the data. We rebuilt the entire database, populated it with the exact same data we have in our local testing scenario and tested the new stored procedure. Performance was much worse – the time needed to complete the stored procedure in the worst case scenario was almost 3 times the time it took to execute it in any of our local servers. Let me be clear here – we are not even talking about time needed to retrieve data from the database – it’s just the time needed to process records inside the database and storing the results in a database table.

Trying to determine if SQL Azure was to blame for this, we decided to compare the performance in different hosting scenarios: Azure SQL VM, a standard hosting offering running SQL Server 2008 Web edition and Amazon RDS. The testing started with Azure SQL VM. Using the Azure Virtual Machine preview, we created several instances of a VM running SQL Server 2012 SP1 Evaluation edition, on Windows Server 2008 R2 SP1. To rule out the virtual machine size as a factor, we had a VM of each size – small, medium, large and extra large (an extra large machine offers 8 computing cores and 14 GB of memory). On all these VMs, used with their pre-configured settings, performance was pretty consistent and didn’t really change much, when compared with SQL Azure. The execution time for the stored procedure was very similar in all VM sizes – and too high in all of them.

We then tried the traditional hosting solution, a Windows 2K3 server running SQL Server 2008 Web edition on two Xeon processors with 4 GB of RAM. Surprisingly, or maybe not, performance was pretty similar to the one in SQL Azure and Azure SQL VM. Almost believing that somehow, hosted solutions of SQL Server were not up to the task, we decided to try Amazon RDS. We had never tried Amazon’s cloud solution before, so we had to check the options offered and create a new account. There are multiple options regarding VM features and size and we decided to test a Large DB Instance – a VM running SQL Server 2008 R2 with 7.5 GB of memory and 4 ECUs (2 virtual cores with 2 ECUs each – 1 ECU, according to Amazon Web Services is equivalent to a 1.0-1.2 GHz 2007 Opteron or Xeon processor). Setting up the VM was as easy as on Azure and a few minutes afterwards, I was creating the database and uploading the test data. Once this was completed, the test stored procedure was executed multiple times … and execution times were less than half, on average, than those from Azure and almost as good as with our local SQL Server instances.

All this testing was both reassuring and worrying. For one, it’s clear that there are cloud based offerings that can provide performance that is pretty similar to what can be obtained from in house SQL Server solutions. For our case, however, it seems that Azure is not one of those. We still need to decide what we will do, but the cost / performance factor for Azure based SQL Server solutions is not looking too good. Right now SQL Azure is the only non preview offer and its cost is rather high, for databases larger than 100 MB. Azure SQL VM won’t be a reality until this summer, but while it may provide a more cost effective solution, it’s not clear that it can be a competitive offer, performance wise. Of course, we are considering a rather specific use case, but the overall performance before we changed our model was not that good either and this experience, while not definitive in anyway, does seem to raise the question – are Azure based SQL Server solutions good enough, performance wise? The answer is not clear right now, but it is a bit unsettling to find a competitor’s cloud offerings a better choice than Microsoft’s, on a Microsoft product as relevant as SQL Server.