Nov 17

My links of the week – November 17, 2013


Here are the links for the last week. It is a rather extensive list, resulting from the difficulties in choosing from so many interesting articles.

  • SQL Server:
    • Paul Randal’s Are I/O latencies killing your performance? addresses latency in tempdb and log files access, starting with the results of a survey that collected such data from around 1100 servers. The post suggests values for what can be considered good or bad I/O latency and includes recommendations and additional links that can help determine the causes and correct the problems, when I/O latency is not good.
    • Jez Schultz Borland’s What You Can (and Can’t) Do With Filtered Indexes addresses filtered indexes and how they can help improve performance, when a query includes a WHERE clause that will return just a fraction of the total number of records in the table. The article shows examples of what you can and can’t do in the WHERE clause for a filtered index to be used and includes links to further resources on filtered indexes.
    • Jen McCown’s Compare tables to find missing rows presents several solutions, using T-SQL, to compare two tables, to determine which records are missing from one of them. 
    • Tim Wiseman’s SQL Server Operations from Set Theory present’s SQL Server’s UNION, EXCEPT and INTERSECT operators explains their use, resorting to set theory and suggests situations where their use can be of help.
    • Rick Dobson’s Masking Personal Identifiable SQL Server Data presents a way to mask confidential data, while allowing data professionals access to such data. At a time where data privacy seems to be a forgotten concept, this article presents an interesting technique to protect personal and confidential data from regular database users.
    • Francesco Cogno’s Backup to Azure using TSQL (and CLR) presents a way to perform a backup to Azure Blob Storage using T-SQL and the Microsoft SQL Server To Windows Azure helper library.
    • Troy Hunt’s Using high-spec Azure SQL Server for short term intensive data processing shows how a SQL Server VM was used by the author to process 153 million records of passwords obtained from the infamous Adobe data breach. While not covering the details of how SQL Server was used to process the records, it is an excellent example of how the cloud (Azure, in this case), can be used to access very powerful hardware resources, for just the time these resources are needed, at a very low cost.
    • Microsoft’s SQL Server Team’s How Does SQL Server Parallel Data Warehouse (PDW) Deliver the Performance that it Does? presents a whitepaper that describes how the PWD works and the performance gains it can offer in typical data warehouse scenarios.
    • The SQL Server 2014 links page, here, saw a few additions, this week.
  • Big Data:
    • Matthew Mombrea’s Essential reading for choosing a NoSQL database provides an interesting set of resources to get started with NoSQL databases and to obtain information on how to chose the best suited database for the project at hand.
    • Michael Hausenblas’s Applying the Big Data Lambda Architecture presents the Lambda Architecture, a generic architecture addressing common big data requirements, designed by Nathan Marz. The article presents, in some detail, a way to use the architecture to design an example social network. It also provides links to additional resources on the architecture and possible components to some of its layers. A most interesting read.
    • Brian Rinaldi’s Current Trends in NoSQL – Q&A with Peter Bell addresses several aspects of NoSQL databases, including adoption trends, NewSQL products and immutable data stores.
    • Igor Pagliai’s How to create a SQL Server Linked Server to HDInsight HIVE using Microsoft Hive ODBC Driver shows, through a very detailed example, how to create a SQL Server linked server to HIVE, using the recently released Windows Azure HDInsight Service and how to issue queries to the HIVE tables using SSMS. A very good read.
  • Web Design and Development:
    • Wilson Page’s An Introduction To DOM Events presents an excellent introduction to DOM events, covering multiple aspects of the subject, from event listening to a detailed presentation of the Event object,custom event creation, using delegate event listeners, ending with a listing of useful events. An excellent introduction to the subject.
    • Bryson Meunier’s SEO For Responsive Websites presents an audit of a responsive website, covering multiple aspects that a responsive website should cover, in order to make it SEO friendly.
    • Paula Borowska’s The Next Big Thing: Responsive Icons presents responsive icons, explaining what they are and how they can be created and implemented.
    • Dan Tao’s Bootstrap without all the debt describes some of the issues that may result from the use of the framework (tight coupling, non semantic markup) and suggests a way to avoid them. A good read.
    • The Visual Studio 2013 Resources page also saw the addition of several new articles on Visual Studio 2013
  • Professional Development:
    • Grant Fritchey’s Cargo Cult Data Professionals is a very interesting article addressing “cargo cultists”, a term coined by Eric Lippert, which could be loosely described as people who have no clear idea on why things work the way they do and just keep doing their work using ready made ideas without actually caring to know whether they are right or not and who feel no need to learn and improve their knowledge and skills. It’s a very good read.
    • Nicholas Carr’s Is software de-skilling programmers? is a rather short, but nonetheless interesting article, on how modern IDE’s can lead to the de-skilling of software developers, while establishing parallels with similar de-skilling past trends, due to the introduction of mechanization. A thought provoking article.

That’s all for this week. Thanks for reading.

Mar 25

Is performance an issue with SQL Server on Azure?

pasturecloudsI inherited the development of a web app that was meant to run on Azure, using SQLAzure as the data store. Right from my first contact with the app, it was clear that it had performance issues. When running from a local SQL Server performance could be deemed as endurable, but it was totally unacceptable on Azure, where it took 3 times longer.  A performance analysis pointed out database access as the main factor, but there were other things that could be improved upon as well.

The identification of the performance issues coincided with some changes in requirements, that led to the decision to take a totally different approach, regarding the way data the app needed was to be handled. This new approach was motivated mainly by the need to support the new requirements, but it was also meant to see if it could help with the performance issues. A model to represent the data was developed and implemented on a SQL Server database. This model was tested and improved upon, up to a point where performance, in a worst case scenario, was deemed acceptable, when implementing it on a local database server. The model requires some calculations to be performed, and this is done through a stored procedure. Each of the stored procedure queries was optimized for performance, considering the overall goal of minimizing the time needed to execute the stored procedure and not the individual queries themselves. This involved, for example, leaving out some indexes that could improve individual queries, but contributed to the degradation of the overall performance. As stated before, we arrived at a point where performance in any of our local SQL Server 2012 databases was deemed good enough.

Having solved the performance issue with local instances of SQL Server 2012, we set to see how SQL Azure, the intended data store, handled the new way to process the data. We rebuilt the entire database, populated it with the exact same data we have in our local testing scenario and tested the new stored procedure. Performance was much worse – the time needed to complete the stored procedure in the worst case scenario was almost 3 times the time it took to execute it in any of our local servers. Let me be clear here – we are not even talking about time needed to retrieve data from the database – it’s just the time needed to process records inside the database and storing the results in a database table.

Trying to determine if SQL Azure was to blame for this, we decided to compare the performance in different hosting scenarios: Azure SQL VM, a standard hosting offering running SQL Server 2008 Web edition and Amazon RDS. The testing started with Azure SQL VM. Using the Azure Virtual Machine preview, we created several instances of a VM running SQL Server 2012 SP1 Evaluation edition, on Windows Server 2008 R2 SP1. To rule out the virtual machine size as a factor, we had a VM of each size – small, medium, large and extra large (an extra large machine offers 8 computing cores and 14 GB of memory). On all these VMs, used with their pre-configured settings, performance was pretty consistent and didn’t really change much, when compared with SQL Azure. The execution time for the stored procedure was very similar in all VM sizes – and too high in all of them.

We then tried the traditional hosting solution, a Windows 2K3 server running SQL Server 2008 Web edition on two Xeon processors with 4 GB of RAM. Surprisingly, or maybe not, performance was pretty similar to the one in SQL Azure and Azure SQL VM. Almost believing that somehow, hosted solutions of SQL Server were not up to the task, we decided to try Amazon RDS. We had never tried Amazon’s cloud solution before, so we had to check the options offered and create a new account. There are multiple options regarding VM features and size and we decided to test a Large DB Instance – a VM running SQL Server 2008 R2 with 7.5 GB of memory and 4 ECUs (2 virtual cores with 2 ECUs each – 1 ECU, according to Amazon Web Services is equivalent to a 1.0-1.2 GHz 2007 Opteron or Xeon processor). Setting up the VM was as easy as on Azure and a few minutes afterwards, I was creating the database and uploading the test data. Once this was completed, the test stored procedure was executed multiple times … and execution times were less than half, on average, than those from Azure and almost as good as with our local SQL Server instances.

All this testing was both reassuring and worrying. For one, it’s clear that there are cloud based offerings that can provide performance that is pretty similar to what can be obtained from in house SQL Server solutions. For our case, however, it seems that Azure is not one of those. We still need to decide what we will do, but the cost / performance factor for Azure based SQL Server solutions is not looking too good. Right now SQL Azure is the only non preview offer and its cost is rather high, for databases larger than 100 MB. Azure SQL VM won’t be a reality until this summer, but while it may provide a more cost effective solution, it’s not clear that it can be a competitive offer, performance wise. Of course, we are considering a rather specific use case, but the overall performance before we changed our model was not that good either and this experience, while not definitive in anyway, does seem to raise the question – are Azure based SQL Server solutions good enough, performance wise? The answer is not clear right now, but it is a bit unsettling to find a competitor’s cloud offerings a better choice than Microsoft’s, on a Microsoft product as relevant as SQL Server.