Oct 06

My links of the week – October 6, 2013

Our weekly reading found a lot of interesting posts, this past week, so making a choice on the posts to include was quite hard. Again, some posts were not originally posted this week, but they are interesting enough. So let’s begin.

  • Richard Morris’ Developing for Delivery, a Practical Example addresses the difficulties of keeping a database current, when the database exists in a multiple number of sites, in a wide range of versions and how such difficulties were addressed by Calvi, a provider of telecom invoicing software. From personal experience, this is not an easy thing to achieve, even at a smaller scale, and the article provides interesting advice on how changes in processes and the use of adequate software tools can help reduce the difficulties involved.
  • Alex Bolenok’s NULL in SQL: explaining its behavior is a very good article on the idiosyncrasies of NULL behavior in SQL. The use of NULLs is not at all clear, especially for developers less familiar with the workings of databases, so the article is almost of mandatory reading. Most interesting.
  • Microsoft’s SQL Server Development Customer Advisory Team’s SQLCAT’s Guide to: Relational Engine is a free ebook that includes relevant posts from SQLCAT’s blog, from 2005 to 2012. Recommended.
  • Rob Farley’s Spooling in SQL execution plans is a few month’s old post, but one that clearly shows why spools are used in SQL Server execution plans and why they are not that bad.
  • Alex Popescu’s The premature return to SQL is a response to one article include in last week’s links, by Jack Clark, in which the author explains why the premature return to “SQL” is wrong – this “premature” return is motivated by an attempt to capture financial gains, does not consider the fact that many NoSQL products, in spite of having not yet reached technical maturity, have already provided valuable, alternative new doors to data and results, basically, from pressure from database vendors. 
  • Uncle Bob Martin’s Dance you Imp’s is a humorous but no less interesting article on Object Relation Mappers and the impedance mismatch between OO and the RDBMS storage used to persist them. In a very funny way, the author concludes that there is actually no object to relational mapping. A very interesting read.
  • Jimmy Bogard’s Scaling lessons learned–from 0 to 15 million users describes the lessons learned while building a system that has grown to handle up to 15 million users over the last 3 years. It is an excellent read and the lessons can be of use to any who develops systems that need to be able to scale (and even for those that don’t have such a need).
  • Sean Hull’s 20 Obstacles to Scalability addresses some key points to consider, when designing a web application that will need to scale. Although based on a MySQL based web app, the advice is general enough and applicable to any RDBMS based web application.
  • James Turner’s What Developers Can Learn from healthcare.gov addresses some of the issues exhibited by the healthcare.gov website, to draw more general lessons regarding load testing, good looks vs. functionality, and validation, that can be of value to any website. A very good read.
  • Chris Andrè Dale’s Why it’s easy being a hacker – A SQL injection case study, although from last January, addresses the issue of SQL injection vulnerabilities and the fact that many easily available teaching materials used by developers may actually contribute to the persistence of such vulnerabilities. It is a very good read and it draws the attention to one issue that can, indeed, have negative consequences – the  influence of teaching materials on developers and their work. This is an issue that will deserve a future post here.

 That’s it for this week. Thanks for reading.

Mar 17

Always chasing the next (software) utopia

 

The techniques, methodologies and tools at the disposal of those who build software are always improving. New tools are made available, new approaches are presented as the next best thing and often promoted as such. There is nothing wrong with new techniques, methodologies or tools, quite the contrary. We all benefit from such developments, if and when we can properly determine when to use them, considering their advantages and disadvantages.

Most software systems need to use some kind of data store. Traditionally, this data store was a relational database, but the picture has changed a bit in recent years, with the rise of NoSQL databases. NoSQL databases gained rapid acceptance, especially in high scalability scenarios, but they also seem a very interesting option when addressing another common problem faced by software developers – the object relational impedance mismatch.

Accessing the data store, especially a relational data store, as always been a source of problems for developers. Before NoSQL was another option, Object Relational Mappers, such as NHibernate or Entity Framework were offered as possible solutions to the impedance mismatch problem, while offering other advantages. ORMs are still used and their use will sure see a rise, and some ORMs now support NoSQL databases, as well. The overall goal of softening the burden of software developers, when accessing data stores, remains as a valid one, of course.

Sometime ago ORMs started offering Code First Design. While before the ORM would deal with an existing database, Code First Design offered the possibility of leaving the details of choosing and implementing the database structure to the ORM. In theory, this was just another step to set developers free from the tyranny of the storage layer. The problem is that when using relational databases, the mapping is basically a generic one, not taking into account the specificity of the relational database management system, not only in terms of the required indexes to enable the best possible performance, but also in terms of poor choices for the data types chosen to implement some table fields – it’s common to find situations where basically all string properties are implemented as nvarchar(max) fields, in a SQL Server database, which will mean accessing these fields will be slightly more costly to access than length limited nvarchar or varchar fields. The fact is that each relational database has specific characteristics that make generic design by a tool less adequate than a design that takes into account those specific characteristics. This starts with the database design (in Code First strategies) and continues with the usual issues associated with ORM tools, while exploring the database – that is, with the non optimal code generated when querying the database.

Of course, we can always argue that ORMs and Code First Design, as any other tool or strategy, needs to be applied in circumstances where their disadvantages are less noticeable and where their advantages recommend them. This is undoubtedly true. However, if we look at beginners books on topics such as .Net MVC, or tutorial material offered by Microsoft on its web development technologies, examples are almost always given using Code First approaches. I think this conveys quite a wrong idea to programmers starting their own path on such technologies and sometimes the consequences can be serious. I have seen it more than once, in real, production systems and those consequences are not always easy to overcome, at least without some effort.

If there were no performance penalties from Code First Design, or you could establish some degree of control over the generated code, I confess I would be inclined to take advantage of maintaining the data store totally transparent to the code. The reality is totally different though. I understand all technologies need to start, evolve and mature, but I would also like to see less promotion of not as mature technologies, at least without a clear indication of its caveats. ORMs will improve and Code First implementations will, too. At this time, however, the scope of apps where either is more limited then we would be lead to think, in my opinion.

Most systems have to be designed, implemented and deployed in scenarios where we have certain performance requirements and limited computational and financial resources. This is quite common in web applications, where I would think most apps are not cloud scale and need not be. The systems we develop need to be able to work within those limitations and that can be a bit incompatible with our utopias. Naturally, we always need to have utopias we can chase and they can be a driving force for evolution, but it should never be forgotten that we apply technologies in real systems and the negative consequences  of our own utopias should, preferably, be avoided.

 

Note: The image used in this post was obtained from here.