Clean Code with SQL Views
One thing that can be both frustrating and freeing is that there is no single correct way to optimally do things in software.
After reading the book Clean Code: A Handbook of Agile Software Craftsmanship by software expert Robert C. Martin, I have noticed when code isn’t “clean” it can be confusing to read and difficult to make changes if there are bugs or there are additional features desired.
I first read this book when learning Java for software applications that connect to a database, and now working as a data scientist I realize that a lot of these clean code principles can apply to the data itself. This is an example of how using views in databases can help create clean code.
Challenges of Messy SQL Queries
There was an application that I was asked to help with to validate results and fix duplicate records, and the way the queries were set up made it difficult to make updates efficiently. For data confidentiality, the following descriptions of the application and databases are fabricated from mock data. The application is designed to select a particular brand of vehicle and show current and historical statistics such as maintenance history for an individual vehicle or component orders across all vehicles of the same brand. With each brand having its own database, the schemas are similar but not the same, and the business logic rules are different for some statistics, such as filtering out specific test groups of vehicles within a brand, or a specific number of days down for maintenance.
These differences meant that for every component on the front-end, a different query for each brand was written. With roughly 30 queries for each of the 6 brands, there were approximately 180 queries for this application to use. A change to a front-end component would require a manual change across 6 different queries, and a change within a brand would require a manual change across 30 different queries. SQL queries can be confusing to read and hard to debug if they have too many joins or filters.
One principle of clean code is to avoid needless repetition of code, as a simpler solution is always better. It would therefore help if the data schemas were the same for each brand and a single query per front-end component was used.
SQL Query Simplification with Views
My interpretation of clean code for SQL queries in the back-end of an application is that there shouldn’t be any specialized knowledge required to write the queries because they should be very simple select statements from descriptive tables, possibly with simple aggregate statistics like minimum and maximum. The application should be able to simply query a table or a view without aliasing or filtering for specific statuses that are particular to each brand.
With the use of views on top of each brand’s set of tables, I was able to easily fix the duplicate records and validate the data because I put all the logic and calculations into these views. The actual view queries are different for each brand but the view names and output schemas are identical between them.
This is one of the most helpful things for the application queries because it consolidated them into 30 simple, brand-agnostic queries. All complicated joins and filters occur once at the view level and don’t need to be duplicated in any application query. The queries are easier to read because the view has a schema with descriptive columns like maintenance_completion_date instead of a combination of logic rules where X is null, for example. The queries are also much easier to maintain because the changes can happen in the view, like a new test group to be filtered out or a different format for maintenance categories comes in, and then no code changes are needed in the actual application. And finally, if a new brand of vehicle is added to the application, the only requirements are to build the view using the same schema as the others and the existing queries will already work.
Overall Benefits
Creating views provides better readability, simpler application queries, easier maintainability, and an overall easier interface between the data and the application which benefits both data experts and software developers.
Learning to write clean code is essential for all data professionals and has been incredibly helpful here at Freya Systems!