How to Guides

4 easy ways to Optimize SQL Queries and Enhance Security

Barry

The fastest way of improving response times is to Optimize your SQL Queries. Your backend SQL server be it MYSQL or Microsoft is the powerhouse of application performance. Poorly designed tables and queries destroy performance, which can lead to a two-second query taking 30 minutes or more to execute. Whilst a complete rebuild may be on your roadmap in a lot of cases some simple tweaks may radically improve your response times. With poorly designed database elements, your organization’s productivity will be critically reduced, and your applications will run noticeably slower for customers and staff. With some tweaking to Optimize your SQL Queries, you can potentially turn a 30-minute query into a few seconds or less of execution time, you can also enhance the security by reducing the quantity of data returned. Below we have listed the top 4 tweaks you should try to Optimize your SQL Queries.

1. Always Specify Columns to Return

A common mistake for new SQL developers is to use the asterisk to return all columns. The following SQL query returns all rows in the Customer table:

SELECT * FROM Customer;

If there are 20 columns in the Customer table, all 20 columns return to the application calling the query. If the column isn’t needed in the front-end application, you could be transferring several unnecessary gigabytes across the network. You can improve performance by specifying only the columns that you need using the following example:

SELECT first_name, last_name, address FROM Customer;

With the above query, only the customer’s first name, last name and address are returned. Another advantage of this type of optimization is that it adds some cybersecurity to the system. Should an attacker be able to manipulate the query with SQL injection or any other hack, using the asterisks in your query could give the attacker access to the entire table and its data.

2. Optimize SQL Queries by avoiding DISTINCT When Possible

The DISTINCT directive removes duplicates from a data result. For instance, you might have a query that pulls customer data from the Customer table, but you want to remove any duplicates from the data result to avoid inaccurate counts. This often happens when the proper logic hasn’t been applied to a query. For instance, if you query a customer table for all customers with the name “John Smith” in London, you could get several records that match the request. Some developers use DISTINCT to remove duplicates, but this puts unnecessary overhead on the database engine.

The following query would return duplicate customer data that would look like duplicates:

SELECT first_name, last_name, state FROM Customer WHERE last_name = ‘smith’;

You could use the DISTINCT directive and remove all duplicates. The following optimized query is an example:

SELECT DISTINCT first_name, last_name, state FROM Customer WHERE last_name = ‘smith’;

Instead of using DISTINCT, you can add more columns to the result data set to distinguish between each “smith” record:

SELECT DISTINCT first_name, last_name, state, address, phone FROM Customer WHERE last_name = ‘smith’;

Now you have aN optimized SQL query that distinguishes between a customer with a last name of “smith” based on phone and address.

3. Use JOIN on Indexed Columns

You can over-index a table, which will hurt performance, but generally every column used in a major query should have an index. If you query a lot on the last_name field, then this field should have an index. Indexes should also be configured on table columns commonly used in JOIN statements. This issue is often overlooked when building new tables.

In the previous SELECT query, the WHERE clause included a filter based on the last_name column. If this column is used often in queries, it should have an index. The following query is an example of a JOIN statement where a customer is joined to associated orders:

SELECT first_name, last_name, state FROM Customer c JOIN Order o ON c.customer_id = o.customer_id WHERE c.last_name = ‘smith’;

In this JOIN statement, the two tables are joined on the customer_id column. The customer ID is likely the primary key or indexed on the Customer table, but the customer_id column in the Order table is used to join the two tables and should be indexed. JOIN statements on columns that aren’t indexed can affect performance severely. It might not be noticeable initially when the table has very few records stored, but when data grows and more queries JOIN to the customer_id column in the Order table, it could take several minutes for the database to respond with a data result.

4. Use JOIN Instead of Subqueries when optimizing

A subquery is one of the worst designs in a SQL query. With a subquery, the database first performs a lookup running the subquery. Then, it runs the main query and filters out records using the subquery. Always use a JOIN statement instead (See. Using INNER JOIN versus CROSS JOIN). The following query is an example of a SELECT statement using a subquery as a filter:

SELECT first_name, last_name, address FROM Customer c WHERE last_name = ‘smith’ c.customer_id IN (SELECT o.customer_id FROM Order o) ;

The above query is an alternative way of finding all customers with an order. The subquery in parenthesis is first run and returns all records in the Order table. This query by itself is inefficient because the Order table could reach millions of records. With this query, you’re returning potentially millions of records with no filter.

After the Order query runs, the outer query on the Customer table executes and then filters are applied to return the right data result.  This is an inefficient query that can be optimized by turning it into a single query with a JOIN statement. The following query accomplishes the same result but executes faster:

SELECT first_name, last_name, state FROM Customer c JOIN Order o ON c.customer_id = o.customer_id WHERE c.last_name = ‘smith’;

Since the JOIN statement joins tables based on the customer_id column, if no orders are found in the Order table for a customer, no records will be returned for the customer. With this query, only customers with orders will be returned. You can also return all customer records regardless if they have an order using the following query:

SELECT first_name, last_name, state FROM Customer c OUTER JOIN Order o ON c.customer_id = o.customer_id WHERE c.last_name = ‘smith’;

In the above query, any customer that does not have an order will display the customer data but columns in the Order table will return NULL. When optimizing queries, avoid IN subqueries as much as possible.

Conclusion

Database optimization should be performed at the time of design and programming SQL queries. If you need to reevaluate and optimize SQL queries due to poor performance, it can be much more difficult and time-consuming to review and reprogram SQL queries. It also adds the risk of introducing bugs. When optimizing queries, test them thoroughly to ensure there are no bugs introduced, and use monitoring applications to track performance gains.

Take a look at DoordaHost

How did we learn so much about Optimizing SQL Queries? By creating our own hosted data solution! DoordaHost allows you to interrogate our data products in situ, this includes information on all UK addresses via  DoordaProperty and data on over 12 million organisations held in DoordaBiz. We do all the data processing all you need to do is query and extract!