Subquery in SQL | Correlated Subquery + Complete SQL Subqueries Tutorial

techTFQ

7 chapters7 takeaways12 key terms5 questions

Overview

This video provides a comprehensive tutorial on SQL subqueries, explaining what they are, how they function, and their various applications. It covers the basic concept of nesting one query within another, detailing how SQL processes these statements. The tutorial then delves into three main types of subqueries: scalar, multiple-row, and correlated subqueries, illustrating each with practical examples. It also explores where subqueries can be used within SQL commands, specifically in SELECT, FROM, WHERE, and HAVING clauses, and discusses the performance implications, particularly with correlated subqueries. Finally, it touches upon nested subqueries and the use of the WITH clause for better readability.

How was this?

Save this permanently with flashcards, quizzes, and AI chat

Chapters

A subquery is an SQL query nested inside another SQL query.
It's also known as an inner query, while the outer query is the main query.
SQL processes subqueries by executing the inner query first, then using its result in the outer query.
This allows for dynamic query building without hardcoding values.

Understanding subqueries is crucial for writing more complex and dynamic SQL queries that can retrieve specific data based on intermediate results.

Finding employees whose salary is greater than the average salary of all employees by using a subquery to calculate the average salary within the WHERE clause.

A scalar subquery returns exactly one row and one column.
The previous example of finding employees earning more than the average salary uses a scalar subquery.
Scalar subqueries can be used in WHERE, FROM, and SELECT clauses.
When used in the FROM clause, the subquery's result set is treated like a temporary table.

Scalar subqueries simplify queries by allowing you to use a single calculated value from one query as a condition or data point in another.

Using a subquery that calculates the average salary in the FROM clause, aliasing its result, and then joining it back to the employee table to filter employees earning more than this average.

Multiple-row subqueries return more than one row, potentially with multiple columns.
They are often used with operators like IN, ANY, or ALL.
A common use case is finding records that match a set of values returned by the subquery.
An example is finding employees who earn the highest salary in their respective departments.

These subqueries are essential for comparing a value against a list of results, enabling complex filtering based on aggregated data across different groups.

Finding employees whose department and salary combination matches the maximum salary for each department, using the IN operator with a subquery that returns department and max salary pairs.

This is a specific type of multiple-row subquery that returns only one column but multiple rows.
It's useful for checking if a value exists within a list of possibilities.
An example is finding departments that do not have any employees.
This is achieved by selecting department names from a 'departments' table that are NOT IN the list of department names found in the 'employees' table.

This type of subquery is powerful for identifying discrepancies or missing data between related tables, such as finding entities that exist in one table but not in another.

Selecting department names from a 'department' table where the department name is NOT IN the distinct list of department names from the 'employee' table.

A correlated subquery's execution depends on the outer query; it's related to the outer query's current row.
The subquery is executed once for each row processed by the outer query.
This dependency makes them potentially slower than non-correlated subqueries.
They are useful for comparing a row's value against an aggregate calculated specifically for that row's group (e.g., average salary within the employee's own department).

Correlated subqueries allow for row-by-row comparisons against dynamically calculated values, enabling sophisticated conditional logic that depends on the context of each record.

Finding employees whose salary is greater than the average salary of their specific department by correlating the subquery's department filter with the outer query's current employee's department.

SQL allows for subqueries within subqueries (nested subqueries).
While functional, deeply nested queries can become hard to read and maintain.
The WITH clause (Common Table Expression or CTE) provides a cleaner way to manage complex queries with multiple subqueries.
A CTE defines a temporary, named result set that can be referenced within a single SQL statement.

Using CTEs improves query readability and maintainability by breaking down complex logic into named, manageable steps, especially when the same subquery is used multiple times.

Finding stores whose total sales are better than the average sales across all stores, using a CTE to first calculate total sales per store and then referencing this CTE in the main query to calculate the overall average and filter.

Subqueries can be used in the SELECT, FROM, WHERE, and HAVING clauses.
Using subqueries in the SELECT clause is generally discouraged due to potential performance issues and readability concerns.
Subqueries in the FROM clause are treated as temporary tables (derived tables).
Subqueries in the HAVING clause are useful for filtering aggregated results based on subquery conditions.

Understanding where subqueries can be applied allows you to leverage them in various parts of your SQL statements to construct precise and efficient queries.

Finding stores that have sold more units than the average units sold by all stores, using a subquery within the HAVING clause to compare the sum of units sold per store against the overall average.

Key takeaways

1Subqueries are powerful tools for breaking down complex data retrieval problems into smaller, manageable parts.
2The type of subquery (scalar, multiple-row, correlated) dictates how it's used and processed by the SQL engine.
3Correlated subqueries offer flexibility for row-by-row analysis but can impact performance if not used judiciously.
4Always consider performance implications; often, joins or CTEs can be more efficient alternatives to complex or deeply nested subqueries.
5Subqueries can be placed in WHERE, FROM, SELECT, and HAVING clauses, each serving a different purpose.
6The WITH clause (CTE) is a best practice for improving the readability and maintainability of queries involving multiple subqueries or repeated subquery logic.
7Understanding how SQL processes subqueries (executing inner first vs. row-by-row) is key to debugging and optimizing queries.

Key terms

SubqueryInner QueryOuter QueryScalar SubqueryMultiple-Row SubqueryCorrelated SubqueryNested SubqueryCommon Table Expression (CTE)WITH ClauseAggregate FunctionGROUP BYHAVING Clause

Test your understanding

1What is the fundamental difference in execution between a scalar subquery and a correlated subquery?
2How can a subquery used in the FROM clause be conceptually understood in terms of database objects?
3Why is it generally recommended to avoid using subqueries within the SELECT clause, and what is a common alternative?
4Describe a scenario where a multiple-row subquery would be more appropriate than a scalar subquery.
5What is the primary performance consideration when using correlated subqueries, and how can this be mitigated?