attheoaks.com

Mastering the HAVING Clause in MySQL for Enhanced Query Performance

Written on

Chapter 1: Understanding the HAVING Clause

The HAVING clause in SQL is a familiar concept for many database users, often used alongside the GROUP BY clause to impose conditions on aggregate data. Typically, we think of the WHERE clause for filtering conditions; however, the WHERE clause is limited to filtering rows rather than groups. This is where HAVING comes into play, allowing us to specify conditions for aggregated results.

To illustrate this, let’s consider an example with a table named tb_employee_department, which captures the link between employees and their respective departments. Below is the creation of the table and some sample data:

DROP TABLE IF EXISTS tb_employee_department;

CREATE TABLE tb_employee_department (

id int(8) unsigned NOT NULL AUTO_INCREMENT,

employee_no varchar(12) NOT NULL,

department_no varchar(5) NOT NULL,

department_name varchar(50) NOT NULL,

PRIMARY KEY (id)

) ENGINE=InnoDB;

INSERT INTO tb_employee_department(employee_no, department_no, department_name)

VALUES ('e_001', '001', 'd_1'),

('e_002', '001', 'd_1'),

('e_003', '002', 'd_2'),

('e_004', '002', 'd_2'),

('e_005', '003', 'd_3'),

('e_006', '003', 'd_3'),

('e_007', '003', 'd_3');

To find departments with exactly three employees, we would use the HAVING clause as follows:

SELECT department_no, COUNT(*) AS nums

FROM tb_employee_department

GROUP BY department_no

HAVING nums = 3;

In contrast, without the HAVING clause, the query would return data for all departments:

SELECT department_no, COUNT(*) AS nums

FROM tb_employee_department

GROUP BY department_no;

This shows that the WHERE clause filters rows first, followed by the GROUP BY clause, and finally the HAVING clause, which filters the grouped results.

Visualization of HAVING clause functionality

The HAVING clause can only utilize three specific elements: constants, aggregate functions, and the aggregate keys defined in the GROUP BY clause. If you attempt to use a column not included in the GROUP BY clause within HAVING, it will result in an error. For example:

SELECT department_no, COUNT(*) AS nums

FROM tb_employee_department

GROUP BY department_no

HAVING department_name = "d_1";

This will throw an error indicating that department_name is unknown in the HAVING clause.

Chapter 2: The Power of HAVING

The HAVING clause is essential for understanding SQL's set-oriented nature. Let's delve into specific examples that showcase its usefulness.

2.1 Identifying Missing Identifiers

Consider the tb_employee_department table where some identifiers have been deleted. To check for missing IDs, you could use the following query:

SELECT 'miss' AS is_miss

FROM tb_employee_department

HAVING COUNT(*) <> MAX(id) - MIN(id) + 1;

This query does not require a GROUP BY clause, as it aggregates the entire table into one group.

2.2 Calculating the Mode

Let's examine another table, tb_employee_salary, which records employee salaries:

DROP TABLE IF EXISTS tb_employee_salary;

CREATE TABLE tb_employee_salary (

id int(8) unsigned NOT NULL AUTO_INCREMENT,

name varchar(5) NOT NULL,

salary DECIMAL(15,2) NOT NULL,

PRIMARY KEY (id)

) ENGINE=InnoDB;

INSERT INTO tb_employee_salary(name, salary)

VALUES ('e_001', 100000),

('e_002', 1000),

('e_003', 2000),

('e_004', 3000),

('e_005', 3000),

('e_006', 3000),

('e_007', 9000),

('e_008', 9000),

('e_009', 9000),

('e_010', 2000);

To find the most frequently occurring salary, you can use:

SELECT salary, COUNT(*) AS counts

FROM tb_employee_salary

GROUP BY salary

HAVING COUNT(*) >= (

SELECT MAX(cnt)

FROM (

SELECT COUNT(*) AS cnt

FROM tb_employee_salary

GROUP BY salary

) TMP

);

2.3 Calculating the Median

Calculating the median is vital when averages may be misleading. The median is the middle value of a dataset, and its SQL calculation can be achieved through:

SELECT AVG(DISTINCT salary)

FROM (

SELECT T1.salary

FROM tb_employee_salary T1, tb_employee_salary T2

GROUP BY T1.salary

HAVING SUM(CASE WHEN T2.salary >= T1.salary THEN 1 ELSE 0 END) >= COUNT(*) / 2

AND SUM(CASE WHEN T2.salary <= T1.salary THEN 1 ELSE 0 END) >= COUNT(*) / 2

) TMP;

The key here is ensuring that subsets share a common element, which is accomplished by including the equals sign in the condition.

2.4 Querying Non-NULL Sets

Suppose we have a log table, tb_employee_log, where we want to identify departments with complete submissions. The query would be:

SELECT department_name

FROM tb_employee_log

GROUP BY department_name

HAVING COUNT(*) = COUNT(submit_date);

This approach ensures that we only include departments where all entries have a submission date.

Ownership of Conditions

Consider these two queries:

SELECT department_no, COUNT(*) AS nums

FROM tb_employee_department

GROUP BY department_no

HAVING department_no = "001";

SELECT department_no, COUNT(*) AS nums

FROM tb_employee_department

WHERE department_no = "001"

GROUP BY department_no;

Both yield the same results, but it's recommended to use the WHERE clause for conditions related to row attributes for clarity and performance.

In conclusion, understanding the HAVING clause enhances your ability to leverage SQL's set-oriented capabilities, moving away from procedural methods. If you find these insights valuable, please consider showing your support!

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Navigating Rejection: Key Steps for a Healthier Mindset

Discover essential strategies for dealing with rejection and establishing personal boundaries to foster healthier relationships.

# The Case for Electric Vehicles: Addressing Long-Distance Concerns

A discussion on the practicality of electric vehicles for long trips and their environmental benefits.

Exploring Chamarel Waterfall and the Seven Coloured Earths

Discover the stunning Chamarel Waterfall and the unique Seven Coloured Earths in Mauritius, a must-visit during your stay.