Mastering the HAVING Clause in MySQL for Enhanced Query Performance
Written on
Chapter 1: Understanding the HAVING Clause
The HAVING clause in SQL is a familiar concept for many database users, often used alongside the GROUP BY clause to impose conditions on aggregate data. Typically, we think of the WHERE clause for filtering conditions; however, the WHERE clause is limited to filtering rows rather than groups. This is where HAVING comes into play, allowing us to specify conditions for aggregated results.
To illustrate this, let’s consider an example with a table named tb_employee_department, which captures the link between employees and their respective departments. Below is the creation of the table and some sample data:
DROP TABLE IF EXISTS tb_employee_department;
CREATE TABLE tb_employee_department (
id int(8) unsigned NOT NULL AUTO_INCREMENT,
employee_no varchar(12) NOT NULL,
department_no varchar(5) NOT NULL,
department_name varchar(50) NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB;
INSERT INTO tb_employee_department(employee_no, department_no, department_name)
VALUES ('e_001', '001', 'd_1'),
('e_002', '001', 'd_1'),
('e_003', '002', 'd_2'),
('e_004', '002', 'd_2'),
('e_005', '003', 'd_3'),
('e_006', '003', 'd_3'),
('e_007', '003', 'd_3');
To find departments with exactly three employees, we would use the HAVING clause as follows:
SELECT department_no, COUNT(*) AS nums
FROM tb_employee_department
GROUP BY department_no
HAVING nums = 3;
In contrast, without the HAVING clause, the query would return data for all departments:
SELECT department_no, COUNT(*) AS nums
FROM tb_employee_department
GROUP BY department_no;
This shows that the WHERE clause filters rows first, followed by the GROUP BY clause, and finally the HAVING clause, which filters the grouped results.
The HAVING clause can only utilize three specific elements: constants, aggregate functions, and the aggregate keys defined in the GROUP BY clause. If you attempt to use a column not included in the GROUP BY clause within HAVING, it will result in an error. For example:
SELECT department_no, COUNT(*) AS nums
FROM tb_employee_department
GROUP BY department_no
HAVING department_name = "d_1";
This will throw an error indicating that department_name is unknown in the HAVING clause.
Chapter 2: The Power of HAVING
The HAVING clause is essential for understanding SQL's set-oriented nature. Let's delve into specific examples that showcase its usefulness.
2.1 Identifying Missing Identifiers
Consider the tb_employee_department table where some identifiers have been deleted. To check for missing IDs, you could use the following query:
SELECT 'miss' AS is_miss
FROM tb_employee_department
HAVING COUNT(*) <> MAX(id) - MIN(id) + 1;
This query does not require a GROUP BY clause, as it aggregates the entire table into one group.
2.2 Calculating the Mode
Let's examine another table, tb_employee_salary, which records employee salaries:
DROP TABLE IF EXISTS tb_employee_salary;
CREATE TABLE tb_employee_salary (
id int(8) unsigned NOT NULL AUTO_INCREMENT,
name varchar(5) NOT NULL,
salary DECIMAL(15,2) NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB;
INSERT INTO tb_employee_salary(name, salary)
VALUES ('e_001', 100000),
('e_002', 1000),
('e_003', 2000),
('e_004', 3000),
('e_005', 3000),
('e_006', 3000),
('e_007', 9000),
('e_008', 9000),
('e_009', 9000),
('e_010', 2000);
To find the most frequently occurring salary, you can use:
SELECT salary, COUNT(*) AS counts
FROM tb_employee_salary
GROUP BY salary
HAVING COUNT(*) >= (
SELECT MAX(cnt)
FROM (
SELECT COUNT(*) AS cnt
FROM tb_employee_salary
GROUP BY salary
) TMP
);
2.3 Calculating the Median
Calculating the median is vital when averages may be misleading. The median is the middle value of a dataset, and its SQL calculation can be achieved through:
SELECT AVG(DISTINCT salary)
FROM (
SELECT T1.salary
FROM tb_employee_salary T1, tb_employee_salary T2
GROUP BY T1.salary
HAVING SUM(CASE WHEN T2.salary >= T1.salary THEN 1 ELSE 0 END) >= COUNT(*) / 2
AND SUM(CASE WHEN T2.salary <= T1.salary THEN 1 ELSE 0 END) >= COUNT(*) / 2
) TMP;
The key here is ensuring that subsets share a common element, which is accomplished by including the equals sign in the condition.
2.4 Querying Non-NULL Sets
Suppose we have a log table, tb_employee_log, where we want to identify departments with complete submissions. The query would be:
SELECT department_name
FROM tb_employee_log
GROUP BY department_name
HAVING COUNT(*) = COUNT(submit_date);
This approach ensures that we only include departments where all entries have a submission date.
Ownership of Conditions
Consider these two queries:
SELECT department_no, COUNT(*) AS nums
FROM tb_employee_department
GROUP BY department_no
HAVING department_no = "001";
SELECT department_no, COUNT(*) AS nums
FROM tb_employee_department
WHERE department_no = "001"
GROUP BY department_no;
Both yield the same results, but it's recommended to use the WHERE clause for conditions related to row attributes for clarity and performance.
In conclusion, understanding the HAVING clause enhances your ability to leverage SQL's set-oriented capabilities, moving away from procedural methods. If you find these insights valuable, please consider showing your support!