In several RDBMS databases, including MySQL, subqueries are often one of the causes for performance issues. Therefore, we have an incentive to avoid them whenever we can and to find alternative ways to implement our requirements.
One of the most popular uses for subselects in SQL is when one needs to fetch the first, last, maximum or minimum row for each group in a table. For example, how would you implement an SQL query that should fetch the employees with the maximum salary for each department from the employees table? Actually, fetching the salary itself is pretty easy, but it becomes more complicated when you want to fetch the employee name (the row data) along with the maximum salary.
Let's look at the table:
Name | Salary | Role |
David | 130,000 | Software Engineer |
John | 140,000 | DevOps Engineer |
Bob | 120,000 | Software Engineer |
Sarah | 130,000 | DevOps Engineer |
Alice | 110,000 | Software Engineer |
Steve | 95,000 | DevOps Engineer |
The common practice will be to write the solution to this problem with a subselect. To avoid that practice, we need to rephrase the problem and deep dive into it.
The new problem: find all employees where there is no less earning employee of the same role.
Now, let's look at the smaller parts of this requirement:
Find all employees:
select * from employees
For each employee, find all less earning people with the same role – here we need to perform two actions: 1) left join the table with itself using the role field. 2) add a condition to make sure the salary is the highest.
SELECT empl1.*, empl2.salary FROM employees AS empl1 LEFT OUTER JOIN employees AS empl2 ON empl2.role = empl1.role AND empl2.salary > empl1.salary
Now, let's look at the result of the second step. You'll notice that the rows with max salary contain a NULL record for the joint salary column.
Name | Salary | Role | Salary (joint) |
David | 130,000 | Software Engineer | NULL |
John | 140,000 | DevOps Engineer | NULL |
Bob | 120,000 | Software Engineer | 130,000 |
Sarah | 130,000 | DevOps Engineer | 140,000 |
Alice | 110,000 | Software Engineer | 120,000 |
Alice | 110,000 | Software Engineer | 130,000 |
Steve | 95,000 | DevOps Engineer | 130,000 |
Steve | 95,000 | DevOps Engineer | 140,000 |
So the next trivial step will be just to filter all other rows by adding another condition:
SELECT empl1.*, empl2.salary FROM employees AS empl1 LEFT OUTER JOIN employees AS empl2 ON empl2.role = empl1.role AND empl2.salary > empl1.salary WHERE empl2.salary IS NULL;
And the final result:
David | 130,000 | Software Engineer | NULL |
John | 140,000 | DevOps Engineer | NULL |
Please note - for this solution to work, you need to make sure you have the correct index in place. In this example, you'll need to create an index that includes the role and salary columns in order to avoid full table scans.
Great query for getting max data in same table...Thanks for your information