This article guides SQL developers on how to use aggregate functions to perform quick calculations in SQL.
An aggregate function allows you to perform a quick calculations on values being returned from a SELECT query. A big requirement for an aggregate function is the inclusion of a GROUP BY clause in the SELECT query.
Aggregate functions are COUNT(), MAX(), MIN(), SUM() and AVG(). With the exception of AVG (which finds the average), the names of the functions indicate what each is used for. Below we will explore how we use them.
Let us assume we have a database that has tables storing Students and their assignment submissions and Grades. The following queries will create and populate these tables:
create database student_papers
go -- If using MSSQL Server. Remove for MySQL.
use student_papers;
-- Create tables
CREATE TABLE students (
id INT PRIMARY KEY,
[name] NVARCHAR(100)
);
CREATE TABLE papers (
id INT PRIMARY KEY,
title NVARCHAR(100),
grade INT,
student_id INT,
FOREIGN KEY (student_id) REFERENCES students(id)
);
-- Populate Tables
INSERT INTO students (id,[name]) VALUES
(1,'Caleb'),
(2,'Samantha'),
(3,'Raj'),
(4,'Carlos'),
(5,'Lisa');
INSERT INTO papers (id,student_id, title, grade ) VALUES
(1,1, 'My First Book Report', 60),
(2,1, 'My Second Book Report', 75),
(3,2, 'Russian Lit Through The Ages', 94),
(4,2, 'De Montaigne and The Art of The Essay', 98),
(5,4, 'Borges and Magical Realism', 89);
The GROUP BY clause introduces the ability for us to literally group data and remove repetitions. This is not a replacement for the DISTINCT (which basically just ignores repeated values) keyword though, as it does a double function of keeping track of other values that are being grouped, allowing us to conduct quick math.
In either case, columns that cause variations in the result set will negate the effect. For best results, ensure that only columns with repeating data are selected, in order to truly see how repetitions are eliminated. In the following examples, we will select ‘name’ from the Students table, while joining (Read More About Joins) the Papers table. Then, we will observe the same query with DISTINCT and GROUP BY clauses.
-- SELECT all Records (will show duplicate names)
SELECT [name] FROM Students INNER JOIN Papers ON Students.id = Papers.student_id
-- SELECT with DISTINCT (Remove duplicates)
SELECT DISTINCT [name] FROM Students INNER JOIN Papers ON Students.id = Papers.student_id
-- SELECT with GROUP BY (Remove duplicates)
SELECT [name] FROM Students INNER JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name]
End-to-End Database Development Courses
Scenario: We want to know the number of students in the class. We don’t need to see each record and count them manually, but just need a number value.
-- Note: The count function can actually be used without a GROUP BY and will return a total count of all records in a single table
SELECT count(name) AS [Number of Students] FROM Students
Result:
Number of Students |
---|
5 |
Scenario: We want to know the number of assignments submitted, per student in the class.
-- Note: By Joining the Papers table, we are seeing each paper that a student turned in. Then, GROUPING BY name will 'clump' all the duplicates, allowing us to count the number of records in the group. We are also using a column alias so that the output column gets a name.
SELECT [name], count(name) AS [Number of Submissions] FROM Students
INNER JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name]
Result:
name | Number of Submissions |
---|---|
Caleb | 2 |
Carlos | 1 |
Samantha | 2 |
Scenario: We want to know each student’s average grade, from all their submissions.
-- Note: We should group by as many columns as we select. This should only be done when we are sure that we have values that can be grouped. The difference in grades will defeat the purpose of grouping and so we only group by the names (which will repeat and thus get grouped) and aggregate the grade column for a single value in the amount of a grade average.
SELECT [name], avg(grade) AS [Average Grade] FROM Students
INNER JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name]
Result:
name | Average Grade |
---|---|
Caleb | 67 |
Carlos | 89 |
Samantha | 96 |
-- The following query depicts how the differing grades will break the grouping and not lead to an average grade
SELECT [name], avg(grade) AS [Bad - Average Grade] FROM Students
INNER JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name], grade
Result:
name | Bad - Average Grade |
---|---|
Caleb | 60 |
Caleb | 75 |
Carlos | 89 |
Samantha | 94 |
Samantha | 98 |
Scenario: We want to know each student’s highest grade
SELECT [name], max(grade) AS [Highest Grade] FROM Students
INNER JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name]
Results:
name | Highest Grade |
---|---|
Caleb | 75 |
Carlos | 89 |
Samantha | 98 |
Scenario: We want to know each student’s lowest grade
SELECT [name], min(grade) AS [Lowest Grade] FROM Students
INNER JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name]
Result:
name | Highest Grade |
---|---|
Caleb | 60 |
Carlos | 89 |
Samantha | 94 |
Scenario: We want to know the sum of the grades per student. (An improbable scenario, but worth the example)
SELECT [name], sum(grade) AS [Total Grade] FROM Students
INNER JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name]
Result:
name | Total Grade |
---|---|
Caleb | 135 |
Carlos | 89 |
Samantha | 192 |
End-to-End Database Development Courses
Scenario: We want to know the average, highest and lowest grades per student, as well as how many submissions they made
-- The following query depicts using multiple aggregate functions in a single select query. Notice, it almost doesn't matter which column is placed inside the aggregate function, as long as the column(s) being grouped are correct.
SELECT [name],
avg(grade) AS [Average Grade],
max(grade) AS [Highest Grade],
min(grade) AS [Lowest Grade],
count(name) AS [Number of Submissions]
FROM Students
INNER JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name]
Result:
name | Average Grade | Highest Grade | Lowest Grade | number of Submissions |
---|---|---|---|---|
Caleb | 67 | 75 | 60 | 2 |
Carlos | 89 | 89 | 89 | 1 |
Samantha | 96 | 98 | 94 | 2 |
Scenario: We want to know the average, highest and lowest grades for all students (two students have no grades at all), as well as how many submissions they made. We also want to order by grades in descending order.
-- Building on the previous query, we do a LEFT JOIN instead, to bring back all students and then throw in an order by to sort the by the average grade.
SELECT [name],
avg(grade) AS [Average Grade],
max(grade) AS [Highest Grade],
min(grade) AS [Lowest Grade],
count(name) AS [Number of Submissions]
FROM Students
LEFT JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name]
ORDER BY [Average Grade] desc
Result:
name | Average Grade | Highest Grade | Lowest Grade | Number of Submissions |
---|---|---|---|---|
Caleb | 67 | 75 | 60 | 2 |
Carlos | 89 | 89 | 89 | 1 |
Samantha | 96 | 98 | 94 | 2 |
Lisa | NULL | NULL | NULL | 1 |
Raj | NULL | NULL | NULL | 1 |
Aggregate functions allow us to conduct quick mathematical operations against data coming from one or more tables. It will take some getting used to and trial and error, to fully grasp when and what to GROUP, but this powerful functionality will help us to produce reports.
To learn more on database development and writing more complex queries, you can check out my courses in MySQL Database Development and Microsoft SQL Server Database Development.