Jamaica trevoir.williams@gmail.com

SQL GROUP BY and Aggregate Functions

Read Post
Share on:

This article guides SQL developers on how to use aggregate functions to perform quick calculations in SQL.

An aggregate function allows you to perform a quick calculations on values being returned from a SELECT query. A big requirement for an aggregate function is the inclusion of a GROUP BY clause in the SELECT query.

Aggregate functions are COUNT(), MAX(), MIN(), SUM() and AVG(). With the exception of AVG (which finds the average), the names of the functions indicate what each is used for. Below we will explore how we use them.

Let us assume we have a database that has tables storing Students and their assignment submissions and Grades. The following queries will create and populate these tables:

Create and Populate Database

create database student_papers
go -- If using MSSQL Server. Remove for MySQL. 
use student_papers;
-- Create tables
CREATE TABLE students (
    id INT PRIMARY KEY,
    [name] NVARCHAR(100)
);  
CREATE TABLE papers (
    id INT PRIMARY KEY,
    title NVARCHAR(100),
    grade INT,
    student_id INT,
    FOREIGN KEY (student_id) REFERENCES students(id)
);
-- Populate Tables
INSERT INTO students (id,[name]) VALUES
(1,'Caleb'), 
(2,'Samantha'), 
(3,'Raj'), 
(4,'Carlos'), 
(5,'Lisa');
INSERT INTO papers (id,student_id, title, grade ) VALUES
(1,1, 'My First Book Report', 60),
(2,1, 'My Second Book Report', 75),
(3,2, 'Russian Lit Through The Ages', 94),
(4,2, 'De Montaigne and The Art of The Essay', 98),
(5,4, 'Borges and Magical Realism', 89);

Using GROUP BY

The GROUP BY clause introduces the ability for us to literally group data and remove repetitions. This is not a replacement for the DISTINCT (which basically just ignores repeated values) keyword though, as it does a double function of keeping track of other values that are being grouped, allowing us to conduct quick math.

In either case, columns that cause variations in the result set will negate the effect. For best results, ensure that only columns with repeating data are selected, in order to truly see how repetitions are eliminated. In the following examples, we will select ‘name’ from the Students table, while joining (Read More About Joins) the Papers table. Then, we will observe the same query with DISTINCT and GROUP BY clauses.

-- SELECT all Records (will show duplicate names)
SELECT [name] FROM Students INNER JOIN Papers ON Students.id = Papers.student_id
-- SELECT with DISTINCT (Remove duplicates)
SELECT DISTINCT [name] FROM Students INNER JOIN Papers ON Students.id = Papers.student_id
-- SELECT with GROUP BY (Remove duplicates)
SELECT [name] FROM Students INNER JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name]

End-to-End Database Development Courses

Aggregate Functions

COUNT

Scenario: We want to know the number of students in the class. We don’t need to see each record and count them manually, but just need a number value.

-- Note: The count function can actually be used without a GROUP BY and will return a total count of all records in a single table
SELECT count(name) AS [Number of Students] FROM Students

Result:

Number of Students
5

Scenario: We want to know the number of assignments submitted, per student in the class.

-- Note: By Joining the Papers table, we are seeing each paper that a student turned in. Then, GROUPING BY name will 'clump' all the duplicates, allowing us to count the number of records in the group. We are also using a column alias so that the output column gets a name. 
SELECT [name], count(name) AS [Number of Submissions] FROM Students 
INNER JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name]

Result:

name Number of Submissions
Caleb 2
Carlos 1
Samantha 2

AVG

Scenario: We want to know each student’s average grade, from all their submissions.

-- Note: We should group by as many columns as we select. This should only be done when we are sure that we have values that can be grouped. The difference in grades will defeat the purpose of grouping and so we only group by the names (which will repeat and thus get grouped) and aggregate the grade column for a single value in the amount of a grade average. 
SELECT [name], avg(grade) AS [Average Grade] FROM Students 
INNER JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name]

Result:

name Average Grade
Caleb 67
Carlos 89
Samantha 96
-- The following query depicts how the differing grades will break the grouping and not lead to an average grade
SELECT [name], avg(grade) AS [Bad - Average Grade] FROM Students 
INNER JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name], grade

Result:

name Bad - Average Grade
Caleb 60
Caleb 75
Carlos 89
Samantha 94
Samantha 98

MAX

Scenario: We want to know each student’s highest grade

SELECT [name], max(grade) AS [Highest Grade] FROM Students 
INNER JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name]

Results:

name Highest Grade
Caleb 75
Carlos 89
Samantha 98

MIN

Scenario: We want to know each student’s lowest grade

SELECT [name], min(grade) AS [Lowest Grade] FROM Students 
INNER JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name]

Result:

name Highest Grade
Caleb 60
Carlos 89
Samantha 94

SUM

Scenario: We want to know the sum of the grades per student. (An improbable scenario, but worth the example)

SELECT [name], sum(grade) AS [Total Grade] FROM Students 
INNER JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name]

Result:

name Total Grade
Caleb 135
Carlos 89
Samantha 192

End-to-End Database Development Courses

AVG, MAX, MIN, COUNT

Scenario: We want to know the average, highest and lowest grades per student, as well as how many submissions they made

-- The following query depicts using multiple aggregate functions in a single select query. Notice, it almost doesn't matter which column is placed inside the aggregate function, as long as the column(s) being grouped are correct. 
SELECT [name], 
    avg(grade) AS [Average Grade],
    max(grade) AS [Highest Grade],
    min(grade) AS [Lowest Grade],
    count(name) AS [Number of Submissions]
FROM Students 
INNER JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name]

Result:

name Average Grade Highest Grade Lowest Grade number of Submissions
Caleb 67 75 60 2
Carlos 89 89 89 1
Samantha 96 98 94 2

Scenario: We want to know the average, highest and lowest grades for all students (two students have no grades at all), as well as how many submissions they made. We also want to order by grades in descending order.

-- Building on the previous query, we do a LEFT JOIN instead, to bring back all students and then throw in an order by to sort the by the average grade. 
SELECT [name], 
    avg(grade) AS [Average Grade],
    max(grade) AS [Highest Grade],
    min(grade) AS [Lowest Grade],
    count(name) AS [Number of Submissions]
FROM Students 
LEFT JOIN Papers ON Students.id = Papers.student_id
GROUP BY [name]
ORDER BY [Average Grade] desc

Result:

name Average Grade Highest Grade Lowest Grade Number of Submissions
Caleb 67 75 60 2
Carlos 89 89 89 1
Samantha 96 98 94 2
Lisa NULL NULL NULL 1
Raj NULL NULL NULL 1

Conclusion

Aggregate functions allow us to conduct quick mathematical operations against data coming from one or more tables. It will take some getting used to and trial and error, to fully grasp when and what to GROUP, but this powerful functionality will help us to produce reports.

To learn more on database development and writing more complex queries, you can check out my courses in MySQL Database Development and Microsoft SQL Server Database Development.

Trevoir Williams

Jamaican Software Engineer and Lecturer.