Title: SQL Database Management
Date: 2025-10-24 22:09
Status: Child
Tag: SQL Data Analyst Data Analysis
Mandatory and Optional SQL Clauses
Previously Covered Topics
Mandatory Clauses:
SELECT- specifies columns to retrieveFROM- specifies the table
Optional Clauses Learned:
WHERE- filters dataAND- combines multiple conditionsORDER BY- sorts results
GROUP BY Clause
Concept Overview
GROUP BY functions similarly to Pivot Tables in Excel. Just as you would:
- Select data in Excel
- Insert a Pivot Table
- Drag fields to rows/columns/values
- Apply aggregation functions
In SQL, GROUP BY performs the same aggregation operations.
Aggregate Functions
Common aggregate functions include:
COUNT()- counts number of recordsSUM()- calculates totalMIN()- finds minimum valueMAX()- finds maximum valueAVG()- calculates average
Basic GROUP BY Syntax
SELECT column_name, AGGREGATE_FUNCTION(column_name)
FROM table_name
GROUP BY column_name;Example 1: Rating-wise Film Count
Business Question: How many films do we have for each rating?
SELECT rating, COUNT(film_id) AS number_of_films
FROM film
GROUP BY rating;Important Rule: When you use an aggregate function (COUNT, SUM, MIN, MAX, AVG), you must include a GROUP BY clause, or you will get an error:
Aggregate query without group by expression
Example 2: Multi-level Grouping (Drill-Down)
Business Question: Show film count by rating and rental duration.
SELECT rating,
rental_duration,
COUNT(film_id) AS number_of_films
FROM film
GROUP BY rating, rental_duration;Key Points:
- When grouping by multiple columns, list all grouping columns in
GROUP BY - Order matters: first column creates primary groups, second column creates sub-groups
- Use commas to separate columns (but not after the last column)
- All non-aggregated columns in
SELECTmust appear inGROUP BY
Example 3: Multiple Aggregate Functions
Business Question: For each rating, show:
- Number of films
- Minimum length
- Maximum length
- Total sum of length
- Average length
SELECT rating,
COUNT(film_id) AS number_of_films,
MIN(length) AS min_length,
MAX(length) AS max_length,
SUM(length) AS total_length,
AVG(length) AS avg_length
FROM film
GROUP BY rating;Commenting Out Columns
To temporarily hide a column without deleting it, add two dashes before the column name:
SELECT rating,
COUNT(film_id) AS number_of_films,
MIN(length) AS min_length,
-- MAX(length) AS max_length,
-- SUM(length) AS total_length,
AVG(length) AS avg_length
FROM film
GROUP BY rating;This is similar to hiding columns in Excel.
HAVING Clause
Purpose and Usage
HAVING is used to filter after grouping has occurred. It’s different from WHERE:
WHEREfilters rows before groupingHAVINGfilters groups after aggregation
Key Rules:
HAVINGmust be used withGROUP BYHAVINGcomes afterGROUP BYin the query- Can also be used with
ORDER BY
Syntax
SELECT column_name, AGGREGATE_FUNCTION(column_name)
FROM table_name
GROUP BY column_name
HAVING condition;Example: Customer Rental Analysis
Business Question: Show customers who have made fewer than 15 rentals.
SELECT customer_id,
COUNT(rental_id) AS total_rentals
FROM rental
GROUP BY customer_id
HAVING COUNT(rental_id) < 15;Explanation:
- First, group all rentals by customer
- Then, filter to show only customers with less than 15 rentals
- This is like applying a filter to a pivot table result
Complex Example with Multiple Clauses
SELECT customer_id,
COUNT(rental_id) AS total_rentals
FROM rental
WHERE rental_date > '2024-01-01'
GROUP BY customer_id
HAVING COUNT(rental_id) >= 15
ORDER BY total_rentals DESC;Clause Order:
SELECT- choose columnsFROM- specify tableWHERE- filter raw dataGROUP BY- aggregate dataHAVING- filter aggregated resultsORDER BY- sort final results
ORDER BY Clause
Purpose
ORDER BY sorts query results in ascending or descending order.
Syntax
SELECT columns
FROM table_name
ORDER BY column_name [ASC|DESC];ASC- ascending order (smallest to largest) - defaultDESC- descending order (largest to smallest)
Examples
Ascending order (lowest to highest):
SELECT customer_id, amount
FROM payment
ORDER BY amount ASC;Descending order (highest to lowest):
SELECT customer_id, amount
FROM payment
ORDER BY amount DESC;Combining with Other Clauses
SELECT customer_id,
COUNT(rental_id) AS total_rentals
FROM rental
GROUP BY customer_id
HAVING COUNT(rental_id) >= 15
ORDER BY total_rentals DESC;CASE Statement
Overview
CASE statement is SQL’s equivalent to Excel’s IF function. It performs logical tests and returns different values based on conditions.
Basic Syntax
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
WHEN condition3 THEN result3
ELSE default_result
END AS column_aliasMandatory Components:
CASE- starts the statementWHEN- specifies conditionTHEN- specifies result if condition is trueEND- closes the statementAS- provides alias for the new column
Optional Component:
ELSE- default value if no conditions match
Example 1: Film Length Buckets
Business Question: Categorize films by length into time buckets.
Logic:
- Less than 60 minutes: “Under 1 hour”
- 60-90 minutes: “1 to 1.5 hours”
- Greater than 90 minutes: “Over 1.5 hours”
SELECT length,
CASE
WHEN length < 60 THEN 'Under 1 hour'
WHEN length BETWEEN 60 AND 90 THEN '1 to 1.5 hours'
WHEN length > 90 THEN 'Over 1.5 hours'
ELSE 'Check logic'
END AS length_bucket
FROM film;Important Notes:
- The
ELSE 'Check logic'helps identify any values that don’t meet the conditions - Use
DISTINCTto see unique categories:
SELECT DISTINCT
CASE
WHEN length < 60 THEN 'Under 1 hour'
WHEN length BETWEEN 60 AND 90 THEN '1 to 1.5 hours'
WHEN length > 90 THEN 'Over 1.5 hours'
END AS length_bucket
FROM film
ORDER BY length_bucket;Example 2: Customer Store Status
Business Question: Show customer details with their store status (active/inactive) for each store.
Table Structure:
store_id: 1 or 2active: 1 (active) or 0 (inactive)
SELECT first_name,
last_name,
email,
CASE
WHEN store_id = 1 AND active = 1 THEN 'Store 1 Active'
WHEN store_id = 1 AND active = 0 THEN 'Store 1 Inactive'
WHEN store_id = 2 AND active = 1 THEN 'Store 2 Active'
WHEN store_id = 2 AND active = 0 THEN 'Store 2 Inactive'
ELSE 'Check logic'
END AS customer_status
FROM customer;Key Points:
- Can use
ANDto combine multiple conditions - Each
WHENcreates a different category - Results appear as a new calculated column
Practice Questions
Question 1: Staff Directory
Business Question: We need a list of all staff members including their first name, last name, email address, and store identification number where they work.
SELECT first_name,
last_name,
email,
store_id
FROM staff;Question 2: Inventory Count by Store
Business Question: We need separate counts of inventory items held at each of our two stores.
SELECT store_id,
COUNT(inventory_id) AS inventory_count
FROM inventory
GROUP BY store_id;Question 3: Active Customers by Store
Business Question: We need a count of active customers for each store separately.
SELECT store_id,
COUNT(customer_id) AS active_customers
FROM customer
WHERE active = 1
GROUP BY store_id;Explanation:
WHERE active = 1filters for only active customers before grouping- This is more efficient than filtering after grouping
Question 4: Email Count
Business Question: To assess liability in case of data breach, provide a count of all customer email addresses stored in the database.
SELECT COUNT(email) AS total_emails
FROM customer;Key Takeaways
-
Aggregate Functions Require GROUP BY: Whenever you use
COUNT(),SUM(),MIN(),MAX(), orAVG(), you must includeGROUP BY -
Column Rules in GROUP BY: All non-aggregated columns in
SELECTmust appear inGROUP BY -
WHERE vs HAVING:
WHEREfilters before aggregationHAVINGfilters after aggregation
-
CASE Statement Structure: Always include
CASE,WHEN,THEN,END, andAS -
Query Clause Order:
SELECT FROM WHERE GROUP BY HAVING ORDER BY

Next Steps
- Practice the provided questions
- Review GROUP BY with multiple columns
- Experiment with different CASE statement scenarios
- Prepare for the next class on Saturday at 9:30 AM
References