Title: SQL Database Management

Date: 2025-10-24 22:09

Status: Child

Tag: SQL Data Analyst Data Analysis

Mandatory and Optional SQL Clauses

Previously Covered Topics

Mandatory Clauses:

  • SELECT - specifies columns to retrieve
  • FROM - specifies the table

Optional Clauses Learned:

  • WHERE - filters data
  • AND - combines multiple conditions
  • ORDER BY - sorts results

GROUP BY Clause

Concept Overview

GROUP BY functions similarly to Pivot Tables in Excel. Just as you would:

  1. Select data in Excel
  2. Insert a Pivot Table
  3. Drag fields to rows/columns/values
  4. Apply aggregation functions

In SQL, GROUP BY performs the same aggregation operations.

Aggregate Functions

Common aggregate functions include:

  • COUNT() - counts number of records
  • SUM() - calculates total
  • MIN() - finds minimum value
  • MAX() - finds maximum value
  • AVG() - calculates average

Basic GROUP BY Syntax

SELECT column_name, AGGREGATE_FUNCTION(column_name)
FROM table_name
GROUP BY column_name;

Example 1: Rating-wise Film Count

Business Question: How many films do we have for each rating?

SELECT rating, COUNT(film_id) AS number_of_films
FROM film
GROUP BY rating;

Important Rule: When you use an aggregate function (COUNT, SUM, MIN, MAX, AVG), you must include a GROUP BY clause, or you will get an error:

Aggregate query without group by expression

Example 2: Multi-level Grouping (Drill-Down)

Business Question: Show film count by rating and rental duration.

SELECT rating, 
       rental_duration, 
       COUNT(film_id) AS number_of_films
FROM film
GROUP BY rating, rental_duration;

Key Points:

  • When grouping by multiple columns, list all grouping columns in GROUP BY
  • Order matters: first column creates primary groups, second column creates sub-groups
  • Use commas to separate columns (but not after the last column)
  • All non-aggregated columns in SELECT must appear in GROUP BY

Example 3: Multiple Aggregate Functions

Business Question: For each rating, show:

  • Number of films
  • Minimum length
  • Maximum length
  • Total sum of length
  • Average length
SELECT rating,
       COUNT(film_id) AS number_of_films,
       MIN(length) AS min_length,
       MAX(length) AS max_length,
       SUM(length) AS total_length,
       AVG(length) AS avg_length
FROM film
GROUP BY rating;

Commenting Out Columns

To temporarily hide a column without deleting it, add two dashes before the column name:

SELECT rating,
       COUNT(film_id) AS number_of_films,
       MIN(length) AS min_length,
       -- MAX(length) AS max_length,
       -- SUM(length) AS total_length,
       AVG(length) AS avg_length
FROM film
GROUP BY rating;

This is similar to hiding columns in Excel.

HAVING Clause

Purpose and Usage

HAVING is used to filter after grouping has occurred. It’s different from WHERE:

  • WHERE filters rows before grouping
  • HAVING filters groups after aggregation

Key Rules:

  1. HAVING must be used with GROUP BY
  2. HAVING comes after GROUP BY in the query
  3. Can also be used with ORDER BY

Syntax

SELECT column_name, AGGREGATE_FUNCTION(column_name)
FROM table_name
GROUP BY column_name
HAVING condition;

Example: Customer Rental Analysis

Business Question: Show customers who have made fewer than 15 rentals.

SELECT customer_id, 
       COUNT(rental_id) AS total_rentals
FROM rental
GROUP BY customer_id
HAVING COUNT(rental_id) < 15;

Explanation:

  • First, group all rentals by customer
  • Then, filter to show only customers with less than 15 rentals
  • This is like applying a filter to a pivot table result

Complex Example with Multiple Clauses

SELECT customer_id,
       COUNT(rental_id) AS total_rentals
FROM rental
WHERE rental_date > '2024-01-01'
GROUP BY customer_id
HAVING COUNT(rental_id) >= 15
ORDER BY total_rentals DESC;

Clause Order:

  1. SELECT - choose columns
  2. FROM - specify table
  3. WHERE - filter raw data
  4. GROUP BY - aggregate data
  5. HAVING - filter aggregated results
  6. ORDER BY - sort final results

ORDER BY Clause

Purpose

ORDER BY sorts query results in ascending or descending order.

Syntax

SELECT columns
FROM table_name
ORDER BY column_name [ASC|DESC];
  • ASC - ascending order (smallest to largest) - default
  • DESC - descending order (largest to smallest)

Examples

Ascending order (lowest to highest):

SELECT customer_id, amount
FROM payment
ORDER BY amount ASC;

Descending order (highest to lowest):

SELECT customer_id, amount
FROM payment
ORDER BY amount DESC;

Combining with Other Clauses

SELECT customer_id,
       COUNT(rental_id) AS total_rentals
FROM rental
GROUP BY customer_id
HAVING COUNT(rental_id) >= 15
ORDER BY total_rentals DESC;

CASE Statement

Overview

CASE statement is SQL’s equivalent to Excel’s IF function. It performs logical tests and returns different values based on conditions.

Basic Syntax

CASE
    WHEN condition1 THEN result1
    WHEN condition2 THEN result2
    WHEN condition3 THEN result3
    ELSE default_result
END AS column_alias

Mandatory Components:

  • CASE - starts the statement
  • WHEN - specifies condition
  • THEN - specifies result if condition is true
  • END - closes the statement
  • AS - provides alias for the new column

Optional Component:

  • ELSE - default value if no conditions match

Example 1: Film Length Buckets

Business Question: Categorize films by length into time buckets.

Logic:

  • Less than 60 minutes: “Under 1 hour”
  • 60-90 minutes: “1 to 1.5 hours”
  • Greater than 90 minutes: “Over 1.5 hours”
SELECT length,
       CASE
           WHEN length < 60 THEN 'Under 1 hour'
           WHEN length BETWEEN 60 AND 90 THEN '1 to 1.5 hours'
           WHEN length > 90 THEN 'Over 1.5 hours'
           ELSE 'Check logic'
       END AS length_bucket
FROM film;

Important Notes:

  • The ELSE 'Check logic' helps identify any values that don’t meet the conditions
  • Use DISTINCT to see unique categories:
SELECT DISTINCT
       CASE
           WHEN length < 60 THEN 'Under 1 hour'
           WHEN length BETWEEN 60 AND 90 THEN '1 to 1.5 hours'
           WHEN length > 90 THEN 'Over 1.5 hours'
       END AS length_bucket
FROM film
ORDER BY length_bucket;

Example 2: Customer Store Status

Business Question: Show customer details with their store status (active/inactive) for each store.

Table Structure:

  • store_id: 1 or 2
  • active: 1 (active) or 0 (inactive)
SELECT first_name,
       last_name,
       email,
       CASE
           WHEN store_id = 1 AND active = 1 THEN 'Store 1 Active'
           WHEN store_id = 1 AND active = 0 THEN 'Store 1 Inactive'
           WHEN store_id = 2 AND active = 1 THEN 'Store 2 Active'
           WHEN store_id = 2 AND active = 0 THEN 'Store 2 Inactive'
           ELSE 'Check logic'
       END AS customer_status
FROM customer;

Key Points:

  • Can use AND to combine multiple conditions
  • Each WHEN creates a different category
  • Results appear as a new calculated column

Practice Questions

Question 1: Staff Directory

Business Question: We need a list of all staff members including their first name, last name, email address, and store identification number where they work.

SELECT first_name,
       last_name,
       email,
       store_id
FROM staff;

Question 2: Inventory Count by Store

Business Question: We need separate counts of inventory items held at each of our two stores.

SELECT store_id,
       COUNT(inventory_id) AS inventory_count
FROM inventory
GROUP BY store_id;

Question 3: Active Customers by Store

Business Question: We need a count of active customers for each store separately.

SELECT store_id,
       COUNT(customer_id) AS active_customers
FROM customer
WHERE active = 1
GROUP BY store_id;

Explanation:

  • WHERE active = 1 filters for only active customers before grouping
  • This is more efficient than filtering after grouping

Question 4: Email Count

Business Question: To assess liability in case of data breach, provide a count of all customer email addresses stored in the database.

SELECT COUNT(email) AS total_emails
FROM customer;

Key Takeaways

  1. Aggregate Functions Require GROUP BY: Whenever you use COUNT(), SUM(), MIN(), MAX(), or AVG(), you must include GROUP BY

  2. Column Rules in GROUP BY: All non-aggregated columns in SELECT must appear in GROUP BY

  3. WHERE vs HAVING:

    • WHERE filters before aggregation
    • HAVING filters after aggregation
  4. CASE Statement Structure: Always include CASE, WHEN, THEN, END, and AS

  5. Query Clause Order:

    SELECT
    FROM
    WHERE
    GROUP BY
    HAVING
    ORDER BY

Next Steps

  • Practice the provided questions
  • Review GROUP BY with multiple columns
  • Experiment with different CASE statement scenarios
  • Prepare for the next class on Saturday at 9:30 AM

References