Quick SQL Guide

A. Notes from Alex The Analyst Video

This reference covers the following topics and I am taking notes of the non trivial syntaxes and concepts:

Basic:

  • Intro
  • Installing MySQL and Setting up Database
  • Select Statement
  • Where Clause
  • Group By
  • Having vs Where
  • Limit and Aliasing

Intermediate:

  • Joins
  • Unions
  • String Functions
  • Case Statements
  • Subqueries
  • Window Functions

Advanced:

  • CTEs
  • Temp Tables
  • Stored Procedures
  • Triggers and Events
  • Data Cleaning Project
  • Exploratory Data Analysis Project

1. Conditional in SQL: the CASE statement

Value of each row on a column is based on a condition. (Just like assigning a variable using if condition.)

SELECT
column1, column2,
CASE
    WHEN column1 <some condition> THEN
    WHEN age <some other condition> THEN
    ELSE <something> -- gives null for if default case/else is not present
END as new_column
FROM some_table;

2. Aggregation - GROUP BY

SELECT column1, column2, sum(column3)
FROM table
GROUP BY column1, column2
HAVING sum(column3) > somethinng;

3. Subqueries

Subqueries can be done in different ways:

3.1 Subquery in WHERE

SELECT
column1, column2
FROM some_table
WHERE
some_column IN (
    select any_column from any_table
);

3.2 Subquery with SELECT

SELECT
column1, column2,
(select something from any_table) -- must produce same number of columns
FROM some_table;

3.3 Subquery with FROM statement

I think this is more common usage pattern. Get a result and then create a table out of it.

SELECT intermediate_table.any_column 
FROM (
    select --usually some intermediate operations are performed here
    any_column, another_column 
    from any_table 
) as intermediate_table;

This can also be done in better way using CTEs and temp tables.

4. Window Functions

SELECT column1, column2, 
       aggregate_function(column3) OVER (PARTITION BY column1 ORDER BY column2)
FROM table_name;
  • Rolling sum using window function
SELECT i, SUM(i)
OVER(ORDER BY i) AS j FROM
generate_series(1,10) AS i;
i j
1 1
2 3
3 6
4 10
5 15
6 21
7 28
8 36
9 45
10 55

Interestingly, removing the ORDER BY won’t generate the same result:

SELECT i, SUM(i)
OVER() AS j FROM
generate_series(1,10) AS i;
i j
1 55
2 55
3 55
4 55
5 55
6 55
7 55
8 55
9 55
10 55

Now lets try to find rolling sum of odds and evens:

SELECT i,
CASE 
    WHEN i % 2 = 1 THEN 'odd'
    ELSE 'even'
END AS odd_even,
SUM(i) OVER (
    PARTITION BY 
    (CASE WHEN i % 2 = 1 THEN 'odd' ELSE 'even' END)
    ORDER BY i
) AS rolling_sum
FROM generate_series(1,10) AS i;

Rolling Sum

  • GROUP BY collapses rows → returns one row per group.
  • Window Functions keep all rows → each row gets its own calculation.
  • ROW_NUMBER, RANK, DENSE_RANK are commonly used as window functions apart from other aggregation functions

5. CTEs

  • Improves readability
  • You can do more than one intermediate tables on CTE

Example:

WITH cte_table(AVG_COL2) AS (
    SELECT column1, avg(column2) AS avg_column2 
    FROM some_table
    GROUP BY column1
)
SELECT avg(avg_column2) 
FROM cte_table;

Note that CTE results are still not reusable in a different query.

6. Temp Tables

  • Storing intermediate results
  • Just use TEMPORARY keyword to create the table. the CREATE TEMPORARY TABLE
  • Reuse the table (within the same session) as many times as you want

B. From (NEON | PostgreSQL Tutorial)[#neon-postgresql]

1. CREATE FUNCTION statement

  • To create user definted function Syntax:
CREATE [or REPLACE] FUNCTION function_name(param_list)
    RETURNS return_type
    LANGUAGE plpgsql
    AS
$$
DECLARE
    -- variable declaration
BEGIN
    --logic
END;
$$

Example:

SELECT i,
       CASE 
           WHEN i % 2 = 1 THEN 'odd'
           ELSE 'even'
       END AS odd_even,
       SUM(i) 
       OVER (
       		PARTITION BY 
       		(CASE WHEN i % 2 = 1 THEN 'odd' ELSE 'even' END)
       		ORDER BY i
       		) AS rolling_sum
FROM generate_series(1,10) AS i order by i;
create function odd_even(n bigint)
	returns text
	language plpgsql
	as
$$
declare
	result text;
begin
	if n % 2 = 0 then
		result := 'even';
	else
		result := 'odd';
	end if;
	return result;
end;
$$;

SELECT i,
       odd_even(i),
       SUM(i)
       OVER (
       		PARTITION BY 
       		odd_even(i)
       		ORDER BY i
       		) AS rolling_sum
FROM generate_series(1,10) AS i order by i;

The variables in the param list can be IN mode (default), OUT mode or `INOUT’ mode and respectively used to pass a value to function, return a value from a function or pass a value to a function and return an updated value.

Following example creates a function to generate random numbers between 0 and 100 and finds the min, max and avg of it.

CREATE OR REPLACE FUNCTION find_distribution(
    sample_size int,
    OUT max_val int,
    OUT min_val int,
    OUT avg_val int
)
LANGUAGE plpgsql
AS $$
BEGIN
    SELECT
        max(rand_n), min(rand_n), avg(rand_n)
    INTO max_val, min_val, avg_val
    FROM (
        SELECT (random() * 100)::INT AS rand_n
        FROM generate_series(1, sample_size)
    ) AS generated_numbers;
END;
$$;

SELECT * from find_distribution(100);

2. CREATE PROCEDURE statement

You need them because functions can’t execute (start, commit or rollback) a transaction. Copying an example straight out of their tutorial:

create or replace procedure transfer(
   sender int,
   receiver int,
   amount dec
)
language plpgsql
as $$
begin
    -- subtracting the amount from the sender's account
    update accounts
    set balance = balance - amount
    where id = sender;

    -- adding the amount to the receiver's account
    update accounts
    set balance = balance + amount
    where id = receiver;

    commit;
end;$$;

References:


Last Updated: Sunday, 16 March, 2025 14:25 AEDT
Author: Madhav Om