SQL
Quick SQL Guide
A. Notes from Alex The Analyst Video
This reference covers the following topics and I am taking notes of the non trivial syntaxes and concepts:
Basic:
- Intro
- Installing MySQL and Setting up Database
- Select Statement
- Where Clause
- Group By
- Having vs Where
- Limit and Aliasing
Intermediate:
- Joins
- Unions
- String Functions
- Case Statements
- Subqueries
- Window Functions
Advanced:
- CTEs
- Temp Tables
- Stored Procedures
- Triggers and Events
- Data Cleaning Project
- Exploratory Data Analysis Project
1. Conditional in SQL: the CASE
statement
Value of each row on a column is based on a condition. (Just like assigning a variable using if condition.)
SELECT
column1, column2,
CASE
WHEN column1 <some condition> THEN
WHEN age <some other condition> THEN
ELSE <something> -- gives null for if default case/else is not present
END as new_column
FROM some_table;
2. Aggregation - GROUP BY
SELECT column1, column2, sum(column3)
FROM table
GROUP BY column1, column2
HAVING sum(column3) > somethinng;
3. Subqueries
Subqueries can be done in different ways:
3.1 Subquery in WHERE
SELECT
column1, column2
FROM some_table
WHERE
some_column IN (
select any_column from any_table
);
3.2 Subquery with SELECT
SELECT
column1, column2,
(select something from any_table) -- must produce same number of columns
FROM some_table;
3.3 Subquery with FROM statement
I think this is more common usage pattern. Get a result and then create a table out of it.
SELECT intermediate_table.any_column
FROM (
select --usually some intermediate operations are performed here
any_column, another_column
from any_table
) as intermediate_table;
This can also be done in better way using CTEs
and temp tables.
4. Window Functions
SELECT column1, column2,
aggregate_function(column3) OVER (PARTITION BY column1 ORDER BY column2)
FROM table_name;
- Rolling sum using window function
SELECT i, SUM(i)
OVER(ORDER BY i) AS j FROM
generate_series(1,10) AS i;
i | j |
---|---|
1 | 1 |
2 | 3 |
3 | 6 |
4 | 10 |
5 | 15 |
6 | 21 |
7 | 28 |
8 | 36 |
9 | 45 |
10 | 55 |
Interestingly, removing the ORDER BY
won’t generate the same result:
SELECT i, SUM(i)
OVER() AS j FROM
generate_series(1,10) AS i;
i | j |
---|---|
1 | 55 |
2 | 55 |
3 | 55 |
4 | 55 |
5 | 55 |
6 | 55 |
7 | 55 |
8 | 55 |
9 | 55 |
10 | 55 |
Now lets try to find rolling sum of odds and evens:
SELECT i,
CASE
WHEN i % 2 = 1 THEN 'odd'
ELSE 'even'
END AS odd_even,
SUM(i) OVER (
PARTITION BY
(CASE WHEN i % 2 = 1 THEN 'odd' ELSE 'even' END)
ORDER BY i
) AS rolling_sum
FROM generate_series(1,10) AS i;
- GROUP BY collapses rows → returns one row per group.
- Window Functions keep all rows → each row gets its own calculation.
ROW_NUMBER
,RANK
,DENSE_RANK
are commonly used as window functions apart from other aggregation functions
5. CTEs
- Improves readability
- You can do more than one intermediate tables on CTE
Example:
WITH cte_table(AVG_COL2) AS (
SELECT column1, avg(column2) AS avg_column2
FROM some_table
GROUP BY column1
)
SELECT avg(avg_column2)
FROM cte_table;
Note that CTE results are still not reusable in a different query.
6. Temp Tables
- Storing intermediate results
- Just use
TEMPORARY
keyword to create the table. theCREATE TEMPORARY TABLE
- Reuse the table (within the same session) as many times as you want
B. From (NEON | PostgreSQL Tutorial)[#neon-postgresql]
1. CREATE FUNCTION
statement
- To create user definted function Syntax:
CREATE [or REPLACE] FUNCTION function_name(param_list)
RETURNS return_type
LANGUAGE plpgsql
AS
$$
DECLARE
-- variable declaration
BEGIN
--logic
END;
$$
Example:
SELECT i,
CASE
WHEN i % 2 = 1 THEN 'odd'
ELSE 'even'
END AS odd_even,
SUM(i)
OVER (
PARTITION BY
(CASE WHEN i % 2 = 1 THEN 'odd' ELSE 'even' END)
ORDER BY i
) AS rolling_sum
FROM generate_series(1,10) AS i order by i;
create function odd_even(n bigint)
returns text
language plpgsql
as
$$
declare
result text;
begin
if n % 2 = 0 then
result := 'even';
else
result := 'odd';
end if;
return result;
end;
$$;
SELECT i,
odd_even(i),
SUM(i)
OVER (
PARTITION BY
odd_even(i)
ORDER BY i
) AS rolling_sum
FROM generate_series(1,10) AS i order by i;
The variables in the param list can be IN
mode (default), OUT
mode or `INOUT’ mode and respectively used to pass a value to function, return a value from a function or pass a value to a function and return an updated value.
Following example creates a function to generate random numbers between 0 and 100 and finds the min, max and avg of it.
CREATE OR REPLACE FUNCTION find_distribution(
sample_size int,
OUT max_val int,
OUT min_val int,
OUT avg_val int
)
LANGUAGE plpgsql
AS $$
BEGIN
SELECT
max(rand_n), min(rand_n), avg(rand_n)
INTO max_val, min_val, avg_val
FROM (
SELECT (random() * 100)::INT AS rand_n
FROM generate_series(1, sample_size)
) AS generated_numbers;
END;
$$;
SELECT * from find_distribution(100);
2. CREATE PROCEDURE
statement
You need them because functions can’t execute (start, commit or rollback) a transaction. Copying an example straight out of their tutorial:
create or replace procedure transfer(
sender int,
receiver int,
amount dec
)
language plpgsql
as $$
begin
-- subtracting the amount from the sender's account
update accounts
set balance = balance - amount
where id = sender;
-- adding the amount to the receiver's account
update accounts
set balance = balance + amount
where id = receiver;
commit;
end;$$;
References:
Last Updated: Sunday, 16 March, 2025 14:25 AEDT
Author: Madhav Om