SQL (Structured Query Language) is the standard language for interacting with relational databases. PostgreSQL (often called "Postgres") is an advanced, open-source relational database system that extends the SQL language while adding many features that safely store and scale complex data workloads.
SQL is a domain-specific language designed for managing data held in relational database management systems. It was developed in the 1970s by IBM researchers and has since become the standard language for database interaction. SQL allows you to:
- Create database structures (tables, views, indexes)
- Insert, update, and delete data
- Query data with sophisticated selection criteria
- Control access to the database and its objects
PostgreSQL is a powerful, open-source object-relational database system with over 30 years of active development. It has earned a strong reputation for reliability, feature robustness, and performance.
Key features of PostgreSQL include:
- ACID (Atomicity, Consistency, Isolation, Durability) compliance for reliable transactions
- Support for various data types beyond standard SQL (arrays, JSON, geometric data)
- Extensibility through user-defined types, functions, and procedural languages
- Multi-version concurrency control (MVCC) for efficient handling of concurrent operations
- Robust security features including SSL support and column/row-level security
- Full text search capabilities
- Spatial data support through PostGIS extension
The SQL language is defined by ISO/IEC standards, with the most recent major revision being SQL:2016. While PostgreSQL aims to conform to the SQL standard, it also offers many extensions and additional features beyond the standard.
Some PostgreSQL-specific features include:
- Custom data types
- Table inheritance
- Rules system
- Event triggers
- Foreign data wrappers
- Advanced indexing methods
Learning SQL with PostgreSQL provides several advantages:
- Transferable skills: Core SQL knowledge is transferable across different database systems.
- Advanced features: PostgreSQL's rich feature set allows you to learn beyond basic SQL.
- Open-source ecosystem: Access to a wealth of community resources, extensions, and tools.
- Industry relevance: PostgreSQL is widely used in industry, from startups to large enterprises.
- Career opportunities: Skills in SQL and PostgreSQL are in high demand in the job market.
PostgreSQL is used by organizations of all sizes and across various industries:
- Technology companies (Apple, Cisco, Fujitsu, Red Hat)
- Financial services (Mastercard, Nasdaq)
- Government agencies
- Research institutions
- Streaming services (Netflix)
- Geospatial applications
- Educational institutions
As we progress through this training, you'll develop practical skills that are directly applicable to real-world database development and management using PostgreSQL.
Before diving into SQL queries and database concepts, we need to set up PostgreSQL on your system. This guide covers installation and initial configuration across different operating systems.
- Download the installer from the official PostgreSQL website
- Run the installer and follow the setup wizard
- Components to install:
- PostgreSQL Server
- pgAdmin (graphical administration tool)
- Command Line Tools
- Stack Builder
- Choose a password for the postgres superuser account
- Keep the default port (5432) unless you have a specific reason to change it
- Select your locale or keep the default
- Click Install to begin the installation process
After installation, you can verify it by:
- Opening the Start menu and finding PostgreSQL in the program list
- Running pgAdmin to connect to your local server
- Using the command prompt to run
psql -U postgres
If you don't have Homebrew installed, install it first:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Then install PostgreSQL:
# Install PostgreSQL
brew install postgresql
# Start the PostgreSQL service
brew services start postgresql
# Verify installation
postgres --version
Alternatively, you can download and install Postgres.app which provides a simple, user-friendly installation:
- Download Postgres.app from the official website
- Drag to your Applications folder
- Double-click to start the app
- Click "Initialize" to create a PostgreSQL database
# Update package lists
sudo apt update
# Install PostgreSQL and contrib package with additional utilities
sudo apt install postgresql postgresql-contrib
# Verify installation
psql --version
# Start the service
sudo systemctl start postgresql
# Enable PostgreSQL to start on boot
sudo systemctl enable postgresql
# Install PostgreSQL
sudo dnf install postgresql-server postgresql-contrib
# Initialize the database
sudo postgresql-setup --initdb
# Start the service
sudo systemctl start postgresql
# Enable PostgreSQL to start on boot
sudo systemctl enable postgresql
After installation, you can access PostgreSQL through:
-
psql command-line interface:
# Login as postgres user (on Linux) sudo -u postgres psql # Or on Windows psql -U postgres
-
pgAdmin: A graphical administration tool included with PostgreSQL
-
DBeaver: A cross-platform database tool that supports PostgreSQL
Using psql:
-- Connect to the default postgres database
psql -U postgres
-- Create a new database
CREATE DATABASE mydb;
-- Connect to your new database
\c mydb
-- Verify connection
SELECT current_database();
Using pgAdmin:
- Right-click on "Databases" in the browser panel
- Select "Create" > "Database..."
- Enter a name for your database
- Click "Save"
PostgreSQL uses role-based access control. Roles can be users (with login privileges) or groups.
-- Create a new user (role with login privilege)
CREATE USER myuser WITH PASSWORD 'securepassword';
-- Grant database access
GRANT ALL PRIVILEGES ON DATABASE mydb TO myuser;
-- Connect as the new user
\c mydb myuser
PostgreSQL comes with several command-line utilities:
-
psql: Interactive terminal for working with PostgreSQL
# Connect to a database psql -U username -d database_name -h hostname -p port
-
pg_dump: Backup utility
# Backup a database to a file pg_dump -U username -d database_name -f backup.sql
-
pg_restore: Restore from backup
# Restore from a backup file pg_restore -U username -d database_name -f backup.sql
-
createdb/dropdb: Create or drop databases
# Create a database createdb -U username database_name # Drop a database dropdb -U username database_name
Once connected to PostgreSQL through psql, you can use these helpful commands:
Command | Description |
---|---|
\? |
Show psql command help |
\h [command] |
Get syntax help on a specific SQL command |
\l |
List all databases |
\c database_name |
Connect to a specific database |
\dt |
List all tables in the current database |
\d table_name |
Describe a table's structure |
\du |
List all users and their roles |
\i filename |
Execute SQL commands from a file |
\e |
Open text editor to write multiline queries |
\q |
Quit psql |
PostgreSQL has several important configuration files:
- postgresql.conf: Main configuration file that controls database server parameters
- pg_hba.conf: Client authentication configuration
- pg_ident.conf: User name mapping for external authentication methods
These files are typically located in the PostgreSQL data directory:
- Windows:
C:\Program Files\PostgreSQL\[version]\data
- macOS (Homebrew):
/usr/local/var/postgres
- Linux:
/etc/postgresql/[version]/main
or/var/lib/pgsql/data
- Ensure the PostgreSQL service is running
- Check
pg_hba.conf
for proper client authentication settings - Verify firewall settings aren't blocking port 5432
- Confirm you're using the correct username and password
- Check that the role has appropriate login privileges
- Ensure
pg_hba.conf
is configured correctly
- Change the PostgreSQL port in
postgresql.conf
if port 5432 is already in use - Restart the PostgreSQL service after changing the port
Now that PostgreSQL is installed and configured, you're ready to start creating databases, tables, and writing SQL queries. The subsequent sections will guide you through these fundamental database operations.
A database is an organized collection of structured data, typically stored electronically in a computer system. A database management system (DBMS) like PostgreSQL is software that interacts with the database, applications, and users to capture and analyze data.
PostgreSQL follows the relational model, which organizes data into tables (relations) with rows and columns:
- Tables: Represent entities or concepts (e.g., students, courses)
- Rows: Individual records in a table (e.g., a specific student)
- Columns: Attributes or properties of the entity (e.g., student name, ID)
- Primary Keys: Unique identifiers for each row
- Foreign Keys: References to primary keys in other tables, establishing relationships
A schema is a logical container for database objects (tables, views, functions, etc.). Schemas help organize database objects and control access permissions. PostgreSQL creates a default schema called public
.
-- Create a new schema
CREATE SCHEMA school;
-- Create a table in the schema
CREATE TABLE school.students (
student_id SERIAL PRIMARY KEY,
name VARCHAR(100)
);
-- List schemas in the current database
\dn
Before creating tables, you need to create a database. In PostgreSQL, you can create a database using the CREATE DATABASE
command:
-- Create a new database
CREATE DATABASE student_records;
-- Connect to the database
\c student_records
You can also specify additional parameters:
CREATE DATABASE student_records
WITH
OWNER = postgres
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.UTF-8'
LC_CTYPE = 'en_US.UTF-8'
TEMPLATE = template0
CONNECTION LIMIT = -1;
PostgreSQL supports a rich set of data types. Understanding these types is crucial for designing efficient database schemas.
Data Type | Description | Size | Range |
---|---|---|---|
SMALLINT | Small integer | 2 bytes | -32,768 to 32,767 |
INTEGER | Regular integer | 4 bytes | -2,147,483,648 to 2,147,483,647 |
BIGINT | Large integer | 8 bytes | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
DECIMAL(p,s) | Exact decimal with precision p and scale s | Variable | Up to 131,072 digits before decimal point; up to 16,383 after |
NUMERIC(p,s) | Same as DECIMAL | Variable | Same as DECIMAL |
REAL | Single precision floating-point | 4 bytes | 6 decimal digits precision |
DOUBLE PRECISION | Double precision floating-point | 8 bytes | 15 decimal digits precision |
SERIAL | Auto-incrementing integer | 4 bytes | 1 to 2,147,483,647 |
BIGSERIAL | Auto-incrementing big integer | 8 bytes | 1 to 9,223,372,036,854,775,807 |
Data Type | Description | Size |
---|---|---|
CHAR(n) | Fixed-length character string | n characters |
VARCHAR(n) | Variable-length character string with limit | Up to n characters |
TEXT | Variable-length character string, unlimited length | Unlimited |
Data Type | Description | Size | Range |
---|---|---|---|
DATE | Calendar date (year, month, day) | 4 bytes | 4713 BC to 5874897 AD |
TIME | Time of day (without time zone) | 8 bytes | 00:00:00 to 24:00:00 |
TIMESTAMP | Date and time (without time zone) | 8 bytes | 4713 BC to 294276 AD |
TIMESTAMPTZ | Date and time with time zone | 8 bytes | 4713 BC to 294276 AD |
INTERVAL | Time interval | 16 bytes | -178,000,000 years to 178,000,000 years |
Data Type | Description | Size | Values |
---|---|---|---|
BOOLEAN | Logical Boolean (true/false) | 1 byte | true, false, null |
Data Type | Description | Size |
---|---|---|
UUID | Universally unique identifier | 16 bytes |
JSON | JSON data stored as text | Variable |
JSONB | JSON data stored in binary format | Variable |
ARRAY | Array of values | Variable |
CIDR | IPv4 or IPv6 network address | 7 or 19 bytes |
INET | IPv4 or IPv6 host address | 7 or 19 bytes |
MACADDR | MAC address | 6 bytes |
Data Type | Description |
---|---|
POINT | Point on a plane |
LINE | Infinite line |
LSEG | Line segment |
BOX | Rectangular box |
PATH | Closed and open paths |
POLYGON | Polygon |
CIRCLE | Circle |
Tables are the fundamental structure for storing data in PostgreSQL. The CREATE TABLE
statement defines a new table structure.
CREATE TABLE students (
student_id SERIAL PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
date_of_birth DATE,
email VARCHAR(100) UNIQUE,
enrollment_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Constraints enforce rules on data in tables:
CREATE TABLE courses (
course_id SERIAL PRIMARY KEY,
course_code VARCHAR(10) NOT NULL UNIQUE,
course_name VARCHAR(100) NOT NULL,
credits INTEGER CHECK (credits > 0 AND credits <= 6),
department VARCHAR(50),
active BOOLEAN DEFAULT TRUE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Temporary tables exist only for the duration of a session:
CREATE TEMPORARY TABLE temp_students (
id INTEGER,
name VARCHAR(100)
);
Constraints enforce rules on data to maintain integrity.
Uniquely identifies each record in a table:
CREATE TABLE departments (
department_id SERIAL PRIMARY KEY,
department_name VARCHAR(100) NOT NULL
);
Establishes a relationship between tables:
CREATE TABLE enrollments (
enrollment_id SERIAL PRIMARY KEY,
student_id INTEGER NOT NULL,
course_id INTEGER NOT NULL,
enrollment_date DATE DEFAULT CURRENT_DATE,
FOREIGN KEY (student_id) REFERENCES students (student_id),
FOREIGN KEY (course_id) REFERENCES courses (course_id)
);
Ensures all values in a column are different:
CREATE TABLE instructors (
instructor_id SERIAL PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE,
phone VARCHAR(20) UNIQUE
);
Ensures values meet a specific condition:
CREATE TABLE grades (
grade_id SERIAL PRIMARY KEY,
student_id INTEGER REFERENCES students(student_id),
course_id INTEGER REFERENCES courses(course_id),
grade NUMERIC(3,1) CHECK (grade >= 0 AND grade <= 100)
);
Ensures a column cannot have NULL values:
CREATE TABLE assignments (
assignment_id SERIAL PRIMARY KEY,
course_id INTEGER NOT NULL REFERENCES courses(course_id),
title VARCHAR(200) NOT NULL,
due_date DATE NOT NULL
);
Specifies a default value for a column:
CREATE TABLE attendance (
attendance_id SERIAL PRIMARY KEY,
student_id INTEGER REFERENCES students(student_id),
course_id INTEGER REFERENCES courses(course_id),
attendance_date DATE DEFAULT CURRENT_DATE,
present BOOLEAN DEFAULT TRUE
);
After creating a table, you might need to modify its structure.
ALTER TABLE students ADD COLUMN phone VARCHAR(20);
ALTER TABLE students DROP COLUMN phone;
ALTER TABLE students RENAME COLUMN email TO contact_email;
ALTER TABLE courses ALTER COLUMN credits TYPE NUMERIC(3,1);
ALTER TABLE students ADD CONSTRAINT email_format CHECK (email LIKE '%@%.%');
ALTER TABLE students DROP CONSTRAINT email_format;
When you no longer need a database object, you can remove it.
DROP TABLE IF EXISTS temp_students;
-- Must be connected to a different database first
DROP DATABASE IF EXISTS old_database;
-
Normalization: Organize data to reduce redundancy and improve integrity
- First Normal Form (1NF): No repeating groups, each column contains atomic values
- Second Normal Form (2NF): 1NF plus all non-key attributes depend on the entire primary key
- Third Normal Form (3NF): 2NF plus all attributes depend directly on the primary key
-
Naming Conventions:
- Use lowercase with underscores for table and column names
- Choose descriptive names
- Be consistent with plural/singular naming (usually singular for entities)
-
Data Types:
- Choose appropriate data types for columns
- Use fixed-length types (CHAR, INTEGER) when values have fixed size
- Use variable-length types (VARCHAR, TEXT) for variable content
-
Constraints:
- Always define primary keys
- Use foreign keys to maintain referential integrity
- Apply NOT NULL constraints where appropriate
- Use CHECK constraints to enforce business rules
-
Documentation:
- Add comments to document tables and columns
COMMENT ON TABLE students IS 'Students enrolled in courses'; COMMENT ON COLUMN students.email IS 'Student email address used for communications';
Data Manipulation Language (DML) consists of SQL commands that handle data operations in a database. These commands allow you to insert, update, delete, and retrieve data from database tables.
The INSERT statement adds new rows to a table.
INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...);
INSERT INTO students (first_name, last_name, email, date_of_birth)
VALUES ('John', 'Smith', 'john.smith@example.com', '1998-05-15');
If you're inserting values for all columns in the table, you can omit the column list:
INSERT INTO students
VALUES (DEFAULT, 'Maria', 'Garcia', 'maria.garcia@example.com', '1999-08-23', CURRENT_TIMESTAMP);
The DEFAULT
keyword uses the default value for the column (useful for auto-incrementing IDs).
PostgreSQL allows inserting multiple rows in a single statement:
INSERT INTO students (first_name, last_name, email)
VALUES
('Robert', 'Johnson', 'robert.j@example.com'),
('Emily', 'Williams', 'emily.w@example.com'),
('Michael', 'Brown', 'michael.b@example.com');
The RETURNING clause returns data from the rows affected by the INSERT:
INSERT INTO students (first_name, last_name, email)
VALUES ('David', 'Wilson', 'david.w@example.com')
RETURNING student_id, first_name, last_name;
This is useful for getting auto-generated values like IDs without making a separate query.
You can insert data from another table or query:
INSERT INTO active_students (student_id, first_name, last_name, email)
SELECT student_id, first_name, last_name, email
FROM students
WHERE status = 'active';
This is powerful for copying or migrating data between tables.
PostgreSQL supports "upsert" operations (insert or update if exists) using the ON CONFLICT clause:
INSERT INTO students (student_id, first_name, last_name, email)
VALUES (101, 'Thomas', 'Anderson', 'thomas.a@example.com')
ON CONFLICT (student_id)
DO UPDATE SET
first_name = EXCLUDED.first_name,
last_name = EXCLUDED.last_name,
email = EXCLUDED.email;
You can also choose to do nothing if a conflict occurs:
INSERT INTO students (student_id, first_name, last_name, email)
VALUES (101, 'Thomas', 'Anderson', 'thomas.a@example.com')
ON CONFLICT (student_id) DO NOTHING;
The UPDATE statement modifies existing data in a table.
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
UPDATE students
SET email = 'new.email@example.com'
WHERE student_id = 101;
UPDATE students
SET
last_name = 'Smith-Jones',
status = 'inactive',
updated_at = CURRENT_TIMESTAMP
WHERE student_id = 102;
UPDATE courses
SET credits = 4
WHERE department = 'Computer Science' AND credits = 3;
PostgreSQL allows updating data based on values from another table:
UPDATE enrollments
SET grade = new_grades.grade
FROM new_grades
WHERE enrollments.student_id = new_grades.student_id
AND enrollments.course_id = new_grades.course_id;
You can use subqueries in the SET clause and WHERE conditions:
UPDATE students
SET total_credits = (
SELECT SUM(courses.credits)
FROM enrollments
JOIN courses ON enrollments.course_id = courses.course_id
WHERE enrollments.student_id = students.student_id
)
WHERE status = 'active';
Like INSERT, UPDATE can return data from modified rows:
UPDATE students
SET status = 'graduated'
WHERE credits_earned >= 120
RETURNING student_id, first_name, last_name, status;
The DELETE statement removes rows from a table.
DELETE FROM table_name
WHERE condition;
DELETE FROM students
WHERE student_id = 103;
DELETE FROM enrollments
WHERE enrollment_date < '2020-01-01';
DELETE FROM students
WHERE student_id IN (
SELECT student_id
FROM enrollments
GROUP BY student_id
HAVING COUNT(*) = 0
);
This removes students who have no enrollments.
To delete all rows without removing the table structure:
DELETE FROM temporary_logs;
Return information about the deleted rows:
DELETE FROM inactive_accounts
WHERE last_login < CURRENT_DATE - INTERVAL '1 year'
RETURNING account_id, username, last_login;
Delete rows based on a join condition:
DELETE FROM enrollments
USING students
WHERE enrollments.student_id = students.student_id
AND students.status = 'withdrawn';
TRUNCATE removes all rows from a table quickly. It's faster than DELETE for removing all rows because it doesn't scan the table.
TRUNCATE TABLE table_name;
TRUNCATE TABLE temporary_logs, audit_trails;
If tables are connected by foreign keys, you might need the CASCADE option:
TRUNCATE TABLE courses CASCADE;
This truncates the courses table and any table with foreign key references to it.
Reset auto-incrementing columns:
TRUNCATE TABLE log_entries RESTART IDENTITY;
PostgreSQL 15 added support for the SQL standard MERGE statement, which combines INSERT, UPDATE, and DELETE operations.
MERGE INTO target_table AS t
USING source_table AS s
ON t.key_column = s.key_column
WHEN MATCHED THEN
UPDATE SET column1 = s.column1, column2 = s.column2
WHEN NOT MATCHED THEN
INSERT (column1, column2) VALUES (s.column1, s.column2);
MERGE INTO students AS t
USING student_updates AS s
ON t.student_id = s.student_id
WHEN MATCHED AND s.status = 'inactive' THEN
DELETE
WHEN MATCHED THEN
UPDATE SET
first_name = s.first_name,
last_name = s.last_name,
email = s.email,
status = s.status
WHEN NOT MATCHED THEN
INSERT (student_id, first_name, last_name, email, status)
VALUES (s.student_id, s.first_name, s.last_name, s.email, s.status);
This performs different actions based on whether the record exists and its status value.
DML statements operate within transactions, which ensure data integrity.
-- Start a transaction
BEGIN;
-- Perform operations
INSERT INTO students (first_name, last_name) VALUES ('Jane', 'Doe');
UPDATE courses SET active = false WHERE course_id = 101;
-- Commit changes
COMMIT;
-- Or roll back changes
-- ROLLBACK;
For more granular control, you can set savepoints within a transaction:
BEGIN;
INSERT INTO students (first_name, last_name) VALUES ('Alex', 'Smith');
SAVEPOINT after_insert;
UPDATE courses SET active = false WHERE course_id = 101;
-- Oops, wrong update - roll back to savepoint
ROLLBACK TO after_insert;
-- Perform correct update
UPDATE courses SET credits = 4 WHERE course_id = 101;
COMMIT;
-
Use Transactions: Wrap related DML operations in transactions to maintain data integrity.
-
Include WHERE Clauses: Always include a WHERE clause in UPDATE and DELETE statements unless you genuinely want to modify all rows.
-
Test with SELECT First: Before running UPDATE or DELETE, test your WHERE condition with a SELECT to verify which rows will be affected.
-
Use RETURNING for Visibility: The RETURNING clause provides visibility into affected rows without requiring a separate query.
-
Consider Performance for Bulk Operations: For large data operations, consider batch processing to avoid locking issues and memory constraints.
-
Prefer Parameterized Queries: When executing DML from application code, use parameterized queries to prevent SQL injection attacks.
-
Use ON CONFLICT for Upserts: Instead of separate EXISTS checks and conditionals, use PostgreSQL's ON CONFLICT for more efficient upsert operations.
-
Add Constraints: Utilize constraints like UNIQUE, NOT NULL, and FOREIGN KEY to prevent invalid data operations at the database level.
SQL queries allow you to retrieve, filter, and manipulate data from your database. Learning to write effective queries is essential for working with relational databases like PostgreSQL.
The SELECT
statement is the most common SQL command, used to retrieve data from one or more tables.
SELECT column1, column2, ...
FROM table_name;
To retrieve all columns from a table, use the asterisk (*) wildcard:
-- Retrieve all columns and all rows from the students table
SELECT * FROM students;
While convenient for exploration, using SELECT *
in production code is generally not recommended as it:
- Returns unnecessary data, impacting performance
- Makes code less maintainable when table structures change
- Can expose sensitive data
Instead, specify only the columns you need:
-- Retrieve only student names and emails
SELECT first_name, last_name, email FROM students;
You can rename columns in the result set using aliases:
-- Rename columns in the result set
SELECT
first_name AS "First Name",
last_name AS "Last Name",
email AS "Contact Email"
FROM students;
The AS
keyword is optional:
SELECT first_name "First Name", last_name "Last Name" FROM students;
You can include expressions in your SELECT statement:
-- Concatenate first and last names
SELECT
student_id,
first_name || ' ' || last_name AS full_name,
EXTRACT(YEAR FROM CURRENT_DATE) - EXTRACT(YEAR FROM date_of_birth) AS approximate_age
FROM students;
To eliminate duplicates from your results, use the DISTINCT
keyword:
-- Get a list of unique departments
SELECT DISTINCT department FROM courses;
-- Get unique combinations of values
SELECT DISTINCT department, credits FROM courses;
The WHERE
clause filters rows based on specified conditions.
SELECT column1, column2, ...
FROM table_name
WHERE condition;
PostgreSQL supports standard comparison operators:
-- Equal to
SELECT * FROM students WHERE first_name = 'John';
-- Not equal to
SELECT * FROM students WHERE first_name <> 'John'; -- or !=
-- Greater than
SELECT * FROM courses WHERE credits > 3;
-- Less than
SELECT * FROM courses WHERE credits < 3;
-- Greater than or equal to
SELECT * FROM courses WHERE credits >= 3;
-- Less than or equal to
SELECT * FROM courses WHERE credits <= 3;
Combine multiple conditions using logical operators:
-- AND: Both conditions must be true
SELECT * FROM students
WHERE date_of_birth >= '2000-01-01' AND date_of_birth <= '2005-12-31';
-- OR: At least one condition must be true
SELECT * FROM courses
WHERE department = 'Computer Science' OR department = 'Mathematics';
-- NOT: Negates a condition
SELECT * FROM students
WHERE NOT email LIKE '%@gmail.com';
The BETWEEN
operator tests if a value is within a range (inclusive):
-- Equivalent to value >= lower_bound AND value <= upper_bound
SELECT * FROM students
WHERE date_of_birth BETWEEN '2000-01-01' AND '2005-12-31';
SELECT * FROM courses
WHERE credits BETWEEN 3 AND 5;
The IN
operator tests if a value matches any value in a list:
-- Equivalent to value = value1 OR value = value2 OR ...
SELECT * FROM courses
WHERE department IN ('Computer Science', 'Mathematics', 'Physics');
-- Can also use with subqueries (covered later)
SELECT * FROM students
WHERE student_id IN (SELECT student_id FROM enrollments WHERE course_id = 101);
The LIKE
operator performs pattern matching using wildcards:
-- % represents zero or more characters
SELECT * FROM students WHERE email LIKE '%@gmail.com';
SELECT * FROM students WHERE first_name LIKE 'J%'; -- Names starting with J
-- _ represents a single character
SELECT * FROM students WHERE last_name LIKE 'Sm_th'; -- Matches Smith, Smyth
PostgreSQL provides ILIKE
for case-insensitive matching:
-- Case-insensitive matching
SELECT * FROM students WHERE first_name ILIKE 'john'; -- Matches John, JOHN, john
To test for NULL values (missing or undefined data):
-- Find students with no email address
SELECT * FROM students WHERE email IS NULL;
-- Find students with an email address
SELECT * FROM students WHERE email IS NOT NULL;
Remember that NULL
represents an unknown or missing value and cannot be compared using =
or !=
.
The ORDER BY
clause sorts the result set:
-- Sort by a single column (ascending by default)
SELECT * FROM students ORDER BY last_name;
-- Explicitly specify ascending order
SELECT * FROM students ORDER BY last_name ASC;
-- Sort in descending order
SELECT * FROM students ORDER BY date_of_birth DESC;
-- Sort by multiple columns
SELECT * FROM students ORDER BY last_name ASC, first_name ASC;
-- Sort using column position in the result set
SELECT first_name, last_name FROM students ORDER BY 2, 1; -- Order by last_name, then first_name
To restrict the number of rows returned, use the LIMIT
clause:
-- Return only the first 10 rows
SELECT * FROM students LIMIT 10;
-- Combine with ORDER BY for "top N" queries
SELECT * FROM courses ORDER BY credits DESC LIMIT 5; -- Top 5 courses by credits
For pagination, combine LIMIT
with OFFSET
:
-- Skip the first 10 rows, return the next 10
SELECT * FROM students ORDER BY student_id LIMIT 10 OFFSET 10;
-- Page 3 with 10 records per page (returns records 21-30)
SELECT * FROM students ORDER BY student_id LIMIT 10 OFFSET 20;
Aggregate functions perform calculations on a set of values and return a single value:
Counts the number of rows:
-- Count all rows
SELECT COUNT(*) FROM students;
-- Count non-NULL values in a column
SELECT COUNT(email) FROM students;
-- Count distinct values
SELECT COUNT(DISTINCT department) FROM courses;
Calculates the sum of numeric values:
-- Total credits of all courses
SELECT SUM(credits) FROM courses;
-- Sum using a condition
SELECT SUM(credits) FROM courses WHERE department = 'Computer Science';
Calculates the average (mean) of numeric values:
-- Average credits per course
SELECT AVG(credits) FROM courses;
-- Note: AVG ignores NULL values
Find the minimum or maximum values:
-- Youngest student
SELECT MIN(date_of_birth) FROM students;
-- Oldest student
SELECT MAX(date_of_birth) FROM students;
-- Courses with minimum and maximum credits
SELECT MIN(credits), MAX(credits) FROM courses;
In PostgreSQL, you can concatenate values from multiple rows using string_agg
:
-- Get a comma-separated list of course codes
SELECT string_agg(course_code, ', ') FROM courses;
-- Group by department
SELECT department, string_agg(course_code, ', ')
FROM courses
GROUP BY department;
The GROUP BY
clause groups rows that have the same values:
-- Count students born each year
SELECT
EXTRACT(YEAR FROM date_of_birth) AS birth_year,
COUNT(*) AS student_count
FROM students
GROUP BY birth_year
ORDER BY birth_year;
-- Calculate average credits by department
SELECT
department,
COUNT(*) AS course_count,
AVG(credits) AS avg_credits
FROM courses
GROUP BY department
ORDER BY department;
The HAVING
clause filters groups based on aggregate values (similar to WHERE
but for groups):
-- Find departments with more than 5 courses
SELECT
department,
COUNT(*) AS course_count
FROM courses
GROUP BY department
HAVING COUNT(*) > 5
ORDER BY course_count DESC;
-- Find average credits by department for departments with average > 3.0
SELECT
department,
AVG(credits) AS avg_credits
FROM courses
GROUP BY department
HAVING AVG(credits) > 3.0
ORDER BY avg_credits DESC;
PostgreSQL supports the SQL standard FETCH
clause as an alternative to LIMIT
:
-- Equivalent to LIMIT 10
SELECT * FROM students
FETCH FIRST 10 ROWS ONLY;
-- Equivalent to LIMIT 10 OFFSET 20
SELECT * FROM students
OFFSET 20 ROWS
FETCH FIRST 10 ROWS ONLY;
-- WITH TIES returns extra rows that tie with the last row
SELECT * FROM courses
ORDER BY credits DESC
FETCH FIRST 5 ROWS WITH TIES;
COALESCE
returns the first non-NULL value in a list:
-- Replace NULL emails with 'No Email'
SELECT
first_name,
last_name,
COALESCE(email, 'No Email') AS contact
FROM students;
NULLIF
compares two expressions and returns NULL if they're equal, otherwise the first expression:
-- Avoid division by zero
SELECT
course_id,
registered_students,
max_students,
registered_students / NULLIF(max_students, 0) AS fill_rate
FROM course_stats;
-- Find students with gmail or yahoo email addresses
SELECT * FROM students
WHERE email LIKE '%@gmail.com' OR email LIKE '%@yahoo.com';
-- Find students who enrolled in the past 30 days
SELECT * FROM students
WHERE enrollment_date >= CURRENT_DATE - INTERVAL '30 days';
-- Find the course with the highest credits in each department
SELECT department, course_name, credits
FROM (
SELECT
department,
course_name,
credits,
RANK() OVER (PARTITION BY department ORDER BY credits DESC) AS credit_rank
FROM courses
) ranked
WHERE credit_rank = 1;
-- Calculate the age of each student
SELECT
first_name,
last_name,
date_of_birth,
AGE(date_of_birth) AS age,
DATE_PART('year', AGE(date_of_birth)) AS age_years
FROM students;
PostgreSQL offers a rich set of built-in functions that make data manipulation and transformation more efficient. This guide covers the most commonly used PostgreSQL functions with examples and explanations.
- String Functions
- Date and Time Functions
- Numeric Functions
- Aggregate Functions
- Conditional Expressions
- JSON Functions
- Type Conversion Functions
The CONCAT
function combines multiple strings, while CONCAT_WS
(Concatenate With Separator) joins strings with a specified separator.
-- Basic concatenation
SELECT CONCAT('Hello', ' ', 'World'); -- Output: 'Hello World'
-- Concatenate with separator (first argument is the separator)
SELECT CONCAT_WS(', ', 'John', 'Doe', 'New York'); -- Output: 'John, Doe, New York'
Extracts a portion of a string.
-- Extract characters from position 1 for 5 characters
SELECT SUBSTRING('PostgreSQL', 1, 5); -- Output: 'Postg'
-- Extract from position 7 to the end
SELECT SUBSTRING('PostgreSQL', 7); -- Output: 'SQL'
Change the case of strings.
SELECT UPPER('postgresql'); -- Output: 'POSTGRESQL'
SELECT LOWER('PostgreSQL'); -- Output: 'postgresql'
SELECT INITCAP('hello world'); -- Output: 'Hello World'
Return the length of a string.
SELECT LENGTH('PostgreSQL'); -- Output: 10
SELECT CHAR_LENGTH('PostgreSQL'); -- Output: 10
Remove characters (spaces by default) from strings.
SELECT TRIM(' PostgreSQL '); -- Output: 'PostgreSQL'
SELECT LTRIM(' PostgreSQL '); -- Output: 'PostgreSQL '
SELECT RTRIM(' PostgreSQL '); -- Output: ' PostgreSQL'
-- Trim specific characters
SELECT TRIM(BOTH 'x' FROM 'xxxPostgreSQLxxx'); -- Output: 'PostgreSQL'
Replaces all occurrences of a substring with another substring.
SELECT REPLACE('PostgreSQL Database', 'Database', 'DBMS'); -- Output: 'PostgreSQL DBMS'
Find the position of a substring within a string.
SELECT POSITION('SQL' IN 'PostgreSQL'); -- Output: 7
SELECT STRPOS('PostgreSQL', 'SQL'); -- Output: 7
Splits a string on a delimiter and returns the specified field.
-- Split on delimiter and return the specified part (1-indexed)
SELECT SPLIT_PART('2023-03-15', '-', 1); -- Output: '2023'
SELECT SPLIT_PART('2023-03-15', '-', 2); -- Output: '03'
Return the current date, time, or timestamp.
SELECT CURRENT_DATE; -- Output: '2025-03-12'
SELECT CURRENT_TIME; -- Output: '14:30:45.123456+00'
SELECT CURRENT_TIMESTAMP; -- Output: '2025-03-12 14:30:45.123456+00'
Extract components from date/time values.
SELECT DATE_PART('year', TIMESTAMP '2025-03-12 14:30:45'); -- Output: 2025
SELECT DATE_PART('month', TIMESTAMP '2025-03-12 14:30:45'); -- Output: 3
-- EXTRACT is equivalent to DATE_PART
SELECT EXTRACT(YEAR FROM TIMESTAMP '2025-03-12 14:30:45'); -- Output: 2025
SELECT EXTRACT(HOUR FROM TIMESTAMP '2025-03-12 14:30:45'); -- Output: 14
Truncates a timestamp to a specified precision.
SELECT DATE_TRUNC('hour', TIMESTAMP '2025-03-12 14:30:45'); -- Output: '2025-03-12 14:00:00'
SELECT DATE_TRUNC('day', TIMESTAMP '2025-03-12 14:30:45'); -- Output: '2025-03-12 00:00:00'
SELECT DATE_TRUNC('month', TIMESTAMP '2025-03-12 14:30:45'); -- Output: '2025-03-01 00:00:00'
Calculates the difference between two timestamps.
-- Calculate age between two timestamps
SELECT AGE(TIMESTAMP '2025-03-12', TIMESTAMP '2020-01-01'); -- Output: '5 years 2 mons 11 days'
-- Calculate age from current date
SELECT AGE(TIMESTAMP '1990-05-15'); -- Output depends on current date
Add or subtract time intervals.
-- Add 1 day to a date
SELECT DATE '2025-03-12' + INTERVAL '1 day'; -- Output: '2025-03-13 00:00:00'
-- Add 3 hours to a timestamp
SELECT TIMESTAMP '2025-03-12 14:30:00' + INTERVAL '3 hours'; -- Output: '2025-03-12 17:30:00'
-- Complex interval addition
SELECT TIMESTAMP '2025-03-12 14:30:00' + INTERVAL '1 year 2 months 3 days 4 hours'; -- Output: '2026-05-15 18:30:00'
Formats date/time values as strings.
SELECT TO_CHAR(CURRENT_DATE, 'DD-Mon-YYYY'); -- Output: '12-Mar-2025'
SELECT TO_CHAR(TIMESTAMP '2025-03-12 14:30:45', 'HH24:MI:SS'); -- Output: '14:30:45'
SELECT TO_CHAR(TIMESTAMP '2025-03-12 14:30:45', 'Month DD, YYYY HH12:MI:SS AM'); -- Output: 'March 12, 2025 02:30:45 PM'
Basic arithmetic operations and mathematical functions.
SELECT ABS(-42); -- Output: 42
SELECT ROUND(42.4); -- Output: 42
SELECT ROUND(42.5); -- Output: 43
SELECT ROUND(42.4567, 2); -- Output: 42.46
SELECT CEIL(42.1); -- Output: 43
SELECT FLOOR(42.9); -- Output: 42
SELECT TRUNC(42.9); -- Output: 42
SELECT POWER(2, 3); -- Output: 8
SELECT SQRT(25); -- Output: 5
Generate random values.
-- Random value between 0 and 1
SELECT RANDOM(); -- Output: a value between 0 and 1
-- Random integer between 1 and 100
SELECT FLOOR(RANDOM() * 100) + 1; -- Output: a value between 1 and 100
Basic aggregate functions for data analysis.
-- Assuming a table named 'orders' with columns 'order_id', 'amount', etc.
SELECT
COUNT(*) AS total_orders,
COUNT(DISTINCT customer_id) AS unique_customers,
SUM(amount) AS total_amount,
AVG(amount) AS average_amount,
MIN(amount) AS minimum_amount,
MAX(amount) AS maximum_amount
FROM orders;
Functions for statistical analysis.
SELECT
STDDEV(amount) AS standard_deviation,
VARIANCE(amount) AS variance,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median
FROM orders;
Concatenates string values with a separator.
-- Concatenate all customer names with a comma separator
SELECT STRING_AGG(customer_name, ', ') FROM customers;
-- Concatenate with ordering
SELECT STRING_AGG(customer_name, ', ' ORDER BY created_at) FROM customers;
Aggregates values into an array.
-- Create an array of product IDs ordered by a customer
SELECT
customer_id,
ARRAY_AGG(product_id ORDER BY order_date) AS ordered_products
FROM orders
GROUP BY customer_id;
Implements if-then-else logic.
-- Simple CASE expression
SELECT
product_name,
CASE category
WHEN 'Electronics' THEN 'Tech'
WHEN 'Clothing' THEN 'Apparel'
ELSE 'Other'
END AS simplified_category
FROM products;
-- Searched CASE expression
SELECT
order_id,
CASE
WHEN amount < 50 THEN 'Small'
WHEN amount BETWEEN 50 AND 200 THEN 'Medium'
ELSE 'Large'
END AS order_size
FROM orders;
Returns the first non-NULL value in a list.
-- Replace NULL with a default value
SELECT COALESCE(shipping_address, billing_address, 'No address provided') FROM customers;
-- Use in calculations to handle NULLs
SELECT product_name, COALESCE(discount, 0) * price AS discount_amount FROM products;
Returns NULL if the two specified expressions are equal.
-- Avoid division by zero
SELECT
amount / NULLIF(quantity, 0) AS unit_price
FROM order_items;
Return the largest or smallest value from a list of expressions.
SELECT GREATEST(10, 20, 30); -- Output: 30
SELECT LEAST(10, 20, 30); -- Output: 10
-- Use with columns
SELECT
order_id,
GREATEST(standard_delivery_date, express_delivery_date) AS latest_possible_delivery
FROM shipments;
Create and manipulate JSON data.
-- Create a JSON object
SELECT JSON_BUILD_OBJECT('name', 'John', 'age', 30); -- Output: {"name": "John", "age": 30}
-- Extract a field from a JSON object
SELECT customer_data->'name' FROM customers; -- Returns the name as JSON
SELECT customer_data->>'name' FROM customers; -- Returns the name as text
-- Extract a nested field
SELECT customer_data->'address'->>'city' FROM customers;
Work with JSON arrays.
-- Create a JSON array
SELECT JSON_BUILD_ARRAY(1, 2, 3, 'text'); -- Output: [1, 2, 3, "text"]
-- Extract an element from a JSON array
SELECT product_tags->0 FROM products; -- First element as JSON
SELECT product_tags->>0 FROM products; -- First element as text
Expand JSON objects and arrays.
-- Expand a JSON object into key-value pairs
SELECT * FROM JSON_EACH('{"name": "John", "age": 30}');
-- Expand a JSON array into rows
SELECT * FROM JSON_ARRAY_ELEMENTS('[1, 2, 3, 4]');
Convert values from one data type to another.
-- Using CAST
SELECT CAST('42' AS INTEGER); -- Output: 42
SELECT CAST(42 AS TEXT); -- Output: '42'
-- Using :: operator (PostgreSQL-specific syntax)
SELECT '42'::INTEGER; -- Output: 42
SELECT 42::TEXT; -- Output: '42'
-- Date conversions
SELECT CAST('2025-03-12' AS DATE); -- Output: 2025-03-12
SELECT '2025-03-12'::DATE; -- Output: 2025-03-12
Format-aware conversions.
-- String to number with format
SELECT TO_NUMBER('1,234.56', '9,999.99'); -- Output: 1234.56
-- Number to string with format
SELECT TO_CHAR(1234.56, '9,999.99'); -- Output: ' 1,234.56'
-- String to date with format
SELECT TO_DATE('12-Mar-2025', 'DD-Mon-YYYY'); -- Output: 2025-03-12
-- String to timestamp with format
SELECT TO_TIMESTAMP('12-Mar-2025 14:30:45', 'DD-Mon-YYYY HH24:MI:SS'); -- Output: 2025-03-12 14:30:45
-
Choose the Right Function: Select functions that match your specific needs to avoid unnecessary transformations.
-
Index Considerations: Be aware that using functions on indexed columns in WHERE clauses can prevent index usage. Consider creating functional indexes when necessary.
-
Performance: Complex functions or operations on large datasets may impact performance. Test and optimize queries with large data volumes.
-
NULL Handling: Always consider how your functions handle NULL values and use COALESCE or NULLIF when needed.
-
Type Safety: Ensure proper type casting to avoid unexpected results or errors.
-
Version Compatibility: Some functions may vary across PostgreSQL versions. Check the documentation for your specific version.
These functions form the backbone of PostgreSQL data manipulation capabilities. By mastering them, you'll be able to transform, analyze, and extract insights from your data more effectively.
Joins are powerful SQL operations that connect data between different tables in a relational database. They're a fundamental concept in relational database design and allow you to retrieve related data from multiple tables in a single query.
Before diving into joins, it's important to understand the types of relationships between tables:
-
One-to-One: Each row in Table A relates to exactly one row in Table B.
- Example: A student and their student ID card
-
One-to-Many: Each row in Table A relates to multiple rows in Table B.
- Example: A course and its enrollments
-
Many-to-Many: Each row in Table A relates to multiple rows in Table B, and vice versa.
- Example: Students and courses (a student takes many courses, and a course has many students)
In relational databases, these relationships are typically implemented using foreign keys.
PostgreSQL supports various types of joins, each serving different purposes:
An inner join returns rows only when there's a match in both tables. It's the most common join type.
SELECT students.first_name, students.last_name, courses.course_name
FROM students
INNER JOIN enrollments ON students.student_id = enrollments.student_id
INNER JOIN courses ON enrollments.course_id = courses.course_id;
You can simplify this by using table aliases:
SELECT s.first_name, s.last_name, c.course_name
FROM students s
INNER JOIN enrollments e ON s.student_id = e.student_id
INNER JOIN courses c ON e.course_id = c.course_id;
Visual representation of INNER JOIN:
Table A Table B
1 1 ← These rows are returned
2 2 ← These rows are returned
3 (No match, not returned)
4 (No match, not returned)
A left join returns all rows from the left table and matching rows from the right table. When there's no match, NULL values are used for columns from the right table.
SELECT s.first_name, s.last_name, c.course_name
FROM students s
LEFT JOIN enrollments e ON s.student_id = e.student_id
LEFT JOIN courses c ON e.course_id = c.course_id;
This returns all students, even those not enrolled in any courses (their course_name will be NULL).
Visual representation of LEFT JOIN:
Table A Table B
1 1 ← Match, returned with values from both tables
2 2 ← Match, returned with values from both tables
3 ← No match, returned with NULLs for Table B columns
4 (Not included)
A right join is the opposite of a left join. It returns all rows from the right table and matching rows from the left table.
SELECT s.first_name, s.last_name, c.course_name
FROM students s
RIGHT JOIN enrollments e ON s.student_id = e.student_id
RIGHT JOIN courses c ON e.course_id = c.course_id;
This returns all courses, even those without student enrollments.
Visual representation of RIGHT JOIN:
Table A Table B
1 1 ← Match, returned with values from both tables
2 2 ← Match, returned with values from both tables
4 ← No match, returned with NULLs for Table A columns
3 (Not included)
A full join returns all rows when there's a match in either the left or right table. It combines the results of both LEFT and RIGHT joins.
SELECT s.first_name, s.last_name, c.course_name
FROM students s
FULL JOIN enrollments e ON s.student_id = e.student_id
FULL JOIN courses c ON e.course_id = c.course_id;
This includes all students and all courses, regardless of whether there are enrollments connecting them.
Visual representation of FULL JOIN:
Table A Table B
1 1 ← Match, returned with values from both tables
2 2 ← Match, returned with values from both tables
3 ← No match, returned with NULLs for Table B columns
4 ← No match, returned with NULLs for Table A columns
A cross join creates a Cartesian product, combining each row from the first table with every row from the second table. No join condition is needed.
SELECT s.first_name, c.course_name
FROM students s
CROSS JOIN courses c;
If students has 10 rows and courses has 5 rows, the result will have 50 rows.
Visual representation of CROSS JOIN:
Table A Table B Result
1 A 1,A
1 B 1,B
2 A 2,A
2 B 2,B
A self join is when a table is joined with itself. This is useful for hierarchical or related data within the same table.
-- Find all pairs of students from the same city
SELECT s1.first_name AS student1, s2.first_name AS student2, s1.city
FROM students s1
JOIN students s2 ON s1.city = s2.city AND s1.student_id < s2.student_id;
The s1.student_id < s2.student_id
condition prevents duplicate pairs and self-matches.
A natural join automatically joins tables based on columns with the same name. While convenient, it can be unpredictable if tables share column names that aren't intended for joining.
SELECT * FROM students NATURAL JOIN student_details;
This automatically joins on all columns with the same name in both tables.
The syntax shown above uses the modern ANSI SQL join syntax. An older syntax puts the join conditions in the WHERE clause:
-- Modern syntax
SELECT s.first_name, c.course_name
FROM students s
JOIN enrollments e ON s.student_id = e.student_id
JOIN courses c ON e.course_id = c.course_id;
-- Older syntax
SELECT s.first_name, c.course_name
FROM students s, enrollments e, courses c
WHERE s.student_id = e.student_id
AND e.course_id = c.course_id;
The modern syntax is preferred because it clearly separates join conditions from filtering conditions.
When join columns have the same name, you can use the USING clause to simplify:
-- With ON clause
SELECT students.first_name, courses.course_name
FROM students
JOIN enrollments ON students.student_id = enrollments.student_id
JOIN courses ON enrollments.course_id = courses.course_id;
-- With USING clause
SELECT students.first_name, courses.course_name
FROM students
JOIN enrollments USING (student_id)
JOIN courses USING (course_id);
The USING clause works only when the join columns have identical names.
You can add extra conditions to your join to filter relationships:
SELECT s.first_name, c.course_name
FROM students s
JOIN enrollments e ON s.student_id = e.student_id
AND e.enrollment_date > '2023-01-01'
JOIN courses c ON e.course_id = c.course_id;
You can join the same table multiple times to explore different relationships:
-- Find courses that share prerequisites
SELECT c1.course_name AS course, c2.course_name AS shares_prereq_with
FROM courses c1
JOIN course_prerequisites cp1 ON c1.course_id = cp1.course_id
JOIN course_prerequisites cp2 ON cp1.prerequisite_id = cp2.prerequisite_id
JOIN courses c2 ON cp2.course_id = c2.course_id
WHERE c1.course_id < c2.course_id;
PostgreSQL supports LATERAL joins, which allow the right-hand side of the join to reference columns from tables on the left:
-- For each student, find their top 3 highest grades
SELECT s.first_name, s.last_name, c.course_name, e.grade
FROM students s
CROSS JOIN LATERAL (
SELECT c.course_name, e.grade
FROM enrollments e
JOIN courses c ON e.course_id = c.course_id
WHERE e.student_id = s.student_id
ORDER BY e.grade DESC
LIMIT 3
) AS c;
This is powerful for correlated subqueries that need access to the outer table.
Not all joins are based on equality conditions. For example:
-- Match students with courses appropriate for their grade level
SELECT s.first_name, c.course_name
FROM students s
JOIN courses c ON s.grade_level BETWEEN c.min_grade_level AND c.max_grade_level;
PostgreSQL supports full-text search, which can be used in joins:
SELECT s.first_name, s.last_name, d.document_title
FROM students s
JOIN documents d ON to_tsvector(d.document_text) @@ to_tsquery(s.first_name);
This would find documents containing students' first names.
Joins can impact query performance, especially with large tables:
-
Indexing: Ensure joined columns are properly indexed.
-
Join Order: PostgreSQL's query planner generally optimizes join order, but sometimes explicit ordering through CTEs or subqueries can help.
-
Statistics: Keep your database statistics updated (
ANALYZE
command) to help the query planner make good decisions. -
Query Analysis: Use
EXPLAIN
orEXPLAIN ANALYZE
to understand how joins are executed.
EXPLAIN ANALYZE
SELECT s.first_name, c.course_name
FROM students s
JOIN enrollments e ON s.student_id = e.student_id
JOIN courses c ON e.course_id = c.course_id;
Joining tables with one-to-many relationships can produce duplicate rows:
-- May produce duplicates if a student has multiple enrollments
SELECT s.first_name, s.last_name
FROM students s
JOIN enrollments e ON s.student_id = e.student_id;
Solution: Use DISTINCT or GROUP BY:
SELECT DISTINCT s.first_name, s.last_name
FROM students s
JOIN enrollments e ON s.student_id = e.student_id;
If you're missing expected rows, check that you're using the right join type. An INNER JOIN might exclude rows that a LEFT JOIN would include.
For slow joins:
- Ensure joined columns are indexed
- Consider denormalizing if appropriate
- Use
EXPLAIN ANALYZE
to identify bottlenecks - Consider partitioning large tables
- Filter data early in the query to reduce join size
For organizational charts or categories:
WITH RECURSIVE employee_hierarchy AS (
-- Base case: top-level employees (no manager)
SELECT employee_id, name, manager_id, 0 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: employees with managers
SELECT e.employee_id, e.name, e.manager_id, eh.level + 1
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy ORDER BY level, name;
-- For each student, count enrollments and calculate GPA
SELECT
s.student_id,
s.first_name,
s.last_name,
COUNT(e.enrollment_id) AS num_courses,
AVG(e.grade) AS gpa
FROM students s
LEFT JOIN enrollments e ON s.student_id = e.student_id
GROUP BY s.student_id, s.first_name, s.last_name;
-- Find students with no enrollments
SELECT s.*
FROM students s
LEFT JOIN enrollments e ON s.student_id = e.student_id
WHERE e.enrollment_id IS NULL;
Many-to-many relationships typically use a junction table:
-- Find all students in a specific course
SELECT s.first_name, s.last_name
FROM students s
JOIN enrollments e ON s.student_id = e.student_id
JOIN courses c ON e.course_id = c.course_id
WHERE c.course_code = 'CS101';
Joins are essential for working with relational data across multiple tables. PostgreSQL offers a rich set of join types and capabilities to handle various relationship scenarios. Understanding when to use each type of join and how to optimize their performance will significantly enhance your ability to work effectively with relational databases.
PostgreSQL is a powerful open-source object-relational database system with over 35 years of active development. It has earned a strong reputation for reliability, feature robustness, and performance. This document explores three crucial PostgreSQL features that help manage complex queries and database operations: views, common table expressions (CTEs), and stored procedures.
A view in PostgreSQL is a virtual table representing the result of a stored query. Views don't store data themselves but provide a way to encapsulate complex queries and present them as if they were tables.
The basic syntax for creating a view is:
CREATE [OR REPLACE] VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Let's create a sample database to work with:
-- Create sample tables
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(customer_id),
amount DECIMAL(10, 2),
order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Insert sample data
INSERT INTO customers (name, email) VALUES
('John Doe', 'john@example.com'),
('Jane Smith', 'jane@example.com'),
('Michael Johnson', 'michael@example.com');
INSERT INTO orders (customer_id, amount) VALUES
(1, 150.50),
(2, 245.75),
(1, 50.25),
(3, 300.00),
(2, 125.40);
Now, let's create a simple view that joins these tables:
CREATE VIEW customer_orders AS
SELECT
c.customer_id,
c.name,
c.email,
o.order_id,
o.amount,
o.order_date
FROM
customers c
JOIN
orders o ON c.customer_id = o.customer_id;
After creating this view, you can query it like a regular table:
SELECT * FROM customer_orders;
SELECT name, SUM(amount) FROM customer_orders GROUP BY name;
Unlike regular views, materialized views physically store the result set. This can significantly improve performance for complex queries, especially those involving aggregations or joins, at the cost of data freshness.
CREATE MATERIALIZED VIEW customer_order_summary AS
SELECT
c.customer_id,
c.name,
COUNT(o.order_id) AS order_count,
SUM(o.amount) AS total_spend
FROM
customers c
LEFT JOIN
orders o ON c.customer_id = o.customer_id
GROUP BY
c.customer_id, c.name;
Materialized views need to be manually refreshed to update their data:
REFRESH MATERIALIZED VIEW customer_order_summary;
-- With CONCURRENTLY option (requires unique index)
CREATE UNIQUE INDEX customer_id_idx ON customer_order_summary (customer_id);
REFRESH MATERIALIZED VIEW CONCURRENTLY customer_order_summary;
In PostgreSQL, some views are automatically updatable if they meet certain conditions:
- The view must not contain WITH, DISTINCT, GROUP BY, HAVING, LIMIT, or OFFSET clauses
- The view must not use set operations (UNION, INTERSECT, EXCEPT)
- The FROM list must contain exactly one entry which must be a table
Example of an updatable view:
CREATE VIEW high_value_customers AS
SELECT customer_id, name, email
FROM customers
WHERE customer_id IN (
SELECT customer_id FROM orders WHERE amount > 200
);
-- This works because the view meets updatable criteria
UPDATE high_value_customers SET email = 'michael.new@example.com' WHERE customer_id = 3;
For more complex views, you can define INSTEAD OF triggers to handle updates:
CREATE OR REPLACE FUNCTION update_customer_orders()
RETURNS TRIGGER AS $$
BEGIN
UPDATE customers SET name = NEW.name, email = NEW.email
WHERE customer_id = NEW.customer_id;
UPDATE orders SET amount = NEW.amount
WHERE order_id = NEW.order_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER customer_orders_update
INSTEAD OF UPDATE ON customer_orders
FOR EACH ROW EXECUTE FUNCTION update_customer_orders();
- Use descriptive names that indicate the purpose of the view
- Document views with comments:
COMMENT ON VIEW customer_orders IS 'Joins customers and their orders for reporting';
- Consider performance implications for views joining multiple large tables
- Use materialized views for expensive computations that don't need real-time data
- Create indexes on materialized views to speed up queries
Common Table Expressions (CTEs) provide a way to write auxiliary statements for use in a larger query. Think of them as temporary result sets or views that exist only for the duration of the query.
The basic syntax for a CTE is:
WITH cte_name AS (
SELECT ...
)
SELECT * FROM cte_name;
A simple example:
WITH customer_spending AS (
SELECT
customer_id,
SUM(amount) AS total_spent
FROM
orders
GROUP BY
customer_id
)
SELECT
c.name,
COALESCE(cs.total_spent, 0) AS total_spent
FROM
customers c
LEFT JOIN
customer_spending cs ON c.customer_id = cs.customer_id
ORDER BY
total_spent DESC;
You can also define multiple CTEs in a single query:
WITH order_stats AS (
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spent,
AVG(amount) AS avg_order_value
FROM
orders
GROUP BY
customer_id
),
customer_categories AS (
SELECT
customer_id,
CASE
WHEN total_spent > 300 THEN 'Premium'
WHEN total_spent > 100 THEN 'Regular'
ELSE 'Basic'
END AS category
FROM
order_stats
)
SELECT
c.name,
os.order_count,
os.total_spent,
os.avg_order_value,
cc.category
FROM
customers c
JOIN
order_stats os ON c.customer_id = os.customer_id
JOIN
customer_categories cc ON c.customer_id = cc.customer_id;
One of the most powerful features of CTEs is recursion, which allows you to solve hierarchical or graph-traversal problems:
-- Create a table for hierarchical data
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
name VARCHAR(100),
manager_id INT REFERENCES employees(employee_id)
);
-- Insert sample data
INSERT INTO employees (employee_id, name, manager_id) VALUES
(1, 'CEO', NULL),
(2, 'CTO', 1),
(3, 'CFO', 1),
(4, 'Lead Developer', 2),
(5, 'Developer 1', 4),
(6, 'Developer 2', 4),
(7, 'Accountant', 3);
-- Recursive CTE to traverse the hierarchy
WITH RECURSIVE org_hierarchy AS (
-- Base case: employees with no manager (root)
SELECT
employee_id,
name,
manager_id,
0 AS level,
name AS path
FROM
employees
WHERE
manager_id IS NULL
UNION ALL
-- Recursive case: find employees managed by previous iteration
SELECT
e.employee_id,
e.name,
e.manager_id,
oh.level + 1,
oh.path || ' > ' || e.name
FROM
employees e
JOIN
org_hierarchy oh ON e.manager_id = oh.employee_id
)
SELECT
employee_id,
level,
REPEAT(' ', level) || name AS employee,
path
FROM
org_hierarchy
ORDER BY
path;
CTEs can be used for data modification (INSERT, UPDATE, DELETE) operations:
-- Update customer emails based on certain conditions
WITH customers_to_update AS (
SELECT
customer_id,
REPLACE(email, '@example.com', '@newdomain.com') AS new_email
FROM
customers
WHERE
email LIKE '%@example.com'
)
UPDATE customers c
SET email = ctu.new_email
FROM customers_to_update ctu
WHERE c.customer_id = ctu.customer_id;
-- Delete orders and return information about deleted records
WITH deleted_orders AS (
DELETE FROM orders
WHERE amount < 100
RETURNING *
)
SELECT
customer_id,
COUNT(*) AS small_orders_removed,
SUM(amount) AS total_removed
FROM
deleted_orders
GROUP BY
customer_id;
While subqueries can often achieve the same result as CTEs, CTEs offer several advantages:
- Improved readability: CTEs make complex queries more readable by breaking them into named blocks
- Reusability within the query: A CTE can be referenced multiple times in the same query
- Recursion: Only CTEs support recursive queries
- Optimization: The PostgreSQL optimizer may handle CTEs differently from subqueries
Example comparing CTE vs subquery approach:
-- Using a CTE
WITH high_value_orders AS (
SELECT customer_id, COUNT(*) AS order_count
FROM orders
WHERE amount > 200
GROUP BY customer_id
)
SELECT
c.name,
COALESCE(hvo.order_count, 0) AS high_value_orders
FROM
customers c
LEFT JOIN
high_value_orders hvo ON c.customer_id = hvo.customer_id;
-- Using a subquery
SELECT
c.name,
COALESCE(
(SELECT COUNT(*)
FROM orders
WHERE customer_id = c.customer_id AND amount > 200
), 0
) AS high_value_orders
FROM
customers c;
Stored procedures in PostgreSQL are database objects containing a set of SQL statements that can be executed as a single unit. PostgreSQL added proper stored procedure support in version 11 with the introduction of CREATE PROCEDURE
.
Basic syntax for creating a stored procedure:
CREATE [OR REPLACE] PROCEDURE procedure_name(parameter_list)
LANGUAGE language_name
AS $$
-- procedure body
$$;
Let's create a simple stored procedure:
CREATE OR REPLACE PROCEDURE create_new_customer(
p_name VARCHAR,
p_email VARCHAR
)
LANGUAGE plpgsql
AS $$
BEGIN
INSERT INTO customers (name, email)
VALUES (p_name, p_email);
COMMIT;
END;
$$;
-- Call the procedure
CALL create_new_customer('Alice Brown', 'alice@example.com');
A more complex procedure with transaction management:
CREATE OR REPLACE PROCEDURE place_order(
p_customer_id INT,
p_amount DECIMAL
)
LANGUAGE plpgsql
AS $$
DECLARE
v_customer_exists BOOLEAN;
BEGIN
-- Check if customer exists
SELECT EXISTS(SELECT 1 FROM customers WHERE customer_id = p_customer_id) INTO v_customer_exists;
IF NOT v_customer_exists THEN
RAISE EXCEPTION 'Customer with ID % does not exist', p_customer_id;
END IF;
-- Insert new order
INSERT INTO orders (customer_id, amount)
VALUES (p_customer_id, p_amount);
-- If we get here, commit the transaction
COMMIT;
END;
$$;
-- Call the procedure
CALL place_order(1, 75.25);
PostgreSQL has both stored procedures and functions. The key differences are:
- Transaction control: Procedures can manage transactions (
COMMIT
,ROLLBACK
), functions cannot - Return values: Functions must return a value, procedures don't have to
- Calling syntax: Use
CALL
for procedures,SELECT
for functions - Usage context: Functions can be used in queries, procedures cannot
Example of a function vs procedure:
-- Function (returns total orders for a customer)
CREATE OR REPLACE FUNCTION get_customer_order_count(p_customer_id INT)
RETURNS INT
LANGUAGE plpgsql
AS $$
DECLARE
v_count INT;
BEGIN
SELECT COUNT(*) INTO v_count
FROM orders
WHERE customer_id = p_customer_id;
RETURN v_count;
END;
$$;
-- Call the function
SELECT get_customer_order_count(1);
-- Procedure (processes a return for an order)
CREATE OR REPLACE PROCEDURE process_order_return(
p_order_id INT,
p_refund_amount DECIMAL DEFAULT NULL
)
LANGUAGE plpgsql
AS $$
DECLARE
v_order_exists BOOLEAN;
v_order_amount DECIMAL;
BEGIN
-- Check if order exists
SELECT
EXISTS(SELECT 1 FROM orders WHERE order_id = p_order_id),
amount
INTO
v_order_exists, v_order_amount
FROM
orders
WHERE
order_id = p_order_id;
IF NOT v_order_exists THEN
RAISE EXCEPTION 'Order with ID % does not exist', p_order_id;
END IF;
-- If refund amount not specified, use full order amount
IF p_refund_amount IS NULL THEN
p_refund_amount := v_order_amount;
END IF;
-- Create a record of the return (in a real system)
RAISE NOTICE 'Processing return for order % with refund amount %', p_order_id, p_refund_amount;
-- We could update the order status here
-- UPDATE orders SET status = 'RETURNED' WHERE order_id = p_order_id;
COMMIT;
END;
$$;
-- Call the procedure
CALL process_order_return(1);
CALL process_order_return(2, 100.00);
One of the primary benefits of stored procedures is transaction management:
CREATE OR REPLACE PROCEDURE transfer_funds(
p_from_account INT,
p_to_account INT,
p_amount DECIMAL
)
LANGUAGE plpgsql
AS $$
BEGIN
-- Start a transaction block
BEGIN
-- Deduct from source account
UPDATE accounts SET balance = balance - p_amount
WHERE account_id = p_from_account;
-- Add to destination account
UPDATE accounts SET balance = balance + p_amount
WHERE account_id = p_to_account;
-- Check for sufficient funds
IF (SELECT balance FROM accounts WHERE account_id = p_from_account) < 0 THEN
RAISE EXCEPTION 'Insufficient funds';
END IF;
-- If we get here, commit the transaction
COMMIT;
EXCEPTION
WHEN OTHERS THEN
-- If any error occurs, roll back the transaction
ROLLBACK;
RAISE;
END;
END;
$$;
Effective error handling in stored procedures:
CREATE OR REPLACE PROCEDURE bulk_insert_orders(
p_customer_id INT,
p_amounts DECIMAL[]
)
LANGUAGE plpgsql
AS $$
DECLARE
v_amount DECIMAL;
v_order_id INT;
v_success_count INT := 0;
v_error_count INT := 0;
BEGIN
-- Start a transaction
BEGIN
-- Check if customer exists
IF NOT EXISTS (SELECT 1 FROM customers WHERE customer_id = p_customer_id) THEN
RAISE EXCEPTION 'Customer with ID % does not exist', p_customer_id;
END IF;
-- Process each amount
FOREACH v_amount IN ARRAY p_amounts LOOP
BEGIN
IF v_amount <= 0 THEN
RAISE EXCEPTION 'Order amount must be positive';
END IF;
INSERT INTO orders (customer_id, amount)
VALUES (p_customer_id, v_amount)
RETURNING order_id INTO v_order_id;
v_success_count := v_success_count + 1;
RAISE NOTICE 'Created order % with amount %', v_order_id, v_amount;
EXCEPTION
WHEN OTHERS THEN
v_error_count := v_error_count + 1;
RAISE WARNING 'Failed to create order with amount %: %', v_amount, SQLERRM;
-- Continue with next order
END;
END LOOP;
RAISE NOTICE 'Processed % orders: % successful, % failed',
array_length(p_amounts, 1), v_success_count, v_error_count;
COMMIT;
EXCEPTION
WHEN OTHERS THEN
ROLLBACK;
RAISE;
END;
END;
$$;
-- Call the procedure
CALL bulk_insert_orders(1, ARRAY[50.00, -10.00, 75.50, 100.00]);
A complete example combining views, CTEs, and a stored procedure:
-- Create additional tables for a more complete example
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
name VARCHAR(100),
price DECIMAL(10, 2),
category VARCHAR(50)
);
CREATE TABLE order_items (
item_id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(order_id),
product_id INT REFERENCES products(product_id),
quantity INT,
price DECIMAL(10, 2)
);
-- Insert sample data
INSERT INTO products (name, price, category) VALUES
('Laptop', 1200.00, 'Electronics'),
('Smartphone', 800.00, 'Electronics'),
('Desk Chair', 150.00, 'Furniture'),
('Coffee Table', 250.00, 'Furniture'),
('Headphones', 100.00, 'Electronics');
INSERT INTO order_items (order_id, product_id, quantity, price) VALUES
(1, 1, 1, 1200.00),
(2, 2, 1, 800.00),
(2, 5, 1, 100.00),
(3, 3, 2, 150.00),
(4, 4, 1, 250.00),
(4, 5, 2, 100.00),
(5, 2, 1, 800.00);
-- Create a view for order details
CREATE OR REPLACE VIEW order_details AS
SELECT
o.order_id,
o.order_date,
c.customer_id,
c.name AS customer_name,
oi.item_id,
p.product_id,
p.name AS product_name,
p.category,
oi.quantity,
oi.price,
(oi.quantity * oi.price) AS item_total
FROM
orders o
JOIN
customers c ON o.customer_id = c.customer_id
JOIN
order_items oi ON o.order_id = oi.order_id
JOIN
products p ON oi.product_id = p.product_id;
-- Create a materialized view for category performance
CREATE MATERIALIZED VIEW category_performance AS
WITH category_sales AS (
SELECT
category,
COUNT(DISTINCT order_id) AS order_count,
COUNT(*) AS item_count,
SUM(quantity) AS units_sold,
SUM(item_total) AS total_sales
FROM
order_details
GROUP BY
category
)
SELECT
category,
order_count,
item_count,
units_sold,
total_sales,
ROUND(total_sales / units_sold, 2) AS avg_unit_price,
ROUND((total_sales / (SELECT SUM(total_sales) FROM category_sales)) * 100, 2) AS sales_percentage
FROM
category_sales
ORDER BY
total_sales DESC;
-- Create a stored procedure to generate sales reports
CREATE OR REPLACE PROCEDURE generate_sales_report(
p_start_date TIMESTAMP,
p_end_date TIMESTAMP,
p_include_details BOOLEAN DEFAULT FALSE
)
LANGUAGE plpgsql
AS $$
DECLARE
v_report_id INT;
v_total_sales DECIMAL;
v_order_count INT;
v_customer_count INT;
BEGIN
-- Check date range
IF p_start_date >= p_end_date THEN
RAISE EXCEPTION 'Start date must be before end date';
END IF;
-- Calculate summary statistics
WITH date_range_sales AS (
SELECT
*
FROM
order_details
WHERE
order_date BETWEEN p_start_date AND p_end_date
)
SELECT
COUNT(DISTINCT order_id),
COUNT(DISTINCT customer_id),
SUM(item_total)
INTO
v_order_count, v_customer_count, v_total_sales
FROM
date_range_sales;
-- Print summary
RAISE NOTICE 'Sales Report: % to %', p_start_date, p_end_date;
RAISE NOTICE 'Total Sales: $%', v_total_sales;
RAISE NOTICE 'Order Count: %', v_order_count;
RAISE NOTICE 'Customer Count: %', v_customer_count;
-- Category breakdown
RAISE NOTICE 'Category Breakdown:';
FOR rec IN (
WITH date_range_sales AS (
SELECT
*
FROM
order_details
WHERE
order_date BETWEEN p_start_date AND p_end_date
)
SELECT
category,
SUM(item_total) AS category_sales,
ROUND((SUM(item_total) / v_total_sales) * 100, 2) AS percentage
FROM
date_range_sales
GROUP BY
category
ORDER BY
category_sales DESC
) LOOP
RAISE NOTICE ' % - $% (% %%)', rec.category, rec.category_sales, rec.percentage;
END LOOP;
-- Detailed breakdown if requested
IF p_include_details THEN
RAISE NOTICE 'Top Products:';
FOR rec IN (
WITH date_range_sales AS (
SELECT
*
FROM
order_details
WHERE
order_date BETWEEN p_start_date AND p_end_date
)
SELECT
product_name,
SUM(quantity) AS units_sold,
SUM(item_total) AS product_sales
FROM
date_range_sales
GROUP BY
product_name
ORDER BY
product_sales DESC
LIMIT 5
) LOOP
RAISE NOTICE ' % - % units - $%', rec.product_name, rec.units_sold, rec.product_sales;
END LOOP;
RAISE NOTICE 'Top Customers:';
FOR rec IN (
WITH date_range_sales AS (
SELECT
*
FROM
order_details
WHERE
order_date BETWEEN p_start_date AND p_end_date
)
SELECT
customer_name,
COUNT(DISTINCT order_id) AS order_count,
SUM(item_total) AS customer_sales
FROM
date_range_sales
GROUP BY
customer_name
ORDER BY
customer_sales DESC
LIMIT 5
) LOOP
RAISE NOTICE ' % - % orders - $%', rec.customer_name, rec.order_count, rec.customer_sales;
END LOOP;
END IF;
-- Commit (not strictly necessary in this case but good practice)
COMMIT;
END;
$$;
-- Call the procedure
CALL generate_sales_report('2023-01-01', '2023-12-31', TRUE);
A set of procedures for database maintenance:
-- Procedure to archive old data
CREATE OR REPLACE PROCEDURE archive_old_orders(
p_cutoff_date TIMESTAMP,
p_dry_run BOOLEAN DEFAULT TRUE
)
LANGUAGE plpgsql
AS $$
DECLARE
v_order_count INT;
v_item_count INT;
BEGIN
-- Count records to be archived
SELECT COUNT(*) INTO v_order_count
FROM orders
WHERE order_date < p_cutoff_date;
SELECT COUNT(*) INTO v_item_count
FROM order_items oi
JOIN orders o ON oi.order_id = o.order_id
WHERE o.order_date < p_cutoff_date;
RAISE NOTICE 'Found % orders and % order items before %',
v_order_count, v_item_count, p_cutoff_date;
IF p_dry_run THEN
RAISE NOTICE 'DRY RUN ONLY - No changes made';
RETURN;
END IF;
-- Create archive tables if they don't exist
BEGIN
EXECUTE '
CREATE TABLE IF NOT EXISTS orders_archive (
LIKE orders INCLUDING ALL
)';
EXECUTE '
CREATE TABLE IF NOT EXISTS order_items_archive (
LIKE order_items INCLUDING ALL
)';
EXCEPTION
WHEN OTHERS THEN
RAISE EXCEPTION 'Failed to create archive tables: %', SQLERRM;
END;
-- Start a transaction for the actual archiving
BEGIN
-- Archive order items first (to maintain referential integrity)
INSERT INTO order_items_archive
SELECT oi.*
FROM order_items oi
JOIN orders o ON oi.order_id = o.order_id
WHERE o.order_date < p_cutoff_date;
-- Then archive orders
INSERT INTO orders_archive
SELECT * FROM orders
WHERE order_date < p_cutoff_date;
-- Delete archived records (in reverse order to maintain referential integrity)
DELETE FROM order_items
WHERE order_id IN (
SELECT order_id FROM orders WHERE order_date < p_cutoff_date
);
DELETE FROM orders
WHERE order_date < p_cutoff_date;
RAISE NOTICE 'Successfully archived % orders and % order items',
v_order_count, v_item_count;
COMMIT;
EXCEPTION
WHEN OTHERS THEN
ROLLBACK;
RAISE EXCEPTION 'Archiving failed: %', SQLERRM;
END;
END;
$$;
-- Procedure to analyze database and update statistics
CREATE OR REPLACE PROCEDURE analyze_database(
p_verbose BOOLEAN DEFAULT FALSE
)
LANGUAGE plpgsql
AS $$
DECLARE
rec RECORD;
BEGIN
FOR rec IN (
SELECT table_schema, table_name
FROM information_schema.tables
WHERE table_schema NOT IN ('pg_catalog', 'information_schema')
AND table_type = 'BASE TABLE'
ORDER BY table_schema, table_name
) LOOP
IF p_verbose THEN
RAISE NOTICE 'Analyzing table %.%', rec.table_schema, rec.table_name;
END IF;
EXECUTE 'ANALYZE VERBOSE ' || quote_ident(rec.table_schema) || '.' || quote_ident(rec.table_name);
END LOOP;
RAISE NOTICE 'Database analysis complete.';
END;
$$;
These comprehensive examples demonstrate how views, CTEs, and stored procedures work together to provide powerful database management capabilities in PostgreSQL.