SQL_notes

Like operator and Wildcards

  • %: This wildcard matches any number of characters (including zero characters).

  • _: This wildcard matches any single character.

  • [character_list]: This wildcard matches any single character in the list. For example, [abc] would match any of the characters ‘a’, ‘b’, or ‘c’.

  • [^character_list]: This wildcard matches any single character that is not in the list. For example,[^abc] would match any character that is not ‘a’, ‘b’, or ‘c’.

Group by operator

The GROUP BY operator is used in a SELECT statement to group rows that have the same values into summary rows. The GROUP BY clause is used with aggregate functions (such as MIN, MAX, AVG, SUM, COUNT) to group the result-set by one or more columns.

1
2
3
SELECT COUNT(*) as num_orders, customer_id, product_id
FROM orders
GROUP BY customer_id, product_id;
1
2
3
4
SELECT COUNT(*) as num_orders, customer_id
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 5;

The WHERE and HAVING clauses are used to filter rows from a result-set in different ways. The WHERE clause is used to filter rows before the group by clause is applied, while the HAVING clause is used to filter rows after the group by clause is applied.

Here is an example to illustrate the difference between the WHERE and HAVING clauses:

1
2
3
4
5
SELECT COUNT(*) as num_orders, customer_id
FROM orders
WHERE status = 'shipped'
GROUP BY customer_id
HAVING COUNT(*) > 5;

Order by operator

The ORDER BY operator is used in a SELECT statement to sort the result-set by one or more columns. The ORDER BY clause has the following syntax:

1
2
3
4
SELECT column1, column2, ...
FROM table_name
WHERE condition
ORDER BY column1, column2, ... ASC|DESC;

The ORDER BY clause sorts the result-set by the specified columns in ascending (ASC) or descending (DESC) order. By default, the ORDER BY clause sorts the result-set in ascending order.

Joins

In SQL, a join is used to combine rows from two or more tables based on a related column between them. There are several types of joins that you can use in SQL:

  1. INNER JOIN: An INNER JOIN returns rows from both tables that satisfy the join condition. If a row in the first table has no matching rows in the second table, it is not included in the result-set.
  2. OUTER JOIN: An OUTER JOIN returns all rows from both tables, regardless of whether there is a match in the other table. There are three types of outer joins:
  • LEFT JOIN: A LEFT JOIN returns all rows from the left table, and any matching rows from the right table. If there is no match, NULL values are returned for the right table.
  • RIGHT JOIN: A RIGHT JOIN returns all rows from the right table, and any matching rows from the left table. If there is no match, NULL values are returned for the left table.
  • FULL JOIN: A FULL JOIN returns all rows from both tables, and NULL values are returned for any columns where there is no match.
  1. CROSS JOIN: A CROSS JOIN returns the Cartesian product of the two tables. In other words, it returns every possible combination of rows from the two tables.

Nested query

A nested query is a SELECT statement that is used within another SELECT statement. It is also called a subquery.

A nested query can be used to return data that is used in the main query as a condition or as a parameter. The nested query is executed first, and its results are used by the main query.

1
2
3
4
5
SELECT column1, column2
FROM table1
WHERE column3 in (SELECT column4
FROM table2
WHERE condition);
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
select m1.title, m1.year, g.genre, m1.rating, m1.votes
from movies as m1
join has_genre as hg on hg.movie_id = m1.movie_id
join genres as g on g.genre_id = hg.genre_id
where m1.votes > 100000 and (not (g.genre = 'Romance' or g.genre = 'Comedy'))
and m1.movie_id in
(select m2.movie_id
from movies as m2
join has_genre as hg1 on hg1.movie_id = m2.movie_id
join has_genre as hg2 on hg2.movie_id = m2.movie_id
join genres as g1 on g1.genre_id = hg1.genre_id
join genres as g2 on g2.genre_id = hg2.genre_id
where g1.genre = 'Romance' and g2.genre = 'Comedy')
order by m1.votes desc
limit 10;

Views

In SQL, a view is a virtual table that is created based on a SELECT statement. A view does not store data itself, but rather displays data from one or more underlying tables.

Views are often used to provide a specific perspective on the data in a database, or to simplify complex queries by breaking them down into smaller, more manageable pieces.

To create a view in SQL, you can use the following syntax:

1
2
3
4
CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;

Null

In SQL, NULL is a special value that represents the absence of a value or a null value. It is not the same as an empty string ‘’ or a zero value 0.

A column with a NULL value is a column that has been left blank during an INSERT or UPDATE operation.

You can test for NULL values using the IS NULL and IS NOT NULL operators. For example:

1
SELECT * FROM table_name WHERE column_name IS NULL;

Common Table Expressions

I did dislike CREATE VIEW. There are actually a fair few alternatives, but the simplest (a drop-in replacement) is to use what are called Common Table Expressions (CTEs), using the WITH keyword.

Say you had a students table that looked like students(crsid, firstname, lastname, college, course) and wanted to count the number of Robinson students. Ignore the fact that in this simple example you could just do it in one go!

One way is with a VIEW:

1
2
CREATE VIEW robinson_students AS (SELECT * from students WHERE college='Robinson');
SELECT count(*) FROM robinson_students;

But this leaves that view around, and imagine if you had one per college and one per course and maybe one per course-college pair, it’d get out of hand pretty quickly. Note that this is two statements, something many database interaction libraries won’t let you do in a single database query (because sneaking a second query into what should be one query is how almost all SQL injection works, and so allowing only one statement per query helps defend from this security risk!).

Instead, you can use CTEs which use the WITH statement:

1
WITH robinson_students AS (SELECT * from students WHERE college='Robinson') SELECT count(*) FROM robinson_students;

See how this is just one statement? You could put a newline just before the SELECT for clarity, but it is all one thing. It doesn’t leave anything lying around afterwards and it’s no less efficient than the VIEW.

It’s really only a few characters different in the end, so very simple to use.

You can have multiple tables in the WITH clause, for more complicated things:

1
2
3
4
WITH
robinson_students AS (SELECT * from students WHERE college='Robinson'),
robinson_compscis AS (SELECT * from robinson_students WHERE course='Computer Science')
SELECT count(*) FROM robinson_students WHERE crsid NOT IN (SELECT crsid FROM robinson_compscis);

You just separate them with a comma, and the last one has no comma. (This query counts the non-compsci students at Robinson, for example, though there are better ways than this).

Of course, you can also just inline WITH clause expressions if you only need them once:

1
SELECT count(*) FROM (SELECT * from students WHERE college='Robinson') AS robinson_students;

But it’s really handy to use WITH clauses if you want to use the result multiple times, as in the more complicated example before where I used robinson_students twice (once in the robinson_compscis bit and once in the main query).

And yes, for this very simple example you can just do it without any subqueries, of course:

1
SELECT count(*) from students WHERE college='Robinson';

Union, Intersect, and Except operator

In SQL, the UNION operator is used to combine the result sets of two or more SELECT statements into a single result set. The INTERSECT operator is used to return only the rows that are present in the results of both SELECT statements. The EXCEPT operator is used to return only the rows that are present in the first SELECT statement and not in the second SELECT statement.

1
2
3
4
5
6
7
8
9
10
11
12
SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2

SELECT column1, column2 FROM table1
INTERSECT
SELECT column1, column2 FROM table2

SELECT column1, column2 FROM table1
EXCEPT
SELECT column1, column2 FROM table2

Index

In SQL, an index is a data structure that allows efficient retrieval of rows from a table. Indexes can be created on one or more columns of a table, and can be used to improve the performance of SELECT, INSERT, UPDATE, and DELETE statements.

1
2
CREATE INDEX index_name
ON table_name (column_name);
1
2
CREATE INDEX index_name
ON table_name (column_name1, column_name2);

Examples

Arithmetic in expression

1
2
3
select sum(minutes)/1440 as days, sum(votes)/count(*) as average, 17 * (max(votes) - min(votes)) as nonsense
from movies
where type = 'movie';