Algorithm_1And2

Big-O, Θ and Ω notations

image-20230415160745530 image-20230415160800307 image-20230415160914191

image-20230415160924676

Strategies

Translation strategy

Translate the problem we want to solve into a different setting, use a standard algorithm in the different setting, then translate the answer back to the original setting. In this case, the translated setting is ‘graphs with different edge weights’. Of course we’ll need to argue why these translated answers solve the original problem.

Amortized analysis

Amortized analysis is a method for calculating computational complexity, in which for any sequence of operations we ascribe an amortized cost to every operation, in such a way that the total amortized cost of a sequence of operations provides an upper bound on the actual total cost, and this upper bound is usually more representative of the actual computational cost when averaged over a large number of operations. This allows us to gain a more accurate understanding of the average performance of an algorithm over time, rather than focusing solely on the worst-case or best-case scenarios for individual operations.

Here is the formal definition. Let there be a sequence of m operations, applied to an initially-empty data structure, whose true costs are $c_1, c_2, . . . , c_m$. Suppose someone invents $c_1’,c_2’,…,c_m’$ such that (we call these amortized costs)
$$
c_1+…+c_j \le c_1’ + … c_j’ ~~~~~for ~all ~j \le m
\
\text{aggregate true cost of a sequence of operations} \le \text{ aggregate amortized cost of those operations}
$$

This is the fundamental inequality of amortized analysis

For example, we might ascribe some of the true cost of one operation A to another operation B, if A involves cleaning up some mess produced by B.


The potential method is a systematic way to choose amortized costs, as follows. We define a function $\Phi(S)$ where $S$ is a possible state of the data structure and $\Phi(S)\ge 0$, and let the amortized cost of each operation be the true cost plus the change in potential $\Delta \Phi$.

image-20230416141242333

Proof of the Potential Theorem

image-20230416141540040

Past papers

Define amoritized cost:

If we have a sequence of k operations whose true costs are $c_1,c_2,…c_k$, and if we devise costs $c_1’,c_2’,…,c_k’$ such that
$$
c_1+c_2+…+c_i \le c_1’ + c_2’ + … + c_i’ \text{ for all } 0\le i \le k
$$
then the $c’$ are called amortized costs

2014-p01-q09

Explain the terms amortized analysis, aggregate analysis and potential method.

In amortized analysis we consider a sequence of n data structure operations and show that the average cost of an operation is small, even though a single operation within the sequence might be very expensive.

Aggregate analysis and potential method are two techniques for performing amortized analysis. In aggregate analysis, we consider an arbitrary sequence of n operations and show an upper bound on the total cost T(n). Every operation is charged the same amortized cost, that is, the average cost T(n)/n, even though in reality some operations are cheaper or more expensive.

In the potential method, every operation may have a different amortized cost. The method overcharges cheap operations early in the sequence, storing some prepaid credit to compensate for future costly operations. The credit can be also seen as the “potential energy” stored within a data structure.


Designing Advanced Data Structures

image-20230416145827079

Algorithms

Complexity

Algorithm Complexity
Dijkstra $O(E + VlogV) \text{ if all weights } \ge 0$
$=O(V ) + O(V ) × cost_{popmin} +O(E) × cost_{push/dec.key}$
Bellman-Ford $O(VE)$
Prim’s algorithm $O(E + VlogV)$
Kruskal’s algorithm $O(ElogE + E+V)=O(ElogE)=O(ElogV)$
Bubble Sort $O(N^2)$
Heap Sort $O(NlogN)$
Quicksort $O(NlogN)$
Dfs_recurse_all / Dfs $O(V+E)$
Bfs $O(V + E)$
Ford Fulkerson $O(Ef^*)$
Topological sort $O(V+E)$

Sorting

Minimum cost of sorting

Assertion 1 (lower bound on exchanges): If there are n items in an array, then $Θ(n)$ exchanges always suffice to put the items in order. In the worst case, $Θ(n)$ exchanges are actually needed.

Proof: Identify the smallest item present: if it is not already in the right place, one exchange moves it to the start of the array. A second exchange moves the next smallest item to place, and so on. After at worst n − 1 exchanges, the items are all in order. The bound is n − 1 rather than n because at the very last stage the biggest item has to be in its right place without need for a swap—but that level of detail is unimportant to Θ notation.

Assertion 2 (lower bound on comparisons). Sorting by pairwise comparison, assuming that all possible arrangements of the data might actually occur as input, necessarily costs at least Ω(n lg n) comparisons.

Proof: you saw in Foundations of Computer Science, there are n! permutations of n items and, in sorting, we in effect identify one of these. To discriminate between that many cases we need at least ⌈$log_2(n!)$⌉ binary tests. Stirling’s formula tells us that $n!$ is roughly $n^n$, and hence that $lg(n! )$ is about $n lg n$.

InsertSort

image-20230514221714743

Select Sort

image-20230415161954760

Bubble sort

image-20230415162238398

Mergesort

image-20230514214845332 image-20230514214909537
Complexity

$$
f(n) = 2f(n/2)+kn \
substitution: n=2^m \
\begin{align*}
f(n)&=f(2^m)\
&=2f(2^m/2)+k2^m \
&=2f(2^{m-1})+k2^m \
&=2(2f(2^{m-2})+k2^{m-1})+k2^m \
&= 2^2f(2^{m-2})+k2^m+k2^m \
&= 2^2f(2^{m-2})+2k2^m \
&=2^3f(2^{m-3})+3k2^m \
&= … \
&= 2^mf(2^{m-m})+mk2^m\
&=f(1)\times 2^m + km2^m \
&=k_0\times 2^m + k\times m2^m \
&=k_0n + kn lgn \
&= O(nlgn)
\end{align*}
$$

Heapsort

image-20230415163513819 image-20230415163533096

The first phase of the main heapsort function (lines 9–11) starts from the bottom of the tree (rightmost end of the array) and walks up towards the root, considering each node as the root of a potential sub- heap and rearranging it to be a heap. In fact, nodes with no children can’t possibly violate the heap property and therefore are automatically heaps; so we don’t even need to process them—that’s why we start from the midpoint floor(END/2) rather than from the end. By proceeding right-to-left we guarantee that any children of the node we are currently examining are already roots of properly formed heaps, thereby matching the precondition of heapify, which we may therefore use. It is then trivial to put an O(nlgn) bound on this phase—although, as we shall see, it is not tight.

  • image-20230415164320394
  • image-20230415164340037
  • image-20230415164629046

In the second phase (lines 13–19), the array is split into two distinct parts: a[0:k] is a heap, while a[k:END] is the “tail” portion of the sorted array. The rightmost part starts empty and grows by one element at each pass until it occupies the whole array. During each pass of the loop in lines 13–19 we extract the maximum element from the root of the heap, a[0], reform the heap and then place the extracted maximum in the empty space left by the last element, a[k], which conveniently is just where it should go20. To retransform a[0:k - 1] into a heap after placing a[k - 1] in position a[0] we may call heapify, since the two subtrees of the root a[0] must still be heaps given that all that changed was the root and we started from a proper heap. For this second phase, too, it is trivial to establish an $O(nlgn)$ bound.

Heapsort therefore offers at least two significant advantages over other sorting algorithms:

  1. it offers an asymptotically optimal worst-case com- plexity of O(nlgn) and it sorts the array in place.
  2. Despite this, on non-pathological data it is still usually beaten by the amazing quicksort.

Quicksort

image-20230415165110799 image-20230415165135638
Average case performance
image-20230415165359829

Some implementations of the Quicksort algorithm select the pivot at random, rather than taking the last entry in the input array.

  • Discuss the advantages and disadvantages of such a choice.
    • Quicksort has a quadratic worst case, which occurs when the result of the partioning step is maximally unbalanced in most of the recursive calls. The worst case is quite rare if the input is random but it occurs in plausible siatuation such as when the input is sorted or almost sorted. Selecting the pivot at random costs an invocation of the random number generator at every recursive call, but it makes it unlikely that the quadratic worst case will occur “naturally” with an input made of distinct values.
  • How would you construct an input to trigger quadratic running time for this randomised Quicksort, without having access to the state of the random number generator?
    • Make an array where all cells have the same value. Then, whatever cell is chosen as the pivot, the partition phase will put them all in the “≤” part while the “>” part will be empty, every time, leading to pessimal (quadratic) running time.

Stability of sorting methods

Data to be sorted often consists of records made of key and payload; the key is what the ordering is based upon, while the payload is some additional data that is just logically carried around in the rearranging process. In some applications one can have keys that should be considered equal, and then a simple specification of sorting might not indicate the order in which the corresponding records should end up in the output list. “Stable” sorting demands that, in such cases, the order of items in the input be preserved in the output. Some otherwise desirable sorting algorithms are not stable, and this can weigh against them.

Sorting Algorithm Stable/Unstable
Bubble Sort Stable
Insertion Sort Stable
Selection Sort Unstable
Quick Sort Unstable
Merge Sort Stable
Heap Sort Unstable
Counting Sort Stable
Radix Sort Stable
Bucket Sort Stable (if implemented properly)

Counting sort

Assume that the keys to be sorted are integers that live in a known range, and that the range is fixed regardless of the number of values to be processed. If the number of input items grows beyond the cardinality of the range, there will necessarily be duplicates in the input. If no data is involved at all beyond the integers, one can set up an array whose size is determined by the range of integers that can appear (not by the amount of data to be sorted) and initialize it to all 0s. Then, for each item in the input data, w say, the value at position w in the array is incremented. At the end, the array contains information about how many instancesof each value were present in the input, and it is easy to create a sorted output list with the correct values in it. The costs are obviously linear.

Bucket sort

Assume the input data is guaranteed to be uniformly distributed over some known range (for instance it might be real numbers in the range 0.0 to 1.0). Then a numeric calculation on the key can predict with reasonable accuracy where a value must be placed in the output. If the output array is treated somewhat like a hash table (cfr. section 4.7), and this prediction is used to insert items in it, then, apart from some local clustering effects, that data has been sorted.

To sort n keys uniformly distributed between 0.0 and 1.0, create an array of n linked lists and insert each key k to the list at position ⌊k · n⌋. This phase has linear cost. (We expect each list to be one key long on average, though some may be slightly longer and some may be empty.) In the next phase, for each entry in the array, sort the corresponding list with insertsort if it is longer than one element, then output it.

Radix Sort

Historically, radix sort was first described in the context of sorting integers encoded on punched cards, where each column of a card represented a digit by having a hole punched in the corresponding row. A mechanical device could be set to examine a particular column and distribute the cards of a deck into bins, one per digit, according to the digit in that column. Radix sort used this “primitive” to sort the whole deck.

The obvious way to proceed might seem to sort on the most significant digit, then recursively for each bin on the next significant digit, then on the next, all the way down to the least significant digit. But this would require a great deal of temporary “desk space” to hold the partial mini- decks still to be processed without mixing them up.

Radix sort instead proceeds, counter-intuitively, from the least significant digit upwards. First, the deck is sorted into b = 10 bins based on the least significant digit. Then the contents of the bins are collected together, in order, to reform a full deck, and this is then sorted according to the next digit up. But the per-digit sorting method used is chosen to be stable, so that cards that have the same second digit still maintain their relative order, which was induced by the first (least significant) digit. The procedure is repeated going upwards towards the most significant digits. Before starting pass i, the digits in positions 0 to i − 1 have already been sorted. During pass i, the deck is sorted on the digit in position i, but all the cards with the same i digit remain in the relative order determined by the even less significant digits to their right. The result is that, once the deck has been sorted on the most significant digit, it is fully sorted. The number of passes is equal to the number of digits (d) in the numerals being sorted and the cost of each pass can be made linear by using counting sort.

The number of passes is equal to the number of digits (d) in the numerals being sorted and the cost of each pass can be made linear by using counting sort

Explain how radix sort works, to what inputs it can be applied and what its asymptotic complexity is.

Radix sort was invented in the context of sorting punched cards. It is a linear-time algorithm for sorting an arbitrary number of strings of fixed (or, more precisely, bounded) length over a small alphabet. It uses as a subroutine a stable linear-time sorting algorithm for items consisting of a single character over the given alphabet, such as for example counting sort.

Assume the strings have maximum length l and that shorter strings are right-padded with blanks or left-padded with zeros in case of strings of digits—whatever is appropriate for the desired sorting order.

Consider the character positions from the least significant (rightmost, index 0) to the most significant (leftmost, index l − 1). For each character position i, use a linear-time stable sort to sort all the strings based on the character at that position.

The main loop runs l times. Each run is one invocation of a linear-time stable sort. Thus the whole algorithm takes O(nl) which, if l is taken as a constant, is O(n).

Explain why running radix sort does not proceed from most to least significant digit, as would at first seem more intuitive.

It might feel more natural to sort first by most significant digit (or character position), because then each of the intermediate passes would bring the individual elements closer and closer to their intended position. However the subsequent passes would have to act separately on each of the “bins” of equal-value characters found in the previous pass, thus requiring additional storage space and possibly a recursive arrangement—both inconvenient in the context of sorting physical punched cards.

By running in the opposite direction instead, i.e. from least to most significant character position, while it’s true that the intermediate passes may appear to “jumble up” the list of items as they go, each pass can run simply on the whole result of the previous one, without requiring recursion nor any additional space beyond that required by the stable sort used as a subroutine.

Dynamic programming

Explain the programming technique known as memoization, detailing the cases to which it applies

Memoization is a time-space trade-off (1) technique in which previously computed results are remembered instead of being recomputed (1). A table that survives between invocations of the function is indexed by the tuple of arguments (1) that the function expects. Before computing a value, the function first checks whether a result for that tuple of input arguments is already present in the table. If so, the previously computed value is returned in constant time (1). This technique is particularly effective when applied to recursive functions that recompute the same partial results over and over again (1), as happens for example in top-down dynamic programming (1). (Earn marks as indicated, up to a maximum of 4.)

Write a memoized recursive function to compute the ith Fibonacci number F(i)

1
2
3
4
5
6
7
8
9
10
function fib(i):
global memo[]
if memo[i] not empty:
return memo[i]
else:
if i<=2:
result = 1
else:
result = fib(i-1)+fib(i-2)
memo[i] = result

Matrix chain multiplication

image-20230415210004985
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
function matrix_chain_order(p)
// Input: An array p[] which represents the chain of matrices such that the ith matrix Ai is of dimension p[i-1] x p[i].
// Output: The minimum number of multiplications needed to multiply the chain.

// Step 1: Length of the array
n = p.length - 1

// Step 2: Create a table to store the cost of multiplication
m = new int[n+1][n+1]

// Step 3: Zeroing the diagonal, cost is zero when multiplying one matrix
for i = 1 to n
m[i][i] = 0

// Step 4: Apply dynamic programming in bottom-up manner
for len = 2 to n // len is the chain length
for i = 1 to n-len+1
j = i + len - 1
m[i][j] = Infinity
for k = i to j-1
cost = m[i][k] + m[k+1][j] + p[i-1]*p[k]*p[j]
if cost < m[i][j]
m[i][j] = cost

// Step 5: Return the minimum cost to multiply the matrices
return m[1][n]

Longest common subsequence

image-20230415210105438
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
function LCS(X[1..m], Y[1..n])
// Input: Two sequences X and Y of lengths m and n respectively
// Output: Length of the longest common subsequence

// Step 1: Initialize a (m+1) x (n+1) matrix C
declare C[0..m, 0..n]

for i = 0 to m
C[i, 0] = 0
for j = 0 to n
C[0, j] = 0

// Step 2: Fill the matrix in a bottom-up manner
for i = 1 to m
for j = 1 to n
if X[i] == Y[j]
C[i, j] = C[i-1, j-1] + 1
else
C[i, j] = max(C[i, j-1], C[i-1, j])

// Step 3: Return the value in the bottom right corner
return C[m, n]

Greedy Algorithms

Explain the greedy strategy in algorithm design. To what problems does it apply?

The greedy strategy applies to problems made of overlapping subproblems with optimal substructure, like dynamic programming. The greedy strategy, however, does not at every stage compute the optimal solution for all possible choices of a parameter; instead, it selects the best-looking choice a priori and proceeds to find an optimal solution for the remaining subproblem.

In order to ensure that the globally optimal solution will be found, it is necessary to prove that the “best-looking choice” is indeed part of an optimal solution. If this proof is missing, the answer given by the greedy algorithm may in fact be wrong

If a problem can be solved with both dynamic programming and a greedy algorithm, what are the advantages of using one or the other?

Almost all problems that can be solved by a greedy algorithm can also be solved by dynamic programming (though not vice versa). When both approaches work, the greedy one is usually more efficient and should be preferred. The dynamic programming solution however does not require developing a heuristic for finding the best choice, nor a proof (sometimes rather hard to develop) that the choice picked by the heuristic is actually part of a globally optimal solution.

Activity Scheduling

DP version
image-20230415210200759
Greedy version

In this example the greedy strategy is to pick the activity that finishes first, based on the intuition that it’s the one that leaves most of the rest of the timeline free for the allocation of other activities. Now, to be able to use the greedy strategy safely on this problem, we must prove that this choice is indeed optimal; in other words, that the ai ∈ S with the smallest fi is included in an optimal solution for S. That activity would be the lowest-numbered ai in S, by the way, or a1 at the top level, since we conveniently stipulated that activities were numbered in order of increasing finishing time.

Proof by contradiction. Assume there exists an optimal solution O ⊂ S that does not include a1. Let ax be the activity in O with the earliest finishing time. Since a1 had the smallest finishing time in all S, f1 ≤ fx. There are two cases: either f1 ≤ sx or f1 > sx. In the first case, s1 < f1 ≤ sx < fx, we have that a1 and ax are disjoint and therefore compatible, so we could build a better solution than O by adding a1 to it; so this case cannot happen. We are left with the second case, in which there is overlap because a1 finishes after ax starts (but before ax finishes, by hypothesis that a1 is first to finish in S). Since ax is first to finish in O and no two activities in O overlap, then no activity in O occurs before ax, thus no activity in O (aside from ax) overlaps with a1. Thus we could build another equally good optimal solution by substituting a1 for ax in O. Therefore there will always exist an optimal solution that includes a1, QED.

Huffman codes

image-20230416095346930 image-20230416095426523 image-20230514225028773

Data structures

Complexities

Data Structure Insertion/Push Deletion Search Extract Minimum/Maximum Successor/Predecessor DecreaseKey Merge
Fibonacci Heap $O(1)$ $O(logN)$ O(1)
Linked list heap $O(1)$ $O(logN)$
Binary heap $O(logN)$ $O(logN)$
Binomial heap $O(logN)$ $O(logN)$ $O(logN)$ $O(logN)$ $O(logN)$
Binary Search Tree $O(logN)$ $O(logN)$ $O(logN)$ $O(logN)$
Red Black Tree $O(logN)$ $O(logN)$ $O(logN)$

Abstract Data Type

List

Lists provide one natural implementation of stacks, and are the data structure of choice in many places where flexible representation of variable amount of data is wanted.

Doubly-linked list

A feature of lists is that, from one item, you can progress along the list in one direction very easily; but once you have taken the next of a list, there is no way of returning. To make it possible to traverse a list in both directions one might define a new type called Doubly Linked List in which each wagon had both a next and a previous pointer.
$$
w.next.previous == w
$$
Equation (1) would hold for every wagon inside the DLL except for the last
$$
w.previous.next == w
$$
Euqation (2) would hold for every wagon inside the DLL except for the first.

Graph representation (using List)

Represent each vertex by an integer, and having a vector such that element $i$ in the vector holds the head of a list of all the vertices connected directly to edges radiating from vertex $i$.

ADT List
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
ADT List {
boolean isEmpty();
// BEHAVIOUR: Return true iff the structure is empty
item head();
// PRECONDITION: isEmpty()==false
// BEHAVIOR: return the first elemnet of the list (without removing it)
void prepend(ite x);
// BEHAVIOUR: add element <x> to the beginning of the list.
// POSTCONDITION: isEmpty()==false
// BAHAVIOUR: head() == x
List tail();
// PRECONDITION: isEmpty()==false
// BEHAVIOUR: return the list of all the elements except the first (without removing it).

void setTail(List newTail);
// PRECONDITION: isEmpty()==false
// BEHAVIOUR: replace the tail of this list with <newTail>.
}

Stack

  • LIFO (last in first out)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
ADT Stack {
boolean isEmpty();
// BEHAVIOUR: return true iff the structure is empty
void push(item x);
// BEHAVIOUR: add element <x> to the top of the stack
// POSTCONDITION: isEmpty()==false.
// POSTCONDITION: top()==x
item pop();
// PRECONDITION: isEmpty()==false
// BEHAVIOUR: return the element on the top of the stack
// As a side effect, remove it from the stack.
item top();
// PRECONDITION: isEmpty()==false
// BEHAVIOUR: Return the element on top of the stack (without removing it).
}
Implementations
  1. Combination of an array and an index (pointing to the “top of stack”)
    • The push operation writes a value into the array and increments the index
    • The pop operation delets a value from the array and decrements the index
  2. Linked lists
    • Pushing an item adds an extra cell to the front of a list
    • Popping removes the cell at the front of the list
Use cases
  • Reverse Polish Notation (e.g. 3 12 add 4 mul 2 sub -> (3 x 12) x 4 - 2)

Queue and Deque

  • FIFO (first in first out)
1
2
3
4
5
6
7
8
9
10
11
12
13
ADT Queue{
boolean isEmpty();
// BEHAVIOUR: return true iff the structure is empty.
void put(item x);
// BEHAVIOUR: insert element <x> at the end of the queue.
// POSTCONDITION: isEmpty()==false
item get();
// PRECONDITION: isEmpty()==false
// BEHAVIOUR: return the first elemnt of the queue, and removing it from the queue
item first();
// PRECONDITION: isEmpty()==false
// BEHAVIOUR: return the first element of the queue without removing it
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
ADT Queue{
boolean isEmpty();
// BEHAVIOUR: return true iff the structure is empty.
void putFront(item x);
// BEHAVIOUR: insert element <x> at the front of the queue.
// POSTCONDITION: isEmpty()==false
void putRear(item x);
// BEHAVIOUR: insert element <x> at the back of the queue.
// POSTCONDITION: isEmpty()==false
item getFront();
// PRECONDITION: isEmpty()==false
// BEHAVIOUR: return the first elemnt of the queue
item getRear();
// PRECONDITION: isEmpty()==false
// BEHAVIOUR: return the last element of the queue
}

Dictionary

  • Also known as Maps, Tables, Associative arrays, or Symbol tables
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ADT Dictionary {
void set(Key k, Value v);
// BEHAVIOUR: store the given (<k>, <v>) pair in the dictionary.
// If a pair with the same <k> had already been stored, the old
// value is overwritten and lost.
// POSTCONDITION: get(k) == v

Value get(Key k);
// PRECONDITION: a pair with the sought key <k> is in the dictionary.
// BEHAVIOUR: return the value associated with the supplied <k>,
// without removing it from the dictionary.

void delete(Key k);
// PRECONDITION: a pair with the given key <k> has already been inserted.
// BEHAVIOUR: remove from the dictionary the key-value pair indexed by
// the given <k>.
}

Set

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
boolean isEmpty();
// BEHAVIOUR: return true iff the structure is empty
boolean hasKey(Key x);
// BEHAVIOUR: return true iff the set contains a pair keyed by <x>
Key chooseAny();
// PRECONDITION: isEmpty()==false
// BEHAVIOUR: Return the key of an arbitrary item from the set

Key min();
// PRECONDITION: isEmpty()==false
// BEHAVIOUR: Return the smallest key in the set

Key max();
// PRECONDITION: isEmpty()==false
// BEHAVIOUR: Return the largest key in the set

Key predecessor(Key k);
// PRECONDITION: haskey(k) == true
// PRECONDITION: min() != k
// BEHAVIOUR: Return the largest key in the set that is smaller than <k>

Key successor(Key k);
// PRECONDITION: hasKey(k) == true
// PRECONDITION: max() !=k
// BEHAVIOUR: Return the smallest key in the set that is larger than <k>.

Set unionWith(Set s);
// BEHAVIOUR: Change this set to become the set obtained by
// forming the union of the set and <s>

Binary Search Tree

To find the successor s of a node x whose key is kx, look in x’s right subtree: if that subtree exists, the successor node s must be in it—otherwise any node in that subtree would have a key between kx and ks, which contradicts the hypothesis that s is the successor. If the right subtree does not exist, the successor may be higher up in the tree. Go up to the parent, then grand-parent, then great-grandparent and so on until the link goes up-and-right rather than up-and-left, and you will find the successor. If you reach the root before having gone up-and-right, then the node has no successor: its key is the highest in the tree.

2-3-4 trees

A 2-3-4 tree is a type of balanced search tree where every internal node can have either 2, 3, or 4 children, hence the name. More specifically:

  1. A 2-node has one data element, and if it is not a leaf, it has two children. The elements in the left subtree are less than the node’s element, and the elements in the right subtree are greater.

  2. A 3-node has two data elements, and if it is not a leaf, it has three children. The elements in the left subtree are less than the first (smaller) element in the node, the elements in the middle subtree are between the node’s elements, and the elements in the right subtree are greater than the second (larger) element.

  3. A 4-node has three data elements, and if it is not a leaf, it has four children. The elements in the first (leftmost) subtree are less than the first (smallest) element in the node, the elements in the second subtree are between the first and second elements in the node, the elements in the third subtree are between the second and third elements in the node, and the elements in the fourth (rightmost) subtree are greater than the third (largest) element.

The key property of a 2-3-4 tree is that all leaf nodes are at the same depth, which ensures the tree remains balanced and operations such as insertion, deletion, and search can be performed in logarithmic time.

2-3-4 trees are commonly used in the implementation of databases and file systems. They are a type of B-tree, and in fact, the widely used B-tree and B+ tree structures are generalizations of 2-3-4 trees.

image-20230416101717670

Inserting

image-20230416101856017

Deletion

image-20230514231022304

Red-black trees

image-20230416102127291

Rotations

image-20230416102709215

What is a BST rotation

image-20230409123605125
2014-p01-q08
image-20230508231551463

Insertion

image-20230416103022162 image-20230416103042821

B-Trees

Rules

  1. There are internal nodes (with keys and payloads and children) and leaf nodes (without keys or payloads or children).
  2. For each key in a node, the node also holds the associated payload.
  3. All leaf nodes are at the same distance from the root.
  4. All internal nodes have at most 2t children; all internal nodes except the root have at least t children.
  5. A node has c children iff it has c−1 keys.
image-20230514232749249

Inserting

To insert a new key (and payload) into a B-tree, look for the key in the B-tree in the usual way. If found, update the payload in place. If not found, you’ll be by then in the right place at the bottom level of the tree (the one where nodes have keyless leaves as children); on the way down, whenever you find a full node, split it in two on the median key and migrate the median key and resulting two children to the parent node (which by inductive hypothesis won’t be full). If the root is full when you start, split it into three nodes (yielding a new root with only one key and adding one level to the tree). Once you get to the appropriate bottom level node, which won’t be full or you would have split it on your way there, insert there.

image-20230514233218401

Deleting

Deleting is a more elaborate affair because it involves numerous subcases. You can’t delete a key from anywhere other than a bottom node (i.e. one whose children are keyless leaves), otherwise you upset its left and right children that lose their separator. In addition, you can’t delete a key from a node that already has the minimum number of keys. So the general algorithm consists of creating the right conditions and then deleting (or, alternatively, deleting and then readjusting).
To move a key to a bottom node for the purpose of deleting it, swap it with its successor (which must be in a bottom node). The tree will have a temporary inversion, but that will disappear as soon as the unwanted key is deleted.

Refill

image-20230416104351438
image-20230416104302865 image-20230416104318650

Hash table

Briefly explain hash functions, hash tables, and collisions

A hash function is a mapping from a key space (for example strings of characters) to an index space (for example the integers from 0 to m-1).

A hash table is a data structure that stores (key,value) pairs indexed by unique key. It offers efficient (typically O(1)) access to the pair, given the key. The core idea is to store the pairs in an array of m elements, placing each pair in the array position given by the hash of the key.

Since the key space is potentially much larger than the index space, the hash function will necessarily map several keys to the same slot. This is known as a collision. Some scheme must be adopted to resolve collisions, other wise the hash table can’t work. The two main strategies involving either storing the colliding pairs in a linked list external to the table (chaining) or storing colliding pairs within the table but at some other position.

Chaining

We can arrange that the locations in the array hold little linear lists that collect all the keys that hash to that particular value. A good hash function will distribute keys fairly evenly over the array, so with luck this will lead to lists with average length ⌈n/m⌉ if n keys are in use.

Open addressing

The second way of using hashing is to use the hash value h(n) as just a first preference for where to store the given key in the array. On adding a new key, if that location is empty then well and good—it can be used; otherwise, a succession of other probes are made of the hash table according to some rule until either the key is found to be present or an empty slot for it is located. The simplest (but not the best) method of collision resolution is to try successive array locations on from the place of the first probe, wrapping round at the end of the array34. Note that, with the open addressing strategy, where all keys are kept in the array, the array may become full, and that its performance decreases significantly when it is nearly full; implementations will typically double the size of the array once occupancy goes above a certain threshold.

image-20230416105202289
Linear probing

This easy probing function just returns $h(k) + j \mod m.$ In other words, at every new attempt, try the next cell in sequence. It is always a permutation. Linear probing is simple to understand and implement but it leads to primary clustering: many failed attempts hit the same slot and spill over to the same follow-up slots. The result is longer and longer runs of occupied slots, increasing search time.

Quadratic probing

With quadratic probing you return $h(k) + cj + dj^2 \mod m$ for some constants c and d. This works much better than linear probing, provided that c and d are chosen appropriately: when two distinct probing sequences hit the same slot, in subsequent probes they then hit different slots. However it still leads to secondary clustering because any two keys that hash to the same value will yield the same probing sequence.

Double hashing

With double hashing the probing sequence is $h1(k)+ j · h2(k) \mod m$, using two different hash functions h1 and h2. As a consequence, even keys that hash to the same value (under h1) are in fact assigned different probing sequences. It is the best of the three methods in terms of spreading the probes across all slots, but of course each access costs an extra hash function computation.

Explain the open addressing strategy of collision resolution and the term probing sequence used in that context.

The open addressing strategy resolves collisions by allowing several successive probes for the same key, with each probe generating a new position to be tried. The sequence of indices generated by a given key is called the probing sequence for that key.

Explain quadratic probing and its advantages and disadvantages. [Hint: refer to primary and secondary clustering.

In quadratic probing, the probing function is a quadratic polynomial of the probe number. This has the advantage of avoiding the primary clustering problem of the simpler linear probing strategy in which, whenever the probing sequence for a key $k_1$ contains the value of h($k_2$) for some other key $k_2$, the two probing sequences collide on every successive value from then onwards, causing a vicious circle of more and more collisions. With quadratic probing, the sequences may collide in that position, but will then diverage again for subsequent probes, However, quadratic probing still stuffers from secondary clustering: if two distinct keys hash to the same value, their probing sequences will be identical and will collide at every probe number.

Graphs and path finding

Notation and representation

image-20230416110628775 image-20230416110701404 image-20230416110721826

Consider the two standard representations of directed graphs: the adjacency-list representation and the adjacency-matrix representation. Find a problem that can be solved more efficiently in the adjacency-list representation than in the adjacency-matrix representation, and another problem that can be solved more efficiently in the adjacency-matrix representation than in the adjacency-list representation.

Problem 1: Is there an edge from node u to v? In the adjacency-matrix representation, we can quickly test this by looking up the entry $a_{u,v}$ of the adjacency matrix, which takes $O(1)$ time. In the adjacency list representation, we may need to go through the whole list $Adj[u]$ which might take $\theta(V)$ time

Problem 2: Is the node u a sink (i.e., are there any outgoing edges from u)? In the adjacency-list representation we can test in $O(1)$ time whether a node u is a sink by checking whether the list $Adj[u]$ is empty or not. In the adjacency-matrix representation, however, this requires $\theta(V)$ time

Implementation 1: Recursion

image-20230416110857331

Implementation 2: With a stack

image-20230416111001215

Analysis

image-20230416190235584 image-20230514235548209

queue implementation

image-20230416111321754

bfs_path

image-20230416111411376

Dijkstra

Implementation

image-20230416111505261

Analysis

Running time

image-20230416111712125

Correctness
image-20230416112056270 image-20230416112117987

Bellman-Ford

image-20230416112419602

Implementation

image-20230416112452955

Analysis

image-20230416112704315

What are the advantages and disadvantages of the Bellman-Ford Algorithm in comparison to Dijkstra’s Algorithm?

Firstly, it can be applied to graphs with non-negative edges. Secondly, it can be also used to detect negative-weight cycles. The disadvantage of the Bellman-Ford Algorithm is the higher runtime which is O(VE), while Dijkstra’s Algorithm can be implemented in O(VlogV + E) time, which is much more efficient, especially if the given graph is sparse.

Johnson’s algorithm

Implementation

image-20230416113215327

All-pair shortest path

All-pairs shortest path problem can be solved using the Floyd-Warshall algorithm or using repeated squaring in the adjacency matrix with modifications - which is a kind of a matrix multiplication.

Floyd-Warshall algorithm uses a dynamic programming approach to solve the all-pairs shortest path problem. Here’s a brief pseudocode for the algorithm:

1
2
3
4
5
6
7
8
9
10
11
function FloydWarshall(graph)
// graph: adjacency matrix for the graph

let dist = copy(graph) // Create a copy of the adjacency matrix

for k from 1 to n
for i from 1 to n
for j from 1 to n
dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j])

return dist

In the above pseudocode, graph is the adjacency matrix of the graph, and n is the number of vertices. The algorithm iteratively updates the shortest path between every pair of vertices (i, j) considering vertex k as an intermediate vertex. The final matrix dist contains the shortest path distances between every pair of vertices.

However, there is another method using repeated squaring of the adjacency matrix. If we multiply the adjacency matrix A by itself we get a new matrix A^2, where each element A^2[i][j] is the number of paths of length 2 from vertex i to vertex j. By taking higher powers of A, we can find the number of paths of any length. To find the shortest paths, we need to modify this slightly to keep track of the shortest path found so far. However, this method is more complex and generally not as efficient as Floyd-Warshall or Dijkstra’s algorithm for sparse graphs.

Past papers

What are the advantages and disadvantages of the Bellman-Ford Algorithm in comparison to Dijkstra’s Algorithm?

Firstly, it can be applied to graphs with non-negative edges [1 mark]. Secondly, it can be also used to detect negative-weight cycles [1 mark]. The disadvantage of the Bellman-Ford Algorithm is the higher runtime which is O(V · E), while Dijkstra’s Algorithm can be implemented in O(V log V + E) time, which is much more efficient, especially if the given graph is sparse

2014-p01-q10
image-20230510162852926
image-20230510163401328 image-20230510163423895

Graphs and subgraphs

Ford-Fulkerson algorithm

Flow

A flow is a set of edge labels $f(u\rarr v)$ such that
$$
0 \le f(u\rarr v)\le c(u\rarr v) \text{ on every edge}
$$

$$
\text{value}(f) = \sum_{u:s\rarr u}f(s\rarr u)-\sum_{u:u\rarr s}f(u\rarr s)
$$

Flow Conservation

$$
\sum_{u:u\rarr v}f(u\rarr v)-\sum_{u:v\rarr u}f(v\rarr u)=0
$$

Implementation

The residual graph
image-20230416114148455
Augmenting paths
image-20230416114238470 image-20230416114354420 image-20230416114447173

Analysis

Running time
image-20230416114556985

Correctness

Max-flow min-cut theorem
image-20230416120010605

State the Max-Flow Min-Cut Theorem

For any flow network with source s and sink t, the value of maximum flow equals the minimum capacity of an (s,t) cut.

image-20230416120143609

Past papers

image-20230509232338837

Matchings

image-20230416121218271 image-20230416121622044 image-20230416121238893

Prim’s algorithm

image-20230416121722912

Implementation

image-20230416121805803

Analysis

image-20230416155149403

Kruskal’s algorithm

image-20230416122241715

Implementation

image-20230416122322196 image-20230416122351869**

Analysis

image-20230416122447273

Topological sort

image-20230416122849911

Analysis

image-20230416123509390

Advanced data structures

Priority queue

image-20230416141703497

Binary heaps

image-20230416105907642

Binomial heaps

image-20230416105943386
first()

To find the element with the smallest key in the whole binomial heap, scan the roots of all the binomial trees in the heap, at cost O(lg n) since there are that many trees.

extractMin()

To extract the element with the smallest key, which is necessarily a root, first find it, as above, at cost $O(lg n)$; then cut it out from its tree. Its children now form a forest of binomial trees of smaller orders, already sorted by decreasing size. Reverse this list of trees and you have another binomial heap. Merge this heap with what remains of the original one. Since the merge operation itself (q.v.) costs $O(lg n)$, this is also the total cost of extracting the minimum.

merge()

To merge two binomial heaps, examine their trees by increasing tree order and combine them following a procedure similar to the one used during binary addition with carry with a chain of full adders.

insert()

To insert a new element, consider it as a binomial heap with only one tree with only one node and merge it as above, at cost $O(lg n).$

decreaseKey()

To decrease the key of an item, proceed as in the case of a normal binary heap within the binomial tree to which the item belongs, at cost no greater than $O(lg n)$ which bounds the height of that tree.

Linked List

image-20230416141941809

Fibonacci heap

image-20230416142148290
push()
image-20230416142417837
popmin()
image-20230416142443596
cleanup()
image-20230416142507945
decrease key()
image-20230416142806095 image-20230416142917843

Analysis

image-20230416143600275

Fibonacci Shape Theorem

image-20230416145153066

Past papers

image-20230416154721801

Disjoint set

image-20230416145358031

In a disjoint set, the weighted union heuristic says: keep track of the size of each set, and when you have to merge two sets then use the size information to decide which set to update, to reduce the amount of work. [Must include: keep track of the sizes.

The path compression heuristic says: when you do a search, by following up a search tree to the root, then rewire the tree so that subsequent searches for the same element can be done quicker

Implementation1: Flat Forest

image-20230416145521202

Implementation2: Deep Forest

image-20230416145618029**

Implementation3: Lazy Forest

image-20230416145743445