V1
DSA
Algorithms & Data Structures Handbook
Beginner → Intermediate Lang-agnostic
Data Structures & Algorithms

Think Like a
Computer Scientist

A beginner-friendly reference that builds your intuition for data structures and algorithms — from arrays all the way to graphs, sorting, and dynamic programming.

Arrays & Lists Trees & Graphs Sorting & Search Recursion Hash Tables Dynamic Programming
01

What & Why

The real reason DSA matters for every programmer

A data structure is a way of organizing data in memory so it can be used efficiently. An algorithm is a step-by-step process for solving a problem. The two are inseparable — the data structure you choose determines which algorithms are fast and which are slow.

Efficiency

The right data structure can turn a program that takes 10 hours into one that runs in 1 second — literally. DSA teaches you to see the difference before you code.

Interviews

Every major tech company uses DSA problems to evaluate candidates. Understanding the "why" behind structures makes solutions feel obvious rather than memorized.

Better Code

When you know what's under the hood — a dictionary is a hash table, a list.sort() is Timsort — you make better decisions every day.

Mental model: Think of data structures like containers. A bag (array), a stack of plates (stack), a to-do queue (queue), a filing cabinet (hash map), a family tree (tree). The shape of the container determines how fast you can find, add, and remove things.
02

Big-O Notation

Measuring efficiency without clocks or benchmarks

Big-O describes how an algorithm's runtime (or memory usage) grows as the input size n grows. It ignores constants and lower-order terms — we care about the shape of growth, not the exact number.

Best
O(1)
Constant
Great
O(log n)
Logarithmic
Good
O(n)
Linear
Okay
O(n log n)
Linearithmic
Bad
O(n²)
Quadratic
Terrible
O(2ⁿ)
Exponential

Concrete examples

NotationNameReal-world analogyExample operation
O(1)ConstantFlipping to a page by number in a bookAccess array by index
O(log n)LogarithmicPhone book binary search — each step halves the search spaceBinary search, BST lookup
O(n)LinearReading every name in a phone bookLinear scan, finding max in unsorted list
O(n log n)LinearithmicSorting a deck of cards efficientlyMerge sort, heap sort
O(n²)QuadraticComparing every person in a room to every other personBubble sort, nested loops
O(2ⁿ)ExponentialAll possible ways to seat guestsNaive recursive Fibonacci, subset problems

How to calculate Big-O

pseudocode
// Rule 1: Drop constants O(2n) → O(n) O(500) → O(1) // Rule 2: Drop non-dominant terms O(n² + n) → O(n²) O(n + log n) → O(n) // Rule 3: Count loops for i in range(n): // O(n) print(i) for i in range(n): // O(n²) — nested loop for j in range(n): print(i, j) for i in range(n): // O(n) — sequential, not nested print(i) for j in range(n): print(j)
Common beginner mistake: Confusing "best case" and "worst case." We almost always talk about worst case Big-O — what's the longest it could possibly take? When someone says binary search is O(log n), they mean worst case.
03

Memory Model

Stack vs Heap — what lives where and why it matters

Stack memory

  • Fixed size, fast access
  • Managed automatically (LIFO)
  • Stores: local variables, function calls, primitives
  • Limited size — deep recursion causes stack overflow

Heap memory

  • Large, flexible, slower access
  • Managed manually or by GC
  • Stores: objects, arrays, dynamic data
  • Memory leaks happen here (in languages without GC)
💡
Why it matters for DSA: A linked list stores nodes on the heap — each node is a separate allocation. An array is a single contiguous block. Heap fragmentation and cache misses make linked lists slower in practice even when Big-O says they're equivalent.
04

Arrays

Contiguous blocks of memory — the foundation of everything

An array stores elements in a contiguous block of memory. Because each element is the same size and they're adjacent, you can jump directly to any index in O(1) time — the computer knows exactly where it is.

42
index 0
17
index 1
89
index 2
5
index 3
63
index 4
OperationTimeWhy
Read by indexO(1)Direct address calculation: base + index × size
Search (unsorted)O(n)Must check every element in the worst case
Search (sorted)O(log n)Binary search cuts the space in half each time
Insert at endO(1)*Amortized for dynamic arrays; may trigger resize
Insert at start/middleO(n)All subsequent elements must be shifted right
Delete at start/middleO(n)All subsequent elements must be shifted left
python
# Creating arrays (Python calls them lists) nums = [10, 20, 30, 40] # O(1) — direct index access first = nums[0] # → 10 last = nums[-1] # → 40 # O(1) amortized — append to end nums.append(50) # [10, 20, 30, 40, 50] # O(n) — insert at start (all elements shift right) nums.insert(0, 5) # [5, 10, 20, 30, 40, 50] # O(n) — linear search def linear_search(arr, target): for i, val in enumerate(arr): if val == target: return i return -1 # Two-pointer technique — a key array pattern def has_pair_with_sum(arr, target): left, right = 0, len(arr) - 1 while left < right: s = arr[left] + arr[right] if s == target: return True elif s < target: left += 1 else: right -= 1 return False
05

Linked Lists

Nodes pointing to nodes — dynamic insertion without shifting

A linked list is a chain of nodes. Each node holds a value and a pointer (reference) to the next node. There's no contiguous memory — each node lives wherever the heap puts it. This makes insertion fast but indexing slow.

python
class Node: def __init__(self, val): self.val = val self.next = None # pointer to next node class LinkedList: def __init__(self): self.head = None def prepend(self, val): # O(1) — new head node = Node(val) node.next = self.head self.head = node def append(self, val): # O(n) — must walk to tail node = Node(val) if not self.head: self.head = node; return cur = self.head while cur.next: cur = cur.next cur.next = node def delete(self, val): # O(n) — find then rewire if not self.head: return if self.head.val == val: self.head = self.head.next; return cur = self.head while cur.next: if cur.next.val == val: cur.next = cur.next.next; return cur = cur.next
Linked List wins when…
  • Frequent insert/delete at front or middle
  • Size is unknown and changes constantly
  • You don't need random access
Array wins when…
  • You need fast random access by index
  • Cache performance matters (spatial locality)
  • Size is roughly known upfront
💡
Fast & slow pointer trick (Floyd's algorithm): Use two pointers — one moving 1 step at a time, one moving 2. If they ever meet, there's a cycle. This runs in O(n) time and O(1) space and is a classic interview pattern.
06

Stacks

Last In, First Out — LIFO discipline

A stack restricts access to one end only. You can only push (add to top) or pop (remove from top). Think of a stack of plates — you always take from the top.

python
# Python list as a stack — all operations O(1) stack = [] stack.append('a') # push stack.append('b') stack.append('c') stack[-1] # peek → 'c' stack.pop() # pop → 'c' # Classic use: matching parentheses — O(n) def is_balanced(s): stack = [] pairs = {')':'(', ']':'[', '}':'{'} for ch in s: if ch in '([{': stack.append(ch) elif ch in ')]}': if not stack or stack[-1] != pairs[ch]: return False stack.pop() return not stack
📌
Where stacks appear in real life: Your browser's back button (history stack), undo/redo in editors, function call management (the "call stack"), expression evaluation in compilers, and depth-first graph traversal.
07

Queues

First In, First Out — FIFO discipline

A queue is like a line at a coffee shop — people join at the back (enqueue) and leave from the front (dequeue). The first one in is the first one out.

python
from collections import deque # deque is O(1) for both ends. Regular list is O(n) for pop(0). queue = deque() queue.append('Alice') # enqueue queue.append('Bob') queue.append('Carol') queue[0] # peek → 'Alice' queue.popleft() # dequeue → 'Alice' # BFS (breadth-first search) — requires a queue def bfs(graph, start): visited = set([start]) queue = deque([start]) order = [] while queue: node = queue.popleft() order.append(node) for neighbor in graph[node]: if neighbor not in visited: visited.add(neighbor) queue.append(neighbor) return order
08

Hash Tables

O(1) average lookup — the most useful data structure you'll use

A hash table maps keys to values. Internally, a hash function converts any key into an integer index into an underlying array. This is why lookup, insert, and delete are all O(1) average case — it's just an array access under the hood.

1

Hash the key

Pass the key through a hash function. E.g. "name"hash("name") = 7284...

2

Compute the bucket index

Take hash % table_size to get an index in the array. E.g. 7284 % 16 = 4

3

Store or retrieve

Put the key-value pair at that index. On retrieval, hash the key again and go to the same slot.

python
# Python dict is a hash table freq = {} words = ["apple", "banana", "apple", "cherry", "banana", "apple"] # Count word frequency — O(n) for word in words: freq[word] = freq.get(word, 0) + 1 # → {'apple': 3, 'banana': 2, 'cherry': 1} # Classic pattern: two-sum problem — O(n) using a hash set def two_sum(nums, target): seen = {} for i, n in enumerate(nums): complement = target - n if complement in seen: return [seen[complement], i] seen[n] = i return [] # Detect duplicate — O(n) def has_duplicate(arr): return len(arr) != len(set(arr))
Collisions: When two keys hash to the same bucket, it's a collision. Good hash tables handle this with chaining (a linked list at each slot) or open addressing. Worst case with many collisions degrades to O(n), but a good hash function keeps this rare.
09

Trees & Binary Search Trees

Hierarchical data — fast search, ordered structure

A tree is a hierarchical structure of nodes. Each node has a parent (except the root) and zero or more children. The most important variant is the Binary Search Tree (BST): each node has at most 2 children, and left subtree values are always smaller, right subtree values are always larger.

python
class TreeNode: def __init__(self, val): self.val = val self.left = None self.right = None # BST insertion — O(log n) balanced, O(n) worst def insert(root, val): if not root: return TreeNode(val) if val < root.val: root.left = insert(root.left, val) elif val > root.val: root.right = insert(root.right, val) return root # The 3 depth-first traversal orders — each visits every node once: O(n) def inorder(node): # Left → Root → Right → sorted output for BST! if not node: return inorder(node.left) print(node.val) inorder(node.right) def preorder(node): # Root → Left → Right → good for copying trees if not node: return print(node.val); preorder(node.left); preorder(node.right) def postorder(node): # Left → Right → Root → good for deleting trees if not node: return postorder(node.left); postorder(node.right); print(node.val) # Tree height — O(n) def height(node): if not node: return 0 return 1 + max(height(node.left), height(node.right))
OperationBalanced BSTUnbalanced (worst case)
SearchO(log n)O(n)
InsertO(log n)O(n)
DeleteO(log n)O(n)
Min / MaxO(log n)O(n)
Balanced vs Unbalanced: If you insert already-sorted values (1, 2, 3, 4…) into a BST, it degrades into a linked list — O(n) lookups. Self-balancing trees like AVL trees and Red-Black trees automatically rotate nodes to maintain O(log n). Python's sortedcontainers.SortedList or Java's TreeMap use these internally.
10

Heaps

Priority queues — always know the min (or max) instantly

A heap is a complete binary tree with one rule: every parent is ≤ its children (min-heap). This means the smallest element is always at the root — accessible in O(1). Inserting and removing cost O(log n) because you might need to "bubble" the new element up or down.

python
import heapq # Python's heapq is a min-heap h = [] heapq.heappush(h, 5) # O(log n) heapq.heappush(h, 1) heapq.heappush(h, 8) heapq.heappush(h, 3) h[0] # O(1) peek at min → 1 heapq.heappop(h) # O(log n) remove min → 1 # Build heap from a list in O(n) — faster than n × push heapq.heapify(h) # Max-heap trick: negate values max_heap = [] for n in [5, 1, 8, 3]: heapq.heappush(max_heap, -n) -heapq.heappop(max_heap) # → 8 (largest) # "K largest elements" — classic heap problem: O(n log k) import heapq def k_largest(nums, k): return heapq.nlargest(k, nums)
💡
When to use a heap: Whenever you repeatedly need the minimum or maximum from a dynamic set. Task schedulers (highest-priority task next), Dijkstra's shortest path algorithm, median finding (two heaps), and "top K" problems all use heaps.
11

Graphs

The most general structure — trees and linked lists are just special graphs

A graph is a set of vertices (nodes) connected by edges. Edges can be directed (one-way) or undirected (two-way), and weighted (with a cost) or unweighted. Social networks, maps, and the internet are all graphs.

Adjacency List

Each node stores a list of its neighbors. Space: O(V + E). Best for sparse graphs (few edges). Most common representation.

Adjacency Matrix

An n×n grid where matrix[i][j] = 1 means an edge exists. Space: O(V²). Fast edge-existence check but wasteful for sparse graphs.

python
# Graph as adjacency list — the standard approach graph = { 'A': ['B', 'C'], 'B': ['A', 'D'], 'C': ['A', 'D'], 'D': ['B', 'C'], } # DFS — uses a stack (or recursion) — O(V + E) def dfs(graph, start, visited=None): if visited is None: visited = set() visited.add(start) print(start) for neighbor in graph[start]: if neighbor not in visited: dfs(graph, neighbor, visited) # BFS — uses a queue — O(V + E) # (see Queue section above) # Detect cycle in undirected graph — O(V + E) def has_cycle(graph): visited = set() def dfs(node, parent): visited.add(node) for neighbor in graph[node]: if neighbor not in visited: if dfs(neighbor, node): return True elif neighbor != parent: return True return False for node in graph: if node not in visited: if dfs(node, None): return True return False
12

Sorting Algorithms

From O(n²) basics to O(n log n) workhorses
Beginner
Bubble Sort
BestO(n)
AverageO(n²)
WorstO(n²)
SpaceO(1)
StableYes

Repeatedly step through the list, compare adjacent elements, and swap if they're in the wrong order. The largest values "bubble up" to the end. Only useful for learning — never in production.

Important
Merge Sort
BestO(n log n)
AverageO(n log n)
WorstO(n log n)
SpaceO(n)
StableYes

Divide the array in half, recursively sort each half, then merge the two sorted halves. Guaranteed O(n log n) — great for linked lists and external sorting (data too big for RAM).

python
def merge_sort(arr): if len(arr) <= 1: return arr mid = len(arr) // 2 left = merge_sort(arr[:mid]) right = merge_sort(arr[mid:]) return merge(left, right) def merge(L, R): result, i, j = [], 0, 0 while i < len(L) and j < len(R): if L[i] <= R[j]: result.append(L[i]); i += 1 else: result.append(R[j]); j += 1 return result + L[i:] + R[j:]
Essential
Quick Sort
BestO(n log n)
AverageO(n log n)
WorstO(n²)
SpaceO(log n)
StableNo

Pick a pivot, partition: all smaller values left, all larger right. Recursively sort partitions. In practice the fastest general-purpose sort — C's qsort, Java's Arrays.sort (primitives), and many others use it.

📌
Just use the built-in. Python's list.sort() and sorted() use Timsort — a hybrid merge/insertion sort that's O(n log n) worst case and O(n) on nearly-sorted data. It's stable. Unless you're learning, always reach for the built-in first.
14

Recursion

A function that calls itself — the key to tree and graph algorithms

Every recursive function needs two things: a base case (when to stop) and a recursive case (how to call itself with a smaller problem). Without a base case, you get infinite recursion and a stack overflow.

python
# Classic: Fibonacci — naive O(2ⁿ) def fib_naive(n): if n <= 1: return n # ← base case return fib_naive(n-1) + fib_naive(n-2) # ← recursive case # Same result, O(n) with memoization — cache results you've seen from functools import lru_cache @lru_cache(maxsize=None) def fib(n): if n <= 1: return n return fib(n-1) + fib(n-2) # Power set — all subsets. O(2ⁿ) — exponential is unavoidable here def power_set(nums): if not nums: return [[]] first = nums[0] rest = power_set(nums[1:]) return rest + [[first] + s for s in rest] # Convert recursion to iteration when stack depth is a concern def fib_iterative(n): # O(n) time, O(1) space a, b = 0, 1 for _ in range(n): a, b = b, a+b return a
Stack overflow: Python's default recursion limit is 1000. For large inputs, prefer iteration or increase the limit with sys.setrecursionlimit(). Tail recursion is not optimized in Python — always prefer iterative solutions for linear recursion (Fibonacci, factorials).
15

Problem Patterns

Recognizing which technique to reach for
PatternWhen to useExample problems
Two PointersSorted array, finding pairs, reversalsTwo Sum (sorted), palindrome check, remove duplicates
Sliding WindowSubarray/substring of variable or fixed sizeMax sum subarray, longest substring without repeat
Fast & Slow PointersCycle detection, middle of linked listFloyd's cycle, linked list middle, happy number
BFSShortest path (unweighted), level-order, nearest XShortest path in grid, word ladder, connect components
DFSExplore all paths, backtracking, tree problemsNumber of islands, path sum, permutations
Binary SearchSorted data, search space can be eliminated by halfSearch in rotated array, find peak, koko bananas
Heap / Priority QueueK-th largest/smallest, repeated min/max extractionKth largest element, merge K sorted lists
Dynamic ProgrammingOptimal substructure, overlapping subproblemsFibonacci, coin change, knapsack, LCS

Sliding Window example

python
# Max sum of subarray of size k — O(n) def max_sum_subarray(nums, k): window_sum = sum(nums[:k]) best = window_sum for i in range(k, len(nums)): window_sum += nums[i] - nums[i - k] # slide: add right, remove left best = max(best, window_sum) return best # Longest substring without repeating characters — O(n) def length_of_longest_substring(s): seen = {} left = best = 0 for right, ch in enumerate(s): if ch in seen and seen[ch] >= left: left = seen[ch] + 1 # shrink window seen[ch] = right best = max(best, right - left + 1) return best
16

Dynamic Programming

Break a big problem into smaller ones — and remember the answers

DP applies when a problem has overlapping subproblems (the same smaller problem gets solved multiple times) and optimal substructure (the best solution to the big problem is built from best solutions to sub-problems). You can approach DP two ways:

Top-Down (Memoization)

Write the recursive solution, then add a cache. If you've seen this subproblem before, return the cached answer. Easier to write, but recursion stack overhead.

Bottom-Up (Tabulation)

Fill a table iteratively from smallest subproblem to largest. No recursion overhead. Often more memory-efficient. Harder to see initially.

python
# Coin Change: minimum coins to make amount — classic DP # coins = [1, 5, 10], amount = 11 → 2 coins (10 + 1) # Bottom-up — O(n × amount) def coin_change(coins, amount): dp = [float('inf')] * (amount + 1) dp[0] = 0 # 0 coins needed to make $0 for a in range(1, amount + 1): for coin in coins: if coin <= a: dp[a] = min(dp[a], 1 + dp[a - coin]) return dp[amount] if dp[amount] != float('inf') else -1 # Longest Common Subsequence — O(m × n) def lcs(s1, s2): m, n = len(s1), len(s2) dp = [[0] * (n + 1) for _ in range(m + 1)] for i in range(1, m + 1): for j in range(1, n + 1): if s1[i-1] == s2[j-1]: dp[i][j] = 1 + dp[i-1][j-1] else: dp[i][j] = max(dp[i-1][j], dp[i][j-1]) return dp[m][n]
The DP thought process: (1) Can I define the problem in terms of smaller versions of itself? (2) What's the simplest base case? (3) How do I combine solutions to get the bigger answer? If yes to all three, DP applies. Start with the recursive solution — memoize — then convert to tabulation.
17

Choosing the Right Data Structure

A practical decision guide
Your needBest choiceWhy
Fast access by index, cache friendlyArrayO(1) access, contiguous memory
Fast insert/delete at frontLinked List or dequeNo shifting needed
Reverse / undo / expression parsingStackLIFO matches the problem shape
Processing in order / BFSQueue / dequeFIFO — fair ordering
Fast key-value lookup, countingHash MapO(1) average access
Deduplication, membership testHash SetO(1) contains, no duplicates
Ordered data, range queriesBST / Sorted ArrayO(log n) search + in-order traversal
Always need min or maxHeapO(1) peek, O(log n) insert/remove
Relationships, shortest pathsGraphModels arbitrary connections
Prefix search, autocompleteTrieO(m) search where m = word length
💡
The golden rule: Always clarify the most frequent operation first. Is it read-heavy or write-heavy? Do you need ordering? Are lookups by key or by position? The answer tells you the data structure — everything else follows.