database

DBMS Indexing Masterclass

Core Concept

What is Indexing?

Indexing is a data structure technique used to quickly locate and access the data in a database.

menu_book The Real-World Analogy

Imagine a textbook with 1,000 pages. You want to find the chapter on "Normalization".

  • Without Index (Full Table Scan) You flip through every single page from 1 to 1000 until you find it. This is slow, O(N).
  • With Index (B-Tree Seek) You go to the "Index" section at the back of the book. It's sorted alphabetically. You find "N", then "Normalization", which points to "Page 450". You go directly there. This is fast, O(log N).
Library Books Analogy

A library without an index is just a pile of books.

Interactive Simulation: Scan vs. Seek

ID to Find: 42

Without Index (Full Scan)

Time: 2033ms
0123456789101112131415161718192021222324252627282930313233344236373839

With Index (B-Tree Seek)

Time: 25ms
50 25 75 12 42

* Note how the scan checks every item, while the index jumps directly through the tree logic.

Clustered vs. Non-Clustered

The Dictionary

A Clustered Index determines the physical order of data in the table. Because the data can only be sorted in one way physically, there can be only one clustered index per table.

  • The leaf nodes of the index contain the actual data pages.
  • Faster for retrieving ranges of data (e.g., "Find all users registered between Jan and Mar").
  • Usually the Primary Key by default.
Root Node Actual Data Table (Sorted) ID: 1 | Name: Alice ID: 2 | Name: Bob ID: 3 | Name: Charlie
school

The Interviewer asks...

Short Answer: A clustered index defines the physical order of data (like a dictionary), while a non-clustered index is a separate structure pointing to the data (like a book index).

Key difference: You can only have one clustered index per table (since data can only be physically sorted one way), but multiple non-clustered indexes.

No! Indexes improve Read performance (SELECT) but degrade Write performance (INSERT, UPDATE, DELETE).

Avoid when:

  • The table is very small (full scan is faster than index overhead).
  • The table has frequent batch updates/inserts (index maintenance cost is high).
  • The column has low cardinality (e.g., 'Gender' column with only 'M'/'F' - an index won't filter much).

The most common structure is the B-Tree (or B+ Tree). It keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time (O(log n)). Hash indexes are also used but are only good for exact equality lookups, not ranges.