Curated by Rajat Kumar

How does 75GB become 20GB?

And then magically turn back to 75GB?
Nothing is deleted. It's just packed smarter.

RAW_DATA.LOG 75 GB
keyboard_double_arrow_down
COMPRESSED.ZIP 20 GB
Saved 55GB (Redundancy Removed)
psychology

Concept 1: The "Lazy" Writer

Imagine you have to write "A" one hundred times.
You could write "A A A A..." (100 times).
OR you could write: A x 100.

This is called Run-Length Encoding (RLE). It's the simplest form of compression. It doesn't delete the letters; it just changes how we describe them.

Analogy: Vacuum Packing

The clothes (data) are the same. We just removed the air (repetition) between them.

science Live Lab: RLE Compressor

arrow_downward
A×7B×7C×7
Size: 21 B
New: 6 B

How ZIP Actually Works

Real files aren't just "AAAAAA". They are complex sentences, code, and logs.
Modern ZIP uses an algorithm called DEFLATE.

1. The "Dictionary" (LZ77)

Instead of writing the same word twice, the computer points back to where it saw it first. "Go back X characters, copy Y characters."

The quick brown fox jumps over First Occurrence <Back:340, Len:20> Pointer Reference The second "The quick brown fox" is replaced by a pointer.

Finding Patterns

Most data is repetitive. Code has repeated keywords (`function`, `return`). Logs have repeated timestamps. The compressor scans the file and creates a Map.

Step 2: Huffman Coding (The Shorthand)

After finding patterns, it assigns short codes to frequent items.
Common letter 'e' might become 01 (2 bits).
Rare letter 'z' might become 100110 (6 bits).

Library Index

The "Map" Analogy

Think of the zip file as a book index. It doesn't contain the chapters again; it just tells you page numbers. The 75GB is reconstructed by following the map perfectly.

Why some files don't shrink

image

JPG & MP4

They are already compressed! They have already removed the repetition. Zipping them is like trying to squeeze a sponge that's already dry.

lock

Encrypted Files

Encryption makes data look like random noise (high entropy). No patterns = No compression.

data_array

Tiny Files

The "Map" (metadata) takes up space. If the file is too small, the map might be bigger than the savings!

Interviewer Cheat Sheet

If asked by an interviewer...

Let’s break the illusion:

  • Zip files don’t delete data
  • They remove repetition

Example: If a file has:

AAAAAAABBBBBBBCCCCCCC

Zip stores it as:

A x7, B x7, C x7

Less space. Same info.

Most large files have patterns:

  • – Repeated text
  • – Similar pixels
  • – Duplicate metadata
  • – Redundant code blocks

Compression algorithms:

  • – Find repetition
  • – Store it once
  • Add a map to rebuild it later

That map is why the file can become FULL SIZE again.

Nothing is created. Nothing is lost.
Just reconstructed.

Why some files barely shrink?

  • Videos (already compressed)
  • Images like JPG/PNG
  • Encrypted files

They have less repetition that leads to less compression.

  • close Zip isn’t Magic
  • check Zip is actually smart math + patterns

Data wasn’t reduced. It was packed efficiently.