And then magically turn back to 75GB?
Nothing is deleted. It's just packed smarter.
Imagine you have to write "A" one hundred times.
You could write "A A A A..." (100 times).
OR you could write: A x 100.
This is called Run-Length Encoding (RLE). It's the simplest form of compression. It doesn't delete the letters; it just changes how we describe them.
Analogy: Vacuum Packing
The clothes (data) are the same. We just removed the air (repetition) between them.
Real files aren't just "AAAAAA". They are complex sentences, code, and logs.
Modern ZIP uses an algorithm called DEFLATE.
Instead of writing the same word twice, the computer points back to where it saw it first. "Go back X characters, copy Y characters."
Most data is repetitive. Code has repeated keywords (`function`, `return`). Logs have repeated timestamps. The compressor scans the file and creates a Map.
After finding patterns, it assigns short codes to frequent items.
Common letter 'e' might become 01 (2 bits).
Rare letter 'z' might become 100110 (6 bits).
Think of the zip file as a book index. It doesn't contain the chapters again; it just tells you page numbers. The 75GB is reconstructed by following the map perfectly.
They are already compressed! They have already removed the repetition. Zipping them is like trying to squeeze a sponge that's already dry.
Encryption makes data look like random noise (high entropy). No patterns = No compression.
The "Map" (metadata) takes up space. If the file is too small, the map might be bigger than the savings!
Example: If a file has:
AAAAAAABBBBBBBCCCCCCC
Zip stores it as:
A x7, B x7, C x7
Less space. Same info.
Most large files have patterns:
Compression algorithms:
That map is why the file can become FULL SIZE again.
Nothing is created. Nothing is lost.
Just reconstructed.
Why some files barely shrink?
They have less repetition that leads to less compression.
Data wasn’t reduced. It was packed efficiently.