Inside information

A summary of the fourth chapter of You Don't Know What You're M ss ng

May 18, 2026

Super excited to get my hands on a copy of the new book this week at Wood festival, where I was talking about my books.

We’re about halfway through the short series of posts I’m writing for this Substack in order to summarise the chapters of my new book, You Don’t Know What You’re M ss ng, ahead of publication on 4th June, and things are starting to feel real. I managed to get my hands on a copy of the the new book at a festival I was speaking at on the weekend before I’ve even been sent my own copies. So nice to feel a physical copy of it in my hands. Some lucky people even walked away from the festival owning a copy of the book before I did. You can get your own hands on a copy by clicking the button below.

Buy the book

As a reminder, the aim of these summaries is to give you a feel for the shape of the argument, the kinds of stories I use in the book to get the message across, and the ideas I hope will stick with you long after you’ve finished reading it.

In this post I’m going to summarise Chapter 4, Inside information.

“Information is a measure of one’s freedom of choice” - Warren Weaver

If you can read a sentence with jumbled letters (see last week’s post about Chapter 3 - Reading Between the Lines), or skim past the odd word without losing the meaning, it suggests something slightly strange, perhaps counterintuitive: not every part of a sentence is doing the same amount of work. Some words are carrying the load. Some are almost redundant. The context surrounding a particular word in a sentence is doing a lot of work when we read.

So it’s natural to ask a question that sounds philosophical but turns out to have a huge amount of practical relevance: how much information is actually contained in a sentence, a paragraph or even a whole language? And if we can measure it, can we strip a message down to its essence - a distilled informational concentrate?

This fourth chapter (Inside Information) is about the art - and the risk - of doing exactly that.

When we compress data, we’re trying to remove what is predictable so we can store or transmit what remains more efficiently. That’s the heart of the information revolution that has transformed daily life in the last few decades: squeezing photos, calls, videos, documents and messages down so they travel quickly and reduce the amount of storage needed.

But there’s a catch. The moment you boil down message down to its bare informational bones, you strip away the helpful redundancy that makes communication robust. Perfect compression gives you no safety net. If something goes wrong, you may have no way to spot the error, let alone correct it. The challenge is finding the balance between economic efficiency and reliable resilience.

Information theory is not concerned with whether a message is true, wise, or worthwhile. In that sense, the information content of a sentence is quite different from its meaning. Instead, information theory is concerned with the mechanics: how messages can be encoded, stored, transmitted, and recovered.

The person who, more than any other, showed us how to think about information properly was Claude Shannon. Shannon’s breakthrough was to quantify information using a concept called entropy.

Thinking about entropy as a measure of surprise is a useful analogy. If something is very predictable, it contains little information. If it’s genuinely uncertain, you gain more information when you discover the outcome. In Scrabble, picking a Z gives you more information (because you don’t expect to see a Z because it’s rare) than drawing an E would (because you expect to see the common letter E). The surprise you get is different because the probabilities of the letters are different.

The maximum information you can gain from a yes-no question is one bit (a portmanteau of ‘binary’ and ‘digit’) - and you only get close to that maximum if “yes” and “no” are equally likely. This is why the best strategy in Guess Who is not to ask something unlikely like “Are they bald?” (high payoff if yes, but usually no), but to ask questions that roughly split the remaining options in half (“Are they female?” is usually a good starter question). The same logic underpins the most efficient way to play Twenty Questions: keep dividing the space of possibilities as evenly as you can. It’s not the most fun way to play, but it’s a good illustration of how you can gain as much information as possible.

Another of Shannon’s brilliant insights takes us back to language. Letters don’t appear independently of each other. They come in patterns - words. They come with context - sentences. That context makes them predictable, which means each letter carries less information than you might naïvely think.

Shannon tried to estimate how much information there really is per character in English, once you include the predictability created by surrounding letters and words. The startling conclusion is that context can reduce the effective information content of English to around a bit per character (whereas 5 binary digits would be needed to encode the 26 letters of the alphabet – 00000 encoding A, 00001 encoding B, 00010 encoding C, 00011 encoding D etc). In other words, a lot of written language is redundancy - usually helpful redundancy.

Share Kit Yates - Math(s) and the real world

This is where compression comes in. Standard text encodings like ASCII use seven bits per character, which is robust but not very efficient. If English can be encoded, in principle, far more economically, then a lot of what ASCII stores is redundancy - space we could make go missing. Compression algorithms are built to exploit exactly this sort of structure: they remove predictable parts while keeping enough to reconstruct the message.

But once you start stripping away redundancy, you start living dangerously. A beautifully compressed message can become brittle. If a single error creeps in, the receiver may not be able to tell.

Morse code is a lovely historical stepping stone here because the choice of encodings for the letters was an early form of compression in action. Morse assigns short codes to common letters and longer codes to rare ones. But Morse comes with a vulnerability: its symbols are not naturally self-delimiting. If spacing is sloppy, messages can become ambiguous. A pause too short can turn clarity into confusion.

The most haunting illustration is the story of the Star Dust, a plane that vanished over the Andes in 1947. Its final transmission ended with the enigmatic word “STENDEC”, repeated three times, and then silence. No one knew what STENDEC meant. What makes the tragedy instructive rather than just intriguing is not the mystery, but the mechanism: in Morse code, missing or misplaced gaps can destroy meaning. When context is stripped away, a small formatting error can convert a message into a riddle with no solution.

It’s similar to the risk we run when we compress information too aggressively. If all the context is removed and a mistake occurs, the message may become ambiguous or unintelligible.

So what do we do about that fragility? We add redundancy back in - but deliberately and in a way that helps us catch and correct errors.

This is where error-detection and error-correction codes come in. The point is not to avoid redundancy entirely, but to choose the right kind. Add a single parity bit to a string of binary digits and you can detect that something has gone wrong. Add more structured redundancy and you can often pinpoint the error and correct it. These techniques sit behind the everyday miracle of modern computing: the fact that we can store and transmit huge volumes of data through noisy, unreliable physical systems and still have it arrive intact.

If you want an example of how small an error can be and how dramatic the consequence, you could do worse than look to the Belgian federal election in 2003, where an electronic voting anomaly turned out to be consistent with a single flipped bit - a tiny physical event in memory leading to a large jump in the number of votes and almost to an incorrect winner being announced.

The lesson is not “computers can’t be trusted”. The lesson is that without redundancy and checking, even a naturally occurring one-bit glitch can change the result.

This information theory chapter ends by returning to the broader theme: the key to efficient, robust communication is not maximal compression, nor is it no compression. It’s a finely struck balance between the two. We need enough context to give messages clarity and the possibility o correcting errors, but not so much that we bury the signal in superfluity. It sounds oxymoronic, but the main take-home from the chapter is that we really need this essential redundancy.

Thanks for reading Kit Yates - Math(s) and the real world! This post is public so feel free to share it.

A favour: pre-order the book

If you’ve enjoyed this summary, you’ll find much more detail in the book itself, with the stories, the science and the slightly uncomfortable implications.

If the book sounds like your sort of thing, please consider pre-ordering it. Pre-orders matter far more than most readers realise. They’re one of the strongest early signals that a book has an audience, which influences everything from how many copies are stocked to how widely it’s recommended.

Amazon: https://www.amazon.co.uk/Dont-Know-What-Youre-Missing/dp/1529438039

Bookshop.org (supports independent bookshops): https://uk.bookshop.org/p/books/you-don-t-know-what-you-re-missing-the-science-of-what-s-lost-and-how-to-find-it-kit-yates/497ab9dcf971763a

Buy the book

Thanks,
Kit

Kit Yates - Math(s) and the real world

Discussion about this post

Ready for more?