Demystifying the Block Protocol: A Q&A on Web Semantics

For decades, the web has served as a vast library of human-readable documents. While this has been revolutionary, the underlying structure remains surprisingly shallow. Machines struggle to understand the meaning of content—like whether a phrase refers to a book, a recipe, or a person. This Q&A explores the historical challenge of adding semantic depth to the web, the promise of the Semantic Web, and how the Block Protocol aims to make structured data as easy as writing a blog post. Let’s dive in.

Why is the web primarily designed for humans rather than machines?

Since the 1990s, the web has been built around HTML, a language that conveys basic document structure—paragraphs, headings, emphasis. CSS then adds visual flair, like font colors and sizes. But these formats lack the ability to tell a computer what a piece of content actually means. For instance, if you mention “Goodnight Moon” in bold text, a human recognizes it as a book title. A naive program, however, only sees bold formatting—it doesn’t know it’s a book, let alone its author, ISBN, or publisher. This fundamental gap between human readability and machine understandability has persisted for decades, making it difficult for computers to automatically extract, compare, or reuse web data.

Demystifying the Block Protocol: A Q&A on Web Semantics — Source: www.joelonsoftware.com

What was Tim Berners-Lee’s vision for the Semantic Web?

In 1999, Tim Berners-Lee articulated a dream where computers could “analyze all the data on the Web”—not just the text and links, but the meaning behind them. He envisioned “intelligent agents” handling tasks like commerce, bureaucracy, and daily errands automatically, as machines talked to machines. This Semantic Web would allow you to publish a book title with detailed, computer-readable metadata: author, publication date, genre, and more. By embedding this additional information into web pages, computers could understand that “Goodnight Moon” is a children’s book by Margaret Wise Brown, published in 1947. Despite the vision being two decades old, widespread adoption has been slow because adding such markup requires extra effort—effort that most creators find burdensome after they’ve already made their content look nice.

How do schema.org and formats like JSON-LD help add structure?

Schema.org provides a shared vocabulary for describing things—books, events, recipes, people, and more. To mark up a book, you start by looking up schema.org’s definition of a “Book.” Then you use a format like RDF or JSON-LD to embed that structured data into your HTML. For example, you can add a JSON-LD script that explicitly states: “This is a Book, title ‘Goodnight Moon’, author ‘Margaret Wise Brown’.” This tells computers exactly what the content is, enabling search engines and AI to understand and display it in richer ways—like star ratings, price snippets, or event listings. However, the process remains technical: it requires learning vocabularies, choosing the right format, and embedding code correctly. For many publishers, this is “homework” they avoid after finishing a human-readable post.

Why hasn’t semantic markup become widespread on the web?

Despite the Semantic Web’s promise, very few websites actually include structured data. The core reason is sheer effort. Creating a beautiful, human-readable blog post is already time-consuming. Adding machine-readable markup feels like an extra, unpaid chore—especially when there’s no immediate payoff. Until a critical mass of computers actively reads that data, creators see little incentive to invest the time. This creates a chicken-and-egg problem: without marked-up content, computers can’t deliver smart features; without smart features, creators don’t bother marking up content. As a result, semantic markup remains rare in the “wild” web, even two decades after Berners-Lee’s call to action. The real hurdle isn’t technology—it’s the human motivation to add it consistently.

How does the Block Protocol aim to make structured data effortless?

The Block Protocol (BP) tackles this motivation problem head-on. Its core belief is that people will only add semantic markup if doing so is as easy as writing a paragraph. Instead of requiring you to learn JSON-LD or schema.org, BP lets you insert pre-built “blocks” that are both human-readable and machine-readable from the start. For example, a “Book Block” automatically includes fields for title, author, ISBN, and more—without you needing to write a single line of markup. When you add that block to your page, it outputs clean, structured data behind the scenes. This block-based approach removes the extra step, making semantic data a natural byproduct of creating content. By lowering the barrier, BP hopes to finally fulfill the Semantic Web’s promise—but on a practical, writer-friendly scale.

Why does human progress depend on machine-readable data?

Human progress increasingly relies on computers processing vast amounts of information quickly and accurately. When web content is only human-readable, every machine (from search engines to AI assistants) must guess what it means—leading to errors, inefficiencies, and missed opportunities. Structured data allows computers to perform tasks like automatically comparing product prices, summarizing research papers, or helping a visually impaired user navigate a site. It also powers smarter recommendation systems, personalized education, and even scientific discovery by linking facts across domains. Without widespread semantic markup, these benefits remain limited to small pockets of the web. Making structured data as easy as writing a sentence could accelerate everything from online shopping to medical research. In short, the future of automation and AI depends on a web that both humans and machines can read fluently.

What does the future hold for the Block Protocol?

The Block Protocol is still emerging, but its design reflects hard lessons from the past. By eliminating the extra effort of manual markup, it hopes to drive adoption among everyday content creators—not just developers. Early applications include blogs, news articles, educational materials, and e-commerce product pages. As more people use blocks, a virtuous cycle could emerge: search engines start recognizing BP-structured data and rewarding it with richer snippets, which in turn encourages more creators to adopt the protocol. Integration with popular publishing platforms (like WordPress or Medium) could accelerate this trend. However, challenges remain: building a diverse library of blocks, ensuring consistent quality, and competing with existing practices. If successful, the Block Protocol may finally turn Berners-Lee’s 1999 dream into a practical, everyday reality—one block at a time.