AI, popularization, and rigor

Following our post on the popularization of AI, your exchanges have been as rich as they were fascinating. Unsurprisingly, the majority of you expect greater rigor from those who aim to popularize. Many of you have testified that rigor and popularization are not incompatible. Here’s a summary of the points and statements that sparked the most interest.

1. “AI is not solely based on mathematics”

The implications of AI are multidisciplinary: ethical, legal, economic, ecological, sociological, etc. But its foundations are indeed mathematical. For example, a large language model (LLM) predicts each token one by one, via a probability distribution calculated from learned weights, leveraged by the self-attention mechanism. Everything is computed, hence mathematical.

2. “Current AI is stochastic”

In the comments of the previous post, we mentioned the stochastic illusion. The seemingly random effect of AI responses is actually parameterized by data scientists through controlled sampling. Concretely, a token is drawn from a probability distribution calculated at each step, usually modified by parameters like temperature. For more details and empirical proof of the deterministic nature of the system, we invite you to read the comments on the initial post.

3. What do I think of RAG?

We’ve been asked several times for the opinion on RAG (Retrieval-Augmented Generation). We’ve written several articles and tutorials on the subject. In summary:

It is mathematically impossible to scale a vector database in a RAG for retrieval techniques (which require domain-specific reasoning) with today’s methods.
Reranking is often done with a lightweight model, without domain-specific reasoning, so the classification remains superficial.
As the database grows, chunks overlap in the latent space, increasing noise and decreasing precision during retrieval.
Retrieval techniques (TF-IDF, BM25, cosine similarity) do not understand domain-specific reasoning.

Example: “contract” and “document” are often close vectorially, but “document” is too generic to enrich a query about a specific “contract” (e.g., a termination clause).
For more explanations and solutions, resources are available in our articles and tutorials.

Is the iceberg of AI understanding saved?

Not yet… But the enthusiasm for rigorous and fair popularization is very real! ❤️

A tip: If you have doubts about a post, send it to an AI and ask if each sentence or statement is accurate and 100% precise. (Spoiler: It’s already been done for us.)

Here’s what we wrote about RAG. We’ve written 4 tutorials (starting from step 5).