Brilliant AI Chatbots

Just over a decade ago, we could not stop talking about surveillance. Indeed, the 2013 Snowden files revealed a series of spook technologies that most people had only seen in largely mediocre Hollywood films, always with a reference to the Stasi or “communism,” of course. So-called “democracies” will never do that was the implicit, subliminal message most quickly digested.

I must admit that when I visited the Stasi museum in Berlin a few years back, I could not help but marvel at the sheer ingenuity a repressive regime could deploy in the name of self-preservation. Anyone convinced that innovation is always a force for good might want to glance over their shoulders. These days, Stasi “innovations” seem almost like children’s toys next to what digital technology—and AI—can offer.

I have previously argued that the 2001 Patriot Act was, to a large extent, the primary driver of global hegemonic surveillance. However, the act did not make a dent in the so-called “Internet Freedom” agenda the U.S. had been pursuing since the late 1990s. On the contrary, today we can envision them as best friends. After Snowden, such an agenda took a big hit—yet it somehow managed to survive until 2018. It thus became a synonym for unfettered global surveillance that spared no one, including close geopolitical allies. That said, research, academic and beyond, augmented exponentially, perhaps reaching its climax with the concept of surveillance capitalism.

By 2016, surveillance had lost its top spot, being rapidly surpassed by the mainstreaming of “fake news.” In the blink of an eye, disinformation conquered the social media and research spaces. A tsunami of publications ensued, and an apparently cohesive disinformation framework became globally dominant. Disinformation misinformation at work.

Something similar has recently occurred in the AI realm. In 2023, after the unexpected success of OpenAI’s large language model (LLM), hallucinations became the talk of the town. Let us not forget that  “numerical” AI, which preceded LLMs, was also prone to error. A key modeling goal for the now-aging machine learning (ML) and deep learning (DL) algorithms was precisely to minimize such errors. The best optimizations could achieve 95% or more error-free predictions or calculations. But 100 percent was close to impossible. However, we were aware of such errors, and models usually provided error ranges. Even so, many assumed AI was infallible and disregarded the invaluable information.

In this context, hallucinations are a euphemism in at least two ways. First, they portray the new LLMs as somehow humanoid, as if they hallucinate just like we do regularly—though, unlike us, they do not need magic mushrooms or LSD to get there. Apparently, they’re always up for a trip, or so their creators would have us believe. That certainly puts them on our level—maybe even a notch above us. Second, unlike classic ML or DL errors, hallucinations are notoriously tricky to pin down. Part of the challenge is that the output is typically text or images—not numbers—so it cannot be easily measured. In practice, LLMs can seamlessly blend accurate and inaccurate information in the same paragraph, serving it up in language as polished as any encyclopedia entry or Wikipedia page. Thematic experts might spot the mistakes, but most people will swallow the whole thing, still buying into AI’s supposed infallibility. And let’s be honest—even experts would struggle to put an exact percentage on what is wrong.

In my experience, one of the most annoying hallucinations produced by these large, energy-intensive models is incorrect attributions. As an experiment, I uploaded draft papers I wrote to several LLM platforms and asked them to summarize them while highlighting key contributions. Although the summarization was generally accurate, I noticed that some text, while factually correct, was attributed to my paper even though I was actually critiquing those ideas. Thus, while the responses did not contain fabricated information, they misattributed others’ contributions to me without any acknowledgement. I believe that, for anyone apart from the original author, detecting such misattributions would be exceptionally difficult.

Back in 2023, I asked ChatGPT about hallucination rates and got a fuzzy, short response. Depending on the task at hand, hallucinations could range from 5 to 30 percent, I was told. That was before the emergence of retrieval-augmented generation (RAG), which allegedly reduces them by at least 50 percent, and the development of more sophisticated hallucination rate indices and indicators. Today, error rates still vary by task, ranging from 1% for general questions to over 30% for legal and non-English content. That is still too high in my book. And does not address the misattribution feature of these beasts. I mean, would you recruit help that makes so many sophisticated mistakes?  I would not, as I would have to spend a lot of time identifying them while parsing large amounts of content. Totally unmanageable.

In my view, hallucinations are stochastic errors the model generates stemming from its internal architecture (stochastic parrots). As such, they are difficult to prevent and trickier to identify.

In any event, hallucinations have now been surpassed by Agentic AI (AgAI). Here, the same cycle repeats itself, this time with different characters. AI agents now dominate media coverage and academic research. And AgAI hallucinations are not attracting many headlines yet. But that is bound to change soon. Indeed, AI agents’ errors can have catastrophic consequences, such as deleting critical files, making erroneous financial transactions, or sharing private content without our knowledge.

Occasionally, though, agentic hallucinations can unwittingly work in the consumer’s favor. I learned that firsthand.

For some time, we’d been eyeing a high-end espresso machine for our home. These dream machines feature built-in grinders and can whip up coffee, cappuccino, and lattes too—no filters, no stale coffee to fuss over. Of course, they don’t come cheap. Some models top $1,500, though you can snag the essentials for half of that if you shop smart. The one we had our sights on could even handle pre-ground coffee.

Last fall, New York State rolled out those much-discussed inflation relief rebate checks for 2023 taxpayers. We qualified, and our check arrived in mid-October. Naturally, we waited for one of those ubiquitous cyber sale days to see if our dream machine made the cut. It did! The (coffee) relief check covered almost 70% of the cost. We took the plunge on Amazon. Two days later, a giant box landed at our door—bigger and heavier than expected. Even before opening it, second thoughts crept in. We decided to send it back, regrettably.

I kicked off the Amazon return process and answered the usual questions. To my surprise, Amazon told me the item could not be returned and suggested I contact the manufacturer instead. Say what? Our dream machine had morphed into a consumer’s nightmare. Worried, I reached out to Amazon’s online customer service, explained the issue, and was promptly asked by the chatbot if I wanted a refund. “Of course,” I thought, as I typed “yes” on the screen.

“This has to be a mistake,” I told my wife with a big grin. Then the wait started. My credit card had to be refunded for the process to be completed. That usually takes a few days. Four days later, the refund was in! I just could not believe it. “I love AI,” I said to my wife, who was as surprised, if not more.

Of course, once the euphoria faded, a fresh inkling of moral hazard crept in. Keeping an expensive machine for free did not sit entirely right. I started weighing options: donate the machine, sell it and give the proceeds to a civil society group in the Global South, or hand it off to a local organization doing social justice work. I was still undecided on how best to proceed, while the box was losing its patience.

Three weeks later, an email from Amazon arrived unexpectedly: they were fine with me keeping the machine, but did not mention a word about the chatbot’s “hallucination.” To this day, I still don’t understand why they did not just ask me to return the product.

Eventually, we gave in, unpacked the dream machine, and put it to the test. It’s still there, churning out great coffee every day, with the old filter machine now enjoying early retirement.

Don’t you love all these brilliant chatbots? Cannot say I do not!

Raul