Tag Archives: ai

Henri Tremblay at JavaOne 2026

Henri Tremblay at JavaOne 2026 | Duke’s Corner Java Podcast | May 18, 2026

Here’s the second interview I did at JavaOne in March with Henri Tremblay. Henri is a Java Champion, Montreal JUG leader, and EasyMock lead developer from Canada.

Henri’s session at JavaOne covered the Java Memory Model, which is a topic he believes every Java developer should understand well. He’s been to six JavaOne’s and had warm words for the conference, which represents a rare opportunity to meet the people whose code runs on systems and devices all over the world.

He has clear advice for developers: read books, understand how and why your code works, and get out there and join the community.

We also talked about why Java still powers so much of the world’s critical infrastructure, from banks to the Mars rover. Henri pointed out that companies often start in C++ and then move to Java because Java runs nearly as fast once it’s going and is far easier to change later.

On AI, Henri had a balanced view. He uses it for tedious work, like sifting through a gigabyte of logs to find a single error. But he was also clear about the risks. “We should not get lazy at reviewing code because AI will generate tons and tons of code. It’s not bad at reviewing it, but still it makes mistakes.” He warned that AI reflects the average of what’s on GitHub, and most code on GitHub isn’t great. Your role, he said, is to find a better answer.

For students and junior developers, he says they should also leverage AI for learning, but he advises that they internalize the fundamentals of software engineering deeply. “Read books, please, please!” He pointed to Core Java, the book he originally learned from and is now helping revise. Blogs and YouTube videos only tough on surface level issues. Books take you deep and that’s the knowledge you need to grow your career.

Henri Tremblay on LinkedIn: https://www.linkedin.com/in/henritremblay/
Jim Grisanzio on LinkedIn: https://www.linkedin.com/in/jimgris/

Bill Joy’s Future

Since there’s a lot of AI slop and doom chatter all over the news, podcasts, and social media these days, I figured I’d revisit Bill Joy in April 2000 for some context. I remember that Joy’s massive article “Why the Future Doesn’t Need Us” hit pretty hard twenty-six years ago not only because of the content but also because of who wrote it. Joy was a cofounder of Sun Microsystems, he helped build the internet, and here he was warning that three technologies might eventually lead to human extinction. The technologies he cited that could end us all include robotics, genetic engineering, and nanotechnology. And his core argument was actually pretty simple. These are not like past technologies. These new things can potentially replicate themselves, and that’s the bit that would change everything if they were used as weapons or simply were let lose by accident.

I first read Joy’s piece at a coffee shop in Cupertino, California across the street from Sun where I worked in software systems marketing. The article was widely read at Sun and also across Silicon Valley and resulted in discussions about Joy and his analysis for months. Many people just called him crazy. But that only demonstrates to me that those who made such flippant statements never read his article. Others knew better, though. They knew full well that Joy was documenting in detail the very real risks of rapidly developing technology without considering the consequences. That is, of course, the standard and pervasive culture of Silicon Valley. The valley may be big, but it is remarkably insular. I didn’t know Joy at the time I read his article, but I went on to meet him several times and worked closely with his teams promoting projects like SPARC, Solaris, Java, Jini, and later on JXTA. I didn’t really know him well, but he was always friendly and professional to me. He was quiet, too, and I always found him a serious thinker who obviously knew far more than he ever expressed. Sun was filled with such characters. They all fascinated me to no end.

The timing of the Joy article is interesting. He said he had been working on the essay since his 1998 conversation with Ray Kurzweil and continued to revise drafts through 1999. When Wired published the piece in April 2000, the tech world was at its peak. The NASDAQ hit a high of 5,048 in March of 2000 just a few weeks before the article dropped. And at that time Sun’s stock reached $250 a share, which gave the company a market cap around $200 billion. That was a significant achievement for 2000. Sun was one of the hottest companies in the valley back then, and it was quite a wild experience working there. The place was buzzing with activity. I loved it. Many of us did. So, it was into that environment of overt tech optimism running at manic levels that Joy published his thoughts about our potentially perilous future.

Andy Bechtolsheim, Vinod Khosla, Scott McNealy, Bill Joy at the Sun Reunion in Silicon Valley in October 2019. Photo by Jim Grisanzio.

Boom!

Then everything blew up. The bubble burst. By mid-April 2000 the NASDAQ suffered its worst week in history, falling more than 25 percent. Companies started dumping workers like I’ve never seen before. Joy’s dark warnings about unchecked technology landed precisely as that optimism crashed. He obviously couldn’t have seen the future, but his timing was remarkable.

Joy’s warnings took on an even darker tone the following year. After publishing the Wired article, he signed a book contract to expand on the article. He moved into a hotel room in New York City and surrounded himself with gloomy books on plagues and nuclear bombs and other such material he was studying on the future risks of technology. Then on September 11, 2001 came the terrorist attacks we have come to know as 9/11. I knew a few people from Sun who worked at the corporate building in New York City, but I did not know Joy was there at the time. He said he stood in the streets with everyone else and watched the impossible happen in real life. The next morning he walked out into the city streets past a long line of sanitation trucks parked on Houston Street ready to haul away the rubble. Everything below 14th Street was closed, he said. “It was quite a compelling experience,” he said in a TED Talk, “but not really, I suppose, a surprise to someone who had his room full of the books I was reading. I was not surprised that it happened at all.”

Joy eventually abandoned the book project. I point this out just as an aside since the event occurred shortly after he published the article I am writing about here in this post. Still, it does reflect the feeling of the times. How much had changed in Silicon Valley and the United States in just one year.

Anyway, back to the article.

Who Is Bill Joy?

Joy wasn’t a fringe thinker or outsider. He was born in 1954 in Farmington Hills, Michigan, and he was a child prodigy who started school early. One time his father took him to the local elementary school principal’s office when he was only three years old. Joy then promptly sat on the principal’s lap and read him a story. He later excelled in math and graduated high school at 16. He loved books and thinking and that became his escape. He also loved science fiction. He devoured Heinlein’s “Have Spacesuit Will Travel” and Asimov’s “I, Robot” with its Three Laws of Robotics. He wanted to be a ham radio operator, which were the Internet hackers of their day, but couldn’t afford the equipment. On TV Star Trek inspired his imagination every Thursday night when his parents went bowling. GeneRoddenberry’s “The Prime Directive” clearly resonated with Joy. You can see that ethic woven into his writing thereafter.

At Berkeley in the 1970s, he created the vi text editor, which, to his surprise, was still widely used more than twenty years later and some hard core developers still use it even now. He also developed the Berkeley version of the Unix operating system. When the other founders of Sun Microsystems (Andy Bechtolsheim, Vinod Khosla, and Scott McNealy) invited him to join them, he participated in the creation of advanced microprocessor technologies and Internet technologies such as Java and Jini. As codesigner of three microprocessor architectures — SPARC, picoJava, and MAJC — he helped drive innovations that shaped modern computing.

By the time he wrote his famous Wired essay, Joy was only 45 years old and at the peak of his influence among developers in Silicon Valley. But Joy was far more then just a coder. He was well connected to the broader scientific community. That is what made the article so jarring. He was not an uninformed critic glancing in from outside with yet another opinion. He was a core architect of the digital age expressing deep doubts about where his own work was leading. His self-reflection was pervasive during this time in his writings and conference presentations.

The Kurzweil Meeting

Joy’s concern seemed to begin at George Gilder’s Telecosm conference in 1998 when he met Ray Kurzweil, who was an inventor and futurist. Kurzweil talked about how the rate of technological improvement was accelerating and also how humans may merge with robots or download their consciousnesses to achieve near immortality. I remember attending several talks on this topic of immortality when I moved to Silicon Valley. It always sounded so silly to me. I wondered how such smart people could take that stuff so seriously. But even now some people in these circles talk about downloading themselves. It still sounds silly. The other bits, though, about intelligent robots and genetic engineering were much more reasonable given my own experience working in the biotech industry. Where to draw the line, however, sometimes really is not clear. Joy had heard such talk before and always felt sentient robots were science fiction. But hearing it from someone he respected changed things. Kurzweil gave him a preprint of “The Age of Spiritual Machines,” which outlined a utopian future where humans gained near immortality by becoming one with robotic technology. I don’t know how far Joy goes with respect to robotic sentience, but it’s clearly more than I’m willing to accept.

Nevertheless, Joy’s unease intensified after reading the book. He felt sure Kurzweil was understating the dangers. Then he found a passage in the book describing a dystopian future where machines become so capable that humans depend on them completely. The passage argued that we would not consciously hand over control to the bots. Instead, “the human race might easily permit itself to drift into a position of such dependence on the machines that it would have no practical choice but to accept all of the machines’ decisions.” In other words, we would become dependent gradually. I can surely see that as a potential reality, no question about it.

But that passage came from Ted Kaczynski, the Unabomber! Joy admits this realization was uncomfortable to say the very least since he was taking a point from a terrorist seriously. Many people have said the same thing after reading Kaczynski’s words. Kaczynski’s bombs had killed three people and wounded many others. One bomb gravely injured David Gelernter, one of Joy’s colleagues and friends. But Joy felt compelled to confront the argument because, however uncomfortable, he saw merit in that single passage about unintended consequences.

The Self-Replication Problem

Joy’s central concern circles around one key difference between powerful 21st-century technologies and those of the 20th-century. Nuclear weapons required huge facilities and rare materials. But genetic engineering, nanotechnology, and robotics, what Joy called GNR technologies, require less infrastructure and can potentially make copies of themselves. A bomb explodes once, but a self-replicating machine does not stop. And that could be a serious problem if something goes wrong.

This matters because knowledge spreads freely. You cannot control ideas at all like you may be able to control uranium. Once people know how to genetically engineer bacteria or design tiny self-replicating machines, that knowledge exists in the world and will move rapidly. A small group, even one person, could potentially cause massive harm. Joy calls this “knowledge-enabled mass destruction.” As he wrote, “I think it is no exaggeration to say we are on the cusp of the further perfection of extreme evil, an evil whose possibility spreads well beyond that which weapons of mass destruction bequeathed to the nation-states, on to a surprising and terrible empowerment of extreme individuals.”

He made the point even sharper later. The real danger, he said, is no longer nation-states but individuals or small groups now empowered with “pandemic power.” These new digital, self-replicating technologies give extreme individuals the kind of destructive capability once reserved only for governments. That shift changes everything because the ramifications of mistakes or ill intent can’t be calculated or controlled.

Joy learned about complex systems and non-linear systems from physicists Stephen Wolfram and Brosl Hasslacher in the early 1980s. These are systems where small changes move in unpredictable ways and where feedback loops create unexpected outcomes. Later, conversations with Danny Hillis, biologist Stuart Kauffman, and Nobel laureate Murray Gell-Mann deepened his understanding. Hasslacher and Mark Reed also gave him insight into molecular electronics, which is the manipulation of matter at the atomic and molecular level where individual atoms replace transistors. When you get to this point in Joy’s article you can’t help but realize that he’s well beyond just a smart software developer who happened to strike it rich by helping found a successful tech company.

Joy knew that the merger of computers and physical sciences was creating enormous power, which up to that point others hadn’t expressed such thoughts in the popular tech media. By 2030, Joy calculated, we would likely build machines a million times as powerful as personal computers of 2000. That is enough computing power to make the scenarios that worried him technically possible. As he wrote, “But now, with the prospect of human-level computing power in about 30 years, a new idea suggests itself: that I may be working to create tools which will enable the construction of the technology that may replace our species. How do I feel about this? Very uncomfortable.” He admitted he had once been too optimistic about nanotechnology. Having struggled his entire career to build reliable software systems, it seemed to him more than likely that this future would not work out as well as some people might imagine.

Three Scenarios

Joy walks through what could actually happen with these technologies. What’s interesting is that evolution itself would be one of the driving forces moving these technologies from a positive outcome to something very negative.

One — Robots might simply out-compete us for resources the way better-adapted species have always displaced others. We would not need a robot uprising. Hans Moravec argued that “in a completely free marketplace, superior robots would surely affect humans as North American placentals affected South American marsupials.” Economic forces alone could push us aside. Also, the dream of robotics includes downloading our consciousnesses into machines. But Joy questions whether a downloaded consciousness would be human in any meaningful sense. The robots would not be our children, and on that path our humanity might be lost entirely. This is one of the reasons why I take Joy seriously. He sees the obvious problem with the downloading issue in a way that others in this field simply do not.

Two — Genetic engineering gives us power to create devastating plagues, either by accident or intention. Joy calls this the “White Plague” scenario, which references a Frank Herbert novel where a molecular biologist weaponizes his knowledge. We now know these profound changes in biological sciences are imminent and will challenge all our notions of what life is, and Joy points out that the public remains skeptical even by the standards of 2000.

Three — Nanotechnology could produce “gray goo,” which are self-replicating nanobots that consume the biosphere. Eric Drexler warned that “tough omnivorous ‘bacteria’ could out-compete real bacteria. They could spread like blowing pollen, replicate swiftly, and reduce the biosphere to dust in a matter of days.” Joy notes grimly, “Gray goo would surely be a depressing ending to our human adventure on Earth, far worse than mere fire or ice, and one that could stem from a simple laboratory accident. Oops.”

The Manhattan Project Parallel

Joy uses the atomic bomb as his template. After Hiroshima and Nagasaki, physicists were shocked by what they created. Oppenheimer later said the physicists “have known sin.” There was a real opportunity through the Acheson-Lilienthal report and Baruch Plan to prevent a nuclear arms race by internationalizing nuclear power. But it failed because political distrust and competitive pressure got in the way. Within years, the Soviets had the bomb, and the arms race was on.

Freeman Dyson captured the moment: “The glitter of nuclear weapons. It is irresistible if you come to them as a scientist. To feel it is there in your hands, to release this energy that fuels the stars, to let it do your bidding. To perform these miracles, to lift a million tons of rock into the sky. It is something that gives people an illusion of illimitable power, and it is, in some ways, responsible for all our troubles, this, what you might call technical arrogance, that overcomes people when they see what they can do with their minds.”

Up to that point, everyone feared nuclear bombs as the ultimate expression of madness. But Joy fears we are repeating this pattern with even more dangerous technologies, and the commercial incentives for their production are enormous. Nations compete. Corporations compete. Researchers want breakthrough innovations. The momentum builds, and pretty soon it is almost impossible to stop. We are being propelled into the new century with no plan, no control, no brakes. However, the driver is not military necessity this time but instead private sector economic gain and competitive pressure. This is human nature, though. Our history is literally filled with these processes. It’s who we are.

The Relinquishment Argument

This is the part that bothered many people. Joy argues for “relinquishment,” or the voluntary decision not to pursue certain lines of knowledge or technology because they are too dangerous. This goes against everything we believe about the value of knowledge and open inquiry, especially within the walls of the Silicon Valley and across the scientific community generally.

That may be why Joy’s article struck such a nerve. The general population is used to various power centers attempting to curtail their freedoms for whatever reason of the day. But regular people are largely powerless to do much about it beyond voting, which is time consuming, or protesting, which brings it’s own personal risks. The scientific and technological elite, however, is something different. They are one of the power centers themselves, and here was one of their own, a high profile one at that, advocating for relinquishment.

But Joy asks, what if unlimited pursuit of knowledge puts us all in mortal danger? He points out that we have done this before. At a 1989 nanotechnology conference, Joy said, “We cannot simply do our science and not worry about these ethical issues.” The United States unilaterally abandoned biological weapons development because the logic was clear. These weapons were easy to replicate and could easily end up in the wrong hands. We would be more secure if nobody developed them, and we embodied this in the 1972 Biological Weapons Convention and 1993 Chemical Weapons Convention.

Joy quotes Thoreau: “We do not ride on the railroad. It rides upon us.” Then he asks directly, “The question is, indeed, Which is to be master? Will we survive our technologies?”

This does not mean stopping all research. It means being strategic about which lines of inquiry we pursue and which we intentionally avoid. It means international agreements, verification systems, and scientists adopting ethical codes like the Hippocratic oath. It requires transparency and cooperation.

Personal Responsibility

Joy writes honestly about his own sense of responsibility: “I feel, too, a deepened sense of personal responsibility, not for the work I have already done, but for the work that I might yet do, at the confluence of the sciences.” He was not speaking as an outside observer but as someone who helped create the technologies that might enable the dangers he feared.

He finds hope in the Dalai Lama’s “Ethics for the New Millennium,” which argues that the most important thing is to conduct our lives with love and compassion for others, and that our societies need a stronger notion of universal responsibility. Neither material progress nor the pursuit of knowledge is the key to happiness. We need to find alternative outlets for our creative forces beyond the culture of perpetual economic growth.

In the TED Talk he gave six years after publishing his article, Joy made the point even clearer. The solution, he said, cannot be technology alone. We need both better public policy and deeper moral progress. He spoke of the need for “the head and the heart,” echoing Russell and Einstein. He argued that scientists, technologists, and businessmen must be held personally accountable under the law for the consequences of their inventions. Today they face no such responsibility. That, he believed, has to change. What’s striking hearing him articulate these perfectly reasonable points is that they aren’t taken seriously at all. many thoughtful people say the same things at conferences or in political speeches. We all clap and support the concepts. Yet, very little actually changes. Or at least the changes take so long and occur in such small steps that we’re left unsatisfied in the present moment.

Was Joy Right?

Joy’s article was unique for its time in the popular press. When it appeared on Wired’s cover in April 2000, it created quite the rumble in tech circles. Wired had been a cheerleader for the digital age for nearly a decade. Its shift from cheering to warning marked an important and surprising moment in the digital zeitgeist. Also, Bill Joy was not some alarmist outsider. He was one of the architects. His warning came from inside the cathedral and it certainly resonated. At the time, I was working in marketing at Sun, and one of my jobs was to promote Sun’s technologies to the media. In press interviews during this period reporters would always bring up Joy’s article even if the meeting was booked on another issue. We had to draft briefing and messaging documents for prepare executives, managers, and engineers that we brought into all interviews because we knew Joy’s article would alway come up in the discussion.

The article presaged much of what we are experiencing now but not necessarily in the ways Joy anticipated. His specific predictions about nanotechnology have not materialized. The gray goo scenario is now considered flawed and implausible. Most scientists believe built-in limitations make runaway nanotechnology highly improbable.

But Joy’s underlying concerns were prescient. He worried about knowledge-enabled destruction, powerful technologies becoming widely available, and complex systems we do not fully understand. All of these have become more relevant as artificial intelligence has advanced today far faster than most people expected in 2000. Interestingly, Joy only explicitly mentioned artificial intelligence once in his article, possibly because he was writing at the tail end of the second “AI winter.” Yet his concerns about self-replicating technologies and systems beyond human control have found new relevance with modern AI these days.

The Sun Sets

After leaving Sun in 2003, Joy moved into venture capital and focused on green energy investments at Kleiner Perkins until 2014. Now 71, he works as principal investigator and chief scientist at Water Street Capital. But despite the renewed relevance of his warnings with the rise of AI, Joy has remained largely silent regarding the public debate he started 26 years ago. It is almost as if he said what he wanted to say in 2000 and we’re still digesting his thoughts all these years later.

He didn’t, however, stay stuck in doom or alarm. In the years after the Wired article he actively tried to move toward better outcomes. He joined Kleiner Perkins specifically to invest in solutions. He backed innovations in education, new materials for the environment, and a major $200 million biodefense fund aimed at closing the gaps that could lead to a pandemic. He came to believe we cannot solve the management of dangerous technology with more technology alone. Instead we need better policy, markets that price in the true cost of catastrophe, and a deeper moral shift.

Joy put it simply in that later talk at TED. “We can’t pick the future, but we can steer the future.” Over the years technologies have changed, but the fundamental challenge he identified in 2000 remains relevant. Figuring out how to pursue knowledge and innovation while maintaining enough wisdom and caution to survive the unintended consequences seems to be a question others should carefully consider today.

In his article, Joy compared coding to Michelangelo releasing statues from marble. He described his software engineering in a similar way with those ecstatic moments when the code emerged from his imagination as if it were already waiting in the machine to be freed. He ended his essay with that same image. After eighteen pages exploring multiple scientific disciplines and warning about existential dangers from the exploitation of technology, he wrote, “I am up late again, it is almost 6 am. I am trying to imagine some better answers, to break the spell and free them from the stone.”

Twenty-six years later, we are still up late too. We’re still searching for those answers, as well. We’re still trying to release better possibilities from the block of marble that is our technological future. And now we go forward in the wild world of AI.

Good luck to us all.

The AI Doom Vibe Change

For a couple of years now, the single story pounding our heads about AI all day every day has been exclusively about looming disaster. AI takes the jobs, then it takes everything else, then few people get rich. A love story. It was an intentional positioning of the technology, obviously, but the question remains why. Well, Cal Newport has a good hypothesis that I wrote about the other day. But others are noticing as well. So, the phenomenon of the old pitch evolving into something new is probably real.

In a recent episode of The AI Daily Brief, host Nathaniel Whittemore says the previous extremist narrative may finally be cracking, and he cites a fair number sources to substantiate his claim. It’s a different take from Cal Newport’s but there is some overlap. The signals are faint, he says, but they’re showing up in two key places at once so that may mean that the shift will likely have some legs. I think Whittemore may be pulling his punches a bit by saying “the signals are faint” just to cover himself since this shift has been so recent, such as really only the last few weeks. I think the shift is clearly underway. Remember, the IPOs are coming soon, baby! The companies and their simps can’t continue with the doom rhetoric. The American public has rejected that strategy. And it’s interesting that some public opinion polls in China lack this doom positioning. Anyway, back to Whittemore’s daily brief.

The first place Whittemore notices the vibe shift is within the never-ending chattering class in the media. He points to Ezra Klein’s recent New York Times article, “Why the AI Job Apocalypse (Probably) Won’t Happen.” I like the “probably” bit. But coming from a big voice on the political left and also one that’s outside the AI bubble, Klein may carry some weight in Whittemore’s eyes that a similar post from others, say, Marc Andreessen, simply wouldn’t. Klein cites economist Alex Imas from the University of Chicago and also a wider body of economic research to make a case rooted in Jevons Paradox. When something gets cheaper, we tend to use more of it, not less. So although computers may have changed or even eliminated specific tasks, the cost savings created enough new demand that the occupations expanded overall. As Klein puts it, “Every enthusiastic AI adopter I know is working harder than ever because there is more they can do.”

Whittemore points to more data that’s emerging. Software engineering, which is the job category most exposed to AI, is the one where postings have actually increased recently. Citadel Securities cites the increase at 18 percent since May of last year. Federal Reserve numbers also show software engineering jobs at their highest level since November 2023, although the current number is still well under the previous mark three years ago. Also, Stripe Atlas just hit 100,000 incorporations, with Q1 up 130 percent year over year. As Derek Thompson says, “AI agents are better at creating firms than destroying jobs.” A new trend?

The second place the shift is showing up for Whittemore is in markets themselves. Anthropic’s revenue, according to SemiAnalysis, has gone from 9 billion to more than 44 billion this year, which is roughly doubling every six weeks. Atlassian’s stock jumped about 30 percent recently after strong earnings with customers using its new Rovo AI tool growing their own ARR at twice the rate of those who weren’t. The skeptics have been questioning how you justify trillions in infrastructure when seats only sell for 20 dollars a month. Well, that’s being answered by the move from seats to tokens taking place recently in the intelligent agent era. A single engineer with Claude Code might burn through hundreds or thousands of dollars in tokens each month, and the companies selling those tokens cannot keep up with demand.

There’s another piece of the vibe shift worth noting, one which I found most interesting since I’ve worked in both industries. The Associated Press recently reported on construction companies teaming up with big tech to push back on community opposition to data centers. Rob Bear of the Pennsylvania Building and Construction Trades Council told the AP that communities should figure out what they actually want from these projects rather than just saying no. “If you don’t ask, you’re never going to get,” he said, pointing to things like better project plans or money for local schools and infrastructure. Whittemore’s take is sharper. He calls it “an insane indictment of how poorly tech companies have run these projects that the issue has gotten this bad” given how many ways there are to make data centers genuinely valuable to nearby communities at a fraction of the total cost. He’s spot on. The AI companies deserve the public backlash. We’ll see how they adapt to the very real world they are now entering.

Even the AI labs are softening their messages. Sam Altman recently wrote that “jobs doomerism is likely long-term wrong” and that OpenAI wants “to build tools to augment and elevate people, not entities to replace them.” Whittemore says this is a meaningful pivot from a company whose stated goal used to look a lot more like replacement.

But Whittemore is careful not to declare victory too fast. The AI transition will still be painful for specific workers and communities, and history shows we generally don’t help them much at all when economies move through technological advancements. But he ends on a hopeful note.

“I find it extremely encouraging to feel the collective foot being taken off the gas of the AI doomerism for just a moment. If nothing else, it creates an opportunity to have a different type of conversation. One that’s neither doom nor utopia, but about how to adapt to and maximize the opportunity of the change that’s here and coming. I think the more time we spend on that conversation rather than in the extremes, the better off we’ll be.”

The AI Doom Fever Finally Fades

Is the AI Doom Fever Breaking? (It’s About Time!) — Cal Newport, AI Reality Check, Deep Questions Podcast

For many years now executives leading the big AI companies have been telling the public that their own products will destroy the economy and gut the white-collar workforce. As Cal Newport observes this is the rough equivalent of a Pfizer executive announcing a new pill that cures psoriasis but also turns half the population into zombies. But lately the extremist rhetoric on AI has started to soften significantly. Instead of totally replacing entire segments of the workforce, these new AI systems will now simply augment existing workers and also lead to massive new opportunities for employment. That’s quite a radical shift in attitude, especially coming from people whose breathless messaging has been so bold.

Nevertheless, tech companies are still laying off tens of thousands of employees and citing AI as the reason. So, we’ll see. Newport thinks the previous over-the-top positioning on AI resulted more from culture, whereas the recent shift in tone is likely more tactical. His analysis is comprehensive and seems pretty accurate given that he’s been pushing back on this rhetorical issue for years now.

A Strange Sales Pitch

In his podcast, Newport runs through many of the recent doom statements. Mustafa Suleyman of Microsoft AI has suggested that AI will be capable of automating most knowledge work within roughly a year. Dario Amodei of Anthropic has warned that the technology will soon replace up to half of all entry-level white-collar jobs in finance, consulting, and tech. Sam Altman of OpenAI speculated last summer about a future in which AI would “do everything,” leaving humans to find new ways to “participate” in the world.

“That’s basically what we’re getting from the AI CEOs,” Newport says. “And I think it’s just lunacy.” Newport has held this position for some time now. And it’s an opinion shared online by many advanced engineers who have been working with AI systems for a long time. However, there are many so-called social media influencers in the AI space who still push the end-of-the world theme. It’s been an odd experience for sure. Granted, software executives have a long history of bragging in their corporate earnings calls that their systems will enable customers to cut expensive employees. It’s hard to think, though, of another industry whose leaders so cheerfully predict that their products will wreck civilization while making their founders and a few insiders rich beyond their wildest dreams. Seems like a difficult sell, eh?

The New Vibe

In late April 2026, Altman posted on X that OpenAI wants to “build tools to augment and elevate people, not entities to replace them,” and added that “jobs doomerism is likely long-term wrong.” A few days later, Nvidia CEO Jensen Huang pushed back even harder. In a Fortune interview posted May 2, he called the half-of-jobs prediction “ridiculous” and warned that becoming a CEO can leave a person with what he described as “a God complex,” speaking as if their position alone gave them the authority to predict civilization-scale outcomes. Huang also estimated that AI has already created more than half a million jobs, because companies that adopt it grow faster and hire more, and noted that demand for software engineers is actually rising.

Newport reads Huang’s comments as a slightly concealed dig at Amodei, but perhaps it was a subtle signal to the industry that things will be shifting. Either way, the tone has clearly migrated from outright apocalypse to smooth augmentation. Who knows. But at least it’s a welcome change for those directly affected by the recent massive layoffs attributed to AI systems that haven’t even been fully built and deployed yet.

Where the Doom Came From

To understand the old rhetoric, though, Newport argues that you have to look at the tech culture of San Francisco and Silicon Valley, especially among engineers, and especially content articulated in a few internet forums in recent decades. The most influential was LessWrong, founded by Eliezer Yudkowsky and devoted to refining the art of human rationality. Also, the blog Slate Star Codex, written by Scott Alexander, helped push the same themes into a wider readership. Out of this loose network grew the rationalist movement, which is the idea that if you trained yourself to think like a logical engineer, you could overcome cognitive bias and act more effectively in the world. Newport, who was trained in computer science at MIT, recognizes this culture. “I’m around engineers. I am an engineer. I know this way of thinking,” he says, adding that his wife once told him, “Don’t take me to the MIT Christmas parties because you guys are all so weird.”

Two important offshoots followed. One was effective altruism, which applies something called expected-value reasoning to charitable giving and was made famous — and then infamous — by Sam Bankman-Fried when he led FTX. The other was the existential risk community (X-risk), which argued that very rare disasters with very large costs deserve serious consideration right now. Newport summarizes the X-risk crowd as focusing especially on three threats: asteroid strikes, deadly pandemics, and superintelligent AI. Nick Bostrom’s 2002 paper Existential Risks is the foundational text, but the wider X-risk literature also covers nanotechnology, nuclear war, and what Bostrom calls “totalitarian lock-in.” Newport doesn’t mention Bill Joy’s shock article “Why the Future Doesn’t Need Us” in WIRED in 2000 but it certainly fits the paradigm.

The X-risk crew organized a closed-door conference that produced the open letter signed by Stephen Hawking, Elon Musk, Bill Gates, and many of the leading AI researchers of the day. Newport places it in Puerto Rico in 2017, but the actual event was the Future of Life Institute’s “Future of AI: Opportunities and Challenges” conference in San Juan in January 2015. The 2017 follow-up was the Beneficial AI conference at Asilomar in California. Robert McMillan’s WIRED piece from January 2015, “AI Has Arrived, and That Really Worries the World’s Brightest Minds,” captured the elite anxiety that emerged from the Puerto Rico meeting and may be the article Newport has in mind when he describes the era. There were other similar pieces in the elite media during this time period as well. The elites aren’t shy with the media.

ChatGPT and the Hero Complex

Then ChatGPT arrived. For people who had spent ten years writing footnoted lists about the coming superintelligence, it felt like the moment they had been preparing for. Newport thinks this was both terrifying and intoxicating. “What if we were right about this risk,” Newport imagines them thinking, “and not only were we right, but it’s happening?” The rationalists sensed they were going to be Neo. They were going to be John Connor. They were the ones who saw it coming and would now lead everyone else through it. It may be hard for normal people to think this way, but we are talking about the tech elite, after all. They do actually live in a different world, one that’s in many ways disconnected from the normal reality of people who have to work for a living. Newport stresses that it’s important to realize that the current AI companies we see now all grew from that culture.

OpenAI, Newport says, originally presented itself as a nonprofit AI safety organization heavily shaped by X-risk concerns. It was started as almost a hobby project for the rationalist crowd before commercial ambitions reshaped what it is now. Anthropic was founded by former OpenAI staff who, according to Newport, felt their old employer was not being rigorous enough about safety. Grok came out of the same orbit. The CEOs, Newport says, were not playing 4D chess with investors. They were just talking the way everyone they knew talked in Silicon Valley. The trouble started when their companies got too big to keep speaking only to their own closed subculture. As Newport puts it, “we’re not, you know, in the Mission District anymore.”

Why It’s Breaking Now

Newport sees three potential forces that may be accelerating the recent change in AI positioning:

First, there is real IPO pressure building. As OpenAI and Anthropic move toward public markets, more sober East Coast investors (who wear suites, Newport says) are quietly asking the founders to stop terrifying the customers they hope will pay for AI products.

Second, public opinion is turning. A Quinnipiac poll from March 2026 found that 55 percent of Americans now believe AI may do more harm than good in daily life, which is up from 44 percent a year earlier, with about seven in ten people expecting fewer job opportunities in the future.

And third, journalists are running out of patience. Ezra Klein’s May 2026 New York Times column “Why the A.I. Job Apocalypse (Probably) Won’t Happen” reports that the economists he interviewed are skeptical of mass joblessness. Also, a recent Ronan Farrow piece in The New Yorker even raises the question of whether Altman is actually a strong chief executive for a trillion-dollar company.

Newport says there may be additional pressures in the market pushing AI executives to temper their rhetoric in recent months, but his analysis on the three issues above seems pretty comprehensive as a working hypothesis.

A Welcome Maturation

The Silicon Valley monoculture has finally collided with the rest of the country, Newport argues. Wall Street realism, journalistic scrutiny, and ordinary public sentiment are forcing the language to evolve to the realities of the market. Newport sounds almost relieved. Somebody, he suggests, finally had to tell these founders to “stop talking like you’re Sarah Connor from Terminator 2.” His understated parting advice still applies. Take AI seriously, but not everything you hear about it.

AI’s Perpetual Present

I’ve been reading “Why We Need Continual Learning” by Malika Aubakirova and Matt Bornstein recently. I also listened to a podcast interview from Malika on a16z . Now, I’m no AI researcher or developer. But I do like exploring the scientific foundations on which advanced software tools are built, especially since I use these applications every day and hope to leverage them more in the future. So although I don’t fully understand what’s actually happening underneath, poking around a bit is an interesting exercise. What follows below is what I’ve learned from the article. Consider it a work in progress. If you want the expert version from Malika and Matt, go read their original piece for a deep dive. This text here is just me working through things as best as I can at my level. At the end of this post, I include a list of terms and definitions. I’ll make that a standard feature in similar upcoming posts for my own short-term recall practice and also for long term memory consolidation. Memory practice (the human kind) is a hobby of mine.

Anyway, here we go. The authors open their article on continual learning by referring back to Christopher Nolan’s “Memento,” which is a film about a man named Leonard Shelby who suffers from anterograde amnesia that prevents him from forming new memories. Every few minutes his world resets and he wakes up in the same perpetual present with no idea what just happened in the past. He tattoos notes on his body and carries Polaroids as memory aids just to function throughout the day. It turns out that he’s very resourceful because he uses whatever he can in his environment to get by. He even appears pretty capable within any given scene in the movie. But, as the authors put it, his tragedy is that “he can never compound. Every experience remains external.” So, I guess that means he can’t learn based on his present moment to prepare for the future like most of us who have normal memories.

That seems to be a good general description of where AI models are right now. Back before I knew about this issue, I actually inadvertently tripped over it when I first used ChatGPT and Grok a few years ago. It was clear from my chats at the time that the models were not “learning” from our conversations at all. I kept spinning around in circles explaining myself over and over again. And, in fact, some of those earlier models didn’t know even basic facts from current events, which was shocking since AI was sold to us as being so super smart. That’s when I realized that the “learning” for LLMs took place at some point in the past and then they were locked shut while life continued on. That experience of an AI not knowing simple bits in the news rarely happens now so the user experience has improved significantly. However, there’s a lot more to it that I didn’t realize from those first few frustrating conversations.

What’s Actually Happening When You Type Into That Text Box

Here’s what I didn’t fully understand before reading the article. When you type into a chat window and stuff happens before you get an answer, that process is not the model learning anything from your input. It’s reading what you gave it and generating a response. When the conversation ends, the model does not carry that conversation forward in its memory. The next conversation starts from exactly the same place as every other new conversation. Initially, that felt unnerving so I had to figure out ways to leverage the knowledge from the LLM without all that forgetting going on.

The text box we type into is just a door into the system. What matters is the context window behind the door, which is everything the model can see at once. So, your message, the whole conversation history, any documents you shared, and any background instructions — all of these things represent what the model is working with when it responds. And it has a size limit. When it fills up, older content gets dropped to make room for new content. So if you spend an hour explaining your company’s internal processes to an AI assistant and then start a fresh conversation the next day in a new text box, the AI has no memory of the previous conversation. You have to start over. Not because it forgot. Because it never learned in the first place.

There’s a name for this phenomenon. The article calls it in-context learning, which is really just the model making smart use of whatever sits in front of it right now. It’s temporary by design. The model reads, responds, and moves on. It’s similar to glancing at your notes before a meeting rather than actually deeply studying, internalizing, and using the material beforehand. When the meeting ends, those casual notes go back in the drawer and are forgotten.

The Frozen Model Problem

To understand why this matters, you need to know a little about what’s inside these models. During training, a model reads an insane amount of text and gradually adjusts billions of numerical values called parameters or weights. You can think of each weight as a dial on a pipe connecting two nodes in the network controlling how much signal flows through. The model trains by turning billions of those dials very slightly over and over again until it gets good at predicting language. That right there is really impressive to me given the scale of information these models are working with. But when the training process ends, all those dials get locked. That stage represents deployment. The model then goes out into the world with its knowledge frozen in place.

Training works because it’s a compression process. The model can’t store everything it reads verbatim. It has to find the underlying patterns, generalize the data, and build something compact that transfers to new situations it’s never seen before. The authors describe this as lossy compression, and that lossiness is actually what produces what seems like intelligence to us when we talk to an AI. When I first read that I thought of a camera compressing a RAW file to a JPEG file. The RAW image contains all the available data but it’s a massive size and requires editing in post production to produce a beautiful image. The JPEG, however, is much smaller because it’s been compressed by the camera to just what’s needed to display a good quality image at a certain size. I’ve always understood that process in photography, but I didn’t realize that LLMs are going through a similar process.

Here’s another way to think about it. Remember when you first learned how to ride a bike? You didn’t read the entire manual every time. You just got some guidance from a friend or a parent and you practiced. You fell down a few times and adjusted your technique, and then eventually your brain distilled your experience into something automatic and compact. That’s compression. You still remember falling down, but that falling down process is no longer helpful for riding once learning has taken place. What remains is the final skill of balancing to ride. An AI model that memorizes every training sentence perfectly would be less useful, not more, because it could retrieve but never generalize. It would behave more like a simple retrieval system than a sophisticated learner.

The painful irony the authors identify is this. The very mechanism that makes these models powerful during training is exactly what we stop them from doing once they’ve been deployed. We freeze the compression at the moment of release and replace it with what’s called external memory. That clarified the argument for me. The system is layered, and each layer is essentially a workaround for the fact that the compression stopped. Understanding that made the next part of the article click.

The Filing Cabinet

To compensate for frozen models, developers have built elaborate scaffolding systems, such as chat histories, retrieval databases, system prompts, external document stores, and more. All of these things make up what the article calls external memory. They are flexible and they live outside the model’s internal, frozen weights. When you need information, the system retrieves it and feeds it into the context window. Then the model reads it and responds.

This architecture works as is and the authors are honest about that. However, they make a point I hadn’t considered before. “A bigger filing cabinet is still a filing cabinet.” Retrieval is not learning. The model is looking things up, not actually knowing them. It just does it very quickly and uses natural language so you get the impression you are talking to someone who is intelligent.

Here’s another practical example. Say a hospital deploys an AI assistant to help with real world clinical decisions. That model was trained on medical literature through some cutoff date. A major new clinical trial or medical policy comes out afterward that changes how doctors treat a particular condition. The hospital can feed that paper into a retrieval database so the AI can surface it when it’s relevant. But the model doesn’t internalize that new research the way doctors would after reading it, applying it to patients, observing the outcomes, and revising their practice accordingly. The AI can retrieve the abstract. But it can’t reason from the new finding the way someone who has truly learned it can in practice. That’s the limitation these researchers are trying to fix.

The same problem exists in cybersecurity with treats evolving daily. A frozen model can be given descriptions of new attack patterns through retrieval, but it can’t compress and generalize from those patterns the way an analyst does who has spent months chasing a specific class of threat. The knowledge stays external. It never becomes part of what the model actually knows unless the model is updated with a new learning process, which is time consuming and very expensive.

What Real Learning Requires

So what’s the alternative? The article introduces a concept called continual learning, which is the field of research aimed at letting models actually update their weights based on new experience after deployment. Not just read notes. Actually learn live like humans do.

And here’s where the Memento metaphor really makes sense. The authors say that today’s AI is stuck in Leonard Shelby’s perpetual present. The scaffolding, the Polaroids and tattoos, and other memory aids work well enough within any given scene. But the model can never compound in real time. Every new thing it encounters stays external.

Think about the difference between a doctor who simply retrieves a recent study and a doctor who has spent years treating patients with that knowledge fully and personally internalized. Or consider the difference between someone who has your email history in front of them and someone who actually knows how you think over time. The article frames this cleanly. “The difference between ‘Here is what you responded to this email before’ versus ‘I understand how you think well enough to anticipate what you need’ is the difference between retrieval and learning.” Even in normal human memory, immediate retrieval is necessary to manage your present experience. However, it’s also required that your present experience be embedded into long term memory for continual learning.

The authors bring up Fermat’s Last Theorem as one powerful example of the kind of hard discovery problem they have in mind. Mathematicians worked on the issue for 350 years. Eventually the problem was solved by Andrew Wiles. But he didn’t crack it by retrieving the right papers. He solved it by working in near total isolation for seven years, and inventing entirely new mathematical techniques to bridge two previously disconnected fields. That kind of discovery required genuine compression, generalization, and creative combination. Not simply fast retrieval. And the article asks directly whether a model that can’t compound from experience could ever do anything like that. The honest answer is they don’t know yet.

Why Updating Weights Is So Hard

At this point I had to ask myself if real time continual learning is so important, why can’t the LLM models do it now? The short answer is that updating a model’s weights after deployment is genuinely dangerous and technically unsolved at scale.

The most obvious problem is called catastrophic forgetting. When you update a model’s weights to learn something new, it tends to overwrite what it already knew. New learning crowds out old learning. If you fine tune a general model specifically on medical records, it might get better at clinical language while getting noticeably worse at everything else because the new training has nudged weights that were also doing other jobs. The model gets better at one thing and potentially worse at everything it was already good at. When you understand this you can really appreciate how humans have benefited from millions of years of evolution. The AI machines seem rather clunky by comparison. When humans learn, new neural connections are made in the brain that stick for a long time as new learning is layered on top. But even in humans, old learning and memory does actually fade gradually over time if a specific neural pathway isn’t continually or at least occasionally reinforced. It just takes a very long period of time. With AI systems, however, new learning can wipe out old new learning immediately. The authors didn’t address this issue directly in humans, but the example seems similar if you study biology.

There’s also the problem of data poisoning. If a model’s weights can be updated through interactions after deployment, bad actors could gradually manipulate its behavior through carefully crafted inputs over time. Unlike a one-time attack, poisoned weights persist across every future conversation. The damage would live in the model itself so safety alignment would degrade unpredictably immediately or some time in the future. The article notes that “even narrow fine-tuning on benign data can produce broadly misaligned behavior,” which is a sobering thought to sit with. Yet we all know this would happen right away based on our own experience being online every day fighting bots and hackers.

These aren’t hypothetical concerns. They’re real problems without clean solutions yet.

Where Things Are Heading

The article maps out a spectrum of approaches to continual learning that are organized around a question I found clarifying: where does the compaction actually happen? It seems there is a stack of technologies managing the process.

On one end you have pure retrieval. No compaction. The model just reads notes. That’s most of what exists today. In the middle there are modules, which are attachable and specialized components that let a model develop some expertise in a specific domain without retraining the entire thing from scratch. A hospital might attach a medical module to a general model so it performs at a specialist level on clinical questions, while the same base model with a different module handles legal contracts. Each module is swappable independently. That’s a practical and reasonable middle ground for now.

On the far end you have full parametric learning, where the model’s weights actually update from new experience after deployment. This is the goal, but it remains largely unsolved at scale with the current technologies. But there are serious research efforts moving in this direction with things like test-time training where the model runs brief learning cycles before it generates a response. Also there are self-improvement approaches where models like AlphaEvolve have generated their own training data and genuinely improved from it, at least within constrained problem domains like mathematics.

The authors frame the path forward as layered. In-context learning stays as the first line of adaptation because it works now and keeps getting better. Modules offer some personalization and domain specialization. But for genuinely novel problems, adversarial scenarios, and knowledge too tacit to put into words, models may eventually need to compress new experience directly into their parameters after training. Otherwise, as the authors put it, we stay stuck in Memento’s perpetual present.

What I Took Away

I started reading this article as someone who uses AI tools every day without really thinking much about what’s happening underneath. What I came away with is a better sense of the gap between what these systems appear to do, what they’re actually doing, and what they’ll potentially do in the future. Right now they can respond to new information and adapt to what you give them. And most times they feel like they understand you. But the reality is that they don’t compound. They don’t learn. They don’t internalize new experience the way continual learning systems or humans would. Their dials are locked. And until engineers figure out how to update those dials safely and continuously after deployment, the models we’re using now are doing something more like reading notes than actually learning from the experience. That’s a distinction with a very big difference.

Check out the original article and Malika’s podcast for the technical details. Below is a list of related terms and definitions.

Continual Learning: Vocabulary List

This list of terms below is based on the a16z article “Why We Need Continual Learning” by Malika Aubakirova and Matt Bornstein and also the podcast with Malika discussing the article. Some definitions closely reflect the article itself, but others expand into broader concepts from the field for additional context. I error checked the terms and definitions with Grok, ChatGPT, Gemini, Perplexity, and DeepSeek.

Agentic Loops

A mode of operation where the model works autonomously step by step toward a goal without you typing each instruction. Each step produces output that feeds into the next. This process can go on for many cycles. The article identifies two related problems as steps accumulate: (1) the immediate symptom is coherence degradation, where the agent loses the thread and starts making poor decisions, and (2) the underlying cause is that maintaining a growing context becomes increasingly expensive and inefficient. Both concerns together represent why the article frames agentic loops as one of the pressure points on the current in-context learning paradigm. For example, an agent tasked with researching a topic, drafting a report, checking sources, and revising the draft might handle the first twenty steps cleanly. But by step eighty the accumulating context has grown so large and costly that the agent starts losing track of earlier decisions and repeating work it already did.

Attention Heads

A key mechanism inside transformers that allows the model to weigh how relevant each part of the context is to every other part when generating a response. Multiple attention heads run in parallel, each learning to focus on different kinds of relationships in the text. One head might learn to track grammatical agreement between subject and verb across a long sentence, while another tracks thematic connections between paragraphs. Together they allow transformers to handle complex, long range dependencies in language that earlier architectures struggled with. For example, in the sentence “The lawyer who argued the case, despite the objections raised by her colleagues, ultimately won,” an attention head helps the model correctly connect “won” back to “lawyer” across all the intervening words.

Catastrophic Forgetting

When a model updates its weights to learn something new, it tends to overwrite what it already knew. In other words, new learning crowds out old learning and sometimes dramatically. This is one of the central unsolved problems in continual learning, and one of the main reasons models are not updated continuously after deployment. Think of it somewhat like overwriting parts of a hard drive. The new files go in, but the old ones can be partially or fully lost. For example, if you fine-tune a general purpose model specifically on a medical records archive, the model will get better at clinical language but noticeably worse at writing poetry or explaining history because the new training has nudged weights that were doing other jobs.

Compression / Compaction

The process of taking a vast amount of raw information and distilling it into something compact and generalized. During training, a model compresses an enormous amount of human writing into its parameters and finds the underlying patterns rather than storing things verbatim. The article uses “compaction” as a broad organizing term for how deeply new information gets digested, which ranges from not at all (pure retrieval, where facts just sit in a database) to fully (weight-level learning, where the model actually internalizes new knowledge). For example, rather than memorizing every recipe ever written, a well-trained model compresses the underlying logic of cooking: how heat transforms food, how flavors balance, how techniques generalize across cuisines.

Continual Learning

The broader field of research aimed at letting models learn from new experience after deployment, ideally by updating their weights rather than relying on external scaffolding. It’s the opposite of the current norm, where training and deployment are completely separate and weights are frozen the moment a model is released. The goal is something closer to how humans learn continuously from experience without needing to be retrained from scratch every time the world changes. For example, a customer service model using continual learning could gradually internalize patterns from thousands of resolved support tickets over time and get genuinely better at its job rather than just retrieving past examples.

Context Window

The full body of text the model can see at once when generating a response. It includes your message, the full conversation history, any documents you shared, and any background instructions passed to the model. It has a size limit measured in tokens. When it fills up, older content must be dropped to make space for new content. For example, if you have a long conversation with an AI assistant and then ask it to recall something you mentioned earlier, it may not be able to answer because that part of the conversation has already been pushed out of the window.

Data Poisoning

One of several serious governance and security risks the article raises around continuous weight updates. If a model’s weights can be updated after deployment interactions, bad actors could gradually manipulate its behavior through carefully crafted inputs over time, which is a slow and hard-to-detect form of corruption that lives in the weights rather than just in the context. Unlike a one-time prompt injection attack, poisoned weights persist across every future conversation. The article groups this alongside other unsolved challenges: alignment degradation, the impossibility of unlearning toxic knowledge, auditability failures, and privacy risks from user interactions being compressed into parameters. For example, an adversary could repeatedly feed a customer-facing AI subtly misleading information about a competitor’s product until the model begins reproducing those inaccuracies on its own with no obvious sign of tampering.

Distillation

A process involving two models: (1) a large, capable, frozen teacher and (2) a smaller student. The student is trained to match the teacher’s outputs as closely as possible and absorb its knowledge in a more compact form. The result is a smaller, more efficient model that performs nearly as well as the larger model on the tasks it was trained for. It’s like an apprentice learning by closely watching and mimicking a master until the skill becomes their own. For example, a large hospital system might use a massive general-purpose model as the teacher and distill its medical reasoning capabilities into a smaller model that can run efficiently on local hospital hardware without requiring a cloud connection.

External Memory

Anything outside the model’s weights used to store and retrieve information. Chat history, databases, document stores, and agent notes are all examples of external memory. Information gets fed back into the context window when necessary. In current deployment architectures, the model typically does not update its weights from that information during inference. The key limitation is that external memory requires retrieval. The model has to be given the right information at the right moment, and if it isn’t, the knowledge might as well not exist. For example, a legal AI might have a database of ten thousand case summaries it can search, but if the retrieval system surfaces the wrong cases, the model has no way to compensate from its own knowledge.

Few-Shot Learning

The ability of a model to perform well on a new task after seeing only a handful of examples, rather than requiring thousands of training samples. Transformers are surprisingly good at this when examples are provided in the context window. Meta-learning approaches aim to make weight-level, few-shot learning just as effective, so the model can internalize new tasks from just a few examples even without them being available in the context. For example, if you show a model three examples of how you want your emails formatted and then ask it to format a fourth, it adapts immediately without any retraining. That’s few-shot learning in action.

Fine-Tuning

A more targeted form of additional training done after the initial training run. Instead of training from scratch on everything that’s known, you take an already-trained model and update it on a smaller or specific dataset. The new information shapes the model’s behavior for a particular use case without rebuilding it from the ground up, but the process still risks catastrophic forgetting if pushed too hard. For example, a company might take a general-purpose language model and fine-tune it on thousands of their internal support conversations, so the model learns the company’s terminology, tone, and common issue patterns without losing its broader language capabilities.

Gradient Descent

The mathematical process by which a model adjusts its weights during training. It measures how wrong the model’s predictions are on a given example and then calculates which direction to nudge each weight to reduce that error slightly. It’s called “descent” because the process is navigating downhill on a mathematical landscape, always moving toward lower error rates. Repeat this across billions of examples and the model gradually gets much better. For example, if the model predicts “cat” when the correct answer is “dog,” gradient descent works backward through the network to figure out which weights contributed to that wrong answer and adjusts them a tiny amount. Do that enough times and the model learns to tell cats from dogs reliably.

In-Context Learning (ICL)

Everything the model reads and uses during a single conversation without updating its underlying knowledge. You paste in a document, it reads it and responds. You describe a task, it follows your instructions. But when the conversation ends, none of that experience changes the model itself. The next conversation starts with the same frozen weights as always. This is a smart use of temporary information, but it’s not genuine learning. For example, if you spend an hour teaching an AI assistant about your company’s internal processes and then start a new conversation the next day, the model will have no memory of the previous conversation. You would need to paste in that information all over again.

Inference

The act of a model generating a response from input. It’s the opposite of training. Training occurs when the model learns by adjusting its weights. Inference occurs when the frozen model performs and takes what it knows and produces an output. Any time you send a message and get a reply, that’s inference. The term “inference-time compute” (below) builds on this and refers specifically to spending extra computational effort during inference to get a better result. But plain inference just means the model is running, not learning. For example, asking a model what the capital of France is and getting back “Paris” in a fraction of a second is inference in its simplest form. No learning took place. The model generated an output from its existing weights without updating them.

Inference-Time Compute

The current dominant paradigm for improving model performance by spending more computational effort at the moment of response rather than updating weights. This includes chain-of-thought reasoning, tool use, search, and iterative problem-solving, all of which cost more compute at response time but produce better results. The article positions this process as a workaround, a scaling of what already works rather than a true solution to the learning problem. Test-time training is the most aggressive form of this learning because it actually runs gradient updates on new information during inference, which begins to compress it into weights in real time. This process sits at the boundary between the current paradigm and genuine parametric learning. For example, when you ask a model a complex math problem and it works through each step before giving a final answer rather than just guessing immediately, that is inference-time compute. The model is using more processing in the moment to arrive at a better result.

Instruction Tuning

A form of fine-tuning where the model is trained specifically on examples of instructions paired with ideal responses. It’s one of the main reasons modern models are so much better at following directions than earlier versions, which tended to just complete text rather than actually do what you asked. The model learns not just facts but the shape of helpful behavior, including how to interpret requests, how to structure answers, and when to ask for clarification. For example, an early language model asked to “summarize this article” might just continue writing in the same style as the article. An instruction-tuned model understands that the request calls for a concise, distinct summary and produces one.

KV Cache

Short for key-value cache. A technical mechanism that stores intermediate computations during inference so the model does not have to redo them from scratch for every token it generates. The article discusses it specifically in the context of KV cache compaction where the cache functions as a form of non-parametric memory but grows substantially as conversations and agent loops get longer. The authors argue that learning to compress this cache more efficiently is one of the meaningful challenges in moving from pure retrieval toward more durable knowledge storage. For example, in a long agentic task, the KV cache holds the computed representations of everything the model has processed so far. Without it, each new token would require reprocessing the entire history from scratch, which would be prohibitively slow.

Lossy Compression

Compression where some information is permanently lost in the process, as opposed to lossless compression where everything can be recovered exactly. For LLMs, the inability to store everything verbatim during training forces the model to find patterns, generalize, and abstract. That forced abstraction is precisely what makes the model seem intelligent and useful in new situations it has never seen before. A JPEG image is the familiar everyday example. Save a photo as a JPEG and the file shrinks dramatically because fine detail is discarded. But if you zoom in close enough you can see the degradation. For most purposes, though, the image is perfectly usable. The tradeoff is the point. For a language model, the equivalent is that it cannot recite every sentence it ever trained on, but it can write a new sentence in any style on any topic because it extracted the underlying structure rather than memorizing the surface.

Meta-Learning

Teaching a model how to learn rather than what to learn. The model is pre-trained in a way that positions it to update quickly and effectively with just a few new examples, rather than requiring extensive retraining. It’s the difference between educating someone to be a quick study versus simply giving them a lot of facts to memorize. A quick study can walk into an unfamiliar subject and get up to speed fast, whereas someone who only memorized facts cannot. For example, a meta-learned model shown three examples of a new classification task, say sorting customer complaints into categories it has never seen before, should be able to generalize accurately to new complaints after just those three examples rather than needing hundreds.

Modules

The article uses this as a broad middle-ground category on the compaction spectrum that sits between pure retrieval and full weight-level learning. In practice, modules can take several forms: adapter layers, LoRA-style weight updates, memory components, or cached representations. What they share is the ability to specialize a general-purpose model for a specific domain without retraining the entire model from scratch. They offer more than retrieval in that some digestion of information happens, but less than full parametric learning in that the core model is typically left unchanged. For example, a hospital might attach a medical module to a general-purpose model so it performs at a specialist level on clinical questions, while the same base model with a legal module performs at a specialist level on contract review, with each module being swappable independently.

Multi-Agent Architectures

Systems where multiple AI models work in parallel with each one handling a slice of a larger task and communicating results to each other or to an orchestrating layer. If a single model is limited by its context window, a coordinated group of agents can collectively handle far more. But this shifts the problem rather than eliminating it. Each agent still faces its own context limit, and coordinating many smaller contexts introduces its own complexity for the system to manage. It’s a non-parametric workaround for scale, not a solution to the underlying constraint. For example, a research task that would overflow one model’s context window might be split across ten agents, each reading a different section of source material with a coordinating agent assembling their summaries into a final report.

Neural Network

The underlying computational structure of an LLM. It’s a network of interconnected nodes organized in layers, loosely inspired by neurons in the brain. But the analogy should not be pushed too far. Each connection between nodes has a weight that determines how strongly one node influences another. During inference, information flows forward through the layers, gets transformed at each step, and eventually produces an output. The network learns by adjusting those weights during training until it gets good at its task. For example, in an image recognition network, early layers might learn to detect simple edges and colors, middle layers might learn to recognize shapes, and later layers might learn to identify objects. Language models work on the same principle but applied to sequences of text.

Parameters / Weights

The billions of numerical values inside a model that encode everything it learned during training. Each value represents the strength of a connection between two nodes in the neural network. During training, these values get adjusted gradually until the model becomes good at predicting language. After training they are frozen, and the model’s knowledge and capabilities are entirely determined by those fixed numbers. “Parameters” and “weights” refer to the same thing and are used interchangeably throughout the article. For example, frontier models contain billions or even trillions of parameters. Each one is a small dial that was tuned during training and now stays locked in place, collectively encoding an enormous amount of compressed knowledge about language, facts, and reasoning patterns.

Parametric Learning

Learning that actually updates the model’s weights based on new experience, as opposed to in-context learning which uses information temporarily without changing anything permanent. It’s the deeper form of learning the article is ultimately arguing we need more of. When a model learns parametrically, new knowledge gets compressed into its weights the same way training data did and becomes a durable part of what it knows rather than a note it holds briefly and then discards. For example, a parametric update after a model encounters thousands of conversations about a new programming language would leave it genuinely better at that language going forward across all future conversations, not just within the session where it learned.

Regularization

A cautious approach to weight updates that penalizes changes to parameters deemed important to existing knowledge. Before updating a weight, the system estimates how critical that weight is to the model’s current capabilities. If it’s very important, the update is constrained or slowed down. This is one of the older approaches to continual learning and helps manage the stability-plasticity dilemma. But it tends to be brittle at scale. Think of it like a renovation rule that protects load-bearing walls. You can still remodel, but certain structures are off-limits because removing them would collapse the building. For example, EWC (Elastic Weight Consolidation), one of the most cited regularization methods, computes an importance score for each weight after training on a task and uses that score to resist changes when training on subsequent tasks.

Reinforcement Learning (RL)

A training approach where a model learns from feedback signals rather than from labeled examples. It tries things, receives a reward or penalty based on how well it did, and adjusts its behavior accordingly over many iterations. The article mentions RL-based feedback loops as one direction in continual learning research where models could improve from real-world deployment signals like user corrections or task outcomes. However, it’s not the central mechanism the authors emphasize. The core focus of the article is on compaction, weight updates, and memory structures. For example, the systems that learned to play chess and Go at superhuman levels used reinforcement learning by playing millions of games against themselves and adjusting strategies based on wins and losses rather than being taught explicit strategies.

Retrieval-Augmented Generation (RAG)

A common approach to giving models access to current or specialized information without retraining. Instead of baking knowledge into weights, you build a searchable database the model can query at response time. The retrieved content gets injected into the context window and the model uses it to generate its answer. It’s purely non-parametric. The model retrieves information but never internalizes it. The limitation is that retrieval only works if the right information gets surfaced at the right time, and no amount of retrieval can substitute for knowledge the model needs to reason with flexibly. For example, a financial AI might use RAG to pull in the latest earnings reports before answering questions about a company’s performance because that information changes constantly and cannot be baked into training data.

Safety Alignment

The work done during training to make a model helpful, honest, and safe to use. It involves carefully curated training data, human feedback on model outputs, and specific training objectives designed to shape the model’s values and behavior. One of the serious risks of continuous weight updates after deployment is that alignment can degrade unpredictably even from adding seemingly benign new data. It seems that fine-tuning on almost anything can shift the weights that govern behavior, not just the ones governing the specific knowledge update. For example, researchers have shown that even brief fine-tuning on ordinary instructional text can weaken safety guardrails in ways that are not obvious until the model is probed specifically for harmful outputs.

Self-Improvement

An approach where the model generates its own training data, filters out low-quality results, trains on the high-quality results, and repeats the cycle. It learns from its own work rather than from human-provided data and can improve capability over repeated iterations in constrained settings. The article cites AlphaEvolve and AlphaProof as examples of this kind of closed-loop improvement. But these systems operate in constrained domains like mathematics and algorithm optimization, not open-ended real-world learning. The article uses these examples to illustrate iterative self-training loops, and what qualifies as a genuinely new discovery in this context remains debated. For example, AlphaEvolve used self-generated solutions and automated evaluation to discover improvements to algorithms that human programmers could not find because it worked within a well-defined problem space where correctness could be verified automatically.

Stability-Plasticity Dilemma

The fundamental tension in any learning system between staying stable, meaning not forgetting what it already knows, and staying plastic, meaning remaining able to learn new things. Push too hard toward plasticity and you get catastrophic forgetting. Push too hard toward stability and the model cannot adapt to anything new. Solving this dilemma is one of the core engineering challenges in continual learning, and no approach has fully solved the problem at scale. The dilemma exists in biological brains too. From what I understand about biology, human memory consolidation is strongly associated with sleep and offline processing, which suggests the brain has its own version of this stability-plasticity problem built right in. For example, a model trained to be highly stable might refuse to update its belief that a particular drug is safe even after being shown new clinical evidence, while a model trained to be highly plastic might update so aggressively that it forgets basic grammar rules after a week of medical fine-tuning.

State Space Models (SSMs)

An alternative to traditional transformer architecture that the article highlights for offering a fundamentally better scaling profile for long contexts. The article describes them as using fixed memory layers interspersed with normal attention, which unlike transformers does not grow unboundedly with every token added to the context. Traditional transformers scale quadratically with context length, while SSMs aim for near-linear scaling. However, this remains an active area of research rather than a fully settled property. The article treats SSMs as a promising architectural direction for enabling much longer agentic loops rather than a definitive solution to the broader continual learning problem. For example, a transformer handling a 100,000-token conversation requires vastly more compute than handling a 10,000-token request. But an SSM handling the same expansion would ideally require only proportionally more, which could make very long agentic tasks far more practical.

Temporal Disentanglement

A core limitation of parametric memory since a model’s weights do not separate timeless facts from information that changes over time. Both get compressed into the same parameters and are tangled together with no internal label distinguishing what’s permanent from what’s mutable. This makes continual weight updates risky because changing a time-sensitive piece of knowledge can corrupt stable knowledge stored in nearby weights. The article frames this as one of the fundamental unsolved problems standing between today’s frozen models and genuinely adaptive ones. For example, the fact that two plus two equals four and the fact that a particular person holds a particular job title are both encoded somewhere in the weights. Updating the job title risks disturbing the arithmetic, because the model has no mechanism for knowing which facts are stable laws and which are contingent facts about the world.

Test-Time Training

An approach that blurs the line between training and responding by letting the model do a small amount of learning before it generates a final answer. Rather than relying entirely on what it learned during the original training run, the model runs brief gradient updates based on what it’s currently seeing and then responds. The article describes this as running gradient descent on test-time data, compressing new information into parameters at the moment it matters, and treats it as one of the more substantive moves toward genuine continual learning because it actually changes weights at inference time. For example, if a model is asked to analyze a long, unusual technical document, test-time training would let it briefly train on that document before responding, compressing its key patterns into weights rather than just reading it as context. This method potentially produces a much more accurate analysis as a result.

The Bitter Lesson

A well-known observation in AI research. It holds that given more compute and data, general methods that let models figure things out at scale consistently outperform clever human-engineered solutions over time. Every time researchers have tried to hardcode structure and shortcuts into AI systems, the simpler but more scalable approaches have eventually won. The article invokes this phenomenon to question why we still hand-engineer memory and compression pipelines rather than letting models learn to do it themselves. For example, early chess programs used elaborate human-crafted rules about piece values and board positions. They were eventually crushed by systems that simply learned from millions of games with minimal human guidance and relied on scale rather than cleverness. The same pattern has repeated across nearly every domain in AI.

Token

The basic unit of text that a large language model processes. A token is roughly a word, though it can also be a fragment of a word, a punctuation mark, or a short common sequence like “ing” or “un.” Models do not read text the way humans do, character by character or word by word. Instead, they break input into tokens first and then process the sequence. The size of a context window is measured in tokens, not words or characters. For example, the sentence “The cat sat on the mat” would be broken into something like seven tokens, roughly one per word. But a word like “unbelievable” might be broken into two or three tokens: “un,” “believ,” “able,” because it’s less common and gets split into recognizable subunits the model has seen frequently.

Training Run

The large-scale and expensive process of building a model’s knowledge by exposing it to massive amounts of data and adjusting its weights. Training involves feeding these huge datasets through the network repeatedly and using gradient descent to nudge weights toward better predictions. The process runs on clusters of specialized hardware for weeks at a time and consumes substantial amounts of electricity. It’s all carefully controlled, occurs before deployment, and produces a fixed set of weights that define everything the model knows. Once training ends, the weights are frozen and the model goes out into the world as-is. For example, training a frontier model like GPT-4 or Claude is estimated to cost tens or hundreds of millions of dollars and requires specialized data centers. This is precisely why continuous post-deployment learning is so appealing because rerunning a full training run every time the world changes isn’t practical.

Transformer

The dominant architecture underlying most major AI models today including Claude, GPT, and Gemini, and more. At its core, a transformer predicts the next token in a sequence of text based on everything that came before it. It generates outputs token by token at very high speed. That sounds simple but at scale it’s not. The architecture was trained on so much human-generated text that it models statistical relationships in language and attempts to produce behavior consistent with understanding context, logic, and meaning. For example, when you ask a transformer-based model to explain a complex idea, it makes predictions about what a good explanation would look like given your question based on patterns it absorbed from vast amounts of human writing on similar topics. That’s why it seems smart. It’s familiar. Whether the final output constitutes genuine understanding is a separate philosophical debate that the article doesn’t address.