Agile, Artificial Intelligence, Architecture, ...: July 2026

Wednesday, July 22, 2026

Agents Are the New Microservices

A team I talked to recently took their working AI feature (one model, one prompt, one call) and split it into twelve cooperating agents. A planner, a researcher, a critic, a writer, a few tool-callers. The demo was beautiful.

Then they tried to run it in production and discovered they no longer had an AI problem. They had a distributed systems problem. The exact one the rest of us solved, painfully, a decade ago, and called microservices.

If you sat through the microservices years, the agent conversation in 2026 should feel like déjà vu. Same pitch, same gleam in everyone's eye, and, I'd bet, the same bill coming due.

The industry decomposes on a cycle

Step back and software architecture is a pendulum. Monoliths to SOA to microservices to serverless. Every wave breaks the system into smaller, more independent pieces, sells the same benefits (independent deployment, team autonomy, scale the hot parts) and quietly ships the same costs. Agents are the next swing.

But there's a real insight underneath the hype, and it's worth saying clearly. Microservices decomposed your code. Agents decompose your work. A microservice is a slice of functionality waiting to be called. An agent is a slice of a decision, pursuing a goal. That's a genuinely bigger idea, and it's why the pitch is seductive: you stop wiring logic for every edge case and instead hand a goal, some tools, and constraints to something that figures out the path. When it works, it's remarkable.

The tax comes back, with interest

What the slide never mentions, then or now: the moment you split one process into many that talk to each other, you inherit every problem of distributed systems whether you wanted them or not. Unreliable calls, timeouts and retries, debugging across boundaries, tracing, state that used to be a variable and is now a message someone might not receive. Conway's law shows up on schedule: your twelve agents start mirroring your org chart instead of your problem.

We paid for those lessons in outages. Agents inherit the whole invoice. But they also change the shape of failure, and this is the part that should keep an architect up at night.

A microservice fails loud. HTTP 500, a timeout, an alert. You know. An agent fails quiet. It returns something fluent, confident, and wrong. There's no error code for "plausible but false." One agent hallucinates, the next treats it as fact, and the mistake compounds down the chain like a game of telephone with no exception thrown. You don't get a red dashboard; you get a subtly bad outcome three steps later.

That's because the "contract" between two agents is natural language. No schema, no types, no compiler to catch a mismatch. The thing distributed systems spent twenty years learning to depend on, gone. And the work isn't deterministic, so the same input can take three different paths, which means you can't unit-test your way to confidence. Add that each hop is a model call, so latency and cost stack, and your elegant twelve-agent design can burn fifty inference calls to answer one question.

Honest version: agents can be microservices with brains, or microservices with worse tooling. Which one you get depends entirely on the discipline you bring.

What's genuinely better (To tax, you need income)

It would be lazy to only sound the alarm. Some of the upside is real and new.

Agents adapt without a redeploy: change a goal or a prompt, not a CI/CD pipeline. They can wrap a legacy monolith that has no API, driving it as a tool, which resurrects systems you'd otherwise pay millions to rewrite. They can self-heal and triage, routing around failures instead of just reporting them. And service discovery gets smarter: instead of looking up payment-service-v2 by exact name, an orchestrator can ask for "something that can process a payment" and match on capability. These aren't small.

So agents won't replace microservices

The headline is provocative, but the literal version is wrong, and the options that claim agents simply replace microservices are selling the same over-decomposition that gave us the distributed monolith. The accurate version: agents won't replace microservices. They'll sit on top and consume them. The deterministic, transactional, regulated heavy lifting stays in boring, reliable services. The cognitive, orchestrating, user-facing layer becomes agents. The future is a hybrid, and the architect's job is drawing that line well.

The lessons we already bought

We're not starting from zero. We paid for this education once. The trick is using it.

Start with the monolith. The hardest microservices lesson was "don't decompose first." One well-built agent beats twelve that need a committee meeting to answer a question. Split only when a real seam forces it.

Schema the handoffs. If agents must pass work, pin the boundary to typed, validated structure, not free prose. Give the natural-language soup a contract where it crosses a line.

Buy observability before you scale, not after. With agents you need more than logs and traces; you need the reasoning and inputs at each hop, or you'll never reconstruct why the system did what it did. Tracing a request becomes tracing a thought.

Don't decompose by hype. We split into microservices because it was fashionable, then spent years merging them back. Don't earn that scar twice. Decompose by genuine boundaries (different scaling, ownership, rate of change), not because "multi-agent" sounds advanced on a slide.

The actual point

The architects who came out of the microservices era well treated it as a tool, not a dogma. They knew "distributed" is a cost you pay for a benefit, and they paid it only when the benefit was real. The ones who suffered adopted the pattern because the conference talks were exciting.

Agents are at exactly that fork. The capability is real, the architecture is familiar, and the failure mode is predictable, because most of us have lived it once already.

So before you split your working system into a swarm of clever agents, ask the same question that should have been asked in 2015: do you have a problem that actually needs distributing, or do you just like the diagram?

Monday, July 20, 2026

AI Is Replacing Traditional Software - Here's What Comes Next

For thirty years, software meant the same thing: a team writes deterministic rules, ships a UI, and the user adapts their workflow to fit the tool.

That model is breaking.

AI-native applications don't ask users to conform to a rigid interface. They interpret intent, generate the interface on demand, and execute a workflow that used to require a dozen manually-configured screens.

The product isn't the UI anymore. The product is the outcome.

Three shifts are already visible:

1. The interface becomes disposable.
When software can generate its own UI per task, "features" stop being a durable asset. Screens and dashboards become ephemeral artifacts generated on the fly. This inverts decades of product strategy built around owning screen real estate.

2. Vertical SaaS unbundles at scale.
Products that survived on workflow lock-in—niche compliance tools, industry-specific dashboards, specialized add-ons—are exposed. If an AI agent can read a spec, query a database, and generate the same workflow in minutes, the switching cost was never the workflow. It was the lock-in. And that's collapsing.

3. Value moves to data, judgment, and orchestration.
The defensible layer isn't the interface anymore. It's proprietary data, domain-specific evaluation logic, and the systems that coordinate multiple AI agents reliably. Companies that only sold "a nicer way to click buttons" are the most exposed.

What comes next:

Pricing models shift from per-seat to outcome-based or consumption-based. The winners aren't the companies with the most polished UI—they're the ones that control the data pipelines AI depends on and the compliance expertise AI can't fake.

Traditional SaaS didn't survive the cloud transition by defending the old model. It won't survive this one by defending the interface.

The question isn't whether AI will replace traditional software. It's: Does your company control the moat AI actually needs?

Saturday, July 18, 2026

Why Most AI Investments Will Look Like ERP Projects by 2028

Right now, enterprise AI feels like the iPhone moment. A demo lands, a room gasps, someone signs a budget, and everyone pictures a clean, magical product that just works.

Here's my prediction. By 2028, most of those investments won't be remembered as the iPhone. They'll be remembered as SAP.

If you were in a boardroom in the late 1990s or 2000s, you know exactly what that means, and your stomach probably just dropped. ERP was going to unify the company, kill the silos, and give leadership one version of the truth. Some of it delivered. A lot of it ran years long, cost multiples of the estimate, and ended in a system nobody quite loved and everyone learned to work around. The software was never the hard part. The company was.

AI is walking into the same story. Not because the technology is weak, but because of what always happens when powerful technology meets a large organization: the innovation becomes operations.

Why ERP is the right rhyme, and the iPhone is the wrong one

The iPhone was a product you bought and used. ERP was a program you implemented. That difference is everything.

A product pays off the moment you unbox it. A program only pays off after you've rebuilt your processes, retrained your people, cleaned your data, and rewired how decisions get made. The technology is maybe a fifth of the work. The other four-fifths is organizational surgery.

Enterprise AI is a program, not a product. The chatbot demo is the product. Getting it to actually change how your claims get processed, your contracts get reviewed, or your supply chain gets planned, safely and at scale and with someone accountable, is an implementation. And implementations rhyme. The technology works. The organization doesn't. That was the real story of ERP, and it's about to be the real story of AI.

This isn't even the first time. ERP, then CRM, then cloud, then the data-warehouse wave: each arrived as innovation and left as governance. Salesforce didn't fix bad sales processes; it exposed them. "Lift and shift" to the cloud didn't cut costs until companies rebuilt what they moved. Every wave starts as a breakthrough and matures into an operating-model change. AI is simply reaching that stage faster than anyone expected.

The specific ways it will rhyme

Budgets balloon and timelines slip. The demo cost a rounding error; the deployment will not. Wiring AI into real workflows means data pipelines, access controls, evaluation, monitoring, and endless edge cases the demo never showed. ERP taught us the pilot is the cheap 10%. AI is teaching the same lesson to anyone paying attention.

A consultant economy appears overnight. ERP built Accenture and Deloitte as we know them. The same thing is happening now: the "AI transformation practice" is being staffed as you read this. The irony your CIO will feel personally: after a decade of selling "disruption," they'll end up hiring the exact roles that made ERP work: enterprise architects, business analysts, data stewards, integration specialists, program managers, change managers. The revolution will be delivered by the people who ran the last one.

Customization eats the promise. The pitch is a general model that does everything. The reality is that your data, your compliance rules, and your processes are unlike anyone else's, so every serious deployment becomes a custom build. The gap between "standard package" and "our special requirements" that made ERP balloon is waiting for AI, in the same place.

The value is in the redesign, not the tool. This is the deepest rhyme. ERP only paid off for companies that used it as a reason to simplify how they actually worked. The ones who bolted it onto their existing mess just automated the mess, expensively. AI is merciless about this. Drop it on a broken process and you get a faster broken process. Bad knowledge management becomes bad RAG. Bad documentation becomes confident hallucination. Bad workflows become automated chaos. The model doesn't fix the rot; it scales it. Most of the return will come from the redesign the AI forces, not the AI itself.

And then it becomes table stakes. The part executives least want to hear. ERP stopped being an advantage the moment everyone had it; it went from competitive edge to the cost of staying in business. Enterprise AI is on the same curve, only faster. The general capability you're paying a premium for today will be a commodity your competitors also have by 2028. The advantage was never the software. It was what only you could do with it.

Where the analogy breaks, to be fair

It isn't a perfect match, and I'd be doing bad analysis if I pretended it was.

AI is cheaper to start with, adopts bottom-up before any big program begins, and iterates in weeks where ERP iterated in years. A team can get real value from an off-the-shelf tool tomorrow, with no eighteen-month rollout. That's genuinely different, and it's good.

But that changes the entry, not the endgame. Easy pilots are exactly what will lull companies into underestimating the enterprise-scale version, which is where the ERP dynamics come roaring back. Cheap to start is not the same as cheap to deploy across a regulated, political, legacy-bound organization.

What to actually do with this

If your AI program is being budgeted like a software purchase, it will fail like an ERP project. Budget it as the organizational change it really is. Assume the model is the cheap part and the change management is the expensive part, and staff for that.

Expect the pilot magic to die on contact with the org, and plan for that death instead of being surprised by it. Pick the processes you're genuinely willing to redesign, not the ones you just want to sprinkle AI onto. And stop treating the general capability as your moat, because it won't be one for long. Your moat is your data, your workflow, and the specific redesign a competitor can't copy.

The companies that came out of the ERP era ahead weren't the ones with the biggest implementation. They were the ones who used a painful technology program as an excuse to become a simpler, sharper business.

So the question worth sitting with, before the next AI budget gets signed: are you buying a product, or signing up for an implementation? Because by 2028, almost everyone will discover it was the second one.

Friday, July 17, 2026

When AI knows everything, what should humans learn?

My nephew asked me last week why he should study anything if the machine already knows it. He's twelve. Fair question, and I didn't have a clean answer, so I've been chewing on it.

The model knows the answer. It doesn't know if the answer matters. Ask it a bad question and it hands back something confident, well-written, and useless, and it will never tell you the question was bad. Knowing what to ask, and what isn't worth asking, is now the harder skill. The machine doesn't do that part for you.

Then there's judging what it gives you. AI is wrong often enough that you can't outsource trust to it. If you don't know enough to smell when a number is off or a claim is too clean, you'll ship its mistakes as your own. You still need real knowledge in your head, not to race the model on recall, but to catch it.

That is the part school got backwards. For a century we tested the things AI is now best at: remembering, understanding, applying. Those were never the point. They were just the easy things to grade. The skills we waved at and rarely taught, analyzing, evaluating, creating, are exactly the ones left standing.

And someone still has to own the decision. When the model says lay off the team or change the treatment, it doesn't carry what happens next. A person does. You can't hand that to something that feels nothing when it's wrong.

So the answer to my nephew is not "study less." It's study differently. Learn to ask sharp questions. Go deep enough in something real to know when an answer is garbage. Build taste. And find the nerve to decide and stand behind it. The best use of an AI tutor isn't getting the answer faster, it's one that argues back and makes you think harder.

I told him: learn enough to know when the machine is lying to you. He got it faster than most executives I've met.

Thursday, July 16, 2026

The LEGO problem computers weren't supposed to solve

Hand a child a picture and a pile of LEGO, and they'll build something close to it. Ask a computer to do the same and you hit a wall that stood for decades.

It sounds trivial. It isn't. Turning a 3D shape into real bricks that snap together, hold their own weight, and don't collapse is a brutal combinatorial problem. Even a handful of bricks can be combined in so many ways that brute force chokes on it. So this sat for years in the pile labelled "computers can't really do this."

That label is coming off. Researchers at Carnegie Mellon built a system called BrickGPT that designs buildable LEGO models from a description. What makes it work isn't raw search. They trained it on over 47,000 brick structures spanning more than 28,000 unique 3D objects, and bolted on something like a physics inspector: it checks gravity, friction, and contact points, and when a section won't stand, it rolls back and redesigns that part. Then they had a robotic arm assemble one of its designs into a real object to prove the thing actually stands up.

Here is why I'd care if I ran a business, and it has nothing to do with LEGO.

Every company keeps a quiet list of things that are "just too hard to automate." Scheduling that one messy operation. Reading those non-standard documents. Planning a build nobody can write down as clean rules. Most of those lists were drawn up years ago and never looked at again. BrickGPT is a reminder that the line between "impossible for computers" and "done last year" moves faster than the list does, especially now that models can reason about real-world constraints instead of just pattern-matching text.

So the useful exercise isn't watching a machine build a LEGO guitar. It's pulling out your own "impossible" list and asking which items got quietly crossed off while you weren't looking.

Reference: https://avalovelace1.github.io/BrickGPT/

Wednesday, July 15, 2026

Why Shadow IT is increasing

Someone on your team is pasting company data into a chatbot right now. Not out of malice. Because it saves them an hour and IT hasn't given them anything better.

That is shadow IT in 2026, and it is growing fast. A few things are driving it.

The approved cloud tools are often too rigid. They are built to be standard, which means they rarely fit one company's actual workflow, so people build their own workaround in whatever is closest to hand. That closest thing is usually Excel. The most common ERP system on earth is still a spreadsheet someone made, because it bends to fit the job when the real system won't.

AI poured fuel on this. An employee opens a personal ChatGPT or Claude tab and is more productive in ten minutes, no ticket, no approval, no wait. When the unofficial option writes the email and cleans the data faster than anything IT handed them, policy loses. It loses every time.

And the gap keeps widening because technology moves faster than a large organization can approve anything. Shadow IT is the bridge people throw across that gap while they wait.

Here is the part leaders get wrong. They treat the tool as the problem and ban it. That just pushes the same behavior somewhere you can't see, where the data still leaks and you've lost the visibility too.

Shadow IT is a symptom. Someone had a job to do and the sanctioned path was slower than the shortcut. Fix that. Find out what the workaround is actually solving, get the safe version to be the fast version, and build a real route for those needs to reach the IT roadmap instead of hiding in a browser tab.

If your shadow IT is growing, your people have already told you your official tools are too slow. They just told you with their behavior instead of a survey.

Tuesday, July 14, 2026

Is the AI boom reaching its conclusion?

Is AI boom reaching its conclusion and will soon transition into a more mundane, infrastructure-focused phase? The hype cycle is ending as public interest shifts from novelty to practical value.

Why the AI Era is Ending Soon

Rapid Adoption: Unlike smartphones, which took decades to saturate the market, AI integrated into existing products (like email & docs) reached 53% of the population in just 3 years, causing it to peak much faster.

Financial Sustainability: Businesses are realizing that AI is expensive to run. Many are failing to see a significant ROI despite massive spending on compute & infrastructure, leading to abandoned projects.

What Happens Next?

The 'AI' Label Will Become Meaningless: Just as we don't market 'electricity-powered' toasters, 'AI-powered' will cease to be a differentiator as the tech becomes standard across the products.

Consolidation: Gimmicky AI tools will fail as businesses demand clear financial returns, leaving only a few dominant 'mega-companies' in control.

Invisible Infrastructure: AI will become a 'boring' part of life. The public will stop noticing it; it will function as a background utility.

Friday, July 10, 2026

Distillation: Genius or Theft?

n early 2026, OpenAI told a US congressional committee that China's DeepSeek had been "free-riding" on American AI. The technique it named was distillation: training a cheaper model on the outputs of a more powerful one to copy its abilities. Anthropic made a similar charge against several Chinese labs. Much of the Western press reached for the same word. Not innovation. Theft.

Hold that word, because there's an awkward fact sitting right next to it. Those same American labs built their models by scraping the open internet, and with it the copyrighted archives of newspapers, the work of authors and artists, and proprietary text none of them paid for. The New York Times is suing. Hundreds of other publishers are too. OpenAI's defense is "fair use," the legal phrase for we built something new on top of what already existed.

So the principle gets slippery fast. When a Chinese lab learns from an American model, it's theft. When an American lab learns from everyone's writing without asking, it's fair use. Both can't be a clean rule. One of them is just a function of who's holding the lead.

I'll argue something uncomfortable: distillation is not a Chinese trick. It's how innovation has always worked. And whether we call it genius or theft usually depends on which side of the wall we're standing on.

What distillation actually is

Strip away the menace and distillation is closer to apprenticeship than to burglary. The student model never receives the teacher's weights or its training data. It watches the teacher's outputs and learns to generalize from them. If a young engineer studied a master's work, absorbed the patterns, and went on to do similar work more cheaply, we wouldn't call it theft. We'd call it education. When software does the same thing, we suddenly reach for a darker word.

It's also worth noting that many of the Chinese "distilled" models are built on openly released foundations like Llama and Qwen, which were put into the world precisely to be built on. The genuine dispute isn't whether learning-from-outputs happened. It's whose outputs, and under what terms. That's a narrower and more honest question than "theft."

Innovation is a relay race, not a lone genius

We tell a flattering story about invention: the solitary genius, the blank page, the bolt from the blue. It's mostly myth. Newton, no modest man, admitted he saw further only "by standing on the shoulders of giants." Almost everything new is an incremental step on top of someone else's work, often someone in another country, often uncredited.

The irony runs right through AI itself. Every model in this fight, American and Chinese alike, is built on the Transformer, the architecture from a single 2017 Google paper that everyone then copied and extended. The entire industry is one long act of building on a rival's published idea. Three older examples should finish off the myth.

The numbers you do math with were borrowed, then renamed. Place value, the decimal system, and zero as a number were worked out in India; Brahmagupta wrote the rules for zero in the 7th century. Arab scholars absorbed and extended this system. The words "algorithm" and "algebra" both come from al-Khwarizmi and his work. When it reached Europe through Fibonacci in the 13th century, the continent called the digits "Arabic numerals," and India, where they were born, largely fell out of the story. The most basic tool in global commerce is a chain of borrowing in which the original source was written out of its own invention. Nobody now argues Europe should have refused positional notation because it came from elsewhere.

Japan turned copying into a quality empire. For a generation after the war, "Made in Japan" meant cheap imitation. Japanese firms reverse-engineered American cars, cameras, and electronics. Then they took a statistical quality method that US industry had largely ignored, Deming's, and perfected it. The copier became the benchmark the world measured itself against: Toyota, Sony, Canon. Nobody calls Japan's rise theft anymore. We call it excellence. The only thing that changed was the result.

And America climbed the very same way. Britain invented the industrial revolution and guarded it, banning the export of textile machinery and even the emigration of skilled mechanics. So in 1789 a young Briton named Samuel Slater memorized the designs of Arkwright's mills and carried them to America in his head. In Britain he is "Slater the Traitor." In America he is the "Father of the Industrial Revolution." Francis Cabot Lowell did the same with the power loom, touring British factories and rebuilding what he saw from memory. Alexander Hamilton openly urged the young republic to acquire foreign technology by whatever means. American industrial supremacy began as the deliberate copying of a rival who was trying to stop it.

The ladder, and the people who climb it

See the pattern. India seeded the mathematics. The Arab world carried and extended it. Europe took it and built modern science. America copied Europe to industrialize. Japan copied America and beat it on quality. China is now copying America in AI. Each stood on the one before, and each, on reaching the top, was tempted to call the next climber a thief.

Economists have a name for this: kicking away the ladder. You climb using every tool available (copying, borrowing, distilling), and the moment you're on top you develop a deep and sudden respect for intellectual property, then write the rules so the next country can't do what you just did. Britain tried it on America. America is now trying it on China, through export controls and accusations both. The argument always arrives dressed as principle. It is almost always about position.

Where the honest line actually is

I'm not pretending all copying is the same, and the serious version of this argument has to concede the difference. Learning from public work is one thing; deliberately breaking an agreement you signed and using deception to extract outputs at scale is another. If DeepSeek's engineers violated OpenAI's terms of service to do this, that's a legitimate grievance about method, and contracts matter.

But look closely and that is the exact grievance the newspapers have against OpenAI: that it took what it wasn't authorized to take, at scale, and built a competitor on top of it. You don't get to call your own scraping "fair use" and the other side's distillation "theft" from the same set of facts. Either learning-from-the-work-of-others is a legitimate engine of progress, with limits we apply evenly to ourselves and our rivals, or it isn't.

There's a strategic point hiding under the moral one, too. Distillation can shorten the journey, but it can't replace the ecosystem that makes frontier AI: the compute, the chip supply chains, the data pipelines, the talent, the capital. And history is blunt about hoarding: every attempt to lock knowledge in, from Britain's machinery bans to today's chip controls, slowed diffusion a little and spurred the rival's home-grown innovation a lot. The country that wins the next decade won't be the one that litigated hardest. It'll be the one that out-built.

So, genius or theft?

Both, and neither, which is to say the question is the wrong one. It pretends to be about ethics when it's really about power, and about who currently benefits from drawing the line where they've drawn it.

So I'll leave you with this. The next time you hear that a rival "stole" its way to the frontier, ask the older question first: how did the accuser get there? Because almost every great power on that ladder was once the thief in someone else's story.

Friday, July 3, 2026

I Sat In on a Webinar About Teaching AI to Spot Cancer. Here's What a Non-Medical Practitioner Actually Understood

Last week I joined a webinar on fine-tuning a vision foundation model to detect cancer in pathology slides. I'm not a pathologist. I can't read a slide, and a good chunk of the biology went over my head. But the machine-learning shape of the problem is something any ML researcher can follow, and by the end I could rebuild the pipeline on paper. This is that explanation, written for people like me who work in AI but not in medicine.

First, what a pathology slide even is

When a doctor removes a piece of tissue, a lab stains it (usually with H&E, which turns cell nuclei purple and other structures pink) and mounts it on glass. A pathologist looks at it under a microscope to judge whether the cells look cancerous.

To bring AI in, the slide gets scanned into a digital file called a whole-slide image, or WSI. The first surprise: these files are enormous. A single scanned slide can run to 100,000 by 100,000 pixels. That is a gigapixel image, hundreds of times bigger than anything a standard vision model takes as input. You cannot feed a whole slide into a network the way you feed it a photo of a cat.

Figure 1 — A gigapixel slide is cut into thousands of small tiles (patches) before any model sees it.

Figure 2 — An H&E-stained whole-slide image. Illustrative; source: # Automated Tumour Detection in Whole Slide Images: An End-to-End Deep Learning Pipeline (https://balintstewart77.github.io/camelyon16-pathology/)

The gigapixel problem, and the weak-label twist

The workaround is tiling. You chop the giant slide into thousands of small patches, often 256 by 256 pixels, and treat each patch as an image the model can handle. One slide becomes ten thousand little pictures. Before that, teams run a quick tissue-detection step to throw away the blank glass, and often a stain-normalization step (methods with names like Macenko and Vahadane) so slides from different labs don't look wildly different in color.

Tiling solves the size problem and creates a new one. You now have ten thousand patches per slide, and for most of them, nobody has told you which contain the cancer. The label you actually hold sits at the level of the whole slide: this patient has cancer, this one doesn't. The needle is somewhere in the haystack, and you were handed only the fact that a needle exists. In ML terms this is weak supervision, and it drives the whole design.

What the foundation model brings

Here is where the vision foundation model, or VFM, comes in, and where the webinar clicked for me.

A pathology VFM is a large vision transformer already trained on an immense pile of unlabeled patches. Virchow, one of the well-known ones, is a 632-million-parameter model trained on roughly 1.5 million whole-slide images with self-supervised learning (the DINOv2 approach from general computer vision). UNI, Prov-GigaPath, and CHIEF are other examples. Self-supervised means it learned the visual structure of tissue with nobody labeling cancer versus benign, the same way a language model learns from raw text.

The payoff: this model already knows what tissue looks like. Hand it a patch and it returns a compact numerical fingerprint, an embedding, that captures the meaningful content. You didn't teach it cells, staining, or texture from scratch. Someone spent enormous compute doing that once and released the weights.

Fine-tuning: you train very little

This reframes the task, and it's the part most relevant to non-medical ML people. You are not building a cancer detector from zero. You are adapting a model that already sees tissue clearly.

The webinar laid out three levels of effort:

The lightest and most common approach freezes the foundation model completely. You run every patch through it once, collect the embeddings, and discard the pixels. Then you train a small aggregator that takes the bag of patch embeddings from one slide and produces a single slide-level prediction. Because you only have slide-level labels, this aggregator is a multiple-instance-learning head with attention: it learns which patches deserve attention and downweights the rest. The attention scores come free, and they show you where on the slide the model is looking.

A middle option adds linear probing or small adapters on top, still keeping the backbone mostly frozen.

The heaviest option fine-tunes the foundation model's own weights, usually with a parameter-efficient method like LoRA so you aren't updating all 632 million of them. It costs the most compute and needs the most labeled data. The presenters' honest take: most teams don't need it. The frozen-encoder-plus-attention-head route gets you far, and full fine-tuning mainly pays off when you have a lot of clean, task-specific data.

Figure 3 — The common pipeline: the frozen VFM (blue) turns each patch into an embedding; a small attention-based aggregator (green, the only part you train) combines them into one slide-level call.

Where to get data without a hospital

One relief for outsiders: you don't need a hospital to start. Several large pathology datasets are public and openly licensed. Camelyon16/17 covers breast-cancer lymph-node slides, PANDA covers prostate, and TCGA spans many cancer types. They come with slide-level labels, which is exactly what the weak-supervision pipeline expects. OpenSlide is the standard library for reading these gigantic files.

The part they spent the most time on: not the model, the validation

This surprised me, and it's the most transferable lesson. The presenters spent less time on architecture than on how you check the result, because this is where pathology models quietly fail.

The headline numbers look great. Virchow reported an AUC around 0.949 for detecting cancer across seventeen tissue types, and held up on rarer ones. AUC measures how well the model separates positive from negative cases, where 1.0 is perfect and 0.5 is a coin flip, so 0.949 ranks a cancer slide above a healthy one almost every time.

Then came the warnings. Split your data by patient and by hospital, never by random tiles, or patches from the same slide leak between train and test and your score becomes fiction. Validate on slides from a hospital the model never trained on, because a different scanner and a different lab's staining can look foreign enough to fool it. Report more than one AUC: sensitivity, specificity, and especially the false-negative rate at a high-sensitivity operating point, because in cancer, missing a positive is far worse than flagging an extra slide for review. And don't over-trust the pretty attention heatmap; a pathologist has to confirm the highlighted regions are biologically sensible, because a model can land on an artifact or a smudge and still score well.

What I took away as a non-medical practitioner

Strip out the biology and the pipeline is familiar. A giant image gets tiled into patches. A pretrained foundation model turns each patch into an embedding. A small model learns from weak, slide-level labels to combine those embeddings into a diagnosis, and its attention doubles as an explanation a doctor can inspect. The hard part isn't the model; it's proving the model works on slides it has never seen.

The recipe travels well past cancer. Any domain with enormous images and scarce, coarse labels (satellite imagery, industrial inspection, materials science) can borrow it directly. The foundation model does the seeing. You do the smaller, more careful work of teaching it what to decide, and the even more careful work of checking that it decided for the right reasons.

One line the webinar kept returning to, which I'll pass on: none of this replaces the pathologist. The realistic goal is a second reader that flags suspicious slides and points to where it's worried, so a human spends attention where it counts. If you build in this space, that framing matters as much as the AUC.

I walked in unable to read a slide, and I still can't. But I walked out able to build the pipeline and, more usefully, able to tell a good result from a fragile one.