Can synthetic brain MRI from latent diffusion models train diagnostic classifiers?

By Soheil Fallah · Data Scientist & AI Consultant · peer-reviewed researcher in generative AI

A classifier trained only on synthetic brain MRI reached an AUC of 0.754 on real test scans. The same architecture trained on real data reached 0.810. So the generated scans carried genuine diagnostic signal, though they did not fully close the gap to real data. Usefulness for training is also a separate question from whether patient privacy held, and both need reporting.

Why train on brains that never existed?

Medical imaging datasets are small and hard to move. Hospitals hold scans under strict privacy rules, and building a large labelled cohort is slow. One way around this is to train a generative model on the scans you already have, then use it to produce new labelled scans you can train on or share. The hard word is "useful." A generated scan can look convincing and still teach a classifier almost nothing. My dissertation set out to measure usefulness directly, not realism.

What I did

I trained a latent diffusion model on [ADNI brain MRI: specify cohort and modality] and used it to generate a labelled training set for [your task, e.g. separating Alzheimer's from cognitively normal subjects]. I then trained a classifier on the synthetic scans alone and tested it on a held-out set of real scans. This is the TSTR protocol: train on synthetic, test on real. As a reference I trained the same classifier on real scans and tested it on the same real set, which gives the TRTR baseline (train real, test real).

The result

The synthetic-trained classifier reached an AUC of 0.754 on real scans. The real-trained baseline reached 0.810. That 0.056 difference is the cost of routing training through the synthetic pipeline. A model that never saw a real scan while learning still classified real scans well, and not far below a model trained on real data.

What the result does not mean

It does not mean synthetic beat real. It did not, and the gap is worth stating plainly. It also says nothing about privacy. TSTR measures task usefulness only. A generator that quietly memorised and replayed real patients would post a high TSTR score while leaking the very data you were trying to protect, so a separate privacy check [name your method] belongs next to the utility numbers. The figures here are also tied to one dataset, one task, and one resolution. I would not assume they transfer to other modalities without testing.

Where this leaves things

Synthetic brain MRI from a latent diffusion model is good enough to augment a small cohort today. It is not yet a stand-in for real data. If you report TSTR, put the TRTR baseline beside it so the gap stays visible, and report a privacy measure so usefulness is not mistaken for safety.