NeuroPictor

Abstract

Recent fMRI-to-image approaches mainly focused on associating fMRI signals with specific conditions of pre-trained diffusion models. These approaches, while producing high-quality images, capture only a limited aspect of the complex information in fMRI signals and offer little detailed control over image creation. In contrast, this paper proposes to directly modulate the generation process of diffusion models using fMRI signals. Our approach, NeuroPictor, divides the fMRI-to-image process into three steps: i) fMRI calibrated-encoding, to tackle multi-individual pre-training for a shared latent space to minimize individual difference and enable the subsequent cross-subject training; ii) fMRI-to-image cross-subject pre-training, perceptually learning to guide diffusion model with high- and low-level conditions across different individuals; iii) fMRI-to-image single-subject refining, similar with step ii but focus on adapting to particular individual. NeuroPictor extracts high-level semantic features from fMRI signals that characterizing the visual stimulus and incrementally fine-tunes the diffusion model with a low-level manipulation network to provide precise structural instructions. By training with over 60,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity, particularly in the within-subject setting, as evidenced in benchmark datasets.

Method

Our NeuroPictor framework is trained in three steps for fMRI-to-image decoding. i) the fMRI calibrated-encoding stage, which establishes a universal latent fMRI space across multiple individuals; ii) the fMRI-to-image cross-subject pre-training stage, which achieves multi-level modulation through perceptual learning. iii) the fMRI-to-image single-subject refining stage, using the same strategy in step ii but focuses on refinement on particular subject. The low-level manipulation network and high-level guiding network learn high-level semantics and low-level structures conditions, respectively.

Main Results

Qualitative comparison of our NeuroPictor and previous state-of-the-art methods.

Mismatched high- and low-level fMRI sources

Visualization of mismatched high-level and low-level features from different fMRI sources.

Subject-Specific Visualizations

Visualizations for Subject-1.

Visualizations for Subject-2.

Visualizations for Subject-5.

Visualizations for Subject-7.

More Results

Interpolating the control scale between 0 and 1 transitions the reconstructed image from semantic consistency to fine-grained control.

Visualizations for ablation study.

Visualizations for failure cases.

BibTeX

@misc{huo2024neuropictor,
        title={NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation}, 
        author={Jingyang Huo and Yikai Wang and Xuelin Qian and Yun Wang and Chong Li and Jianfeng Feng and Yanwei Fu},
        year={2024},
        eprint={2403.18211},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
  }

NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation

NeuroPictor can swap high-level fMRI features to manipulate image semantics while maintaining structural consistency.