Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul 8;35(27):10005-14.
doi: 10.1523/JNEUROSCI.5023-14.2015.

Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream

Affiliations

Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream

Umut Güçlü et al. J Neurosci. .

Abstract

Converging evidence suggests that the primate ventral visual pathway encodes increasingly complex stimulus features in downstream areas. We quantitatively show that there indeed exists an explicit gradient for feature complexity in the ventral pathway of the human brain. This was achieved by mapping thousands of stimulus features of increasing complexity across the cortical sheet using a deep neural network. Our approach also revealed a fine-grained functional specialization of downstream areas of the ventral stream. Furthermore, it allowed decoding of representations from human brain activity at an unsurpassed degree of accuracy, confirming the quality of the developed approach. Stimulus features that successfully explained neural responses indicate that population receptive fields were explicitly tuned for object categorization. This provides strong support for the hypothesis that object categorization is a guiding principle in the functional organization of the primate ventral stream.

Keywords: deep learning; functional magnetic resonance imaging; neural coding.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
DNN-based encoding framework. A, Schematic of the encoding model that transforms a visual stimulus to a voxel response in two stages. First, a deep (convolutional) neural network transforms the visual stimulus (x) to multiple layers of feature representations. Then, a linear mapping transforms a layer of feature representations to a voxel response (y). B, Schematic of the deep neural network where each layer of artificial neurons uses one or more of the following (non)linear transformations: convolution, rectification, local response normalization, max pooling, inner product, and softmax. C, Reconstruction of an example image from the activities in the first five layers.
Figure 2.
Figure 2.
The DNN model accurately predicts voxel responses across the occipital cortex. A, Prediction accuracies of the significant voxels across the occipital cortex (p < 2e-6 for both subjects, Bonferroni corrected for number of voxels, Student's t test across cross-validated training images within subjects). B, Prediction accuracies of the significant voxels across V1, V2, V4, and LO (p < 5e-8 for both subjects, Bonferroni corrected for number of layers and voxels, Student's t test across cross-validated training images within subjects). C, SNRs of the voxels across the occipital cortex.
Figure 3.
Figure 3.
Properties of the voxel groups systematically change as a function of layer assignment. A, Significant linear partial correlations between the predicted responses of each pair of voxel groups. Line widths are proportional to mean partial correlation coefficients across subjects. B, Distribution of the receptive field centers for both subjects. C, Example reconstructions of the internal representations of the convolutional layers. Reconstructions are enlarged, and automatic tone, contrast, and color enhancement are applied for visualization purposes. D, Proportions of the internal representations of the convolutional layers that are assigned to low-level (blob, contrast, and edge), mid-level (contour, shape, and texture), and high-level (irregular pattern, object part, and entire object) feature classes. E, Receptive field complexity (K), invariance, and size of the voxel groups.
Figure 4.
Figure 4.
Layer assignments of the voxels systematically increase as a function of position on the occipital cortex. A, Layer assignments of the significant voxels across occipital cortex (p < 2e-6 for both subjects, Bonferroni corrected for number of voxels, Student's t test across cross-validated training images within subjects). B, Layer assignments of the significant voxels across V1, V2, V4, and LO (p < 5e-8 for both subjects, Bonferroni corrected for number of layers and voxels, Student's t test across cross-validated training images within subjects). C, Proportions of voxels in areas V1, V2, V4, and LO that are assigned to low-level (blob, contrast, and edge), mid-level (contour, shape, and texture), and high-level (irregular pattern, object part, and entire object) feature classes.
Figure 5.
Figure 5.
Voxels in different visual areas are differentially selective to feature maps in different layers. A, Selectivity of the significant voxels in the occipital cortex to three distinct feature maps of varying complexity (p < 2e-6 for both subjects, Bonferroni corrected for number of voxels, Student's t test across cross-validated training images within subjects). B, Biclusters of hyperaligned voxels and feature maps. Horizontal and vertical red lines delineate the boundaries of clusters of feature maps and voxels, respectively. The rows and columns are thresholded such that each row and column contain at least one element that survives the threshold of r2 = 0.15. The numbers in parentheses denote the number of remaining feature maps and voxels after thresholding.
Figure 6.
Figure 6.
Our model performs similarly to the control models that are task optimized but outperforms those that are not task optimized across V1, V2, V4, and LO voxels of both subjects. A, Comparison between the prediction accuracies for our model (r0) with those for the pretrained DNN (rP), random DNN (rR), and GWP (rGWP) models. Red dots denote the individual voxels. Asterisks indicate the visual areas where the prediction accuracies are significantly different. B, Comparison between the layer assignments for our model (DNN0) with those of the pretrained DNN (DNNP) and random DNN (DNNR) models. Red dots denote the individual voxels. Crosses indicate the mean layer assignments of the DNN0 model.

Comment in

  • Encoding Voxels with Deep Learning.
    Wang P, Malave V, Cipollini B. Wang P, et al. J Neurosci. 2015 Dec 2;35(48):15769-71. doi: 10.1523/JNEUROSCI.3454-15.2015. J Neurosci. 2015. PMID: 26631460 Free PMC article. No abstract available.

References

    1. Agrawal P, Stansbury D, Malik J, Gallant JL. Pixels to voxels: modeling visual representation in the human brain. 2014 arXiv 1407.5104 [q-bio.NC]
    1. Aquino KM, Robinson PA, Drysdale PM. Spatiotemporal hemodynamic response functions derived from physiology. J Theor Biol. 2014;347:118–136. doi: 10.1016/j.jtbi.2013.12.027. - DOI - PubMed
    1. Cadieu CF, Hong H, Yamins DL, Pinto N, Ardila D, Solomon EA, Majaj NJ, DiCarlo JJ. Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLoS Comput Biol. 2014;10:e1003963. doi: 10.1371/journal.pcbi.1003963. - DOI - PMC - PubMed
    1. Carandini M, Demb JB, Mante V, Tolhurst DJ, Dan Y, Olshausen BA, Gallant JL, Rust NC. Do we know what the early visual system does? J Neurosci. 2005;25:10577–10597. doi: 10.1523/JNEUROSCI.3726-05.2005. - DOI - PMC - PubMed
    1. Chatfield K, Simonyan K, Vedaldi A, Zisserman A. Return of the devil in the details: delving deep into convolutional nets. 2014 arXiv 1405.3531. [cs.CV]

LinkOut - more resources