Abstract Humans uniquely appreciate aesthetics, experiencing pleasurable responses to complex stimuli that confer no clear intrinsic value for survival. However, substantial variability exists in the frequency and specificity of aesthetic responses. While pleasure from aesthetics is attributed to the neural circuitry for reward, what accounts for individual differences in aesthetic reward sensitivity remains unclear. Using a combination of survey data, behavioral and psychophysiological measures and diffusion tensor imaging, we found that white matter connectivity between sensory processing areas in the superior temporal gyrus and emotional and social processing areas in the insula and medial prefrontal cortex explains individual differences in reward sensitivity to music. Our findings provide the first evidence for a neural basis of individual differences in sensory access to the reward system, and suggest that social–emotional communication through the auditory channel may offer an evolutionary basis for music making as an aesthetically rewarding function in humans.
Humans routinely experience pleasure in response to higher order stimuli that confer no clear evolutionary advantage. Aesthetic responses through pursuit of and engagement with the arts activates the same reward network in the brain that responds to the basic, sensory pleasures associated with food, sex and drugs via dopaminergic pathways (Blood and Zatorre, 2001; Salimpoor et al., 2013). The reward system is also active in prosocial functions including cooperation (Declerck et al., 2013) and self-disclosure (Tamir and Mitchell, 2012). Aesthetic judgment further shares its neural correlates in the reward system with moral decision-making (Avram et al., 2013). This overlap between aesthetic and social rewards offers the hypothesis that higher order pleasures could have originated because of the prosocial benefits associated with them. Others argue that higher order pleasures are an evolutionary exaptation, arising from stimulus-driven recruitment of brain circuitry involved in emotional reaction, response and appraisal (Huron, 2006). Currently, the neural link between sensory experiences and pleasurable aesthetic responses remains unclear. Music provides an ideal stimulus with which to study pleasure and reward, as it has been a fixture of every human civilization throughout history and is often reported as one of the most enjoyable of human experiences (Tramo, 2013). Individuals tend to report a complex array of bodily and mental sensations while listening to music, such as the feeling of a lump in the throat, feeling moved and the experience of chills: the tingling sensation on the scalp, back of the neck and spine that is often accompanied by goose bumps (Panksepp, 1995). Previous research has linked the occurrence of such sensations to behavioral ratings of subjective pleasure (Grewe et al., 2005), changes in psychophysiological measures of heart rate and skin conductance (Steinbeis et al., 2006) and neural activity in emotion and reward processing regions of the brain, specifically the nucleus accumbens (NAcc), anterior insula (aIns) and medial prefrontal cortex (mPFC) (Blood and Zatorre, 2001; Zatorre, 2005; Salimpoor et al., 2013).
Although these emotion and reward systems are found in all humans, not everyone experiences intense emotional responses to music and previous studies vary in the reported rates of these reactions (Grewe et al., 2005). Individual differences in musical ability, familiarity, engagement, as well as in measures of one of the Big Five personality traits of openness to experience, were all correlated but not perfectly predictive of the likelihood of a person reporting such reactions to aesthetic stimuli (Nusbaum and Silvia, 2010). Individual variations exist to the point that some individuals report being unable to experience pleasure from music despite normal responses to other rewards (e.g. monetary rewards) (Mas-Herrero et al., 2014). Pleasurable valuation of music is associated with increased functional connectivity in the brain between auditory cortices and mesolimbic reward circuitry (Blood and Zatorre, 2001; Salimpoor et al., 2013), yet why these circuits elicit intensely pleasurable responses in some individuals and not others is still unknown. Understanding the neural basis of individual differences between emotional and non-emotional responders could help define the neural pathways by which sensory stimuli gain value and become rewarding.
Based on these previous findings, we predict that structural connectivity between auditory- and reward-processing regions gives rise to aesthetic responses to music. Diffusion tensor imaging (DTI) is most effective at detecting differences in structural connectivity, such as those associated with psychiatric and neurological disorders (Barnea-Goraly et al., 2004; Park et al., 2004; Johnstone and Reekum, 2007), individual differences in personality (Cohen et al., 2009; Parkinson and Wheatley, 2014) or sensorimotor skill acquisition (Scholz et al., 2009) and musical ability (Loui et al., 2009). By using DTI to compare aesthetically responsive and unresponsive individuals while controlling for all other possible confounding factors, we hope to elucidate the neurobiological basis of individual differences in aesthetic responses to music.
To identify these individual differences in responsiveness to music, a large-scale screening was conducted that assessed individuals’ emotional responses (including chills) to music, measures of personality and background in and engagement with music. Using the results from the survey, we identified the 10 participants who reported perceiving chills to music (chill group) consistently, and the 10 participants who reported perceiving chills to music (no-chill group) rarely or never. Importantly, we matched the two groups for gender, age, personality factors and musical training. Behavioral ratings (continuous and discrete subjective ratings of pleasure and experience of chills) and psychophysiological measures (heart rate, skin conductance) were recorded while these twenty subjects listened to their self-reported favorite pieces of music as well as neutral control pieces. We then used DTI and probabilistic tractography to test for differences in indicators of structural connectivity between the chill responders and the no-chill responders. We hypothesized that people who consistently respond emotionally to aesthetic musical stimuli possess stronger white matter connectivity between auditory association regions in the posterior superior temporal gyrus (pSTG) and emotion and reward processing regions in the aIns and the mPFC.
Materials and methods
A total of 237 people completed an online survey to assess individuals’ background in and engagement with music, which was sent to various community and university email lists throughout the Boston area. To obtain a simple measure of personality, the 10-Item Personality Measure Index (TIPI) was used (Gosling et al., 2003). Also included in the questionnaire was the Short Test of Musical Preferences (STOMP) (Rentfrow and Gosling, 2003). The prevalence of intense emotional responses to music was assessed based on the answers to the Aesthetic Experience Scale in Music (AES-M), which consists of 15 questions derived from the Aesthetic Experience Scale (Sloboda, 1991; Silvia and Nusbaum, 2011). The list of questions contained in the AES-M can be found in Appendix A in the Supplementary materials available online.
The 15 items of the AES-M were subjected to a principal components analysis (PCA) using SPSS Version 17 (PASW Statistics 17). Multidimensional scaling (MDS) was also conducted in SPSS to determine the respective relationship between items in the AES-M and STOMP. Pairwise correlations were then computed between the extracted component scores from the AES-M components, the extracted component scores from the musical genres components, as well as basic demographic information collected from the survey on years of musical training and personality measures (refer to Supplementary Materials for the results of the MDS with STOMP and pairwise correlations with personality measures).
From the survey, 20 participants (mean age = 21.6 years, SD = 3.28 years, range = 18–34; 8 males, 12 females; 18 right-handed persons, 1 left-handed person in each group) were selected for the behavioral and scanning portion of the study. This sample size was chosen as previous DTI studies had shown between-group differences in structural connectivity given a sample size of 20, with 10 in each group (Loui et al., 2009). Ten of the 20 participants—the ‘chill group’—reported, on average, a six or higher on a seven-point scale on all items in the AES-M and had component scores that loaded positively on the first component of the PCA, which will subsequently be referred to as the “chill factor”. An additional five participants were originally selected to be part of the chill group based on their responses to the survey, but were excluded from the analysis when they failed to report experiencing chills during behavioral testing. The other 10 participants—the ‘no-chill group’—reported, on average, a score of two or lower on the AES-M and had factor scores that loaded negatively on the first component, the chill factor, of the PCA. The two groups were matched on years and age of onset of musical training, IQ and personality traits (see Table 1). All participants were healthy with no hearing impairments and no neurological or psychiatric disorders. Informed consent was obtained as approved by the Institutional Review Board of Beth Israel Deaconess Medical Center. IQ scores from all participants were collected using the Shipley verbal and abstract scaled composite scores (Shipley, 1940) and all participants were within the normal range.
|Survey data (n = 237) Mean (Std.)||Chill group (n = 10) mean (Std.)||No-chill group (n = 10) mean (Std.)||P|
|Age||24.95 (10.01)||22.8 (1.26)||20.4 (4.2)||0.103|
|Onset of musical training||7.57 (3.09)||8.37 (3.58)||7.5 (2.27)||>0.250|
|Years of formal musical training||7.76 (5.29)||6.55 (5.02)||6.55 (5.02)||>0.250|
|Abstract||—||17.6 (2.5)||17.1 (1.6)||>0.250|
|Verbal||—||33.3 (6.4)||33.78 (2.6)||>0.250|
|Total||—||120 (5.3)||118.1 (4.8)||>0.250|
|Openness to experience scorea||5.65 (0.99)||5.3 (1.47)||5.15 (1.42)||>0.250|
|Aesthetic experience scoreb||4.16 (1.11)||5.37 (0.64)||2.64 (0.57)||<0.001|
|Frequency of chillsc||4.19 (1.60)||6.1 (0.48)||1.7 (0.82)||<0.001|
|Chill factord||—||1.1 (0.52)||−1.35 (0.52)||<0.001|
aTIPI: A qualitative index of the Big Five personality measures. Each item is scored out of 7 for a max of 7.
bAesthetic Experience Questionnaire: Average of all 15 items on the Aesthetic Experience Questionnaire. Each question asks frequency of experiencing a certain emotion, 1 corresponding to rarely if ever, and 7 corresponding to always.
cAesthetic Experience Questionnaire: One item asking ‘How frequently do you experience chills to music?’ with 1 corresponding to rarely if ever, and 7 corresponding to always.
dComponent scores obtained from principal components analysis on survey results.
Prior to testing, each participant submitted 3–5 pieces of music. For the chill group, these were pieces that reliably induced chills. For the no-chill group, these were pieces they found most pleasurable. Using Audacity 1.2.5, pieces were edited to 2 min, based on the self-reported, most pleasurable moments. See the Supplementary Table S1 for a full list of stimuli used.
Behavioral and psychophysiological testing
During the behavioral paradigm, each participant listened to six excerpts, three favorite pieces and three controls. The control excerpts were familiar, yet did not elicit a pleasurable response and were selected from the list of pieces provided by other participants. While listening to each excerpt, participants rated their emotional responses using a slide rule ranging from 0 to 10 (0 = neutral/no pleasure, 10 = high pleasure). If a chill occurred, participants were instructed to press and hold the space bar on the keyboard for the duration of the chill. Previous literature has shown that the button press alone does not elicit significant physiological responses (Guhn et al., 2007) and thus cannot account for the skin conductance response (SCR) increases.
Psychophysiological data was collected using the Biopac MP150 System for Mac MP150WS (Biopac Systems, Inc.) with a sampling rate of 500 Hz. The cardiac and electrodermal signals were recorded in Acqknowledge Software for Mac ACK100M (Biopac). Before the trials began, baseline physiological data were collected over a 5-min period.
SCR and heart rate (HR) data were extracted and analyzed using in-house software in Matlab. Raw HR data were converted to interbeat intervals (IBI), which are inversely related to heart rate (HR = 60000/IBI). IBI is reported here instead of heart rate because it has been shown to have a linear relationship with autonomic nervous system stimulation and therefore more accurately serves as a marker for arousal (Stern et al., 2001). A mixed design ANOVA was conducted in SPSS in order to compare differences between groups in mean IBI and SCR during peak rating and during chills. See Supplementary Material for additional details.
DTI data acquisition
High-resolution T1 and DTI images were acquired in a 3T GE MRI Scanner. The anatomical images were acquired using a T1-weighted, 3D, magnetization-prepared, rapid-acquisition, gradient echo (MPRAGE) volume acquisition with a voxel resolution of 0.93 × 0.93 × 1.5 mm. The diffusion images were acquired using a diffusion-weighted, single-slot, spin-echo, echo-planar imaging sequence (TEI = 86.9 ms, TR = 10 000 ms, FOV = 240 mm, slice thickness = 2.5 mm resulting in a voxel size of 2.5mm3, no skip, NEX = 1, axial acquisition, 30 noncollinear directions with b-value of 1000 s/mm2, 5 volumes with b-value of 0 s/mm2). This diffusion sequence lasted 10 min, and has been shown to be sufficient in detecting between-group differences in tracts (Loui et al., 2009).
The posterior portion of the STG (pSTG) was extracted from the Harvard-Oxford Cortical atlas (Desikan et al., 2006), and masked with a standardized FA image. The ROI was then warped to each individual’s brain in native space and binarized.
The anterior insula (aIns) was extracted using the LONI atlas (Shattuck et al., 2000). Then, using previous literature as a reference (Uddin and Menon, 2009), the anterior portion was defined anatomically within the lateral sulcus of the brain. The ROI was then inverse-normalized to each participant’s brain and binarized. Both the pSTG and the aIns were then thresholded at 10% of the robust range.
Since anatomical atlases vary on their delineation of frontal lobe regions, the medial prefrontal cortex (mPFC) was drawn by hand on each participant’s native space FA image. The mPFC was drawn on the coronal regions of interest in the anterior portion of the corona radiata (Marchina et al., 2011). The ROIs were drawn by a first coder and verified by a second coder. Both coders were blind to subject and group identity. Volume and center of gravity coordinates did not differ between groups for any of the ROIs. (Refer to Supplementary materials for images and further information on ROIs).
The corticospinal tract was identified as a control pathway unrelated to emotional/reward processing. To identify the corticospinal tract in each subject as a control, three ROIs were drawn bilaterally by hand on each participant’s native space FA image by a blinded coder and verified by a second blinded coder: the left and right pons, the inferior internal capsule and the precentral gyrus (as in Lindenberg et al., 2010). Mean volume and location of the ROIs for the corticospinal tract are given in Supplementary Table S2.
All images were processed using FMRIB’s Software Library (FSL) (Jenkinson et al., 2012). The images were then corrected for eddy current distortions using the eddy correct function in FSL. Non-brain structures, such as the skull, were removed from each participant’s images by the brain extraction tool. A diffusion tensor model was fit at each voxel in the extracted brain using the dtifit function in order to get an fractional anisotropy (FA) image for each participant. Probabilistic tractography was enabled using a Bayesian Estimation of Diffusion Parameters Obtained using Sampling Techniques (bedpostX) to determine the probable directions of each fiber for each brain voxel.
Probabilistic tractography was conducted to determine structural connectivity between pSTG, aIns and mPFC. Tractography was initiated from the seed region of interest in the pSTG to waypoint masks in both the aIns and mPFC. Tractography was also conducted from seed regions of interest in the AIns to the mPFC and STG, and from the mPFC to the AIns and STG. Volume and voxel number were determined for each participant.
For the corticospinal tract, probabilistic tractography was conducted from the seed region of the pons to the target regions of the internal capsule and precentral gyrus (as described in Lindenberg et al., 2010). See Supplemental Materials for more details on DTI data acquisition, definitions of seed and target regions of interest and probabilistic tractography.
Aesthetic experience scale in music
Survey responses showed that people who are open to experience and have more musical training are more likely to report strong emotional responses. These positive correlations between strong emotional responses and openness to experience (r = 0.13, P = 0.047), and between strong emotional responses and number of years of musical training (r = 0.14, P = 0.029), are consistent with previous reports (Nusbaum and Silvia, 2010).
MDS analysis on survey data reveals a differentiation between two groups of intense emotional perceivers (Figure 1): one that experiences visceral emotional responses (e.g. heart skipping a beat, pit in the stomach) and another that experiences abstract, more cognitive emotional responses (e.g. feelings of awe, losing sense of time). The response to ‘chills’ appears at the center of the MDS solution (Figure 1), suggesting that the perception of chills is common to both visceral and abstract emotional responders. To understand the variance of strong emotional responses to music, a PCA performed on the 15-items of the AES-M scale yielded two significant components. The question pertaining to frequency of chills loaded most highly on the first factor, suggesting that the chill response to music was the strongest predictor of individual differences in emotional responses to music.
Since the two groups used for the following comparisons consisted of only 10 participants each, we did not assume a normal distribution or equal variance between groups; instead, nonparametric statistical tests were used for the subsequent analyses. The Mann–Whitney U test was used to compare the means between the chill and no-chill groups. The Wilcoxon signed-rank test was used for all within-subject comparisons, and Spearman rank-order correlations were used for all brain-behavior correlations.
Peak pleasure ratings were significantly higher in response to participants’ favorite pieces as compared to the neutral control pieces (mdn favorite piece = 9.49, mdn neutral piece = 4.27, Wilcoxon signed-rank test, W = 0, Z = 3.92, P < 0.001; Figure 2A). Every participant in the chill group reported experiencing at least one chill during their favorite excerpts. We compared changes relative to a pre-stimulus baseline in inter-beat intervals (IBIs), an inverse measure of heart rate and mean SCR during the reported experience of chills and peak pleasure ratings between the two conditions of favorite pieces versus neutral pieces, for the experimental and control group separately. Chill responders showed a significant decrease in IBI and significant increase in mean SCR during most highly rated moments of favorite pieces. This was verified by extracting a 10-s window of IBI or SCR centered around the peak pleasure rating for each favorite or neutral piece, and comparing these IBI and SCR between favorite and neutral pieces: IBI: mdn favorite = −46.67, mdn neutral = 11.93, W = 0, Z = − 2.80, P = 0.002; SCR: mdn favorite = 0.19, mdn neutral = −0.12 W = 4, Z = 2.40, P = 0.014). A significant difference in the SCR between favorite and neutral pieces was also found for the no-chill group (mdn favorite = 0.19, mdn neutral = 0.03, W = 8, Z = 1.99, P = 0.05), but no such differences were found in the no-chill group with regards to IBI (mdn favorite = −2.67, mdn neutral = −16.13, W = 19, Z = 0.87, P = n.s.). However, IBI was significantly lower in the chill group than the no-chill group when both groups listened to their favorite pieces (mdn chill group = −46.67, mdn no-chill group = −2.67, Mann–Whitney U = 18, Z = −2.42, P = 0.015). Finally, when the chill group reported experiencing a chill, their IBI was significantly lower than the IBI for control group listening to their favorite piece (mdn chill moment = −81.6, mdn favorite, no-chill group = −2.67, Mann–Whitney U = 10, Z = 3.17, P = 0.002. The same comparisons were not significant for SCR (mdn chill moment = 0.20, mdn favorite, no-chill group = 0.19, Mann–Whitney U = 47, Z = 0.23, P = 0.85). These psychophysiological results (Figure 2B and C) confirm that the chill group was indeed experiencing measurable changes in arousal, over and above the no-chill group, during their self-selected favorite pieces of music.
Diffusion tensor imaging
Tracts were traced from atlas-defined seed regions in the pSTG, towards the targets of the aIns and mPFC (Desikan et al., 2006). Chill responders showed higher volume in these tracts in both hemispheres. This was confirmed using a Mann–Whitney U test showing a significant difference between tract volumes of the chill group and the no-chill group in both hemispheres (left hemisphere: mdn chill group = 2334 voxels, mdn no-chill group = 1286.5 voxels, Mann–Whitney U = 17, Z = 2.50, P = 0.011; right hemisphere: mdn chill group = 1141.5, mdn no-chill group = 541.5, Mann–Whitney U = 11, Z = 2.95, P = 0.002, surviving Bonferroni correction for multiple comparisons across three tracts in two hemispheres, Figure 3A). These tracts of white matter were identified as parts of the uncinate fasciculus and the arcuate fasciculus or superior longitudinal fasciculus, the latter of which is known to be larger in the left hemisphere (Vernooij et al., 2007), as is replicated here (Figure 3B). Tracts from the AIns to the mPFC and STG, and from the mPFC to AIns and STG did not show any significant differences between the groups.
Individual differences in tract volume
To assess the relationship between white matter connectivity and degree of emotional response to music, we tested for nonparametric correlations between tract volume, factor scores from the first factor of the PCA (loading most highly on the reported frequency of chills), and psychophysiological measures during peak behavioral ratings of pleasure and chills. Because the sample size is relatively low, i.e. only 20 total, these correlations are considered preliminary and are interpreted with caution.
A significant positive correlation was found between tract volume in the right hemisphere and individual subjects’ chill factor score (Spearman rank-order correlation rs = .60, P = 0.006, 95% CI [0.21, 0.82], surviving Bonferroni correction for the two hemispheres), showing that the more frequently a person experiences chills in response to music, the higher white matter connectivity they have between pSTG, aIns and mPFC. Furthermore, a significant negative correlation was also found between the mean IBI during peak subjective pleasure rating and tract volume in the left hemisphere (rs = −0.59, P = 0.009, 95% CI [−0.81, −0.18]). Thus, the volume of the three-node network including the pSTG, aIns and mPFC is associated with the degree of physiological arousal that the participants experience while listening to music.
No significant difference between the chill and no-chill groups was found in volume or FA of the corticospinal tract, suggesting that significant differences found in white matter connectivity between the pSTG, aIns and mPFC were not generalizable to other tracts.
Emotional reactions to aesthetic stimuli are intriguing experiences to humans as they are profoundly pleasurable and rewarding, yet highly individualized. Finding the behavioral and neural differences between individuals who do and do not experience such reactions may help gain a better understanding of the reward circuitry and the evolutionary significance of aesthetics for humans. Given that the insula and mPFC have been previously shown to be involved in emotional response and reward, we predicted that people who frequently experience intense emotions to music would have increased structural connectivity between these regions and auditory processing regions in the superior temporal lobe.
Survey results confirmed that substantial individual differences exist in the tendency to experience strong emotional responses to music, and that these individual differences are dependent on behavioral and personality factors. Real-time ratings of experienced pleasure and psychophysiological measures recorded during music listening showed quantifiable differences between individuals who report experiencing chills and individuals who do not. Results from diffusion tensor imaging show that white matter connectivity between auditory perceptual regions (pSTG) and regions of the brain important for emotional and social processing (aIns, mPFC) reflect individual differences in the tendency to experience chills from music. The chills group showed higher volume in tracts between seed regions in the pSTG and target regions in the aIns and mPFC, especially on the right side, which survived correction for the multiple comparisons of three different tracts tested on both hemispheres. Effects are not attributable to gender, ethnicity, IQ and language differences, years of musical training or personality, as the two groups are matched for these variables (Table 1).
Furthermore the volume of white matter connectivity was significantly correlated with a participant’s tendency to experience chills: the more frequently a person reports experiencing chills, the larger the volume of white matter connectivity among these three regions of the brain. The tract volume was also negatively correlated with mean IBI during peak pleasure rating. Although these correlations are based on small sample sizes and are only preliminary in this study, they suggest that participants with higher structural connectivity among these three regions of interest tend to be more physiologically aroused by their favorite pieces of music.
The observed differences in tract volume may arise from increased branching, differences in myelination, or higher structural integrity of white matter pathways that overlap with multiple fiber bundles in the brain, including the arcuate fasciculus and the uncinate fasciculus. These white matter bundles have implications for individual differences in behavior (Johansen-Berg, 2010); for instance, higher white matter connectivity was observed in people with high emotional empathy (Parkinson and Wheatley, 2014), whereas lower white matter connectivity was observed in people with social-emotional impairments such as autism (Barnea-Goraly et al., 2004), mood disorders (Johnstone and Reekum, 2007) and schizophrenia (Park et al., 2004). Thus, the present findings may suggest that people who have difficulty in experiencing strong emotional responses to aesthetic stimuli, such as people with musical anhedonia (Mas-Herrero et al., 2014), may also be susceptible to other insensitivities or even impairments in emotional and social functioning.
Although it remains to be seen whether the tendency to perceive strong emotional responses to music may be generalizable towards other aesthetic stimuli (such as visual art, dance, poetry or architecture), the present paradigm of comparing individual differences in aesthetic response through music may provide a window into the interface between the emotion and communication systems in the brain. The current findings also converge with prior reports that people who are emotionally empathic have higher white matter integrity in the temporal and frontal lobe regions also traversed by the arcuate and uncinate fasciculi (Parkinson and Wheatley, 2014).
Together, the present results may inform scientific as well as philosophical theories on the evolutionary origins of human aesthetics, specifically of music: perhaps one of the reasons why music is a cross-culturally indispensable artifact is that it appeals directly through an auditory channel to emotional and social processing centers of the human brain.
Source: Oxford Academic