Words are not enough: how preschoolers’ integration of perspective and emotion informs their referential understanding*

Abstract When linguistic information alone does not clarify a speaker's intended meaning, skilled communicators can draw on a variety of cues to infer communicative intent. In this paper, we review research examining the developmental emergence of preschoolers’ sensitivity to a communicative partner's perspective. We focus particularly on preschoolers’ tendency to use cues both within the communicative context (i.e. a speaker's visual access to information) and within the speech signal itself (i.e. emotional prosody) to make on-line inferences about communicative intent. Our review demonstrates that preschoolers’ ability to use visual and emotional cues of perspective to guide language interpretation is not uniform across tasks, is sometimes related to theory of mind and executive function skills, and, at certain points of development, is only revealed by implicit measures of language processing.

When linguistic information alone does not clarify a speaker's intended meaning, skilled communicators can draw on a variety of cues to infer communicative intent. In this paper, we review research examining the developmental emergence of preschoolers' sensitivity to a communicative partner's perspective. We focus particularly on preschoolers' tendency to use cues both within the communicative context (i.e. a speaker's visual access to information) and within the speech signal itself (i.e. emotional prosody) to make on-line inferences about communicative intent. Our review demonstrates that preschoolers' ability to use visual and emotional cues of perspective to guide language interpretation is not uniform across tasks, is sometimes related to theory of mind and executive function skills, and, at certain points of development, is only revealed by implicit measures of language processing.

I N T R O D U C T I O N
"When I use a word," Humpty Dumpty said, in rather a scornful tone, "it means just what I choose it to mean-neither more nor less." "The question is," said Alice, "whether you can make words mean so many different things." "The question is," said Humpty Dumpty, "which is to be master-that's all." (Lewis Carroll, Through the Looking Glass) As so cleverly illustrated by this exchange between Humpty Dumpty and Alice, inferring a speaker's intended meaning cannot always be accomplished through words alone. Consider, for example, the following situation: a child looks at her bookshelf and says to her parent "Can you get the book?" Given that there are multiple possible referents (i.e. books) available, how does the parent infer the child's intended meaning? In the face of this indeterminacy, listeners can use a variety of cues to infer the child's intended meaning. For example, the parent may consider whether the child has a favourite book she always wants to read; whether there is a particular book the parent, but not the child, can reach; whether there is a book that is not visible to the child and thus can be excluded from consideration; or whether the child sounds happy because a brand new book is on the shelf. As demonstrated by this example, skilled listeners can draw upon information about a speaker's perspectives to gauge that speaker's communicative intent. This ability to use information about a speaker's perspective to make inferences about that speaker's intended meaning is known as communicative perspective taking.
Communicative situations like the one described in the example above are likely frequently encountered in everyday interactions. Thus, core questions arise around children's abilities to attend to and integrate other's perspectives during communicative interactions and whether these perspectives can be integrated rapidly enough to guide language processing in the moment. In this paper, we review research examining the developmental emergence of preschoolers' sensitivity to a communicative partner's perspective. We focus particularly on preschoolers' tendency to use cues both within the communicative context (i.e. a speaker's visual access to information) and within the speech signal itself (i.e. emotional prosody) to make on-line inferences about communicative intent. First, we review research examining the emergence of communicative perspective taking during the first two years of development, with particular focus on children's attention towards others' visual perspectives. Next, we introduce the visual world paradigm as a means of examining HOW cues of perspective become integrated with on-line spoken language processing. We then review research examining children's sensitivity to a speaker's visual perspective and emotional prosody in referential communication, addressing current issues in these research areas. We conclude with empirical challenges and future directions.

T H E E M E R G E N C E O F V I S U A L P E R S P E C T I V E -T A K I N G A N D C O M M U N I C A T I V E A B I L I T I E S
Visual perspective taking involves tracking what another person can see in order to form inferences about their knowledge and intentional actions (Moll & Meltzoff, a). For example, knowing that a person cannot see a toy that is hidden by a barrier may lead one to infer that she is unaware of the toy's presence. Around the same that time that infants begin to engage in verbal communicative interactions, they also begin to track and reason about the perspectives of others. That is, studies using looking-time measures have found evidence of perspective taking emerging just after infants reach their first birthdays (Caron, Kiel, Dayton & Butler, ; Dunphy-Lelii & Wellman, ; Luo & Baillargeon, ). For example, -month-olds will selectively follow the gaze of another person whose visual access to items is not occluded by either a physical barrier (Caron et al., ; Dunphy-Lelii & Wellman, ) or a blindfold (Brooks Meltzoff, ). Similarly, ·-month-old infants will track an agent's visual access to a desired item and use the information to interpret the agent's subsequent actions (Luo & Baillargeon, ). When assessed explicitly via verbal or behavioural selection responses, visual perspectivetaking abilities become evident around two years of age (Moll & Meltzoff, b). For example, -month-olds, but not -month-olds, will correctly respond to an adult who is searching for a toy ("Where is it? I cannot find it") by selecting an item hidden from the adult (Moll & Tomasello, ).
Given the early development of visual perspective taking, when do children first begin to consider the visual perspectives of others in communicative interactions? The first studies to examine this question suggested that before children reach school-age, they are largely egocentric in their referential communication and fail to integrate feedback from their communicative partner (e.g. Glucksberg & Krauss, ; Krauss & Glucksberg, ). However, advancements in both methods and technology have led to more sensitive means of assessing children's visual perspective taking. We now know that the ability to integrate perspectivetaking and communication abilities emerges during infancy and shows marked improvement throughout the preschool years.
Between  and  months of age, infants begin to differentially adapt their pointing gestures to communicate object location to both knowledgeable and unknowledgeable agents (Liskowski, Carpenter & Tomasello, ). During this same period, infants will also vary their interpretation of communicative behaviours (e.g. eye-gaze and emotional reactions towards an object) depending on the visual perspective of their communicative partner (Moll & Tomasello, ; Moses, Baldwin, Rosicky & Tidball, ). By the end of their second year, infants begin to use the perspectives of others to disambiguate spoken language. Specifically, in word learning studies, researchers have shown that infants as young as  months will attend to where a speaker is looking to correctly infer the referent of a novel label (e.g. Baldwin, , ; Tomasello, Strosberg & Akhtar, ). By two years of age, children will monitor what a person has or has not seen and will adapt their verbal requests for items to match the knowledge state of their listener (Nayer & Graham, ; O'Neill, ). Overall, these findings suggest that as soon as infants begin to reason about the visual perspectives of others, they begin to also use this information to inform their interpretation and production of both nonverbal and verbal communicative behaviours.
In summary, the ability to integrate visual perspective taking in receptive and productive communication begins to emerge during the second year of life. In the next section, we shift our focus to research that has begun to examine HOW children develop the ability to integrate perspective-taking abilities with on-line language processing. We begin with a brief overview of the visual world paradigm as used in referential communication experiments.
The visual world paradigm is the basic method used to study spoken language comprehension in real time, drawing upon the systematic relation between eye-movements and language processing (Allopenna, Magnuson & Tanenhaus, ; Sedivy, Tanenhaus, Chambers & Carlson, ; Tanenhaus, Spivey-Knowlton, Eberhard & Sedivy, ). In this paradigm, researchers track participants' eye-movements as they respond to spoken instructions in the context of a visual display (see Huettig, Rommers & Meyer, ; Snedeker & Huang, , for recent reviews of the paradigm). Using this paradigm, research has demonstrated that spoken language is processed incrementallythat is, both child and adult listeners interpret words and sentences as they unfold over time, rather than waiting to hear an entire sentence before making inferences about a speaker's intended meaning (e.g. Allopenna et al., ; Swingley, Pinto & Fernald, ; Tanenhaus et al., ; Trueswell, Sekerina, Hill & Logrip, ). Furthermore, this incremental interpretation occurs in real time, with listeners launching eye-movements to intended referents within the first few hundred milliseconds of hearing a target word (e.g. Tanenhaus et al., ; Trueswell et al., ).
Research using the visual world paradigm led to fundamental insights into the interactive nature of the language processing systemthat is, adult and child listeners integrate linguistic, paralinguistic, and non-linguistic  information in real time to guide their interpretations of utterances (e.g. Chambers, Tanenhaus & Magnuson, ; Collins, Graham & Chambers, ; Graham, Sedivy & Khu, ; Sedivy, ; Snedeker & Truewell, ; Trueswell et al., ). To illustrate, a seminal study by Chambers, Tanenhaus, Eberhard, Filip, and Carlson () examined how adult listeners coordinate linguistic and non-linguistic information when listening to referential statements. In this study, adult participants were instructed to manipulate physical objects (e.g. "Put the cube inside the can"), in the context of displays where there were two possible candidate referents (e.g. a large can and a small can). The size of the theme object (e.g. the can) was varied across conditions such that it could either fit in both containers or only in one container. From the earliest moments of processing, listeners' visual attention was restricted to only those containers large enough to accommodate the object, indicating that they were rapidly integrating contextual information and knowledge of the possible actions with the unfolding utterance.
The visual world paradigm has also been used to examine the timing and integration of visual perspective taking during on-line referential communication (e.g. Brown-Schmidt & Heller, ; Hanna, Tanenhaus & Trueswell, ). In this variation, a discrepancy of perspective is established between a listener and a speaker by varying the physical copresence of objects available for reference on a visual display. For example, a listener may hear an instruction to manipulate a target referent (e.g. "Pick up the duck") on a display where only one of two candidate referents is mutually available to both themselves and the speaker (e.g. one of two ducks is occluded from the speaker's view). If listeners use the perspective of their speaker to constrain their interpretations of reference, then they should ignore items on the display that their speaker cannot seei.e. PRIVILEGED GROUND INFORMATIONin favour of items that are mutually visible to both themselves and the speakeri.e. COMMON GROUND INFORMATION. The type of information a listener considers (i.e. privileged ground vs. common ground) during referential interpretation can be measured via their eye-gaze towards display items as a critical sentence unfolds on-line. In this way, the visual world paradigm offers a valuable means of assessing both the types of perspective cues that listeners consider as they interpret reference on-line as well as the timing with which perspective information becomes integrated with linguistic input. In the next section, we review developmental research that has used the visual world paradigm to examine visual perspective taking during the preschool years, with particular focus on research examining how preschoolers interactively coordinate visual perspective information with the linguistic properties of unfolding referential statements.

V I S U A L P E R S P E C T I V E T A K I N G D U R I N G O N -L I N E C O M M U N I C A T I O N
To date, the majority of experimental studies that have examined children's perspective taking using a visual world paradigm have focused on one type of perspective reasoningnamely, reasoning about information that is visually shared or not shared between themselves and a speaker (e.g. Nadig & Sedivy, ; Epley, Morewedge & Keysar, ). This research has yielded valuable insights into two keys issues: (i) when preschoolers begin to use visual perspective taking to guide their on-line comprehension and production of referential utterances; and (ii) the timecourse and efficiency with which preschoolers recruit perspective information during on-line language processing.
Preschoolers' use of visual perspective taking to guide referential communication During the preschool years, children undergo significant improvements in their ability to integrate visual perspective with both the comprehension and production of referential statements (e.g. Matthews, Lieven, Theakston & Tomasello, ; Nadig & Sedivy, ). In a series of studies in our lab, we have assessed preschoolers' sensitivity to others' visual perspectives in both productive and receptive language, examining the emergence of these abilities during the preschool years.
In one of our first studies (Nilsen & Graham, ), we examined three-to five-year-olds' integration of visual perspective taking in a comprehension task, where children had to follow a speaker's instructions to retrieve objects on a display. We also examined four-to five-year-olds' ability to use a listener's visual perspective in a production task, where children had to instruct an experimenter to retrieve objects on a display. In both tasks, we examined whether children's explicit responses varied with the visual perspective of their communicative partner. On the comprehension task, we also examined whether children's implicit eye-gaze towards display items would be constrained by visual perspective cues. Results of the comprehension task indicated that three-to five-year-olds accurately tracked what a speaker could see in order to correctly interpret the referent of an ambiguous utterance. That is, when interpreting an ambiguous instruction (e.g. "Pick up the duck" in a display with two ducks), children were more likely to constrain their visual attention towards items that were mutually visible than to items that were exclusively visible to themselves. Across both experiments, children were also more likely to select items that were visible to themselves and the speaker.
The results of the production task showed that four-to five-year-olds considered their listener's visual perspective, when forming their own instructions. That is, children used more adjectives to request a target referent (e.g. "Pick up the big duck") when their communicative partner had visual access to two competing referents rather than one unambiguous referent. Other research has shown that children as young as three years of age will similarly adapt their productions to fit a listener's perspective (Matthews et al., ), but not in contexts where the child and listener's perspective are simultaneously competing for the child's attention or where cues of visual perspective change on a trial by trial basis. Our findings, using a visual world paradigm, therefore demonstrate that preschoolers are able to selectively and flexibly track the visual perspective of their listener in order to adapt their production of referential utterances (see also Nadig & Sedivy, ).
Thus, around three to four years of age, children can use a speaker's perspective to guide their referential interpretations and begin to adapt the clarity of their own messages to match the visual perspective of their listener. In the next set of studies, we asked whether preschoolers can take this understanding one step further and use visual perspective information to evaluate the clarity of an utterance from the perspective of another person (Nilsen & Graham, ; Nilsen, Graham, Smith & Chambers, ). Message evaluation is a critical component of referential comprehension, as detection of sentence ambiguity could highlight to the listener the need to rely on non-linguistic cues of reference such as visual perspective. In this third-party paradigm, a sticker is hidden in a location and children either share the speaker's perspective (i.e. see where the sticker was placed), or share the message recipient's perspective (i.e. do not see the sticker's location). The message recipient is provided with a statement about the sticker location that is either ambiguous (e.g. "it's under the rubber duck" in the presence of two rubber ducks) or unambiguous (e.g. "it's under the big duck" in the presence of a big and a small rubber duck). After hearing the statement, children are asked to evaluate the message recipient's knowledge of the sticker location and the quality of message (e.g. "Was that a good clue or a tricky clue"; see also Robinson & Robinson, ; Sodian, ). Thus, in this paradigm, children must ignore their own perspective in order to interpret the quality of a message from the perspective of another person.
Using this third-party paradigm, we conducted a longitudinal study to examine children's implicit and explicit message evaluation between the ages of four and five years (Nilsen & Graham, ). Our results demonstrated that, at four years of age, children only demonstrated implicit sensitivity to message ambiguity. That is, even when children were aware of the sticker's location, they gazed equally towards both locations when hearing an instruction that was exclusively ambiguous to the other person. By · years of age, children began to show evidence of explicit message evaluation: first recognizing when a message was sufficiently clear for the message recipient to interpret reference, and then later, at five years of age, recognizing when a message was too ambiguous for an agent to infer reference. Implicit sensitivity to message ambiguity at four years of age, however, was not predictive of later developing explicit message evaluation.
In summary, by using variations of the a visual world paradigm, we have found that preschool children integrate visual perspective taking to constrain both implicit and explicit comprehension of referential statements by as early as three years of age. The ability to flexibly use visual perspective taking to inform the explicit evaluation and production of referential statements also begins to emerge between four and five years of age. In the case of message evaluation, however, implicit awareness of message ambiguity may be evident before children are able to explicitly judge the quality of a spoken utterance.
Visual world paradigms, however, are not only useful for developmental trajectories. They also provide a unique means of assessing the timecourse of communicative perspective taking. In the following section we review studies that have begun to examine how rapidly and efficiently children integrate visual perspective taking with on-line language processing.

Timing of preschoolers' recruitment of visual perspective information
Because spoken language is processed incrementally as it unfolds in real time, perspective inferences must be rapidly generated so that this information is coordinated with other cues of reference. The question of when, during sentence processing, perspective cues become integrated with linguistic input has been the subject of a lively debate in the adult literature, with proponents advocating for both early and late integration accounts. Early integration accounts propose that individuals are inherently motivated to track their communicative partner's perspective, and thus perspective constraints are considered from the earliest moments of sentence processing (Brown-Schmidt & Heller, ; Heller, Parisien & Stevenson, ). According to these accounts, the ability to use perspective information to constrain the interpretation of a spoken utterance depends on the strength of these cues relative to other sources of information (e.g. ambiguity of linguistic input, number of competing referents on a display, etc.). If perspective cues are strongly represented, then evidence of perspectivetaking integration should be seen as a sentence is unfolding. Conversely, late integration accounts propose that perspective constraints may not always be available to influence the earliest moments of sentence processing (Apperly, Carroll, Samson, Humphreys, Qureshi & Moffitt, ; Keysar, ). According to these accounts, individuals do not always track perspective cues automatically, and the cognitive demands associated with generating perspective inferences would make it inefficient for the language processing system to coordinate these cues with other sources of information during online sentence processing. As a result, late integration accounts predict that perspective cues are often not considered until after a spoken utterance has been heard and linguistic input has been processed.
To date, only a few studies have examined the timing of children's perspective taking during on-line sentence processing (Epley et al., ; Nadig & Sedivy, ). In one of the first studies to address this question, Nadig and Sedivy () examined fiveand six-year-olds' ability to interpret referential instructions (e.g. "Pick up the duck") using displays that contained four items, two of which were referential matches for the critical noun (i.e. two similar ducks). Children were significantly faster at identifying the referent on trials where the speaker had visual access to only one of the two candidate referents (i.e. privileged ground trials) vs. trials where the speaker could see both candidate referents (i.e. common ground trials). Eye-gaze data further demonstrated that, on privileged ground trials, children began to constrain their attention towards the target while the instruction was still being heard (approximately - ms after the onset of the noun). These findings demonstrate that children integrated perspective cues early to constrain their interpretation of a referential statement, as it was still unfolding. A recent study in our lab yielded similar results with younger children (Khu, Chambers & Graham, unpublished observations). That is, we found that four-year-olds selectively used common ground information to guide their interpretation of referential statements within the earliest moments of processing (i.e. as soon as the critical noun began to unfold).
In contrast, Epley and colleagues () found sensitivity to another's visual perspective emerged much later in sentence processing. In this study, the referential comprehension of both children, ranging in age from four to twelve years, and adults was examined using a similar but more challenging procedure than Nadig and Sedivy (). Participants followed instructions that contained size or spatial ambiguity (e.g. "Move the small truck" in a display with multiple trucks) on displays that contained nine items, rather than four items. A set of three display items matched the critical noun (e.g. three trucks of ascending sizes), but the strongest referential candidate was always occluded from the view of the speaker (e.g. the smallest truck was hidden behind a screen). The target referent was thus the best referential candidate that could be seen by both the listener and the speaker (i.e. the medium-sized truck). Eye-movement patterns indicated that both children and adults considered privileged ground items first before shifting their focus to target items in common ground. While adults were able recover their attention towards the target early enough (an average of  ms following the offset of the instruction) to produce an accurate reaching response, children's recovery was significantly slower (an average of  ms after the offset of the instruction) and often led to inaccurate reaching responses. These results suggest that both children and adults demonstrated late integration of perspective information, although adults were better able to use this information to correct, if not constrain, their interpretation of a referential statement.
Overall, these findings suggest that perspective information is available to listeners early in speech processing; however, when this information is integrated with linguistic information may depend on the complexity of the communicative task (see San Juan, Khu & Graham,  for a discussion). That is, children are less likely to show early integration of perspective information when there is more contextual information to consider (i.e. more display items) and there is greater competition between different sources of information. Thus, in the Epley et al. () task, children's representation of the speaker's perspective may have been outweighed by the relative strength of other competing cues of reference (e. g. the fact that the item in privileged ground was a stronger referential match to the linguistic input). Alternatively, children in this task may have had more difficulty generating inferences about their speaker's perspective because there was more direct conflict between their own and their communicative partner's perspective (Moll, Meltzoff, Merzsch & Tomasello, ). If children were less efficient at generating perspective inferences in this type of context, then they would not have been able to consider this information until well after the linguistic input had been processed.

Summary of visual perspective taking and communication
Application of the visual world paradigm has provided developmental researchers with a more sensitive means of assessing the implicit and explicit integration of visual perspective taking during spoken language processing. This has led to a more detailed understanding of when communicative abilities emerge during the preschool years. These methods have also expanded the opportunity to examine the timing and efficiency with which children integrate visual perspective taking during communication. However, further research in this area is necessary for understanding the underlying mechanisms of communicative perspective taking. That is, more studies are needed to clarify the contextual and cognitive factors that influence children's integration of visual perspective taking during the earliest moments of sentence processing.  need to monitor the emotional perspective of a communicative partner could be more socially relevant, with lack of attention to emotion potentially more socially consequential in a communicative interaction than the need to consider a partner's visual perspective. One means through which speakers may signal their emotional state or disposition is through the emotional prosody that accompanies an utterance. Emotional prosody refers to paralinguistic information that signals a speaker's emotional state or disposition, as indexed by variations in pitch contours, speech rate, intensity, and pitch level (Banse & Scherer, ; Frick, ). Emotional prosody is often consistent with an utterance's linguistic content (i.e. a speaker's sadness is communicated both through her words and her emotional prosody for the statement "I'm having a bad day" spoken in a sad tone of voice). When linguistic information alone does not fully disambiguate meaning, however, emotional prosody alone can provide clarification. For example, statements like "School starts tomorrow" or "I got the reviews on my manuscript", can convey markedly different meanings if spoken with a happy-sounding voice versus a sad-sounding voice.
In this next section, we consider preschoolers' sensitivity to a speaker's emotional perspective, as signalled by their emotional prosody, to guide inferences about communicative intent. Specifically, we review research documenting: (i) the emergence of preschoolers' sensitivity to emotional prosody to resolve communicative ambiguity; (ii) valence and timing differences in preschoolers' sensitivity to emotional prosody; and (iii) sensitivity to emotional prosody and communicative perspective taking.
Developmental emergence of sensitivity to emotional prosody in communication In the first year of life, infants display sensitivity to emotional prosody. Infants as young as  month of age show preferences for infant-directed speech, which has distinct prosodic modifications that typically convey positive affect, over adult-directed speech (Cooper & Aslin, ; Fernald, ; Singh, Morgan & Best, ). During this first year, infants begin to discriminate the different intonational patterns used by mothers to convey distinct communicative intent types (i.e. comforting or soothing, affection or approval, and directive affect; e.g. Fernald, , , ; Kitamura & Burnham, ; Kitamura & Lam, ). Furthermore, infants will respond in an appropriate manner to different types of emotional prosody. For example, -month-old infants smile more when hearing approval vocalizations produced in infant-directed speech than when hearing prohibition vocalizations, even if these vocalizations are produced in an unfamiliar language (Fernald, ). Thus, even before the onset of productive language, infants detect and respond to emotional prosody.
As children acquire language, they must learn to integrate emotional prosody with linguistic information. From a research standpoint, this issue has been investigated from two directions: first, examining children's relative attention to information conveyed by linguistic content versus that conveyed by emotional prosody when the two information sources are in conflict (e.g. Children's sensitivity to conflicting linguistic and emotional prosody cues. Research examining children's resolution of conflicting linguistic and emotional prosody cues indicates that children's sensitivity to emotional prosody shifts during infancy and the preschool and school-age years (Friend, ; Friend & Bryant, ). At the early stages of language development, -month-olds rely on emotional prosody to guide their behaviour, when emotional prosody and lexical content provide incongruent messages (Friend, ). As children reach preschool age, they are more likely to rely on the linguistic content of an utterance over emotional prosody, when the two sources of information conflict. For example, Morton and Trehub () presented four-to ten-year-olds with sentences that described either happy or sad events (e.g. "I got an ice cream for being good", for a happy event), spoken with both positive (happy-sounding) and negative (sad-sounding) emotional prosody. When presented with conflicting information (i.e. a sentence describing a sad event paired with happy emotional prosody), four-to eight-year-olds relied almost exclusively on the content of the sentences to judge the emotional state of the speaker. By nine years of age, children began to decrease their reliance on the linguistic content of the utterances and, like adults, used the speaker's emotional prosody to gauge the speaker's emotional state. In a subsequent study, Morton and Munakata () demonstrated that preschoolers' adherence to linguistic content over emotional prosody persists even when they are explicitly instructed to attend to emotional prosody.
The findings described above suggest that four-to eight-year-olds prioritize lexical content over emotional prosody in these conflict paradigms. More implicit measures, however, suggested that children are not fully disregarding the information signalled by the emotional prosody.
Specifically, in the Morton and Trehub () experiments, children showed longer response latencies on conflict trials, indicating that they recognized the incongruity between the two sources of information. Thus, preschoolers' tendency to privilege linguistic information over emotional prosody in these tasks likely reflects difficulty resolving conflicting sources of information, rather than a failure to recognize the meaning of emotional prosody (Morton et al., ; Waxer & Morton, ).
Children's use of emotional prosody to resolve linguistic indeterminacy. In a series of studies in our lab, we have approached the question of preschoolers' integration of linguistic content and emotional prosody from a different direction. Rather than using conflict tasks, we asked whether preschoolers might show greater sensitivity to emotional prosody in tasks that are less cognitively demandingnamely, when the linguistic information is indeterminate, rather than in conflict with emotional prosody. We also reasoned that employing a visual world paradigm and measuring both explicit behavioural responses (i.e. pointing) and eyemovements would allow us to gain insight into both preschoolers' realtime processing of, and more conscious and controlled responses to, emotional prosody.
In the first study to address this question, we presented three-and fouryear-olds with formally ambiguous referential descriptions (i.e. "Look at the ball", in the presence of more than one ball) and examined whether they would use emotional prosody to identify the speaker's intended referent (Berman et al., ). On each trial, preschoolers saw arrays that contained three photographed objects: two objects of the same category that varied in their physical state (e.g. an intact ball and a deflated ball) and an unrelated object (e.g. a star). Children were instructed to find one of the two objects belonging to the same category using an ambiguous phrase (e.g. "Look at the ball"), spoken using one of three different types of emotional prosody (happy-sounding, sad-sounding, or neutral). Fouryear-olds' eye-gaze patterns, but not their pointing responses, demonstrated appropriate sensitivities to emotional prosody. As the ambiguous noun unfolded, children fixated the broken object most often when hearing sad-sounding emotional prosody, less when hearing neutral prosody, and much less when hearing happy-sounding prosody. This effect emerged only during the noun region: during the early part of the utterance (i.e. "Look at the") there was no influence of emotional prosody on eye-gaze behaviour. Neither three-year-olds' eye-gaze patterns nor their pointing behaviour reflected any sensitivity to emotional prosody.
Results from this study suggest that there is a developmental progression in the use of emotional prosody for language comprehension between three and four years of age. Four-year-olds, however, appear to be in a transitional period in their ability to integrate emotional prosody with linguistic information, as their sensitivity to emotional prosody was not reflected in their explicit behavioural decisions. In a subsequent study, using the same paradigm, we found that five-year-olds evidenced use of emotional prosody to guide referential understanding, when assessed with both eye-gaze measures and pointing measures (Berman, Graham, & Chambers, , Experiment ). We further documented this developmental transition between four and five years in another set of studies examining preschoolers' use of emotional prosody to learn new words (Berman, Graham, Callaway, & Chambers, ). In these experiments, we presented four-and five-year-olds with two novel objects, first in their original state and second in an altered state (broken or enhanced). Children heard an instruction to find the referent of a novel word, produced with sad-sounding, neutral, or happy-sounding emotional prosody. Both four-and five-year-olds' gaze patterns indicated that they linked the novel word with the object that best matched the speaker's emotional prosody (e.g. the broken object when the instruction was produced with sad-sounding affect, the enhanced object when the instruction was produced with happy-sounding prosody). Only five-yearolds, however, demonstrated their use of emotional prosody in their explicit referential decisions.
Taken together, these findings indicate that, between four and five years of age, preschoolers move from an implicit understanding of emotional prosody in referential communication tasks to a more explicit use of this cue. Threeyear-olds, however, did not appear to show any evidence of integrating emotional prosody with linguistic information. What might account for the three-year-olds' apparent lack of success in such tasks? We addressed this question in a recent study, examining specifically whether three-yearolds' difficulties in our earlier study stemmed from an inability to identify the acoustic cues corresponding to different types of emotional prosody (Berman, Chambers & Graham, ). Here, we presented three-and fiveyear-olds with utterances produced with happy-sounding, neutral, or sadsounding emotional prosody in the presence of faces depicting happy, neutral, or sad facial expressions. Children were instructed to point to the face that reflected how the speaker was feeling when she made a specific utterance. Only five-year-olds pointed to the face that matched the utterance's emotional prosody, providing further evidence of the developmental changes in sensitivity to emotional prosody between three and five years. In contrast, both three-year-olds' and five-year-olds' gaze patterns demonstrated that they could link happy-sounding and sadsounding emotional prosody to the appropriate emotional face. Matching neutral emotional prosody to neutral faces proved difficult for children of both ages. These results suggest that three-year-olds can recognize happysounding and sad-sounding emotional prosody and link it to the appropriate facial expression. Thus, the difficulties demonstrated by threeyear-olds in the Berman et al. () study are likely isolated to the process of linking vocal affect with the intent to refer to objects.
In summary, children's success at integrating emotional prosody with linguistic information varies during the preschool years as a function of communicative task. That is, young children are better able to use emotional prosody to infer communicative intent when linguistic information is indeterminate (Berman et al., ; Berman, Graham, & Chambers, ) versus when linguistic information is in conflict with emotional prosody (e.g. Morton & Trehub, ). Furthermore, there is significant progression in children's abilities to coordinate emotional and linguistic information between three and five years of age, with three-yearolds showing implicit sensitivity to emotional prosody only under some conditions and five-year-olds demonstrating more robust integration of these two sources of information.
Valence effects. In addition to developmental differences, our studies on preschoolers' integration of emotional prosody with lexical content have documented valence differences in children's sensitivity to emotional prosody, both in terms of the types of representation created and in the timecourse of processing prosody in the unfolding speech stream. First, although five-year-olds will use both positive and negative emotional prosody to map a novel word to a novel object, children were only successful at extending and generalizing these newly mapped words when the words were learned using negative vocal affect (Berman, Graham, Callaway, & Chambers, ). This finding suggests that negative emotional prosody (versus positive emotional prosody) enabled children to establish a more robust representation of the word in this task.
Second, our studies have demonstrated comparatively greater sensitivity to negative-sounding emotional prosody versus positive-sounding emotional prosody in the earliest moments of speech processing. Specifically, when five-year-olds were presented with unambiguous referential contexts, they used negative emotional prosody early in the utterance to anticipate a particular referential outcome (Berman, Graham, & Chambers, ). That is, when presented with an unambiguous referential description produced with negative emotional prosody (e.g. "Look at the ball", in the presence of a ball, a duck, and a cellphone; Experiments  & ), children began to anticipate reference to the one broken object in the scene well before the disambiguating noun. The effect of positive emotional prosody, in contrast, was not observed until after the onset of the noun. Similarly, the gaze patterns of both three-and five-year-olds showed that sadsounding emotional prosody led children to identify a sad face in the first  ms of an unfolding utterance (Berman et al., ). In contrast, children did not identify a happy face on the basis of happy-sounding speech until approximately  ms into the utterance.
Our findings are consistent with other research demonstrating an advantage for negative emotional prosody versus positive emotional prosody, in terms of both accuracy and timing of emotion recognition. For example, three-to five-year-olds more accurately identify sadness, on the basis of paralinguistic cues, versus happiness, anger, or fear (Nelson & Russell, ). Similarly, adults are more successful at using emotional prosody to identify sadness versus happiness in a speaker's voice, even if utterances are produced in a foreign language (Paulmann & Pell, ; Pell, Monetta, Paulmann & Kotz, ; Pell, Paulmann, Dara, Alasseri & Kotz, ; Scherer, Banse & Wallbott, ). Furthermore, adults, like the preschoolers in our studies, are significantly quicker to identify sad vocal emotion versus positive vocal emotion. In one study, for example, it took adults approximately  ms longer to recognize happiness compared to sadness, on the basis of emotional paralanguage (Pell & Kotz, ). Finally, the timing advantage for negative-sounding emotional prosody observed in our studies and those of others is generally consistent with the proposal that both adults and children are biased towards negative information when processing information (Rozin & Royzman, ; Vaish, Grossmann & Woodward, ).
Does sensitivity to emotional prosody reflect perspective taking?. The research reviewed above documents the critical role of emotional prosody in spoken language comprehension. These studies, however, do not unequivocally demonstrate that children are using emotional prosody to reason about a speaker's emotional perspective. That is, preschoolers' use of emotional prosody to resolve communicative ambiguity could arise from established associative links between vocal patterns and their own emotional reactions (e.g. I would be sad if my beachball was deflated) or object states, rather than inferences about a speaker's perspective.
In a recent study in our lab, we developed an on-line communicative perspective-taking task that more clearly tested whether preschoolers could use emotional prosody to reason about a speaker's emotional perspective (Khu, Chambers & Graham, unpublished observations). In this task, fouryear-olds played a competitive game with a speaker, in which a 'loss' for the child meant a 'win' for the speaker, and vice versa. Accordingly, children could not rely on their own emotional reactions or previous associations to infer the speaker's emotional state and communicative intent (e.g. when the speaker sounded sad, it corresponded to a win for the child). Children's eye-gaze was tracked and their responses recorded as they heard ambiguous statements spoken with either happy-or sadsounding emotional prosody. The implicit gaze measures indicated that preschoolers used the speaker's emotional perspective to influence their on-line language comprehension. For example, their eye-gaze patterns indicated that they anticipated that they would lose and that the speaker would win when the speaker sounded happy. The influence of emotional prosody on children's interpretations did not occur until after the utterance had ended, suggesting that this information exerted relatively late effects on children's language processing. In addition, evidence of emotional perspective taking was only weakly reflected in children's explicit responses.
Summary. The research reviewed in this section documents preschoolers' sensitivity to emotional prosody in referential communication, highlighting the developmental changes that occur between three and five years of age and the powerful role of negative-sounding emotional prosody. This research also demonstrates that preschoolers can use emotional prosody to generate inferences about a speaker's emotional perspective and integrate these perspectives in on-line language processing. In the next section, we consider the cognitive abilities that might support preschoolers' integration of perspective information in referential communication.

C O G N I T I V E A B I L I T I E S A N D T H E I N T E G R A T I O N O F P E R S P E C T I V E A N D L I N G U I S T I C I N F O R M A T I O N
Theoretical accounts of communicative perspective taking have posited a role for two key sets of cognitive abilities that may support listeners' integration of perspective and linguistic information, namely theory of mind skills and executive function (Nilsen & Fecica, ; San Juan et al., ). In what follows, we review research that has examined relations between children's abilities in these domains and communicative perspective taking.

Theory of mind
Theory of mind is the ability to represent and form inferences about other people's mental states. It encapsulates both the ability to track what another person can or cannot see and the ability to represent states of knowledge and intention. During the preschool years, there are marked developmental changes in children's theory of mind skills (Gopnik & Slaughter, ; Wellman, Cross & Watson, ). For example, most three-year-olds make incorrect predictions about the actions of an agent who holds a false belief, whereas most five-year-olds correctly predict that the agent's false belief will guide her behaviour (Wellman & Liu, ). During this same developmental period, there are significant developmental changes in children's visual perspective-taking abilities (Flavell, Speer, Green, August & Whitehurst, ; Masangkay, McCluskey, McIntyre, Sims-Knight, Vaughn & Flavell, ; Moll & Tomasello, ; Moll & Meltzoff, a; Moll et al., ) and emotional perspective-taking abilities (Denham & Couchoud, ; Hughes & Dunn, ). Theory of mind abilities may support communicative perspective taking by providing children with the representational ability to track and form inferences about their communicative partner's perspective and referential intent.
Although many researchers have proposed a theoretical link between children's mentalizing abilities and communicative perspective taking (Achim, Fossard, Couture & Achim, ; Nilsen & Fecica, ; Sperber & Wilson, ), only a handful of studies have directly examined the relation between these two capacities. This research has shown that threeto six-year-old children's accurate production and repair of referential statements is positively related to their performance on both visual perspective-taking tasks (Roberts & Patterson, ) as well as standard measures of false-belief understanding (Resches & Pereira, ). Falsebelief understanding has also been shown to predict children's comprehension of spoken instructions and detection of referential ambiguity (Maridaki-Kassotaki & Antonopoulou, ; Resches & Pereira, ).
Two recent studies in our lab have specifically examined whether children's theory of mind abilities are related to their on-line integration of perspective information in referential communication. Our results have demonstrated that theory of mind skills predicted four-year-olds' communicative perspective taking in both visual perspective-taking and emotional perspective-taking referential communication tasks (Khu et al., unpublished observations). Importantly, the relations between theory of mind and communicative perspective taking were specific to the relevant domain. That is, visual perspective taking measured using an off-line task was related to four-year-olds' successful integration of a speaker's visual perspective in an on-line referential communication task (Khu et al., unpublished observations). Likewise, off-line emotional perspective taking was associated with the real-time integration of the speaker's emotional perspective in a referential communication task (Khu et al., unpublished observations).
Although research has demonstrated links between theory of mind and communicative perspective taking, the nature of this relation is underspecified. That is, it is unclear if the relation is necessarily causal and unidirectional in nature. For example, one longitudinal study has shown that children's ability to track perspectives in conversation (e.g. being able to infer the correct recipient of a spoken utterance) predicts later development of false-belief understanding (Bernard & Deleau, ). Thus, the relation between theory of mind and communicative development may be bidirectional, as children who engage in more communicative exchanges may experience greater opportunities to represent and reason about differing perspectives (Harris, de Rosnay & Pons, ).

Executive function
Beyond theory of mind abilities, executive function has also been proposed as a critical component of communicative perspective taking (Brown-Schmidt, ; Lin, Keysar & Epley, ). Executive function may facilitate spoken language processing by providing individuals with the cognitive control needed to (i) inhibit their own perspective in favour of their communicative partner's perspective, (ii) simultaneously consider and integrate multiple cues of reference, including perspective information, and (iii) select a response that appropriately matches their communicative partner's state of knowledge or emotional state. To date, studies have demonstrated that individual differences in executive function significantly predict preschool children's comprehension of referential statements (Gillis & Nilsen, ; Nilsen & Graham, , ). In one study, we examined the relation between three-to five-year-olds' communicative perspective taking and their performance on various measures of executive function (e.g. inhibitory control, working memory, and cognitive flexibility; Nilsen & Graham, ). Although individual differences in executive function did not predict children's performance on production measures, a positive correlation was found between children's inhibitory control and their ability to consider a speaker's visual perspective while interpreting a referential statement. Similar relations have been found in studies examining children's message evaluation. Specifically, both cognitive flexibility (Gillis & Nilsen, ) and inhibitory control (Nilsen & Graham, ) have been shown to predict preschool children's emerging detection of message ambiguity. Thus, executive function appears to assist children with integration of perspective information during spoken language comprehension.
Not all studies that have examined children's comprehension of spoken utterances, however, have found significant correlations with executive function measures (Khu et al., unpublished observations; Nilsen, Mangal & MacDonald, ). For example, Nilsen and colleagues () found that inhibitory control measures did not correlate with the performance of typically developing children and children with Attention Deficit Hyperactivity Disorder on a complex comprehension task (similar in procedure to Epley et al., ). Similarly, Khu et al. (unpublished observations) failed to find relations between children's working memory, conflict inhibitory control, or delay inhibitory control, and performance on communication tasks that involved taking a speaker's visual or emotional perspective. Further research is thus needed to clarify how the contributions of executive function vary across different communicative tasks. It also remains to be seen whether a similar relation exists between executive function and children's ability to produce statements that are tailored to their listener's perspective.

C O N C L U S I O N S A N D F U T U R E D I R E C T I O N S
As we have reviewed, preschoolers are remarkably skilled at integrating perspective information with on-line language comprehension, with significant development occurring between three and five years of age. Our review has highlighted research demonstrating that preschoolers' ability to use visual and emotional perspective information to guide language interpretation is not uniform in character, is sometimes related to theory of mind and executive function skills, and, at certain ages, is revealed only by implicit measures of language processing. Together, the research reviewed here helps to broaden theoretical models of communicative perspective taking, underscoring the importance of examining how different types of perspective inferences shape children's referential understanding.
Although research has significantly advanced our understanding of preschoolers' communicative perspective taking, there remain a number of key issues for further empirical consideration. We discuss three such considerations below.
What types of perspective representations are needed to guide communication? Communicative perspective taking encompasses a broad range of abilities, including, but not limited to, the ability to track the visual perspective of a communicative partner and/or the emotional prosody of spoken utterance to form inferences about referential intent. What remains unclear is the type of perspective representations that are necessary to influence both implicit constraints on visual attention as well as explicit interpretation of spoken utterances. As a number of studies reviewed in this paper have shown, discrepancies sometimes exist between children's eye-gaze patterns and elicited responses (e.g. Berman et al., ; Nilsen et al., ). That is, implicit awareness of a communicative partner's perspective does not always influence the explicit comprehension of spoken utterances.
At present, it is unclear whether these discrepancies are due to underdeveloped cognitive abilities, such as executive function, that would assist children in selecting an explicit social response. Alternatively, these findings could also be indicative of the types of perspective representations that are necessary to influence explicit referential interpretation. That is, implicit awareness of a speaker's perspective may be sufficient to constrain visual attention but not sufficiently robust to outweigh competing cues of reference. The few studies that have examined a relation between children's mentalizing abilities and communicative perspective taking have suggested that explicit awareness of a partner's perspective may be important, if not necessary, for communicative perspective taking to develop (e.g. Khu et al., unpublished observations). As it currently stands, however, it is unclear if discrepancies between implicit and explicit responses are indicative of children's inability to (i) rapidly generate robust, if not explicit, representations of perspective, and/or (ii) integrate perspective cues with other sources of information.
Related to the issue of perspective representations, recent accounts have also suggested that there may be limits on the types of perspective inferences that individuals can generate efficiently enough to influence real-time social responses (Butterfill & Apperly, ; Low, Apperly, Butterfill & Rakoczy, ). In support of these accounts, researchers have found that both children and adults are able to rapidly form inferences about Level I perspective taking (e.g. understanding WHAT another person sees from a different perspective) but show significant delays in reasoning about Level II perspective taking (e.g. understanding HOW another person sees the same item from a conflicting perspective) (Low & Watts, ; Surtees, Butterfill & Apperly, ). Examining whether similar limits exist in communicative perspective taking may provide insight into whether children's ability to integrate perspective taking with spoken language processing is dependent on the complexity of perspective inferences being formed.
How might social experience influence the development of communicative perspective taking? Nilsen and Fecica () have proposed that the development of cognitive abilities associated with communicative perspective taking may, in turn, be dependent on the quality and degree of children's social experience. To date, most research examining the relation between social experience and communicative perspective taking has focused on children's production of spoken utterances. For example, several studies have now shown that corrective feedback from a listener (e.g. requests for clarification) can lead to improvements in preschoolers' production of referential statements (Matthews, Butcher, Lieven & Tomasello, ; Matthews, Lieven & Tomasello, ; Nilsen & Mangal, ) and better detection of referential ambiguity (Robinson & Robinson, , ; Sonnenschein, ). Incentives (i.e. stickers) have similarly been shown to lead to improvements in the accuracy of preschoolers' communicative production, suggesting that experience can influence children's motivation to track and form inferences about a communicative partner's perspective (Varghese & Nilsen, ). More experience engaging in communicative interactions, perhaps through pretend play, may also contribute to children use of perspective in communication. For example, Roby and Kidd () found that, relative to children without imaginary companions, children with imaginary companions were more likely to produce descriptions that would help their listener identify a target image and request clarification when interpreting ambiguous descriptions. Thus, social experience appears to influence children's production and, possibly, detection of miscommunicated messages. It remains to be seen, however, whether similar experience and feedback would impact their comprehension of referential statements.
Does communicative perspective taking vary across different social contexts? Related to the question of how social experience may influence the development of communicative perspective taking, it also remains an open question whether children's communicative perspective taking may vary across different social contexts. For example, Moll, Carpenter, and Tomasello () found that toddlers were more likely to conflate their own perspective with that of a co-present adult when engaged in a collaborative social interaction. It is possible that efficiency and accuracy of communicative perspective taking may vary with contextual factors (e.g. cooperative vs. competitive contexts) that could affect both children's motivation and ability to track differences in perspective.
In closing, addressing the considerations described above will further clarify the cognitive and social factors contributing to the development and efficiency of communicative perspective taking, leading to a more comprehensive account of communicative development.