back printable version

Animacy, Motion, Emotion and Empathy in Visual Music: Enhancing appreciation of abstracted animation through wordless song


This paper will discuss the exploration of key musical and visual parameters with the aim of enhancing the appreciation of Abstracted Animation [1] with varying degrees of animacy. A series of animations were created in response to multiple, wordless, sung, close variations of a song. Carefully delineated, visual parameters and a mapping of visual to audio relationships of the animations afforded insights into key audio-visual intersections and suggested future directions.


Song Series Animacy is a practice-based research project within the area of Visual Music and Abstracted Animation. Nine studies-in-animation were produced as a pilot study to map parameters of animacy, music, vision and the vision to audio relationship. The development of three of these studies-in-animation is reflected here in detail. Conclusions are drawn from the whole study, outlining the most effective parameters for future research.

Figure 1. Song Series Animacy Variations 1 to 9. Images retrieved from Copyright 2015 by Julie Watkins


I position my research and artistic practice within Visual-Music. Many attempts have been made to make a synthesis between light and music, through colour organs, painting and painting onto film as well as through oscilloscopes' representations of sound generation in visual art. This is achieved through algorithmically transforming sonic or visual data into a data set that can generate or manipulate images and sound in real time. Visual Music has been very widely defined and often undifferentiated from Lumia, Color Music, Synchromy, Audio-Visual Music and even Motion Graphics.

Andrew Hill divides visual music into four main sub categories:

  1. A purely visual approach to Visual Music, for example Thomas Wilfred's Lumia, or the works of Kandinsky or Klee.
  2. Visual composition to pre-existing musics such as in some of the early works of Oskar Fischinger.
  3. The composition of sound and image informed by traditions of music in which materials are structured within time. The sound and image are regarded as equal components joined in the context of a work and are both structured musically.
  4. The synthesis of visual materials from sound and the representation of sound visually (2013: 6).

My own research investigates category B: visual composition to pre-existing musics. I am interested in how prescriptive the visual to audio relationship and the correlation between image and music needs to be in order to connect the two (see Figure 3). Laszlo Moholy-Nagy stated 'to develop creative possibilities of the sound film the acoustic alphabet of sound writing will have to be mastered; in other words, we must learn to write acoustic sequences on the sound track without having to record real sound' (1947: 277). Norman McLaren created Synchromy using an acoustic alphabet in 'sound track area of the film and repeating them in the visual area in colour' (McLaren, 1970).

Animacy: Motion, emotion and empathy

The primate visual system processes colour separately from motion. Margaret Livingstone names the two visual systems 'Where' and 'What' (2002, 51). The 'Where System' processes motion perception, depth perception, spatial organisation and figure/ground segregation separately from the 'What System', which processes object recognition, face recognition and colour perception. The 'Where System' has developed to enable our negotiation through three-dimensional space and the flight or fight reflex. At first glance Abstracted Animation would appear to have little relation to the basic level of response of the Where System. However, research into the mirror neuron system has shown that we can respond at a basic level to depicted motion.

There is debate over whether embodied empathetic reactions are the same in response to art as they are in real life. David Freedberg and Vittorio Gallese have investigated the mirror neuron system in relation to still images and sculptures and they concluded that this system significantly impacts responses to art. They state:

Automatic empathetic responses constitute a basic level of response to images and to works of art. Underlying such responses is the process of embodied simulation that enables the direct experiential understanding of the intentional and emotional contents of images (2007: 202).

Images containing motion do not have to be representational to be engaging. At the cognitive level Phillip McAleer and Frank Pollick found 'perception of animacy can be powerfully determined by motion alone - even when the motions of real people are represented by single points' (2008: 837). Furthermore, it has been demonstrated that abstract animation can appear to be motivated by human emotions; the shapes appear to be alive; animate. From Disney onwards, character animation has had a tradition of using human actors to closely model animation on dramatic performance and so engage the audience in the story. Matthew Stone, Doug DeCarlo, Insuk Oh, Christian Rodriguez, Adrian Stere, Alyssa Lees and Chris Bregler found: 'Engaging dramatic performances convincingly depict people with genuine emotions and personality' (2004: 1). This has been further developed with motion capture and is widely used today. Abstract animation can draw on the audience's understanding of character animation and has its own history of conveying emotion through animating motion; Freedberg and Gallese also state: 'Recent studies in macaques and humans demonstrated that mirror neurons not only underpin action understanding, but they are also involved in understanding the intentions that underlie action' (2007: 200). John Whitney states that this means: 'Structured motion begets emotion' (1980: 41). Motion graphics relies heavily on engaging viewers via anthropomorphism. Animated geometric objects have been shown to be readily anthropomorphised as demonstrated by Fritz Heider and Marianne Simmel (1944). As Sandra Marshal and Annabel Cohen state: '[I]t is reported typically that a pair of friends (the small triangle and the circle) are antagonised by a "bully" (large triangle) who in failing to achieve his goal takes out his anger by destroying his home' (1988: 99).

Animacy is affected by the relationship of the visuals to a soundtrack, as demonstrated by Julian Thayer and Robert Levenson's study in 1983 and later by Marshall and Cohen's study in 1988. Thayer and Levenson used Heider and Simmel's visuals and compared and contrasted an allegro and an adagio movement from Prokofiev's fifth Symphony as the soundtrack. They demonstrated a relationship between the connotations of the soundtrack and audience interpretations of the animation. The expressive quality of movement is dependent on speed. The perception of speed within a moving image is thus affected by music and sounds, which underscores dynamism and speeds motion as well as creating a more dance-like stylisation (Arnheim, 2006: 187).

Marshall and Cohen found that music seems to do more than provide mood for the visual scene. They state:

changing the meaning of the film and its components on the activity dimension does not depend upon changing the overall salience of that dimension in the background music. Changed meaning however, may depend upon perceived temporal congruence between music and the film. [...] [C]ongruence between internal structure of film and music alters the attentional strategy to and subsequent encoding of information in the film [...] the pattern of attention to music alone or film alone is altered under conjoint presentation (1988: 110).

However, it should be noted that human physical actions and meaning do not have a one to one correlation; 'there is no one-to-one correspondence between body movements and emotional states. A single emotion category can be connected to several body movements and vice versa' (Morita et al., 2013: 6).

Music Parameters

As Walter Murch states: 'the possibility of reassociation of image and sound is the fundamental stone upon which the rest of the edifice of film sound is built, and without which it would collapse' (in Chion, 1990: xix). For Song Series Animacy the pre-existing musics are traditional songs. The complexity of the human voice and our recognition of vocal quality are brought into play (Bozeman, 2007). All the versions are sung a capella and wordlessly to emphasize the emotion of the song. Additionally, lyrics would take too much attention away from the relationship between sound and vision, as viewers would focus on the meaning of the lyric. As Michel Chion states, sound is 'verbocentric, above all' (1990: 6).

Figure 2. Music Parameters for Song Series Animacy Variations 1 to 9 Copyright 2015 by Julie Watkins

Musical parameters have been chosen that elicit emotional, physiological and psychological responses; these include modes, percussive quality and tempo (Zwaag et al., 2011). To encapsulate the main emotive triggers these will be delineated as: (1) mode - pentatonic major or minor, (2) percussive quality - smooth, aspirated or articulated, and (3) tempo - fast, medium or slow, based on a resting heart rate (Iwanaga: 1995). The meter of the song is not stressed, but subsumed under the melody. This affords a more neutral backdrop for the investigation of mapping synchronisation to the musically phrased dynamics or the changing amplitude of six frequencies within the major pentatonic mode of the song and the image qualities delineated in figure 5.

Mapping Vision to Audio Parameters

As Helmut Leder, Benno Belke, Andries Oeberst, and Dorothee Augustin state: 'Since the emergence of abstract art, art has provided objects that are differentiated only on style of depiction rather than content' (2004: 498). Song Series Animacy is a non-representational and non-figurative series of compositions of light responding to music with rhythm and motion lying in the liminal space of near-abstraction to abstraction and movement to near-stasis. This is a pilot study to refine these parameters. For Variations 1, 4, 7 and 9 the vision was mapped to the musical dynamic phrasing of the audio variations of the song (these Variations are underlined in Figure 3). In contrast the other Variations map visual qualities such as opacity, scaling and brightness to the changing amplitude of particular frequencies. The methodology was to apply an audio filter in six passes that extracted the amplitude data for the frequencies of the pitches of the song. This afforded the possibility of synchronicity for multiple visual elements.

The audio was subtly differentiated by tempo and percussive quality affording the testing of several wider parameters of the imagery. Three strands of enquiry into animation have been pursued; edited live action to the least percussive, smooth songs (Variations 1, 4 and 7), soft, layered, light effects to the aspirated songs (Variations 2, 5 and 8) and abstract geometric to the most percussive, articulated songs (Variations 3 and 6). Variation 9 combines edited live action to a percussive articulated song with a looser synch than Variation 1. The relationship between the visual and the music ranges from synchronicity to a loose, metaphorical synch. The results from Variations 1 to 9 are mapped in Figure 3, and contextualised by some seminal examples of wordless animations and visual music.

Figure 3. Mapping Vision to Audio Parameters Copyright 2015 by Julie Watkins

Vision Parameters: Creating abstracted animation

Artworks by James Turrell, Mark Rothko, Agnes Martin, William Turner, Albert Irvin, Piet Mondrian, Wassily Kandinsky and John Whitney have been mapped relative to each other in relation to their degree of tonal contrast, form, texture, gestural marks [2] and degree of mathematical spacing or grid-composition (see Figure 4). These parameters have been reflected on and explored across the nine Variations. The tonal ranges explored were 100%, 75%, 50% and 25%. The animations' visual compositions ranged from formless to textural to gestural to grids (see Figure 5).

Figure 4. Tonal Range Mapped Against Compositional Structure of Seminal Artworks Copyright 2015 by Julie Watkins

Figure 5. Tonal Range Mapped Against Compositional Structure of Variations 1-9 Based on images retrieved from Copyright 2015 by Julie Watkins

Reflections on the development of Variations 4, 5 and 6

Variation 4

Figure 6. Variation 4 Image retrieved from Copyright 2015 by Julie Watkins

Audio-visual phrasing of the sequence involves determining and using 'the primary synch points that are crucial for meaning and dynamics' (Chion,1990: 190). Reassociation is necessary to extend the sound vision relationship, to add some ambiguity, as Walter Murch states: 'to create a purposeful and fruitful tension between what is on the screen and what is kindled in the mind of the audience - what Chion calls sound en creux (sound "in the gap")' (in Chion, 1990: xix). In Variation 4, the sound / vision relationship is extended. Falling glitter was filmed at high speed to clarify the motion of individual particles within the dynamics of the whole motion of thrown particles. Further, this motion was edited to the musical dynamic phrases. Thus, the throws of glitter are timed to the dynamic phrases of the music but the synchronicity of individual particles of glitter is random. As Chion states: 'total disorder with no apparent goal is intolerable for human beings. We cannot resist giving it structure and form, a teleology, a shape and direction, even when it itself has none' (Chion, 1990: 211). In Variation 4 the overarching visual phrasing and abundance of detailed motion gives a greater impression of synchronicity.

Variation 5

Figure 7. Variation 5 Image retrieved from Copyright 2015 by Julie Watkins

As James Turrell states: 'You can extend feeling out through the eyes to touch with seeing' (1993: 26). Variation 5 references Turrell's work; it also has no object and no particular place to look. The gestural paint-strokes and turbulent motion reference Turner's seascapes. Six animated layers represent the six most dominant audio frequencies after the song was filtered. The amplitude of each pitch governs the opacity of one of the layers. The animation has been greatly simplified and smoothed out, so that instead of changing on every frame the shortest change lasts for half a second. Turbulence was added to the visual as a metaphor for breath. Nevertheless these images and audio were perceived to be overly synchronised. The changes in amplitude of the frequencies and the visual changes matched closely and the overall result lacked tension. Further layers were added in such a way as to make the piece more aleatory to emphasize the musical dynamic phrasing and to add visual depth. This was inspired by work from Turrell, Cruz Diez and Turner.

Variation 6

Figure 8. Variation 6. Image retrieved from Copyright 2015 by Julie Watkins

The background is a warm red-orange and the rectangles, which represent sung notes, are rendered in complementary blue. They are mirrored about the x-axis to make a visual metaphor for an open mouth. The mathematical spacing and sizing of the rectangles are influenced by Norman McLaren's 1971 film Synchromy. The lower the frequency the closer it is to the central x-axis. The lowest frequency is half the width of the screen, the next lowest frequency being a quarter the width of the screen and so on. The changing amplitude of each of the six filtered pitches of the song determines the height of the visualized notes every twenty-fifth of a second. Particles have been used to give a sense of moving breath. The myriad synch-events flatten the audio-visual phrasing of the sequence.

Future Direction

This pilot mapping of parameters in the versions of Song Series Animacy has resulted in contrasting affordances, opportunities and challenges for synchronicity, animacy, stasis and motion, culminating in a rich but delineated terrain for exploring visual-audio interest, including animation, editing and re-animating of live action, a 'time-collage'.

On reflection I found that Variations 1, 4, and 7 with animation based on musical phrasing were the most effective for animacy and empathy and enhancing appreciation of abstracted animation through wordless song. The heavily saturated colours and use of complementary colours had less affordance in this context than the full tonal range. The use of 100% of the tonal range with gestural animation and time-collage introduced here will be further explored and discussed and refined in a future Series. It is hoped that this reflection and delineation of music and vision parameters will be of use to others researching in a similar area.

Figure 9. Song Series Animacy, Variations 1-9 Images retrieved from Copyright 2015 by Julie Watkins. The animations were originally shown at DRHA Dublin 1 to 3 September 2015 and a shorter version of this paper was presented.


[1] Abstracted Animation is my own term; it refers to animation that has texture, depth and expressive movement, without overtly representing concrete reality.

[2] Gestural animation refers to Kimon Nicolaides' notion of 'gesture': 'Gesture has no precise edges, no exact shape, no jelled form. The forms are in the act of changing. Gesture is movement in space. To be able to see gesture, you must be able to feel it in your own body' (1988: 15).

Special Thanks to:
Martin Nelson for his singing and Ian Thompson for recording him.


Arnheim, R. (2006) Film as art. Berkeley, CA: University of California Press.

Bozeman, K. (2007), 'A case for voice science in the voice studio' in The Journal of Singing 63/3: 265-270

Chion, M. (1990) ‪Audio-vision: Sound on Screen, New York & Chichester: Columbia University Press

Coates, M. (2007) Dawn Chorus. Online, (accessed on 16/10/15)

Freedberg, D. and Gallese, V. (2007) 'Motion, emotion and empathy in esthetic experience', in TRENDS in Cognitive Sciences 11/5: 197-203

Heider, F. and Simmel, M. (1944) 'An experimental study of apparent behaviour' in American Journal of Psychology, 57: 234-259

Hill, A. (2013) Interpreting Electroacoustic Audio-visual Music, (Doctoral Thesis, De Montfort University, UK) retrieved from on 01/05/2015

Irvin, A. (2012) Rosetta. Acrylic on canvas (152 x 121cm). Gimpel Fils London (last accessed on 16/10/2015)

Iwanaga, M. (1995). 'Relationship between heart rate and preference for tempo of music', in Perceptual and Motor Skills, 81(2): 435-440.

Jacobsen, T. (2002). 'Kandinsky's questionnaire revisited: Fundamental correspondence of basic color and form', in Perceptual and Motor Skills, 95, 903-913.

Jacobsen, T., & Wolsdorff, C. (2007). 'Does history affect aesthetic preference? Kandinsky's teaching of colour-form correspondence, empirical aesthetics, and the Bauhaus', in The Design Journal, 10: 16-27

Kandinsky, W. (1911) Picture with the Circle. Oil on canvas (139 x 111 cm). Georgian National Museum, Tbilisi. (accessed on 16/10/2015)

Kandinsky, W. (1977) Concerning the spiritual in art. New York: Dover Publications

Kenny, P. (2009) Catch a Wave easkey-no-1, exhibited: Paul Kenny, Seaworks 2001-2009, at Beetles + Huxley London (accessed on 16/10/2015)

Leder, H. Belke, B. Oeberst, A. and Augustin, D. (2004) 'A model of aesthetic appreciation and aesthetic judgments' in British Journal of Psychology, 95: 489-508

Le Grice, M. (2009) Experimental Cinema in the Digital Age. London: British Film Institute

Livingstone, M. (2002) Vision and Art: The Biology of Seeing. New York: Harry N. Abrams

Martin, A. (1965) Morning. Acrylic paint and graphite on canvas. (18 x 18 cm). Tate, London. (accessed on 16/10/2015)

McAleer, P., & Pollick, F. E. (2008). 'Understanding intention from minimal displays of human activity', in Behavior Research Methods, 40(3): 830-839

McLaren, N. (1971) Synchromy, The National Film Board of Canada. DVD

Miller, G. (1970) The Eye Hears The Ear Sees: Norman McLaren Film maker BBC & The National Film Board of Canada DVD

Moholy-Nagy L. (1947) 'Problems of the Modern Film' revised and reprinted in Vision in Motion. Chicago: Paul Theobald, p.277

Morita, J. Nagai,Y. Moritsu T. (2013) 'Relations between Body Motion and Emotion: Analysis based on Laban Movement Analysis',

CogSci Proceedings pp.1026-31 (accessed 05/09/2013)

Murch, W. in Chion, M. (1994) ‪Audio-vision: Sound on Screen. New York & Chichester: Columbia University Press

Mondrian, P. (1935) Composition B No.11 with Red. Oil on canvas (80 x 63 cm). Tate, London. (accessed on 16/10/2015)

Nicolaides, K. (1988) The Natural Way to Draw. London: Andre Deutsch

Rothko, M. (1959) Untitled - Mural for End Wall. Oil and mixed media on canvas (265 x 268 cm). National Gallery of Art, Washington (accessed on 16/10/2015)

Stone, M. et al., (2004) 'Speaking with Hands: Creating animated conversational characters from recordings of human performance, in Siggraph (accessed 05/09/2013)

Thayer, J.F.; Levenson, R.W. (1983) 'Effects of music on psychophysiological responses to a stressful film' in Psychomusicology 3: 44-54.

Turner, W. (c.1840) Waves Breaking On A Lee Shore At Margate (Study for 'Rockets and Blue Lights'), Oil paint on canvas (597 x 952 cm).

Tate, London. (accessed on 16/10/2015)

Turrell, J. (2013) Breathing Light, LED light into space, Los Angeles County Museum of Art, (accessed on 16/10/2015)

Turrell, J. (1993) Air Mass. London: The South Bank Centre

Whitney J. (1980) Digital Harmony: On the Complementarity of Music and Visual Art. Kingsport, TN: Kingsport Press

Zwaag, M. D., Westerink, J. H., & Broek, E. L. (2011). 'Emotional and psychophysiological responses to tempo, mode, and percussiveness', in Musicae Scientiae, 15(2): 250-269.


Julie Watkins is a senior lecturer in Film and Television at the University of Greenwich. She worked as lead creative in prestigious Post-Production facilities in Soho and Manhattan. She designed concepts, led Technical Direction, Animation, Motion Graphic and Visual Effects Teams, for Commercials, Broadcast Graphics and Films. She taught at New York University. She joined the University of Greenwich in 2006, initiated a Film and Television degree and partnership with the BBC. She has MA (distinction) in Graphic Design from University of the Arts London. She has shown work at DRHA 2014 and 2015 and is completing a Ph.D.