Abstract
In video game design, audio (both environmental background music and object
sound effects) play a critical role. Sounds are typically pre-created assets
designed for specific locations or objects in a game. However, user-generated
content is becoming increasingly popular in modern games (e.g. building custom
environments or crafting unique objects). Since the possibilities are virtually
limitless, it is impossible for game creators to pre-create audio for
user-generated content. We explore the use of generative artificial
intelligence to create music and sound effects on-the-fly based on
user-generated content. We investigate two avenues for audio generation: 1)
text-to-audio: using a text description of user-generated content as input to
the audio generator, and 2) image-to-audio: using a rendering of the created
environment or object as input to an image-to-text generator, then piping the
resulting text description into the audio generator. In this paper we discuss
ethical implications of using generative artificial intelligence for
user-generated content and highlight two prototype games where audio is
generated for user-created environments and objects.