Creative AI Across Modalities

Abstract

For the past few years, we have witnessed eye-opening generation results from AI foundation models such as GPT-3, and DALL-E2. These models have set up great infrastructures for new types of creative generation across various modalities such as language (e.g. story generation), images (e.g. text-to-image generation, fashion design), and audio (e.g. lyrics-to-music generation). Researchers in these fields encounter many similar challenges such as how to use AI to help professional creators, how to evaluate creativity for an AI system, how to boost the creativity of AI, how to avoid negative social impact, and so on. There have been various workshops that focus on some aspects of AI generation. This workshop aims to bridge researchers and practitioners from NLP, computer vision, music, ML, and other computational fields to create the 1st workshop on “Creative AI across Modalities”.

Invited Speakers

Snigdha Chaturvedi
(UNC)

Chris Donahue
(Google Magenta)

Andrew Owens
(UMich)

Niki Kittur
(CMU)

Mark Riedl
(Georgia Tech)

Diyi Yang
(Stanford)

Aaron Hertzman
(Adobe Research)

Accepted Papers

Photong: Generating 16-Bar Melodies from Images By: Yanjia Zhang, Haohan Wang
Large Language Models Learn to Drum By: Li Zhang, Chris Callison-Burch
Blind Judgement: Agent-Based Supreme Court Modelling with GPT By: Sil Hamilton
A Friendly Face: Do Text-to-Image Systems Rely on Stereotypes when the Input is Under-Specified? By: Kathleen C. Fraser, Isar Nejadgholi, Svetlana Kiritchenko
Spiking ConvLSTM for Semantic Music Generation By: Anna Shvets
Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing By: Tuhin Chakrabarty, Vishakh Padmakumar, He He
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Tune Generation Task By: Shangda Wu, Maosong Sun
Towards Grounded Dialogue Generation in Video Game Environments By: Nader Akoury, Ronan Salz, Mohit Iyyer
Improving the Creativity of Generative Language Models By: Douglas Summers-Stay, Clare R. Voss, Stephanie M. Lukin
Is AI Art Another Industrial Revolution in the Making? By: Alexis Newton, Kaustubh Dhole
Music Playlist Title Generation Using Artist Information By: Haven Kim, Seungheon Doh, Junwon Lee, Juhan Nam
3DStyleMerge: Part-Compatible 3D Style Transfer By: Abhinav Upadhyay, Alpana Dubey, Suma Mani Kuriakose
Color Me Intrigued: Quantifying Usage of Colors in Fiction By: Siyan Li
Trash to Treasure: Using text-to-image models to inform the design of physical artefacts By: Amy Smith, Hope Schroeder, Ziv Epstein, Mike Cook, Simon Colton, Andrew Lippman
Unsupervised Melody-Guided Lyrics Generation
Exploiting Multiple Guidance From 3DMM For Face Reenactment By: Huayu Zhang, Yurui Ren, Yuanqi Chen, Ge Li, Thomas H. Li
Neural Story Planning By: Anbang Ye, Christopher Zhang Cui, Taiwei Shi, Mark Riedl
Leveraging Human Preferences to Master Poetry By: Rafael Pardinas, Gabriel Huang, David Vazquez, Alexandre Piché
SEE&TELL: Controllable Narrative Generation from Images By: Stephanie M. Lukin, Sungmin Eum
Learning the Visualness of Text Using Large Vision-Language Models By: Gaurav Verma, Ryan A. Rossi, Christopher Tensmeyer, Jiuxiang Gu, Ani Nenkova
SketchBetween: Video-to-Video Synthesis for Sprite Animation via Sketches By: Dagmar Lukka Loftsdóttir, Matthew Guzdial
Threshold Designer Adaptation: Improved Adaptation for Designers in Co-creative Systems By: Emily Halina, Matthew Guzdial
Simple Unsupervised Image Captioning via CLIP’s Multimodal Embeddings By: Derek Tam, Colin Raffel, Mohit Bansal
Culturally-Aware Stable Diffusion: Supporting Cultural Representation in Text-to-Image Synthesis By: Zhixuan Liu, Peter Schaldenbrand, Youeun Shin, Beverley-Claire Okogwu, Youngsik Yun, Jihie Kim, Jean Oh
Robot Synesthesia: A Sound and Semantics Guided AI Painter By: Vihaan Misra, Peter Schaldenbrand, Jean Oh
Deep Generative Multimedia Children's Literature By: Matthew Lyle Olson
Diffusion Models as Visual Reasoners By: Jason Lin, Maya Srikanth
In BLOOM: Evaluating Creativity and Affinity in Artificial Lyrics and Art By: Evan Crothers, Herna L. Viktor, Nathalie Japkowicz
Conveying the Predicted Future to Users: A Case Study of Story Plot Prediction By: Chieh-Yang Huang, Saniya Naphade, Kavya Laalasa Karanam, Ting-Hao Huang
A Tool for Composing Music via Graphic Scores in the style of Gyorgy Ligeti's Artikulation using Self-supervised Representation Learning By: Berker Banar, Simon Colton
One Artist’s Personal Reflections on Methods and Ethics of Creating Mixed Media Artificial Intelligence Art By: Jane Adams

Schedule

For the in-person participants, please go to Room 146B for joining all the talks and the poster session.

08:50 am - 09:00 am (EST)	Introduction and Opening Remarks
09:00 am - 09:45 am (EST)	Invited Talk	Andrew Owens (UMich)	Title: Cross-modal Synthesis with Sight, Sound, and Touch
09:45 am - 10:30 am (EST)	Invited Talk	Mark Riedl (Georgia Tech)	Title: Computers, Creativity, and Lovelace Abstract: In this talk we examine the what attributes we should expect in human-level creative systems, and the mechanisms by which we might achieve them. I provide examples from the domain of automated story generation. I conclude the talk with some informal analysis of recent progress toward AI systems that express creativity.
10:30 am - 10:40 am (EST)	Break
10:40 am - 11:25 am (EST)	Invited Talk	Chris Donahue (Google)	Title: Frontiers in Controllable Music Generation Abstract: For music generation and creative generation more broadly, control is key to unlocking human expression. In this talk, I will discuss the recent improvements in and remaining obstacles to building controllable music generation systems that unlock exciting new expressive capabilities for musicians and non-musicians alike. Additionally, I will discuss control considerations that are more specific to music and argue that text is useful but not sufficient for expressive musical control. As a case study, I will discuss SingSong, a recent system from the MusicLM project at Google which learns to translate vocal performances into instrumental accompaniments, thereby allowing anyone to create rich music featuring their own voice.
11:25 am - 12:10 pm (EST)	Invited Talk	Niki Kittur (CMU)	Title: Scaling Analogical Innovation
12:10 pm - 01:30 pm (EST)	Lunch
01:30 pm - 02:50 pm (EST)	Poster Session (virtual + in person)
02:50 pm - 3:35 pm (EST)	Invited Talk	Aaron Hertzman (Adobe)	Title: Can Computers Create Art? Abstract: Can AI algorithms make art, and be considered artists? Within the past decade, the growth of new neural network algorithms has enabled exciting new artforms with considerable public interest, including DeepDream, GANs, VAEs, and diffusion models like DALL-E and Imagen. These tools raise recurring questions about their status as creators and their effect on the arts. In this talk, I will discuss how these developments parallel the development of previous artistic technologies, like oil paint, photography, and traditional computer graphics. I argue that art is a social phenomenon, and discuss possible—but very unlikely—scenarios for when these algorithms could someday be considered artists.
03:35 pm - 04:20 pm (EST)	Invited Talk	Snigdha Chaturvedi (UNC)	Title: Modeling People in Automatic Story Generation Abstract: Automatic story generation is the task of designing NLP systems that, given a prompt, can produce the text of a story. Most methods for this problem focus on modeling events and their coherence. However, an alternate perspective to story generation can be from the viewpoint of people described in the story. In this talk, I focus on one aspect of modeling people in story generation -- modeling their social relationships. I describe our story generation approach to incorporate a desired social network demonstrating relationships between various people to be mentioned in the story. We propose a model that uses latent variables to incorporate social relationships. Apart from generating coherent stories that reflect the desired social network, the latent variable-based design results in an explainable generation process.
04:20 pm - 04:30 pm (EST)	Break
04:30 pm - 05:15 pm (EST)	Invited Talk	Diyi Yang (Stanford)	Title: Improving Everyday Interaction through Human-Centered Text Generation Abstract: As natural language generation has gained popularity and produced extensive industrial applications, there has been an increasing focus on enabling the use of natural language in human-like interactions. How can we improve such everyday interactions and build language generation systems that are more aware of human factors? In this talk, we take a closer look at human-centric language generation and present two recent works that promote positive language use and summarize daily conversations. Specifically, the first part examines positive reframing by neutralizing a negative point of view and generating a more positive perspective without contradicting the original meaning. The second part demonstrates how more structures of conversations can be utilized to generate better summaries for everyday conversation.
05:15 pm - 05:25 pm (EST)	Closing Remarks