JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models

Futureverse AI Innovation

Abstract Music generation has attracted growing interest with the advancement of deep generative models. However, generating music conditioned on textual descriptions, known as text-to-music, remains challenging due to the complexity of musical structures and high sampling rate requirements. Despite the task’s significance, prevailing generative models exhibit limitations in music quality, computational efficiency, and generalization. This paper introduces JEN-1, a universal high-fidelity model for text-to-music generation. JEN-1 is a diffusion model incorporating both autoregressive and non-autoregressive training. Through in-context learning, JEN-1 performs various generation tasks including text-guided music generation, music inpainting, and continuation. Evaluations demonstrate JEN-1’s superior performance over state-of-the-art methods in text-music alignment and music quality while maintaining computational efficiency.

Comparison between music generative models

In the following, we compare the output of JEN-1 to the outputs of other state-of-the-art music generative models (MusicGen, MusicLM, Riffusion and Mousai).

Prompt
JEN-1
MusicGen
MusicLM
Riffusion
Mousai

A punchy double-bass and a distorted guitar riff
JEN-1
MusicGen
MusicLM
Riffusion
Mousai

Lofi slow bpm electro chill with organic samples
JEN-1
MusicGen
MusicLM
Riffusion
Mousai

Smooth jazz, with a saxophone solo, piano chords, and snare full drums
JEN-1
MusicGen
MusicLM
Riffusion
Mousai

A grand orchestral arrangement with thunderous percussion, epic brass fanfares, and soaring strings, creating a cinematic atmosphere fit for a heroic battle
JEN-1
MusicGen
MusicLM
Riffusion
Mousai

A dynamic blend of hip-hop and orchestral elements, with sweeping strings and brass
JEN-1
MusicGen
MusicLM
Riffusion
Mousai

Pop dance track with catchy melodies, tropical percussion, and upbeat rhythms, perfect for the beach
JEN-1
MusicGen
MusicLM
Riffusion
Mousai

Classic reggae track with an electronic guitar solo
JEN-1
MusicGen
MusicLM
Riffusion
Mousai

Earthy tones, environmentally conscious, ukulele-infused, harmonic, breezy, easygoing, organic instrumentation, gentle grooves
JEN-1
MusicGen
MusicLM
Riffusion
Mousai

Violins and synths that inspire awe at the finiteness of life and the universe
JEN-1
MusicGen
MusicLM
Riffusion
Mousai

80s electronic track with melodic synthesizers, catchy beat and groovy bass
JEN-1
MusicGen
MusicLM
Riffusion
Mousai

A piano and cello duet playing a sad chambers music
JEN-1
MusicGen
MusicLM
Riffusion
Mousai

A light and cheerly EDM track, with syncopated drums, aery pads, and strong emotions
JEN-1
MusicGen
MusicLM
Riffusion
Mousai

Acoustic folk song to play during roadtrips: guitar flute choirs
JEN-1
MusicGen
MusicLM
Riffusion
Mousai

Rock with saturated guitars, a heavy bass line and crazy drum break and fills
JEN-1
MusicGen
MusicLM
Riffusion
Mousai

Representation of key musical attributes

Genre
Prompt JEN-1
Moody, melancholy medium-tempo standard jazz song that is good for late-night listening.
Moody, melancholy medium-tempo fusion jazz song that is good for late-night listening
Moody, melancholy medium-tempo blues that is good for late-night listening.
Instrument
Prompt JEN-1
A romantic love song played by a band with a lead piano
A romantic love song played by a band with a lead saxophone
Tempo
Prompt JEN-1
Slow-paced progressive rock instrumental video game music with electric guitars, keyboards, bass and drums
Rapidly-paced progressive rock instrumental video game music with electric guitars, keyboards, bass and drums
Mood
Prompt JEN-1
Funky, bright and happy song with smooth, soulful male vocals.
Romantic song with smooth, soulful male vocals
Era
Prompt JEN-1
Rock music from the 70s, with bass, drum, guitar
Rock music from the 80s, with bass, drum, guitar
Rock music from the 90s, with bass, drum, guitar
Rock music from the 2000s, with bass, drum, guitar

Zero-shot ability

It is possible to generate music based on creative prompts.

Prompt JEN-1
A funky hip hop song played with Scottish bagpipe
Acoustic guitars are playing heavy metal riffs
A flute plays a salsa song
Standard jazz song sung in acapella
Cartoon theme song sung by cats

Diversity

Prompt
Variation 1
Variation 2
Variation 3
Variation 4
Variation 5

A dynamic blend of hip-hop and orchestral elements, with sweeping strings and brass, evoking the vibrant energy of the city
Variation 1
Variation 2
Variation 3
Variation 4
Variation 5

Smooth jazz, with a saxophone solo, piano chords, and snare full drums
Variation 1
Variation 2
Variation 3
Variation 4
Variation 5

Music inpainting (cut and edit)

By masking part of the tracks we can generate music that respects the text prompt, while following the remainder of the track - the following samples have masking applied at 2-5 seconds.

Prompt Masked Inpainted
Intense, drama, climax, avengers, dark knight, man of steel
Motivational, driving, anthemic, uplifting, positive, underscore
Sad, emotional, dreamy, reflective, ambient, pensive
Upbeat, optimistic, joyous, cheerful, light, happy

Music continuation

We can generate coherent music continuations that respect the text prompt - the following samples have masking applied during the second half of the track (5-10 seconds).

Prompt Masked Continued
Intense, drama, climax, avengers, dark knight, man of steel
Motivational, driving, anthemic, uplifting, positive, underscore
Sad, emotional, dreamy, reflective, ambient, pensive
Upbeat, optimistic, joyous, cheerful, light, happy