May 28, 2026

HeartMuLa: A Family of Open-Source Music Foundation Models

TL;DR

HeartMuLa is a family of open-source music foundation models for music generation workflows. The 3B generator supports music generation from lyrics and comma-separated style tags, with multilingual lyric support across English, Chinese, Japanese, Korean, and Spanish. The model family also includes HeartCodec for music coding, HeartTranscriptor for lyrics transcription, and HeartCLAP for audio-text alignment.

What HeartMuLa Is

HeartMuLa, now listed on AIOZ AI, is a music foundation model family designed for lyrics-conditioned music generation and related audio-language workflows.

The family includes a music language model for generating audio from lyrics and tags, a music codec, a lyrics transcription model, and an audio-text alignment model.

How the Generation Workflow Works

HeartMuLa’s workflow is based on text-driven control. Builders provide lyrics and comma-separated style tags, then adjust generation parameters to shape the output.

The workflow supports inputs including:

Lyrics or descriptive text for audio generation
Style, mood, or instrumentation tags
Maximum duration setting
Sampling parameters such as temperature and top-k
CFG scale to control how strongly the model follows the lyrics and tags

The output is an .mp3 audio file, with a listed duration range of 30 to 240 seconds. This makes HeartMuLa suitable for both short music outputs and longer generation tests.

0:00

/0:30

Core Capabilities

Text-to-audio music generation from lyrics and tags
Multilingual lyric support across English, Chinese, Japanese, Korean, and Spanish
Style guidance through comma-separated tags
Adjustable output duration from 30 to 240 seconds
Generation control through temperature, top-k, and CFG scale
.mp3 audio output for generated results
Related model-family components for music coding, lyrics transcription, and audio-text alignment

Key Technical Details

HeartMuLa is part of an open-source music foundation model family that includes HeartMuLa, HeartCodec, HeartTranscriptor, and HeartCLAP.

Key technical details include:

Model: HeartMuLa
Model type: Text-to-Audio
Generator version: 3B
Frameworks/formats: PyTorch, Safetensors, Diffusers
Input controls: lyrics, tags, maximum duration, temperature, top-k, and CFG scale
Output format: .mp3 audio file
Duration range: 30 to 240 seconds
Music codec: HeartCodec
Lyrics transcription: HeartTranscriptor
Audio-text alignment: HeartCLAP
License: Apache-2.0

HeartCodec is a 12.5 Hz music codec with high reconstruction fidelity. HeartTranscriptor is a lyrics transcription model, while HeartCLAP establishes a unified embedding space for music descriptions and cross-modal retrieval.

Where It Fits Best

Practical use cases include:

Lyrics-to-music generation
Multilingual music generation workflows
Style-guided music prototyping
Background music generation for creator workflows
Audio-text alignment research
Cross-modal music retrieval
Lyrics transcription workflows
Music generation experiments using adjustable sampling parameters

Download HeartMuLa on AIOZ AI

Start with a focused music generation task. Prepare a lyric set, define style or mood tags, choose the maximum duration, and adjust parameters such as temperature, top-k, and CFG scale.

Then evaluate how well the output follows the lyrics, reflects the selected style tags, and fits your target duration and workflow needs.

Download HeartMuLa on AIOZ AI and evaluate how it fits your own AI music generation setup.

FAQ

Q1: What is HeartMuLa used for?

It is used for music generation workflows, including lyrics-conditioned music generation, multilingual lyric workflows, and style-guided music prototyping.

Q2: Which languages does HeartMuLa support for lyrics?

The 3B generator supports multilingual lyrics across English, Chinese, Japanese, Korean, and Spanish.

Q3: How long can HeartMuLa generate audio?

The listed duration range is 30 to 240 seconds, based on the maximum duration setting on the AIOZ AI model page.

Q4: What components are included in the HeartMuLa family?

The family includes HeartMuLa for music generation, HeartCodec for music coding, HeartTranscriptor for lyrics transcription, and HeartCLAP for audio-text alignment.