HeartMuLa: A Family of Open-Source Music Foundation Models

HeartMuLa: A Family of Open-Source Music Foundation Models

TL;DR

HeartMuLa is a family of open-source music foundation models for music generation workflows. The 3B generator supports music generation from lyrics and comma-separated style tags, with multilingual lyric support across English, Chinese, Japanese, Korean, and Spanish. The model family also includes HeartCodec for music coding, HeartTranscriptor for lyrics transcription, and HeartCLAP for audio-text alignment.

What HeartMuLa Is

HeartMuLa, now listed on AIOZ AI, is a music foundation model family designed for lyrics-conditioned music generation and related audio-language workflows.

The family includes a music language model for generating audio from lyrics and tags, a music codec, a lyrics transcription model, and an audio-text alignment model.

How the Generation Workflow Works

HeartMuLa’s workflow is based on text-driven control. Builders provide lyrics and comma-separated style tags, then adjust generation parameters to shape the output.

The workflow supports inputs including:

  1. Lyrics or descriptive text for audio generation
  2. Style, mood, or instrumentation tags
  3. Maximum duration setting
  4. Sampling parameters such as temperature and top-k
  5. CFG scale to control how strongly the model follows the lyrics and tags

The output is an .mp3 audio file, with a listed duration range of 30 to 240 seconds. This makes HeartMuLa suitable for both short music outputs and longer generation tests.

0:00
/0:30

Core Capabilities

  • Text-to-audio music generation from lyrics and tags
  • Multilingual lyric support across English, Chinese, Japanese, Korean, and Spanish
  • Style guidance through comma-separated tags
  • Adjustable output duration from 30 to 240 seconds
  • Generation control through temperature, top-k, and CFG scale
  • .mp3 audio output for generated results
  • Related model-family components for music coding, lyrics transcription, and audio-text alignment

Key Technical Details

HeartMuLa is part of an open-source music foundation model family that includes HeartMuLa, HeartCodec, HeartTranscriptor, and HeartCLAP.

Key technical details include:

  • Model: HeartMuLa
  • Model type: Text-to-Audio
  • Generator version: 3B
  • Frameworks/formats: PyTorch, Safetensors, Diffusers
  • Input controls: lyrics, tags, maximum duration, temperature, top-k, and CFG scale
  • Output format: .mp3 audio file
  • Duration range: 30 to 240 seconds
  • Music codec: HeartCodec
  • Lyrics transcription: HeartTranscriptor
  • Audio-text alignment: HeartCLAP
  • License: Apache-2.0

HeartCodec is a 12.5 Hz music codec with high reconstruction fidelity. HeartTranscriptor is a lyrics transcription model, while HeartCLAP establishes a unified embedding space for music descriptions and cross-modal retrieval.

Where It Fits Best

Practical use cases include:

  • Lyrics-to-music generation
  • Multilingual music generation workflows
  • Style-guided music prototyping
  • Background music generation for creator workflows
  • Audio-text alignment research
  • Cross-modal music retrieval
  • Lyrics transcription workflows
  • Music generation experiments using adjustable sampling parameters

Download HeartMuLa on AIOZ AI

Start with a focused music generation task. Prepare a lyric set, define style or mood tags, choose the maximum duration, and adjust parameters such as temperature, top-k, and CFG scale.

Then evaluate how well the output follows the lyrics, reflects the selected style tags, and fits your target duration and workflow needs.

Download HeartMuLa on AIOZ AI and evaluate how it fits your own AI music generation setup.

FAQ

Q1: What is HeartMuLa used for?

It is used for music generation workflows, including lyrics-conditioned music generation, multilingual lyric workflows, and style-guided music prototyping.

Q2: Which languages does HeartMuLa support for lyrics?

The 3B generator supports multilingual lyrics across English, Chinese, Japanese, Korean, and Spanish.

Q3: How long can HeartMuLa generate audio?

The listed duration range is 30 to 240 seconds, based on the maximum duration setting on the AIOZ AI model page.

Q4: What components are included in the HeartMuLa family?

The family includes HeartMuLa for music generation, HeartCodec for music coding, HeartTranscriptor for lyrics transcription, and HeartCLAP for audio-text alignment.