Experience breakthrough voice AI with Fun Audio Chat, the state-of-the-art 8B-parameter large audio language model released by FunAudioLLM in December 2025.
Download Fun Audio Chat today and deploy industry-leading speech recognition (ASR), audio understanding, and low-latency voice interaction capabilities that rank #1 among similar-scale models on OpenAudioBench and VoiceBench.
Fun Audio Chat represents a groundbreaking advancement in voice AI technology. Developed by FunAudioLLM and released on December 23, 2025, this Large Audio Language Model brings together 8 billion parameters optimized specifically for natural, low-latency voice interactions. At its core, Fun Audio Chat leverages innovative Dual-Resolution Speech Representations (DRSR) architecture combined with Core-Cocktail training methodology to deliver exceptional performance across spoken question answering, audio understanding, speech function calling, and voice empathy recognition. The model achieves top rankings among all ~8B parameter models on industry-standard benchmarks including OpenAudioBench, VoiceBench, UltraEval-Audio, MMAU, MMAU-Pro, and MMSU. What sets Fun Audio Chat apart is its efficient 5Hz frame rate processing that reduces GPU computational requirements by nearly 50% compared to traditional 12.5Hz or 25Hz audio models, while simultaneously maintaining superior speech quality with a UTMOS score of 4.37 and an impressive ASR Word Error Rate (WER) of just 4.32%. This makes Fun Audio Chat the ideal choice for developers seeking production-ready voice AI capabilities without excessive infrastructure costs.
Fun Audio Chat employs a sophisticated grouping mechanism that maps 25Hz audio tokens to 5Hz speech representations, enabling the shared LLM backbone to process audio at an ultra-efficient 5Hz frame rate while maintaining 25Hz output quality through a refined head architecture.
Achieving 76.6% on MMAU, 58.0% on MMAU-Pro, and outperforming strong open-source baselines including Kimi-Audio, Audio-Flamingo-3, MiMo-Audio, and Step-Audio2-Mini across comprehensive audio understanding tasks.
Built for real-time voice conversations, Fun Audio Chat delivers natural speech-to-speech interactions with minimal delay, making it perfect for AI assistants, customer service bots, and interactive voice applications requiring human-like responsiveness.
Powered by Core-Cocktail training that preserves strong text LLM capabilities alongside advanced audio processing, Fun Audio Chat supports both English and Mandarin languages with scores of 3.35 and 3.46 respectively on speech instruction-following tasks.
Experience unmatched voice AI performance with Fun Audio Chat - ranking #1 among 8B models on spoken question answering, audio understanding, and speech instruction-following benchmarks while delivering 50% GPU cost savings through efficient architecture.
Get started with Fun Audio Chat in four straightforward steps - from download to deployment:
Download the Fun Audio Chat 8B model from either Hugging Face (FunAudioLLM/Fun-Audio-Chat-8B) or ModelScope repositories. You'll also need Fun-CosyVoice3-0.5B-2512 for speech synthesis. Install system dependencies including Python 3.12, PyTorch 2.8.0 with CUDA 12.8 support, PyTorch Audio 2.8.0, and ffmpeg for audio processing capabilities.
Clone the FunAudioLLM/Fun-Audio-Chat repository with recursive submodules using 'git clone --recurse-submodules'. Create a dedicated conda environment named FunAudioChat with Python 3.12, activate it, then install PyTorch packages and all requirements. Set your PYTHONPATH to the project root directory to ensure proper module imports.
Configure Fun Audio Chat for your specific use case - whether speech-to-text ASR applications or speech-to-speech voice interaction systems. Leverage the efficient 5Hz DRSR architecture that reduces computational overhead by 50% while maintaining exceptional accuracy. Customize parameters for your target languages (English or Mandarin) and performance requirements.
Launch Fun Audio Chat using the provided inference examples - run 'python examples/infer_s2t.py' for speech-to-text transcription or 'python examples/infer_s2s.py' for full speech-to-speech conversations. Enable natural voice interactions with spoken question answering, audio understanding, empathetic responses, and speech function calling in your production applications.
Comprehensive voice AI capabilities powered by 8B parameters and innovative DRSR architecture.
Fun Audio Chat delivers exceptional automatic speech recognition performance with just 4.32% Word Error Rate on benchmark datasets. The ASR system supports both English and Mandarin languages with dual-resolution processing that converts speech to text with accuracy rivaling commercial systems while requiring 50% fewer GPU resources.
Ranks #1 among all ~8B parameter models on OpenAudioBench and VoiceBench for spoken QA tasks. Fun Audio Chat directly processes and answers voice queries without intermediate text conversion, achieving superior performance over Kimi-Audio, Audio-Flamingo-3, and other open-source competitors.
Achieves industry-leading results on MMAU (76.6%), MMAU-Pro (58.0%), and MMSU benchmarks for audio comprehension. Fun Audio Chat analyzes complex audio signals for emotion detection, speaker recognition, acoustic event classification, and contextual understanding across diverse audio modalities beyond just speech.
Integrates voice commands with backend systems through Speech-ACEBench and Speech-BFCL tested capabilities. Fun Audio Chat enables users to trigger application functions, database queries, API calls, and smart home controls through natural voice commands with high accuracy and low error rates.
Outperforms GPT-Audio on both Semantics-based Empathy and Paralinguistic-Cue-based Empathy metrics. Fun Audio Chat detects emotional states from tone, pitch, pace, and semantic content, enabling AI assistants to provide contextually appropriate empathetic responses that feel genuinely human and supportive.
Fun Audio Chat's innovative 5Hz frame rate architecture reduces GPU training hours by nearly 50% compared to traditional 12.5Hz, 16.67Hz, or 25Hz models. Requiring only ~24GB GPU memory for inference, the model delivers enterprise-grade performance on consumer hardware while maintaining UTMOS speech quality scores of 4.37.
Real experiences from AI engineers deploying Fun Audio Chat in production applications.
We evaluated multiple audio language models and Fun Audio Chat consistently outperformed on every benchmark. The download from Hugging Face was straightforward, and we had our voice assistant running within hours. The 4.32% WER is phenomenal for our customer service application.
Sarah Martinez
Senior AI Engineer, VoiceTech Solutions
The FunAudioLLM team's DRSR architecture is genuinely innovative. Achieving top rankings on OpenAudioBench and VoiceBench while using 50% fewer GPU hours is remarkable. Fun Audio Chat has become our go-to model for speech research projects.
David Chen
ML Research Scientist, Stanford AI Lab
Fun Audio Chat's audio understanding capabilities exceeded our expectations. The 76.6% MMAU score isn't just a number - we see real-world improvements in emotion detection and context awareness. Our users consistently praise the natural, empathetic voice interactions.
Emma Williams
Product Lead, SpeechAI Startup
Integrating Fun Audio Chat was simpler than expected thanks to excellent documentation and inference examples. The speech-to-speech capability works flawlessly for our telemedicine platform. Patients appreciate the low-latency, natural conversations with our AI assistant.
Michael Brown
Full-Stack Developer, HealthTech Corp
We needed commercial-friendly licensing and Fun Audio Chat's Apache 2.0 license was perfect. The speech function calling works reliably for triggering backend workflows through voice commands. Running on just 24GB GPU memory keeps our infrastructure costs reasonable.
Lisa Anderson
AI Solutions Architect, Enterprise Software
After comparing Fun Audio Chat with Kimi-Audio and Baichuan-Audio, the choice was clear. Superior performance on spoken QA benchmarks plus multilingual support for English and Mandarin made it ideal for our international customer base. Deployment was production-ready from day one.
James Taylor
CTO, Global Contact Center Platform
Subscribe to receive tutorials, benchmark results, model updates, and implementation tips for Fun Audio Chat and FunAudioLLM technology.
Everything you need to know about deploying and using Fun Audio Chat for voice AI applications.
Need more help? Contact our support team
Download the #1 ranked voice AI model and transform your applications with natural speech interactions.