Skip to content

IndexTTS:工业级可控高效的零样本文本转语音系统

Published:

原文链接


IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech SystemIndexTTS:工业级可控高效零样本文本转语音系统

👉🏻 IndexTTS 👈🏻  👉🏻 索引 TTS 👈🏻

[HuggingFace Demo] [ModelScope Demo][HuggingFace 演示][ModelScope 演示]
[Paper] [Demos][论文][演示]

IndexTTS is a GPT-style text-to-speech (TTS) model mainly based on XTTS and Tortoise. It is capable of correcting the pronunciation of Chinese characters using pinyin and controlling pauses at any position through punctuation marks. We enhanced multiple modules of the system, including the improvement of speaker condition feature representation, and the integration of BigVGAN2 to optimize audio quality. Trained on tens of thousands of hours of data, our system achieves state-of-the-art performance, outperforming current popular TTS systems such as XTTS, CosyVoice2, Fish-Speech, and F5-TTS.IndexTTS 是一种主要基于 XTTS 和 Tortoise 的 GPT 风格的文本转语音 (TTS) 模型。它能够使用拼音纠正汉字的发音,并通过标点符号控制任意位置的停顿。我们增强了系统的多个模块,包括改进扬声器条件特征表示,以及集成 BigVGAN2 以优化音频质量。我们的系统经过数万小时的数据训练,实现了最先进的性能,优于当前流行的 TTS 系统,如 XTTS、CosyVoice2、Fish-Speech 和 F5-TTS。Experience IndexTTS: Please contact [email protected] for more detailed information.体验指数 TTS:请联系 [email protected] 了解更多详细信息。

Contact  联系

QQ群(二群):1048202584
Discord:https://discord.gg/uT32E7KDmy不和谐:https://discord.gg/uT32E7KDmy
简历:[email protected]
欢迎大家来交流讨论!

📣 Updates  📣 更新

🖥️ Method  🖥️ 方法

The overview of IndexTTS is shown as follows.IndexTTS 概述如下图所示。

The main improvements and contributions are summarized as follows:主要改进和贡献总结如下:

Model Download  模型下载

🤗HuggingFace  �� 拥抱脸ModelScope  模型作用域
IndexTTS  索引 TTSIndexTTS  索引 TTS
😁IndexTTS-1.5  😁索引 TTS-1.5IndexTTS-1.5  索引 TTS-1.5

📑 Evaluation  📑 评估

Word Error Rate (WER) Results for IndexTTS and Baseline Models on the seed-test种子测试中 IndexTTS 和基线模型的单词错误率 (WER) 结果

WERtest_zhtest_entest_hard
Human  人1.262.14-
SeedTTS  种子 TTS1.0021.9456.243
CosyVoice 2  舒适之声 21.452.576.83
F5TTS1.561.838.67
FireRedTTS  火红 TTS1.513.8217.45
MaskGCT2.272.6210.27
Spark-TTS  火花-TTS1.21.98-
MegaTTS 3  超级 TTS 31.361.82-
IndexTTS  索引 TTS0.9371.9366.831
IndexTTS-1.5  索引 TTS-1.50.8211.6066.565

Word Error Rate (WER) Results for IndexTTS and Baseline Models on the other opensource test另一个开源测试中 IndexTTS 和基线模型的单词错误率 (WER) 结果

Modelaishell1_testcommonvoice_20_test_zhcommonvoice_20_test_enlibrispeech_test_cleanavg
Human  人2.09.510.02.45.1
CosyVoice 2  舒适之声 21.89.17.34.95.9
F5TTS3.911.75.47.88.2
Fishspeech  鱼语2.411.48.88.08.3
FireRedTTS  火红 TTS2.211.016.35.77.7
XTTS3.011.47.13.56.0
IndexTTS  索引 TTS1.37.05.32.13.7
IndexTTS-1.5  索引 TTS-1.51.26.83.91.73.1

Speaker Similarity (SS) Results for IndexTTS and Baseline ModelsIndexTTS 和基线模型的说话人相似度 (SS) 结果

Modelaishell1_testcommonvoice_20_test_zhcommonvoice_20_test_enlibrispeech_test_cleanavg
Human  人0.8460.8090.8200.8580.836
CosyVoice 2  舒适之声 20.7960.7430.7420.8370.788
F5TTS0.7430.7470.7460.8280.779
Fishspeech  鱼语0.4880.5520.6220.7010.612
FireRedTTS  火红 TTS0.5790.5930.5870.6980.631
XTTS0.5730.5860.6480.7610.663
IndexTTS  索引 TTS0.7440.7420.7580.8230.776
IndexTTS-1.5  索引 TTS-1.50.7410.7220.7530.8190.771

MOS Scores for Zero-Shot Cloned Voice零样本克隆语音的 MOS 分数

ModelProsody  韵律Timbre  音色Quality  质量AVG
CosyVoice 2  舒适之声 23.674.053.733.81
F5TTS3.563.883.563.66
Fishspeech  鱼语3.403.633.693.57
FireRedTTS  火红 TTS3.793.723.603.70
XTTS3.232.993.103.11
IndexTTS  索引 TTS3.794.204.054.01

Usage Instructions  使用说明

Environment Setup  环境设置

  1. Download this repository:下载此存储库:
git clone https://github.com/index-tts/index-tts.git
  1. Install dependencies:  安装依赖项:

Create a new conda environment and install dependencies:创建新的 conda 环境并安装依赖项:

conda create -n index-tts python=3.10
conda activate index-tts
apt-get install ffmpeg
# or use conda to install ffmpeg
conda install -c conda-forge ffmpeg

Install PyTorch, e.g.:安装 PyTorch,例如:

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118

Note  注意

If you are using Windows you may encounter an error when installing pynini: ERROR: Failed building wheel for pynini In this case, please install pynini via conda:如果您使用的是 Windows,则在安装 pynini 时可能会遇到错误 ERROR: Failed building wheel for pynini 在这种情况下,请通过 conda 安装 pynini

# after conda activate index-tts
conda install -c conda-forge pynini==2.1.6
pip install WeTextProcessing --no-deps

Install IndexTTS as a package:将 IndexTTS 作为包安装:

cd index-tts
pip install -e .
  1. Download models:  下载型号:

Download by huggingface-cli:通过 huggingface-cli 下载:

huggingface-cli download IndexTeam/IndexTTS-1.5 \
  config.yaml bigvgan_discriminator.pth bigvgan_generator.pth bpe.model dvae.pth gpt.pth unigram_12000.vocab \
  --local-dir checkpoints

Recommended for China users. 如果下载速度慢,可以使用镜像:

export HF_ENDPOINT="https://hf-mirror.com"

Or by wget:  或者通过 wget

wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/bigvgan_discriminator.pth -P checkpoints
wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/bigvgan_generator.pth -P checkpoints
wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/bpe.model -P checkpoints
wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/dvae.pth -P checkpoints
wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/gpt.pth -P checkpoints
wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/unigram_12000.vocab -P checkpoints
wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/config.yaml -P checkpoints

Note  注意

If you prefer to use the IndexTTS-1.0 model, please replace IndexTeam/IndexTTS-1.5 with IndexTeam/IndexTTS in the above commands.如果您更喜欢使用 IndexTTS-1.0 模型,请在上述命令中将 IndexTeam/IndexTTS-1.5 替换为 IndexTeam/IndexTTS

  1. Run test script:  运行测试脚本:
# Please put your prompt audio in 'test_data' and rename it to 'input.wav'
python indextts/infer.py
  1. Use as command line tool:用作命令行工具:
# Make sure pytorch has been installed before running this command
indextts "大家好,我现在正在bilibili 体验 ai 科技,说实话,来之前我绝对想不到!AI技术已经发展到这样匪夷所思的地步了!" \
  --voice reference_voice.wav \
  --model_dir checkpoints \
  --config checkpoints/config.yaml \
  --output output.wav

Use --help to see more options.使用 --help 查看更多选项。

indextts --help

Web Demo  网络演示

pip install -e ".[webui]" --no-build-isolation
python webui.py

# use another model version:
python webui.py --model_dir IndexTTS-1.5

Open your browser and visit http://127.0.0.1:7860 to see the demo.打开浏览器并访问 http://127.0.0.1:7860 查看演示。

Sample Code  示例代码

from indextts.infer import IndexTTS
tts = IndexTTS(model_dir="checkpoints",cfg_path="checkpoints/config.yaml")
voice="reference_voice.wav"
text="大家好,我现在正在bilibili 体验 ai 科技,说实话,来之前我绝对想不到!AI技术已经发展到这样匪夷所思的地步了!比如说,现在正在说话的其实是B站为我现场复刻的数字分身,简直就是平行宇宙的另一个我了。如果大家也想体验更多深入的AIGC功能,可以访问 bilibili studio,相信我,你们也会吃惊的。"
tts.infer(voice, text, output_path)

Acknowledge  承认

  1. tortoise-tts  -TTS
  2. XTTSv2
  3. BigVGAN  大 VGAN
  4. wenet  韦内特
  5. icefall  冰瀑

📚 Citation  📚 引文

🌟 If you find our work helpful, please leave us a star and cite our paper.🌟 如果您觉得我们的工作有帮助,请给我们留下星号并引用我们的论文。

@article{deng2025indextts,
  title={IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System},
  author={Wei Deng, Siyi Zhou, Jingchen Shu, Jinchao Wang, Lu Wang},
  journal={arXiv preprint arXiv:2502.05512},
  year={2025}
}

Previous Post
Trzsz-ssh (tssh) 中文文档:高效文件传输工具
Next Post
F5-TTS:一个伪造流利与忠实语音的童话故事生成工具