Skip to content

DiffSensei: 利用多模态LLM和扩散模型进行个性化漫画生成的实现

Published:

原文链接


DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

arXiv Project Page Video Checkpoint Dataset

Page results caption1

Page results1

Page results2

More demos are in our project page.

A story about LeCun, Hinton, and Benjio winning the Novel Prize…

Long story

🚀 TL;DR

DiffSensei can generate controllable black-and-white manga panels with flexible character adaptation.

Key Features:

🎉 News

🛠️ Quick Start

Installation

# Create a new environment with Conda
conda create -n diffsensei python=3.11
conda activate diffsensei
# Install Pytorch and Diffusers related packages
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -c conda-forge diffusers transformers accelerate
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu121
# Install other dependencies
pip install -r requirements.txt
# Third-party repo for running the gradio demo
pip install gradio-image-prompter

Model Download

Download our DiffSensei model from huggingface and place it in the checkpoints folder like this:

If you plan not to use the MLLM component, you can download the model without the MLLM component and use the gradio_wo_mllm.py to produce your results.

checkpoints
  |- diffsensei
    |- image_generator
      |- ...
    |- mllm
      |- ...

Inference with Gradio

We provide gradio demo for inferencing DiffSensei.

CUDA_VISIBLE_DEVICES=0 \
python -m scripts.demo.gradio \
  --config_path configs/model/diffsensei.yaml \
  --inference_config_path configs/inference/diffsensei.yaml \
  --ckpt_path checkpoints/diffsensei

We also offer a version without MLLM, designed for lower memory usage. If you choose this version, you can skip downloading the MLLM component in the checkpoint, significantly reducing memory consumption. (Can be run on a single 24GB 4090 GPU with batch-size=1 for small or medium panel sizes). While this version may have slightly reduced text compatibility, the overall quality remains largely unaffected.

CUDA_VISIBLE_DEVICES=0 \
python -m scripts.demo.gradio_wo_mllm \
  --config_path configs/model/diffsensei.yaml \
  --inference_config_path configs/inference/diffsensei.yaml \
  --ckpt_path checkpoints/diffsensei

Please be patient. Try more prompts, characters, and random seeds, and download your favored manga panels! 🤗

The MangaZero Dataset

For license issues, we cannot directly share the images. Instead, we provide the manga image urls (in MangaDex) and annotations of our MangaZero dataset. Note that the released version of MangaZero is about 3/4 of the full dataset used for training. The missing images is because some urls are not available. For similar usage for manga data, we strongly encourage everyone who is interested to collect their dataset freely from MangaDex, following the instruction of MangaDex API.

Please download MangaZero from Huggingface.

After downloading the annotation file, please place the annotation file in data/mangazero/annotations.json and run scripts/dataset/download_mangazero.py to download and organize the images.

python -m scripts.dataset.download_mangazero \
  --ann_path data/mangazero/annotations.json \
  --output_image_root data/mangazero/images

Citation

article{wu2024diffsensei,
  title={DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation},
  author={Jianzong Wu, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, and Yunhai Tong},
  journal={arXiv preprint arXiv:2412.07589},
  year={2024},
}

Star History Chart


Previous Post
如何使用Gmail、Resend和Cloudflare搭建免费的企业邮箱
Next Post
StoryWeaver: 统一世界模型实现知识增强的故事角色定制