Sign Language Production (SLP) is the process of converting the complex input text into a real video. Most previous works focused on the Text2Gloss, Gloss2Pose, Pose2Vid stages, and some concentrated on Prompt2Gloss and Text2Avatar stages. However, this field has made slow progress due to the inaccuracy of text conversion, pose generation, and the rendering of poses into real human videos in these stages, resulting in gradually accumulating errors. Therefore, in this paper, we streamline the traditional redundant structure, simplify and optimize the task objective, and design a new sign language generative model called Stable Signer. It redefines the SLP task as a hierarchical generation end-to-end task that only includes text understanding (Prompt2Gloss, Text2Gloss) and Pose2Vid, and executes text understanding through our proposed new Sign Language Understanding Linker called SLUL, and generates hand gestures through the named SLP-MoE hand gesture rendering expert block to end-to-end generate high-quality and multi-style sign language videos. SLUL is trained using the newly developed Semantic-Aware Gloss Masking Loss (SAGM Loss). Its performance has improved by 48.6% compared to the current SOTA generation methods, which is a significant increase in the SLP field.
We present some latest results of our newly developed SLP model. Our new work aims to combine the most stable and fastest Text2Pose sign language model with state-of-the-art Pose2Video model to help more disadvantaged groups.
We are the team with the fastest progress and best results in the field of multilingual generation. While the United Nations recognizes sign language as one of its official languages, the deaf community still faces significant communication challenges. Currently, mainstream academia can only generate German Sign Language and American Sign Language, overlooking a broader population worldwide. We plan to target more than eight sign languages for our generation system, which would cover over fifty countries.
If you have any regular or academic cooperation on SLP, please contact me directly (Sen Fang). If you have funding support or want to sponsor our laboratory, you can send an email to my advisor and copy me as well. The code will be commercialized and will not be made public.
|
ASL
Input: Here's a little example here I'm going to give you some insight on though.
Translation: Here's a little example here I'm going to give you some insight on though. |
DGS
Input: Im Südostraum ist es vorteilhaft, ein Hochdruckgebiet kommt von Belgien.
Translation: In the southeastern region it is advantageous, a high-pressure system is coming from Belgium. |
KSL
Input: 염려하다.
Translation: worried about. |
DSGS
Input: Bei dieser Firma bestelle ich immer die Plastikbecher.
Translation: I always order plastic cups from this company. |
|
LSF-CH
Input: Le leader du parti socialiste.
Translation: The leader of the Socialist Party. |
LIS-CH
Input: Io voglio comprare una nuova borsa.
Translation: I want to buy a new bag. |
LSA
Input: desayuno.
Translation: breakfast. |
TSL
Input: Animasyon atölyesinde 9-12 yaş arası işitme engelli çocuklar animasyon sanatını öğrenecekler.
Translation: Children who are deaf or hard of hearing (aged 9-12) will learn about animation art at the animation workshop. |
| QP: Synthetic sign language video obtained after processing using style transfer model for reference. | |||
Reference Image
|
|||
Reference Image
|
|||
Reference Image
|
|||
Reference Image
|
|||
Reference Image
|
|||
Reference Image
|
|||
Reference Image
|
|||
Reference Image
|
|||
@misc{fang2025stablesignerhierarchicalsign,
title={Stable Signer: Hierarchical Sign Language Generative Model},
author={Sen Fang and Yalin Feng and Hongbin Zhong and Yanxin Zhang and Dimitris N. Metaxas},
year={2025},
eprint={2512.04048},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.04048},
}
@InProceedings{Fang_2025_ICCV,
author = {Fang, Sen and Chen, Chen and Wang, Lei and Zheng, Ce and Sui, Chunyu and Tian, Yapeng},
title = {SignLLM: Sign Language Production Large Language Models},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {October},
year = {2025},
pages = {6622-6634}
}