About Me

I’m a final-year PhD student at DLVCLab, South China University of Technology, co-supervised by Prof. Lianwen Jin and Prof. Yuliang Liu. I was fortunate to work with Prof. Xiang Bai. Previously, I obtained my bachelor’s degree in Computer Science and Technology, South China University of Science and Technology.

I am actively seeking research positions in both industry and academia. My primary research focuses on the training and evaluation of multimodal large language models and document intelligent.

Research Highlights

📌My research focuses on Optical Character Recognition (OCR), document intelligent, and multi-modal large language models (MLLMs). My previous work can be categorized into three main areas:

Improve scene text spotting algorithms with better synergy between text detection and recognition, like SwinTextSpotter and SwinTextSpotter v2 and ESTextSpotter and Bridge Text Spotter.
Build evaluation benchmarks for OCR in large multimodal models, including OCRBench and OCRBench v2 and OCR-Reasoning.
Develop lightweight multi-modal models for OCR tasks, such as Mini-Monkey that achieves superior performance with only 2B parameters.

✉️ Welcome to contact me for any discussion and cooperation!

🔥 News

📝 Selected Works

Refer to my Google Scholar for the full list.

OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning.
Mingxin Huang, Yongxin Shi, Dezhi Peng, Songxuan Lai, Zecheng Xie, Lianwen Jin .
ICLR 2026.
[Paper][Code]
SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting.
Mingxin Huang, Dezhi Peng, Hongliang Li, Zhenghao Peng, Chongyu Liu, Dahua Lin, Yuliang Liu, Xiang Bai, Lianwen Jin.
IJCV 2025.
[Paper][Code]
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid.
Mingxin Huang, Yuliang Liu, Dingkang Liang, Lianwen Jin, Xiang Bai.
ICLR 2025.
[Paper][Code]
Bridging the Gap Between End-to-End and Two-Step Text Spotting.
Mingxin Huang, Hongliang Li, Yuliang Liu, Xiang Bai, Lianwen Jin.
CVPR 2024.
[Paper][Code]
ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer.
Mingxin Huang, Jiaxin Zhang, Dezhi Peng, Hao Lu, Can Huang, Yuliang Liu, Xiang Bai, Lianwen Jin.
ICCV 2023.
[Paper][Code]
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition.
Mingxin Huang, Yuliang Liu, Zhenghao Peng, Chongyu Liu, Dahua Lin, Shenggao Zhu, Nicholas Yuan, Kai Ding, Lianwen Jin.
CVPR 2022.
[Paper][Code]
OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning.
Mingxin Huang, Yongxin Shi, Dezhi Peng, Songxuan Lai, Zecheng Xie, Lianwen Jin.
arXiv 2025.
[Paper][Code]
VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization.
Yuliang Liu, Mingxin Huang, Hao Yan, Linger Deng, Weijia Wu, Hao Lu, Chunhua Shen, Lianwen Jin, Xiang Bai.
TPAMI 2025.
[Paper][Code]
OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models.
Yuliang Liu, Zhang Li, Mingxin Huang, Biao Yang, Wenwen Yu, Chunyuan Li, XuCheng Yin, ChengLin Liu, Lianwen Jin, Xiang Bai.
SCIS 2024.
[Paper][Code]

📚 Academic Services

Reviewer for top-tier conferences and journals: NeurIPS, ICLR, CVPR, ICCV, AAAI, IJCV, TIP, ACMMM.

Mingxin Huang

Research Highlights

🔥 News

📝 Selected Works

📚 Academic Services