Publications

ReFAct: Empowering Multimodal Web Agents with Visual and Context Focusing

R. Wu*, S. Zhang*, X. Tang, R. Zhang, Y. Liu, T. Jiang, W. Xu, and Y. Li

IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2026

A focusing framework for multimodal web agents that improves visual grounding and context selection during dynamic web tasks.