News
-
September, 2025:
I will soon be on the industry research job market with a planned graduation date of May 2026. Feel free to reach out regarding research positions related to vision-language models and visual representation learning!
-
September, 2025:
🧩 Partonomy was accepted to NeurIPS 2025 with a spotlight. See you in San Diego!
-
February, 2025:
🔍 SearchDet, which uses the methods we developed for MIRACLE, was accepted to CVPR 2025. See you in Nashville!
|
Research
My work is broadly in vision and language and decomposable object recognition.
Recently, this has included developing a multimodal large language model which segments parts, introducing a training-free method for few-shot object detection, and building a system which uses symbolic object representations with knowledge-graphs and part decomposition for image recognition.
I am especially interested in improving LLMs' capabilities to reason about problems not easily
represented in language (e.g. latent reasoning).
|
|
PARTONOMY: Large Multimodal Models with Part-Level Visual Understanding
Ansel Blume*,
Jeonghwan Kim*,
Hyeonjeong Ha,
Elen Chatikyan,
Xiaomeng Jin,
Khanh Duy Nguyen,
Nanyun Peng,
Kai-Wei Chang,
Derek Hoiem,
Heng Ji
Spotlight @ NeurIPS, 2025
arXiv / Code and dataset coming soon!
Large multimodal models (LMMs) with the ability to segment, segmenting LMMs, have poor part understanding. We introduce the explanatory part segmentation task and construct the Partonomy part understanding dataset to evaluate it. We use Partonomy to train PLUM, a segmenting LMM with strong part segmentation abilities which outperforms other segmenting LMMs on general reasoning and segmentation tasks.
|
|
Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval
Mankeerat Sidhu,
Hetarth Chopra,
Ansel Blume,
Jeonghwan Kim,
Revanth Gangi Reddy,
Heng Ji
CVPR, 2025
arXiv / Github
Exemplar images from the web can be used for highly effective training-free long-tail object detection by combining embedding heatmaps with SAM regions.
This method far surpasses SOTA few-shot methods on several benchmarks.
|
|
MIRACLE: An Online, Explainable Multimodal Interactive Concept Learning System
Ansel Blume*,
Khanh Duy Nguyen*,
Zhenhailong Wang,
Yangyi Chen,
Michal Shlapentokh-Rothman,
Xiaomeng Jin,
Jeonghwan Kim,
Zhen Zhu,
Jiateng Liu,
Kuan-Hao Huang,
Mankeerat Sidhu,
Xuanming Zhang,
Vivian Liu,
Raunak Sinha,
Te-Lin Wu,
Abhay Zala,
Elias Stengel-Eskin,
Da Yin,
Yao Xiao,
Utkarsh Mall,
Zhou Yu,
Kai-Wei Chang,
Camille Cobb,
Karrie Karahalios,
Lydia Chilton,
Mohit Bansal,
Nanyun Peng,
Carl Vondrick,
Derek Hoiem,
Heng Ji
ACM MM Technical Demos, 2024
ACM Page / Github
We developed MIRACLE, an interactive system for object recognition that learns concepts in real-time,
highlighting key regions that distinguish objects from one another.
|
|
Region-based Representations Revisited
Michal Shlapentokh-Rothman*,
Ansel Blume*,
Yao Xiao,
Yuqun Wu,
Sethuraman TV,
Heyi Tao,
Jae Yong Lee,
Wilfredo Torres,
Yu-Xiong Wang,
Derek Hoiem
CVPR, 2024
arXiv / Project Page
Region features constructed by average pooling image features over SAM regions are effective on a wide range of downstream tasks.
|
|
Generative Models for Product Attribute Extraction
Ansel Blume,
Nasser Zalmout,
Heng Ji,
Xian Li
EMNLP Industry Track, 2023
ACL Page
Generative language models can outperform extractive product attribute extraction models while having greater
data efficiency and the unique ability to detect implied attributes.
|
|
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Zhenhailong Wang,
Ansel Blume,
Sha Li,
Genglin Liu,
Jaemin Cho,
Zineng Tang,
Mohit Bansal,
Heng Ji
Spotlight @ NeurIPS, 2023
arXiv / Github
Video-language foundation models are highly biased towards using objects for action recognition, as opposed to actually
analyzing the action itself. Paxion proposes a training scheme that improves action recognition without harming performance on downstream tasks.
|
|
Measuring Security Practices and How They Impact Security
Louis F. DeKoven,
Audrey Randall,
Ariana Mirian,
Gautam Akiwate,
Ansel Blume,
Lawrence K. Saul,
Aaron Schulman,
Geoffrey M. Voelker,
Stefan Savage
IMC, 2019
ACM Page
A large scale study on factors and security practices that help to prevent system compromise in practice.
|
|