Research
My work is broadly in vision-language and structured neural reasoning.
Recently, this has included developing symbolic object representations with knowledge-graphs for image recognition, emphasizing
interpretability via part decomposition and attribute recognition.
On the language side, I am especially interested in improving LLMs' capabilities to reason about problems not easily
represented in language (e.g. reasoning about structured state spaces with applications in robotics).
|
|
Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval
Mankeerat Sidhu,
Hetarth Chopra,
Ansel Blume,
Jeonghwan Kim,
Revanth Gangi Reddy,
Heng Ji
CVPR, 2025
arXiv
Exemplar images from the web can be used for highly effective training-free long-tail object detection by combining embedding heatmaps with SAM regions.
This method far surpasses SOTA few-shot methods on several benchmarks.
|
|
MIRACLE: An Online, Explainable Multimodal Interactive Concept Learning System
Ansel Blume*,
Khanh Duy Nguyen*,
Zhenhailong Wang,
Yangyi Chen,
Michal Shlapentokh-Rothman,
Xiaomeng Jin,
Jeonghwan Kim,
Zhen Zhu,
Jiateng Liu,
Kuan-Hao Huang,
Mankeerat Sidhu,
Xuanming Zhang,
Vivian Liu,
Raunak Sinha,
Te-Lin Wu,
Abhay Zala,
Elias Stengel-Eskin,
Da Yin,
Yao Xiao,
Utkarsh Mall,
Zhou Yu,
Kai-Wei Chang,
Camille Cobb,
Karrie Karahalios,
Lydia Chilton,
Mohit Bansal,
Nanyun Peng,
Carl Vondrick,
Derek Hoiem,
Heng Ji
ACM MM Technical Demos, 2024
ACM Page / Github
We developed MIRACLE, an interactive system for object recognition that learns concepts in real-time,
highlighting key regions that distinguish objects from one another.
|
|
Region-based Representations Revisited
Michal Shlapentokh-Rothman*,
Ansel Blume*,
Yao Xiao,
Yuqun Wu,
Sethuraman TV,
Heyi Tao,
Jae Yong Lee,
Wilfredo Torres,
Yu-Xiong Wang,
Derek Hoiem
CVPR, 2024
arXiv / Project Page
Region features constructed by average pooling image features over SAM regions are effective on a wide range of downstream tasks.
|
|
Generative Models for Product Attribute Extraction
Ansel Blume,
Nasser Zalmout,
Heng Ji,
Xian Li
EMNLP Industry Track, 2023
ACL Page
Generative language models can outperform extractive product attribute extraction models while having greater
data efficiency and the unique ability to detect implied attributes.
|
|
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Zhenhailong Wang,
Ansel Blume,
Sha Li,
Genglin Liu,
Jaemin Cho,
Zineng Tang,
Mohit Bansal,
Heng Ji
NeurIPS, 2023
arXiv / Github
Video-language foundation models are highly biased towards using objects for action recognition, as opposed to actually
analyzing the action itself. Paxion proposes a training scheme that improves action recognition without harming performance on downstream tasks.
|
|
Measuring Security Practices and How They Impact Security
Louis F. DeKoven,
Audrey Randall,
Ariana Mirian,
Gautam Akiwate,
Ansel Blume,
Lawrence K. Saul,
Aaron Schulman,
Geoffrey M. Voelker,
Stefan Savage
IMC, 2019
ACM Page
A large scale study on factors and security practices that help to prevent system compromise in practice.
|
|