Seongchan Kim (김성찬)
Computer Vision & World Understanding · Integrated M.S./Ph.D. @ KAIST AI / GSAI
Hello, 안녕하세요 👋

Understanding the World through Computer Vision and Multimodal Intelligence

I explore how machines can understand and interpret the visual world through video analysis, object interactions, and language-grounded perception. My research spans from fine-grained video segmentation to interaction-aware generation, aiming to build AI systems that truly comprehend visual scenes.

Computer Vision
Video Understanding
Multimodal Learning
Visual Scene Analysis
Research teaser showcasing computer vision work

Publications

Self-Evolving Neural Radiance Fields
Wild3D Workshop @ ICCV 2025
MUG-VOS: Multi-Granularity Video Object Segmentation
AAAI 2025
Referring Video Object Segmentation via Language Aligned Track Selection
arXiv 2025
InterRVOS: Interaction-aware Referring Video Object Segmentation
Under review at AAAI 2026
MATRIX: Mask Track Alignment for Interaction-Aware Video Generation
Under review at ICLR 2026

Selected Projects

MATRIX

Mask Track Alignment for interaction-aware video generation. Includes MATRIX-11K, interaction-dominant layer analysis, and custom evaluation metrics.

(Repo link — coming soon)

InterRVOS

Interaction-aware referring VOS with explicit instance-level grounding and language-conditioned track selection.

Project Page ↗

SOLA

Language Aligned Track Selection for referring VOS. Improves grounding via alignment between language and instance tracks.

Project Page ↗

MUG-VOS

Multi-Granularity VOS framework and benchmark for robust segmentation across temporal scales.

Project Page ↗

Contact

For collaboration or internship opportunities, reach out via LinkedIn or DM on X. Scholar profile: Google Scholar.

Last updated: