SVBench Leaderboard

Welcome to the leaderboard of the SVBench! ๐Ÿ†

SVBench is a benchmark specifically designed to evaluate the performance of Large Vision-Language Models (LVLMs) in long-context streaming video understanding tasks. This benchmark comprehensively assesses the models' capabilities in handling streaming videos through its unique temporal multi-turn question-answering chains. To facilitate research and development, SVBench provides a detailed leaderboard showcasing the performance results of over a dozen models on this benchmark. By ranking the models based on their performance on SVBench, users can quickly identify models that excel in specific tasks, thereby guiding subsequent research and applications. Detailed information about SVBench and the leaderboard can be accessed via the following link: SVBench Benchmark. The paper is available at: SVBench Paper. Additionally, the related dataset is hosted on the Hugging Face platform, and researchers can access it at SVBench Dataset for further experiments and model development. This leaderboard not only provides a fair competitive environment for current models but also serves as an important reference standard for future model improvements and innovations.

Evaluation Dimension
Model
Type
Size
F/FPS
Dialogue_OS
Streaming_OS
Average
LLaVA-NeXT-Video
ImageLLM
7B
1fps
62.57
59.97
61.27