§00 · APOD
Toolondo Totality Trails
In this composited night skyscape, stacked exposures trace graceful star trails above Lake Toolondo, Victoria, Australia, planet Earth. Captured while the lunar eclipse of March 3 was in progress, the exposures used were made during the hour-long total eclipse phase. So faint star trails are easily visible along with the trail of the reddened Moon in the eclipse-darkened skies above the lake and trees. Of course, the apparent motion of Moon and stars revealed in the timelapse composite reflect the Earth's daily rotation around its axis. Dramatically punctuating the Moon's trail as totality end...
2026-03-13 · © Jason Perry ·
NASA APOD ↗
§06 · arXiv Dispatch
Research Filed Today
Preprints submitted to arXiv on March 13, 2026. Science before peer review.
01
Autoregressive (AR) video generative models rely on video tokenizers that compress pixels into discrete token sequences. The length of these token sequences is crucial for balancing reconstruction quality against downstream generation computational cost. Traditional video tokeniz...
Tianwei Xiong, Jun Hao Liew, Zilong Huang et al. (+3)
02
Multimodal Large Language Models (MLLMs) are increasingly used to carry out visual workflows such as navigating GUIs, where the next step depends on verified visual compositional conditions (e.g., "if a permission dialog appears and the color of the interface is green, click Allo...
Haozhan Shen, Shilin Yan, Hongwei Xue et al. (+5)
03
Modern visual agents require representations that are general, causal, and physically structured to operate in real-time streaming environments. However, current vision foundation models remain fragmented, specializing narrowly in image semantic perception, offline temporal model...
Yibin Yan, Jilan Xu, Shangzhe Di et al. (+2)
04
Unified multimodal models target joint understanding, reasoning, and generation, but current image editing benchmarks are largely confined to natural images and shallow commonsense reasoning, offering limited assessment of this capability under structured, domain-specific constra...
Mingxin Liu, Ziqian Fan, Zhaokai Wang et al. (+13)
05
Online Video Large Language Models (VideoLLMs) play a critical role in supporting responsive, real-time interaction. Existing methods focus on streaming perception, lacking a synchronized logical reasoning stream. However, directly applying test-time scaling methods incurs unacce...
Yiran Guan, Liang Yin, Dingkang Liang et al. (+5)
06
Text-to-image generation models have advanced rapidly, yet achieving fine-grained control over generated images remains difficult, largely due to limited understanding of how semantic information is encoded. We develop an interpretation of the color representation in the Variatio...
Mateusz Pach, Jessica Bader, Quentin Bouniot et al. (+2)
07
While large-scale diffusion models have revolutionized video synthesis, achieving precise control over both multi-subject identity and multi-granularity motion remains a significant challenge. Recent attempts to bridge this gap often suffer from limited motion granularity, contro...
Yujie Wei, Xinyu Liu, Shiwei Zhang et al. (+12)
08
Humans perceive and understand real-world spaces through a stream of visual observations. Therefore, the ability to streamingly maintain and update spatial evidence from potentially unbounded video streams is essential for spatial intelligence. The core challenge is not simply lo...
Fangfu Liu, Diankun Wu, Jiawei Chi et al. (+7)
Source: arXiv.org · Cornell University
§03 · The Wire
Wikipedia in Motion
500
edits recorded in the most recent sample. Most-edited topics:
ListCaveInternationalVictoriaSphenomorphusCriminalNationalPark