NeurIPS 2025
1 600
1 400
| 1 | How Does Sequence Modeling Architecture Influence Base Capabilities of Pre-trained Language Models? Exploring Key Architecture Design Principles to Avoid Base Capabilities Degradation | Jang Hyun Cho, Andrea Madotto, Effrosyni Mavroudi, Triantafyllos Afouras, Tushar Nagarajan, Muhammad Maaz, Yale Song, Tengyu Ma, Shuming Hu, Suyog Jain, Miguel Martin, Huiyu Wang, Hanoona Bangalath, Peize Sun, Po-Yao Huang, Daniel Bolya, Nikhila Ravi, Shashank Jain, Tammy Stark, Seungwhan Moon, Babak Damavandi, Vivian Lee, Andrew Westbury, Salman Khan, Philipp Kraehenbuehl, Piotr Dollar, Lorenzo Torresani, Kristen Grauman, Christoph Feichtenhofer | [2506.04088](https://huggingface.co/papers/2506.04088) | 110 | 11 | [link](https://foremost-beechnut-8ed.notion.site/WebThinker-Empowering-Large-Reasoning-Models-with-Deep-Research-Capability-d13158a27d924a4b9df7f9ab94066b64) | [link](https://github.com/Intelli-Chip-Lab/enhanced-self-distillation-framework-for-snn) | [ChenDY/NAG_wan2-1-fast](https://huggingface.co/spaces/ChenDY/NAG_wan2-1-fast)
[ChenDY/NAG_FLUX.1-Kontext-Dev](https://huggingface.co/spaces/ChenDY/NAG_FLUX.1-Kontext-Dev)
[ChenDY/NAG_FLUX.1-dev](https://huggingface.co/spaces/ChenDY/NAG_FLUX.1-dev) | [nathanrchn/zip2zip-test](https://huggingface.co/nathanrchn/zip2zip-test)
[Saibo-creator/zip2zip-evqn-7000](https://huggingface.co/Saibo-creator/zip2zip-evqn-7000)
[Saibo-creator/zip2zip-evqn-7000-new](https://huggingface.co/Saibo-creator/zip2zip-evqn-7000-new)
[Saibo-creator/zip2zip-Phi-3.5-mini-instruct-v0.1](https://huggingface.co/Saibo-creator/zip2zip-Phi-3.5-mini-instruct-v0.1)
[Saibo-creator/zip2zip-Llama-3.2-3B-Instruct-v0.1](https://huggingface.co/Saibo-creator/zip2zip-Llama-3.2-3B-Instruct-v0.1)
[Saibo-creator/zip2zip-Llama-3.2-1B-Instruct-v0.1](https://huggingface.co/Saibo-creator/zip2zip-Llama-3.2-1B-Instruct-v0.1)
[Saibo-creator/zip2zip-Llama-3.1-8B-Instruct-v0.1](https://huggingface.co/Saibo-creator/zip2zip-Llama-3.1-8B-Instruct-v0.1)
[epfl-dlab/zip2zip-Llama-3.1-8B-Instruct-v0.1](https://huggingface.co/epfl-dlab/zip2zip-Llama-3.1-8B-Instruct-v0.1)
[epfl-dlab/zip2zip-Llama-3.2-1B-Instruct-v0.1](https://huggingface.co/epfl-dlab/zip2zip-Llama-3.2-1B-Instruct-v0.1)
[epfl-dlab/zip2zip-Llama-3.2-3B-Instruct-v0.1](https://huggingface.co/epfl-dlab/zip2zip-Llama-3.2-3B-Instruct-v0.1)
[epfl-dlab/zip2zip-Phi-3.5-mini-instruct-v0.1](https://huggingface.co/epfl-dlab/zip2zip-Phi-3.5-mini-instruct-v0.1)
[epfl-dlab/zip2zip-Phi-3-medium-instruct-v0.1](https://huggingface.co/epfl-dlab/zip2zip-Phi-3-medium-instruct-v0.1) | [WaltonFuture/MMR1-direct-synthesizing](https://huggingface.co/datasets/WaltonFuture/MMR1-direct-synthesizing)
[WaltonFuture/geometry3k-in-context-synthesizing](https://huggingface.co/datasets/WaltonFuture/geometry3k-in-context-synthesizing)
[WaltonFuture/geometry3k-direct-synthesizing](https://huggingface.co/datasets/WaltonFuture/geometry3k-direct-synthesizing)
[WaltonFuture/GeoQA-8K-in-context-synthesizing](https://huggingface.co/datasets/WaltonFuture/GeoQA-8K-in-context-synthesizing)
[WaltonFuture/GeoQA-8K-direct-synthesizing](https://huggingface.co/datasets/WaltonFuture/GeoQA-8K-direct-synthesizing)
[WaltonFuture/MMR1-in-context-synthesizing](https://huggingface.co/datasets/WaltonFuture/MMR1-in-context-synthesizing) | 11/17 ✅ | A central question in sensory neuroscience is how much, but also what information neurons transmit about the world. While Shannon’s information theory provides a principled framework to quantify the amount of information neurons encode about all stimuli, it does not reveal which stimuli contribute most, or what stimulus features are encoded. As a concrete example, it is known that neurons in the early visual cortex are 'sensitive' to stimuli in a small region of space (their receptive field). However, it is not clear how such simple intuitions carry to more complex scenarios, e.g. with large, noisy & non-linear population of neurons and high-dimensional stimuli.Several previous measures of neural sensitivity have been proposed. For example, the Fisher information quantifies the sensitivity of neural responses to infinitesimal stimulus perturbations. However, as the Fisher is not a valid decomposition of the mutual information it cannot say how different stimuli contribute to the total encoded information. On the other hand, previous works have proposed stimulus dependent decompositions of mutual information, which define a function $ I(x) $ such that $ I(R; X) = \mathbb{E}[I(x)] $. However, this decomposition is inherently ill-posed: infinitely many functions $I(x)$ satisfy the constraint, with no principled way to select among them. Further, different decompositions behave in qualitatively different ways, making it hard to interpret what are they are telling us. Finally, most proposed decompositions are computationally intractable for the high-dimensional stimuli and non-linear encoding models relevant for neuroscience.To resolve these limitations, we propose a set of axioms that any stimulus specific and feature-specific information decomposition should satisfy in order to serve as a meaningful and interpretable measure of neural sensitivity. These axioms formalize intuitive desiderata: that the information assigned to each stimulus, and stimulus feature, should be non-negative, and additive with respect to repeated measurements. We also require the decomposition to respect a form of locality: changes in how a neuron responds to a stimulus $ x $ should not affect the information attributed to a distant stimulus $ x' $. Finally, the attribution must be insensitive to irrelevant features, which do not contribute to the total information. Together, these constraints ensure that the decomposition is both interpretable and theoretically grounded. We show that existing decompositions violate one or more of these axioms, limiting their interpretability and use as information theoretic measures of neural sensitivity. We then introduce a novel decomposition that satisfies all of our axioms. It generalizes Fisher information by capturing neural sensitivity to both infinitesimal and finite stimulus perturbations. Moreover, it supports further decomposition across individual stimulus features (e.g., image pixels), enabling fine-grained analysis of neural representations.Beyond satisfying our theoretical axioms, our decomposition is computationally tractable for large neural populations and high-dimensional naturalistic stimuli, through the use of diffusion models. We demonstrate the power of our method by quantifying the information encoded by a model of visual neurons about individual images and pixels. Our approach uncovers aspects of the neural code that are not picked up by standard methods, such as the Fisher information, and opens the door to similar analyses in higher-order sensory areas, and artificial neural networks. | 1000 |