Publications

End-to-End Implicit Neural Representations for Classification

Published in CVPR, 2025

Abstract Implicit neural representations (INRs) such as NeRF and SIREN encode a signal in neural network parameters and show excellent results for signal reconstruction. Using INRs for downstream tasks, such as classification, is however not straightforward. Inherent symmetries in the parameters pose challenges and current works primarily focus on designing architectures that are equivariant to these symmetries. However, INR-based classification still significantly under-performs compared to pixel-based methods like CNNs. This work presents an end-to-end strategy for initializing SIRENs together with a learned learning-rate scheme, to yield representations that improve classification accuracy. We show that a simple, straightforward, Transformer model applied to a meta-learned SIREN, without incorporating explicit symmetry equivariances, outperforms the current state-of-the-art. On the CIFAR-10 SIREN classification task, we improve the state-of-the-art without augmentations from 38.8% to 59.6%, and from 63.4% to 64.7% with augmentations. We demonstrate scalability on the high-resolution Imagenette dataset achieving reasonable reconstruction quality with a classification accuracy of 60.8% and are the first to do INR classification on the full ImageNet-1K dataset where we achieve a SIREN classification performance of 23.6%. To the best of our knowledge, no other SIREN classification approach has managed to set a classification baseline for any high-resolution image dataset.

Recommended citation: Gielisse, Alexander, and Jan van Gemert. "End-to-End Implicit Neural Representations for Classification." arXiv preprint arXiv:2503.18123 (2025). https://arxiv.org/pdf/2503.18123

ARC: Anchored Representation Clouds for High-Resolution INR Classification

Published in ICLR Workshop on Weight Space Learning, 2025

Abstract Implicit neural representations (INRs) encode signals in neural network weights as a memory-efficient representation, decoupling sampling resolution from the associated resource costs. Current INR image classification methods are demonstrated on low-resolution data and are sensitive to image-space transformations. We attribute these issues to the global, fully-connected MLP neural network architecture encoding of current INRs, which lack mechanisms for local representation: MLPs are sensitive to absolute image location and struggle with high-frequency details. We propose ARC: Anchored Representation Clouds, a novel INR architecture that explicitly anchors latent vectors locally in image-space. By introducing spatial structure to the latent vectors, ARC captures local image data which in our testing leads to state-of-the-art implicit image classification of both low- and high-resolution images and increased robustness against image-space translation.

Recommended citation: Luijmes, Joost, et al. "ARC: Anchored Representation Clouds for High-Resolution INR Classification." arXiv preprint arXiv:2503.15156 (2025). https://arxiv.org/abs/2503.15156

Local Attention Transformers for High-Detail Optical Flow Upsampling

Published in arXiv preprint, 2024

Abstract Most recent works on optical flow use convex upsampling as the last step to obtain high-resolution flow. In this work, we show and discuss several issues and limitations of this currently widely adopted convex upsampling approach. We propose a series of changes, in an attempt to resolve current issues. First, we propose to decouple the weights for the final convex upsampler, making it easier to find the correct convex combination. For the same reason, we also provide extra contextual features to the convex upsampler. Then, we increase the convex mask size by using an attention-based alternative convex upsampler; Transformers for Convex Upsampling. This upsampler is based on the observation that convex upsampling can be reformulated as attention, and we propose to use local attention masks as a drop-in replacement for convex masks to increase the mask size. We provide empirical evidence that a larger mask size increases the likelihood of the existence of the convex combination. Lastly, we propose an alternative training scheme to remove bilinear interpolation artifacts from the model output. Our proposed ideas could theoretically be applied to almost every current state-of-the-art optical flow architecture. On the FlyingChairs + FlyingThings3D training setting we reduce the Sintel Clean training end-point-error of RAFT from 1.42 to 1.26, GMA from 1.31 to 1.18, and that of FlowFormer from 0.94 to 0.90, by solely adapting the convex upsampler.

Recommended citation: Gielisse, Alexander, Nergis Tömen, and Jan van Gemert. Local Attention Transformers for High-Detail Optical Flow Upsampling. arXiv preprint arXiv:2412.06439 (2024). https://arxiv.org/abs/2412.06439

Color Equivariant Convolutional Networks

Published in NeurIPS 2023, 2023

Abstract Color is a crucial visual cue readily exploited by Convolutional Neural Networks (CNNs) for object recognition. However, CNNs struggle if there is data imbalance between color variations introduced by accidental recording conditions. Color invariance addresses this issue but does so at the cost of removing all color information, which sacrifices discriminative power. In this paper, we propose Color Equivariant Convolutions (CEConvs), a novel deep learning building block that enables shape feature sharing across the color spectrum while retaining important color information. We extend the notion of equivariance from geometric to photometric transformations by incorporating parameter sharing over hue-shifts in a neural network. We demonstrate the benefits of CEConvs in terms of downstream performance to various tasks and improved robustness to color changes, including train-test distribution shifts. Our approach can be seamlessly integrated into existing architectures, such as ResNets, and offers a promising solution for addressing color-based domain shifts in CNNs.

Recommended citation: arXiv:2310.19368 Attila Lengyel, Ombretta Strafforello, Robert-Jan Bruintjes, Alexander Gielisse, Jan van Gemert https://arxiv.org/abs/2310.19368

Optical Flow Upsamplers Ignore Details: Neighborhood Attention Transformers for Convex Upsampling

Published in TU Delft Repository, 2023

Abstract Most recent works on optical flow use convex upsampling as the last step to obtain high-resolution flow. In this work, we show and discuss several issues and limitations of this currently widely adopted convex upsampling approach. We propose a series of changes, inspired by the observation that convex upsampling as currently implemented performs badly in high-detail areas. We identify three possible causes; wrong training data, the non-existence of a convex combination, and the inability of the convex upsampler to find the correct convex combination.

Recommended citation: A.S. Gielisse. (2023). Master Thesis. TU Delft Repository. http://sandergielisse.github.io/files/msc_thesis_alexander_gielisse.pdf