Research Topic: Super-Resolution
▼ 2024 New ▶ Image ▶ Video ▶ Spectrum ▶ Light Field
2024 New
Neural degradation representation learning for all-in-one image restoration
Mingde Yao, Ruikang Xu, Yuanshen Guan, Jie Huang, Zhiwei Xiong
Paper | Code | Abstract
Existing methods have demonstrated effective performance on a single degradation type. In practical applications, however, the degradation is often unknown, and the mismatch between the model and the degradation will result in a severe performance drop. In this paper, we propose an all-in-one image restoration network that tackles multiple degradations. Due to the heterogeneous nature of different types of degradations, it is difficult to process multiple degradations in a single network. To this end, we propose to learn a neural degradation representation (NDR) that captures the underlying characteristics of various degradations. The learned NDR decomposes different types of degradations adaptively, similar to a neural dictionary that represents basic degradation components. Subsequently, we develop a degradation query module and a degradation injection module to effectively recognize and utilize the specific degradation based on NDR, enabling the all-in-one restoration ability for multiple degradations. Moreover, we propose a bidirectional optimization strategy to effectively drive NDR to learn the degradation representation by optimizing the degradation and restoration processes alternately. Comprehensive experiments on representative types of degradations (including noise, haze, rain, and downsampling) demonstrate the effectiveness and generalization capability of our method.
Learning Piecewise Planar Representation for RGB Guided Depth Super-Resolution
Ruikang Xu, Mingde Yao, Yuanshen Guan, Zhiwei Xiong
Paper | Code | Abstract
RGB guided depth super-resolution (GDSR) aims to reconstruct high-resolution (HR) depth images from low-resolution ones using HR RGB images as guidance, overcoming the resolution limitation of depth cameras. The main challenge in this task is how to effectively explore the HR information from RGB images while avoiding texture being over-transferred. To address this challenge, we propose a novel method for GSDR based on the piecewise planar representation in the 3D space, which naturally focuses on the geometry information of scenes without concerning the internal textures. Specifically, we design a plane-aware interaction module to effectively bridge the RGB and depth modalities and perform information interaction by taking piecewise planes as the intermediary. We also devise a plane-guided fusion module to further remove modality-inconsistent information. To mitigate the distribution gap between synthetic and real-world data, we propose a self-training adaptation strategy for the real-world deployment of our method. Comprehensive experimental results on multiple representative datasets demonstrate the superiority of our method over existing state-of-the-art GDSR methods.
Asymmetric Event-Guided Video Super-Resolution
Zeyu Xiao, Dachun Kai, Yueyi Zhang, Xiaoyan Sun, Zhiwei Xiong
Paper | Abstract
Event cameras are novel bio-inspired cameras that record asynchronous events with high temporal resolution and dynamic range. Leveraging the auxiliary temporal information recorded by event cameras holds great promise for the task of video super-resolution (VSR). However, existing event-guided VSR methods assume that the event and RGB cameras are strictly calibrated (e.g., pixel-level sensor designs in DAVIS 240/346). This assumption proves limiting in emerging high-resolution devices, such as dual-lens smartphones and unmanned aerial vehicles, where such precise calibration is typically unavailable. To unlock more event-guided application scenarios, we perform the task of asymmetric event-guided VSR for the first time, and we propose an Asymmetric Event-guided VSR Network (AsEVSRN) for this new task. AsEVSRN incorporates two specialized designs for leveraging the asymmetric event stream in VSR. Firstly, the content hallucination module dynamically enhances event and RGB information by exploiting their complementary nature, thereby adaptively boosting representational capacity. Secondly, the event-enhanced bidirectional recurrent cells align and propagate temporal features fused with features from content-hallucinated frames. Within the bidirectional recurrent cells, event-enhanced flow is employed to simultaneously utilize and fuse temporal information at both the feature and pixel levels. Comprehensive experimental results affirm that our method consistently generates superior quantitative and qualitative results.
Mamba-based light field super-resolution with efficient subspace scanning
Ruisheng Gao, Zeyu Xiao, Zhiwei Xiong
Paper | Code | Abstract
Transformer-based methods have demonstrated impressive performance in 4D light field (LF) super-resolution by effectively modeling long-range spatial-angular correlations, but their quadratic complexity hinders the efficient processing of high resolution 4D inputs, resulting in slow inference speed and high memory cost. As a compromise, most prior work adopts a patch-based strategy, which fails to leverage the full information from the entire input LFs. The recently proposed selective state-space model, Mamba, has gained popularity for its efficient long-range sequence modeling. In this paper, we propose a Mambabased Light Field Super-Resolution method, named MLFSR, by designing an efficient subspace scanning strategy. Specifically, we tokenize 4D LFs into subspace sequences and conduct bi-directional scanning on each subspace. Based on our scanning strategy, we then design the Mambabased Global Interaction (MGI) module to capture global information and the local Spatial- Angular Modulator (SAM) to complement local details. Additionally, we introduce a Transformer-to-Mamba (T2M) loss to further enhance overall performance. Extensive experiments on public benchmarks demonstrate that MLFSR surpasses CNN-based models and rivals Transformer-based methods in performance while maintaining higher efficiency. With quicker inference speed and reduced memory demand, MLFSR facilitates full-image processing of high-resolution 4D LFs with enhanced performance.
Continuous Spatial-Spectral Reconstruction via Implicit Neural Representation
Ruikang Xu, Mingde Yao, Chang Chen, Lizhi Wang, Zhiwei Xiong
Paper | Code | Abstract
Existing methods for spectral reconstruction usually learn a discrete mapping from RGB images to a number of spectral bands. However, this modeling strategy ignores the continuous nature of spectral signature. In this paper, we propose Neural Spectral Reconstruction (NeSR) to lift this limitation, by introducing a novel continuous spectral representation. To this end, we embrace the concept of implicit function and implement a parameterized embodiment with a neural network. Specifically, we first adopt a backbone network to extract spatial features of RGB inputs. Based on it, we devise Spectral Profile Interpolation (SPI) module and Neural Attention Mapping (NAM) module to enrich deep features, where the spatial-spectral correlation is involved for a better representation. Then, we view the number of sampled spectral bands as the coordinate of continuous implicit function, so as to learn the projection from deep features to spectral intensities. Extensive experiments demonstrate the distinct advantage of NeSR in reconstruction accuracy over baseline methods. Moreover, NeSR extends the flexibility of spectral reconstruction by enabling an arbitrary number of spectral bands as the target output.
Toward DNN of LUTs: Learning Efficient Image Restoration with Multiple Look-Up Tables
Jiacheng Li, Chang Chen, Zhen Cheng, Zhiwei Xiong
Paper | Code | Abstract
The widespread usage of high-definition screens on edge devices stimulates a strong demand for efficient image restoration algorithms. The way of caching deep learning models in a look-up table (LUT) is recently introduced to respond to this demand. However, the size of a single LUT grows exponentially with the increase of its indexing capacity, which restricts its receptive field and thus the performance. To overcome this intrinsic limitation of the single-LUT solution, we propose a universal method to construct multiple LUTs like a neural network, termed MuLUT. Firstly, we devise novel complementary indexing patterns, as well as a general implementation for arbitrary patterns, to construct multiple LUTs in parallel. Secondly, we propose a re-indexing mechanism to enable hierarchical indexing between cascaded LUTs. Finally, we introduce channel indexing to allow cross-channel interaction, enabling LUTs to process color channels jointly. In these principled ways, the total size of MuLUT is linear to its indexing capacity, yielding a practical solution to obtain superior performance with the enlarged receptive field. We examine the advantage of MuLUT on various image restoration tasks, including super-resolution, demosaicing, denoising, and deblocking. MuLUT achieves a significant improvement over the single-LUT solution, e.g., up to 1.1dB PSNR for super-resolution and up to 2.8dB PSNR for grayscale denoising, while preserving its efficiency, which is 100× less in energy cost compared with lightweight deep neural networks. Our code and trained models are publicly available at https://github.com/ddlee-cn/MuLUT.
Confidence-Based Iterative Generation for Real-World Image Super-Resolution
Jialun Peng, Xin Luo, Jingjing Fu, Dong Liu
Paper | Code | Abstract
Real-world image super-resolution deals with complex and unknown degradations, making it challenging to produce plausible results in a single step. In this work, we propose a transformer model with an iterative generation process that iteratively refines the results based on predicted confidences. It allows the model to focus on regions with low confidences and generate more confident and accurate results. Specifically, our model learns to predict the visual tokens of the high-resolution image and their corresponding confidence scores, conditioned on the low-resolution image. By keeping only the most confident tokens at each iteration and re-predicting the other tokens in the next iteration, our model generates all high-resolution tokens within a few steps. To ensure consistency with the low-resolution input image, we further propose a conditional controlling module that utilizes the low-resolution image to control the decoding process from high-resolution tokens to image pixels. Experiments demonstrate that our model achieves state-of-the-art performance on real-world datasets while requiring fewer iteration steps compared to recent diffusion models.
Look-Up Table Compression for Efficient Image Restoration
Yinglong Li, Jiacheng Li, Zhiwei Xiong
Paper | Code | Abstract
Look-Up Table (LUT) has recently gained increasing attention for restoring High-Quality (HQ) images from Low-Quality (LQ) observations thanks to its high computational efficiency achieved through a" space for time" strategy of caching learned LQ-HQ pairs. However incorporating multiple LUTs for improved performance comes at the cost of a rapidly growing storage size which is ultimately restricted by the allocatable on-device cache size. In this work we propose a novel LUT compression framework to achieve a better trade-off between storage size and performance for LUT-based image restoration models. Based on the observation that most cached LQ image patches are distributed along the diagonal of a LUT we devise a Diagonal-First Compression (DFC) framework where diagonal LQ-HQ pairs are preserved and carefully re-indexed to maintain the representation capacity while non-diagonal pairs are aggressively subsampled to save storage. Extensive experiments on representative image restoration tasks demonstrate that our DFC framework significantly reduces the storage size of LUT-based models (including our new design) while maintaining their performance. For instance DFC saves up to 90% of storage at a negligible performance drop for x4 super-resolution. The source code is available on GitHub: https://github.com/leenas233/DFC.