Research Topic: Super-Resolution

Non-local self-similarity has been well exploited in the single image super-resolution task as an effective prior. However, due to the difficulty of modeling the 4D correspondence globally, the potential of the non-local prior is less revealed for light field (LF) super-resolution. Meanwhile, existing non-local models only utilize the global spatial correspondence, but largely neglect the global geometric correspondence. To address the aforementioned problems, we propose a Decoupled Selective Matching Network (DSMNet) for LF super-resolution, by designing a novel selective matching mechanism to flexibly extract non-local information from specific 4D positions in an LF. Such a mechanism matches the reference patch with several auxiliary patches dynamically searched from predefined windows, which promotes efficiency while improving performance compared to the existing non-local models. Specifically, our DSMNet decouples the whole LF into Sub-Aperture Images (SAIs) and Epipolar Plane Images (EPIs). For each SAI patch, we separately perform the selective matching inside the current SAI and cross different SAIs to exploit the global spatial correspondence efficiently. For each EPI patch, we separately perform the selective matching in EPIs of different orientations to embed robust LF geometric information into features by enhancing EPI textures, which exploits the global geometric correspondence in an efficient manner. Comprehensive experiments validate that DSMNet outperforms state-of-the-art LF super-resolution methods both quantitatively and qualitatively.

Light field (LF) cameras suffer from a fundamental trade-off between spatial and angular resolutions. Additionally, due to the significant amount of data that needs to be recorded, the Lytro ILLUM, a modern LF camera, can only capture three frames per second. In this paper, we consider space-time super-resolution (SR) for LF videos, aiming at generating high-resolution and high-frame-rate LF videos from low-resolution and low-frame-rate observations. Extending existing space-time video SR methods to this task directly will meet two key challenges: 1) how to re-organize sub-aperture images (SAIs) efficiently and effectively given highly redundant LF videos, and 2) how to aggregate complementary information between multiple SAIs and frames considering the coherence in LF videos. To address the above challenges, we propose a novel framework for space-time super-resolving LF videos for the first time. First, we propose a novel Multi-Scale Dilated SAI Re-organization strategy for re-organizing SAIs into auxiliary view stacks with decreasing resolution as the Chebyshev distance in the angular dimension increases. In particular, the auxiliary view stack with original resolution preserves essential visual details, while the down-scaled view stacks capture long-range contextual information. Second, we propose the Multi-Scale Aggregated Feature extractor and the Angular-Assisted Feature Interpolation module to utilize and aggregate information from the spatial, angular, and temporal dimensions in LF videos. The former aggregates similar contents from different SAIs and frames for subsequent reconstruction in a disparity-free manner at the feature level, whereas the latter interpolates intermediate frames temporally by implicitly aggregating geometric information. Compared to other potential approaches, experimental results demonstrate that the reconstructed LF videos generated by our framework achieve higher reconstruction quality and better preserve the LF parallax structure and temporal consistency. The implementation code is available at https://github.com/zeyuxiao1997/LFSTVSR.

Deep learning has opened up new possibilities for light field super-resolution (SR), but existing methods trained on synthetic datasets with simple degradations (e.g., bicubic downsampling) suffer from poor performance when applied to complex real-world scenarios. To address this problem, we introduce LytroZoom, the first real-world light field SR dataset capturing paired low- and high-resolution light fields of diverse indoor and outdoor scenes using a Lytro ILLUM camera. Additionally, we propose the Omni-Frequency Projection Network (OFPNet), which decomposes the omni-frequency components and iteratively enhances them through frequency projection operations to address spatially variant degradation processes present in all frequency components. Experiments demonstrate that models trained on LytroZoom outperform those trained on synthetic datasets and are generalizable to diverse content and devices. Quantitative and qualitative evaluations verify the superiority of OFPNet. We believe this work will inspire future research in real-world light field SR.

Data augmentation (DA) is an efficient strategy for improving the performance of deep neural networks. Recent DA strategies have demonstrated utility in single image super-resolution (SR). Little research has, however, focused on the DA strategy for light field SR, in which multi-view information utilization is required. For the first time in light field SR, we propose a potent DA strategy called CutMIB to improve the performance of existing light field SR networks while keeping their structures unchanged. Specifically, CutMIB first cuts low-resolution (LR) patches from each view at the same location. Then CutMIB blends all LR patches to generate the blended patch and finally pastes the blended patch to the corresponding regions of high-resolution light field views, and vice versa. By doing so, CutMIB enables light field SR networks to learn from implicit geometric information during the training stage. Experimental results demonstrate that CutMIB can improve the reconstruction performance and the angular consistency of existing light field SR networks. We further verify the effectiveness of CutMIB on real-world light field SR and light field denoising. The implementation code is available at https://github.com/zeyuxiao1997/CutMIB.

Spatial-angular separable convolution (SAS-conv) has been widely used for efficient and effective 4D light field (LF) feature embedding in different tasks, which mimics a 4D convolution by alternatively operating on 2D spatial slices and 2D angular slices. In this paper, we argue that, despite its global intensity modeling capabilities, SAS-conv can only embed local geometry information into the features, resulting in inferior performances in the regions with textures and occlusions. Because the epipolar lines are highly related to the scene depth, we introduce the concept of spatial-angular correlated convolution (SAC-conv). By alternating 2D convolutions on the vertical and horizontal epipolar slices, SAC-conv can embed global and robust geometry information into the features. We verify that SAS-conv and SAC-conv are skilled at different aspects of 4D LF feature embedding through a detailed feature and error analysis. Based on their complementarity, we further combine SAS-conv and SAC-conv by a parallel residual connection, forming a new spatial-angular versatile convolution (SAV-conv) module. We conduct comprehensive experiments on two representative LF reconstruction tasks, i.e., LF angular super-resolution and LF spatial super-resolution. Both the quantitative and qualitative results demonstrate that, without any extra parameters, networks upgraded with our proposed SAV-conv notably outperform those upgraded with SAS-conv and achieve a new state-of-the-art performance.

Deep learning provides a new avenue for light field super-resolution (SR). However, the domain gap caused by drastically different light field acquisition conditions poses a main obstacle in practice. To fill this gap, we propose a zero-shot learning framework for light field SR, which learns a mapping to super-resolve the reference view with examples extracted solely from the input low-resolution light field itself. Given highly limited training data under the zero-shot setting, however, we observe that it is difficult to train an end-to-end network successfully. Instead, we divide this challenging task into three sub-tasks, i.e., pre-upsampling, view alignment, and multi-view aggregation, and then conquer them separately with simple yet efficient CNNs. Moreover, the proposed framework can be readily extended to finetune the pre-trained model on a source dataset to better adapt to the target input, which further boosts the performance of light field SR in the wild. Experimental results validate that our method not only outperforms classic non-learning-based methods, but also generalizes better to unseen light fields than state-of-the-art deep-learning-based methods when the domain gap is large.

Light field images taken by plenoptic cameras often have a tradeoff between spatial and angular resolutions. In this paper, we propose a novel spatial super-resolution approach for light field images by jointly exploiting internal and external similarities. The internal similarity refers to the correlations across the angular dimensions of the 4D light field itself, while the external similarity refers to the cross-scale correlations learned from an external light field dataset. Specifically, we advance the classic projection-based method that exploits the internal similarity by introducing the intensity consistency checking criterion and a back-projection refinement, while the external correlation is learned by a CNN-based method which aggregates all warped high-resolution sub-aperture images upsampled from the low-resolution input using a single image super-resolution method. By analyzing the error distributions of the above two methods and investigating the upperbound of combining them, we find that the internal and external similarities are complementary to each other. Accordingly, we further propose a pixel-wise adaptive fusion network to take advantage of both their merits by learning a weighting matrix. Experimental results on both synthetic and real-world light field datasets validate the superior performance of the proposed approach over the state-of-the-arts.

Lenslet-based light field imaging generally suffers from a fundamental trade-off between spatial and angular resolutions, which limits its promotion to practical applications. To this end, a substantial amount of efforts have been dedicated to light field super-resolution (SR) in recent years. Despite the demonstrated success, existing light field SR methods are often evaluated based on different degradation assumptions using different datasets, and even contradictory results are reported in literature. In this paper, we conduct the first systematic benchmark evaluation for representative light field SR methods on both synthetic and real-world datasets with various downsampling kernels and scaling factors. We then analyze and discuss the advantages and limitations of each kind of method from different perspectives. Especially, we find that CNN-based single image SR without using any angular information outperforms most light field SR methods even including learning-based ones. This benchmark evaluation, along with the comprehensive analysis and discussion, sheds light on the future researches in light field SR.

VIDAR

Research Topic: Super-Resolution

Light Field