快捷导航

AV画质一秒清晰!通过深度学习改善图片分辨率


An Introduction to Super Resolution using Deep Learning
AV画质一秒清晰!通过深度学习改善图片分辨率
01

1_MCmhK79S8PIW2Ms1QUeYpg.jpegPhoto by Jeremy Thomas on Unsplash

Introduction

Super Resolution is the process of recovering a High Resolution (HR) image from a given Low Resolution (LR) image. An image may have a “lower resolution” due to a smaller spatial resolution (i.e. size) or due to a result of degradation (such as blurring). We can relate the HR and LR images through the following equation: LR = degradation(HR)

A low resolution image kept besides its high resolution version. (Photo by Jarrad Horneon Unsplash)

Clearly, on applying a degradation function, we obtain the LR image from the HR image. But, can we do the inverse? In the ideal case, yes! If we know the exact degradation function, by applying its inverse to the LR image, we can recover the HR image.

But, there in lies the problem. We usually do not know the degradation function before hand. Directly estimating the inverse degradation function is an ill-posed problem. In spite of this, Deep Learning techniques have proven to be effective for Super Resolution.

This blog primarily focuses on providing an introduction to performing Super Resolution using Deep Learning by using Supervised training methods. Some important loss functions and metrics are also discussed. A lot of the content is derived from this literature review which the reader can refer to.

1_MCmhK79S8PIW2Ms1QUeYpg.jpeg

Jeremy ThomasUnsplash上发布的照片

引言

超分辨率是指将一张给定的低分辨率(LR)的图片恢复成高分辨率(HR)图像的过程。由于较小的空间分辨率(即大小)或退化(比如模糊),图像可能具有“较低的分辨率”。我们可以将HR图像和LR图像由下面的公式联系起来:LR=degradation(HR)


一张低分辨率图像和旁边的高分辨率版本。(Jarrad HorneUnsplash上发布的照片)

显然,在应用退化函数时,我们从HR图像得到了LR图像。但是,我们可以反过来做吗?在理想情况下,是可以的!如果我们知道确切的退化函数,通过对LR图像使用它的逆函数,我们可以恢复出它的HR图像。

但问题就出在这里。我们通常事先不知道退化函数。直接求逆退化函数是一个不适定问题。尽管如此,深度学习技术已证明对超分辨率任务有效。

本博客主要介绍如何使用,监督训练的深度学习方法,来完成超分辨率任务。还会讨论一些损失函数和度量。很多内容都是从这篇文献综述中衍生出来的,读者可以参考。

02

Supervised Methods

As mentioned before, deep learning can be used to estimate the High Resolution (HR) image given a Low Resolution (LR) image. By using the HR image as a target (or ground-truth) and the LR image as an input, we can treat this like a supervised learning problem.

In this section, we group various deep learning approaches in the manner the convolution layers are organized. Before we move on to the groups, a primer on data preparation and types of convolutions is presented. Loss functions used to optimize the model are presented separately towards the end of this blog.

Preparing the Data

One easy method of obtaining LR data is to degrade HR data. This is often done by blurring or adding noise. Images of lower spatial resolution can also be scaled by a classic upsampling method such as Bilinear or Bicubicinterpolation. JPEG and quantization artifacts can also be introduced to degrade the image.

Degrading a high resolution image to obtain a low resolution version of it. (Photo by Jarrad Horne on Unsplash)

One important thing to note is that it is recommended to store the HR image in an uncompressed (or lossless compressed) format. This is to prevent degradation of the quality of the HR image due to lossy compression, which may give sub-optimal performance.

监督方法

正如之前所说,深度学习可以用来从一张低分辨率(LR)图像判断出高分辨率(HR)图像,通过将HR图像作为目标(或者真值),将LR图像作为输入。我们可以将其看作是一个监督学习问题。

在本节中,我们按照卷积层的组织方式对各种深度学习方法进行分组。在我们讨论具体分组之前,先介绍一下数据准备和卷积层的类型。用于优化模型的损失函数在本博客的最后单独给出。

准备数据

获得LR数据的一个简单方法就是降级HR数据。这通常通过模糊或添加噪声实现。较低空间分辨率的图像也可以通过经典的上采样方法,如双线性双三次插值来缩放。还可以引入JPEG和量化伪影来降低图像的质量。


降低高分辨率图像来获得它的低分辨率版本。(Jarrad HorneUnsplash上发布的照片

需要注意的一件重要事情是,建议以未压缩(或无损压缩)格式存储HR图像。这是为了防止由于有损压缩而导致的HR图像质量下降,这可能会导致性能不佳

03

Types of Convolutions

Besides classic 2D Convolutions, several interesting variants can be used in networks for improved results. Dilated (Atrous) convolutions can provide a greater effective field of view, hence using information that are separated by a large distance. Skip connectionsSpatial Pyramid Poolingand Dense Blocks motivate combining both low level and high level features to enhance performance.

Network design strategies. (Source)

The above image mentions a number of network design strategies. You can refer to this paper for more information. For a primer on the different types of convolutions commonly used in deep learning, you may refer to this blog.

Group 1 — Pre-Upsampling

In this method, the low resolution images are first interpolated to obtain a “coarse” high resolution image. Now, CNNs are used to learn an end-to-end mapping from the interpolated low resolution images to the high resolution images. The intuition was that it may be easier to first upsample the low-resolution images using traditional methods (such as Bilinear interpolation) and then refine the resultant than learn a direct mapping from a low-dimensional space to a high-dimensional space.

A typical pre-upsampling network. (Source)

You can refer to page 5 of this paper for some models using this technique. The advantage is that since the upsampling is handled by traditional methods, the CNN only needs to learn how to refine the coarse image, which is simpler. Moreover, since we are not using transposed convolutions here, checkerboard artifacts maybe circumvented. However the downside is that the predefined upsampling methods may amplify noise and cause blurring.

卷积的类型

除了经典的2D卷积之外,还可以在网络中使用一些有趣的变种来改善结果。膨胀卷积(Atrous)可以提供更有效的视野,因此使用的图像中的信息都相隔很远。跳跃连接空间金字塔池化密集块激励了将低级特征和高级特征相结合以提高性能。


网络设计策略。(来源

上面的图片提到了一些网络设计策略。你可以参考这篇论文以获得更多信息。有关深度学习中常用的不同类型卷积的入门知识,可以参考这篇博客

第一组-预上采样

该方法首先对低分辨率图像进行插值,得到“粗略的”高分辨率图像。现在,卷积神经网络被用来学习从插值的低分辨率图像到高分辨率图像的端到端映射。直觉上是,首先使用传统方法(例如双线性插值)对低分辨率图像进行上采样可能更容易,然后改进结果,而不是学习从低维空间到高维空间的直接映射。


一个典型的预上采样网络。(来源

关于使用这个技术的一些模型,你可以参考这篇论文的第五页。优点是因为上采样过程是用传统的方法处理的,卷积神经网络只需要学习如何改进粗略的图像,这个相对而言更加简单。此外,由于我们没有使用转置卷积,棋盘效应会被规避。然而,缺点是预定义的上采样方法可能会放大噪声并导致模糊。

04

Group 2— Post-Upsampling

In this case the low resolution images are passed to the CNNs as such. Upsampling is performed in the last layer using a learnable layer.


A typical post-upsampling network. (Source)

The advantage of this method is that feature extraction is performed in the lower dimensional space (before upsampling) and hence the computational complexity is reduced. Furthermore, by using an learnable upsampling layer, the model can be trained end-to-end.

Group 3— Progressive Upsampling

In the above group, even though the computational complexity was reduced, only a single upsampling convolution was used. This makes the learning process harder for large scaling factors. To address this drawback, a progressive upsampling framework was adopted by works such as Laplacian Pyramid SR Network (LapSRN) and Progressive SR (ProSR). The models in this case use a cascade of CNNs to progressively reconstruct high resolution images at smaller scaling factors at each step.

A typical progressive-upsampling network. (Source)

By decomposing a difficult task into simpler tasks, the learning difficulty is greatly reduced and better performance can be obtained. Moreover, learning strategies like curriculum learning can be integrated to further reduce learning difficulty and improve final performance.

第二组-后上采样

在这种情况下,低分辨率图像被传递到卷积神经网络。 使用可学习层在最后一层执行上采样。


一个典型的后上采样网络。(来源

该方法的优点在于在较低维空间中(在上采样之前)执行特征提取,因此降低了计算复杂度。 此外,通过使用一个可学习的上采样层,可以端到端地训练模型。

第三组-渐进式上采样

在上述组中,虽然计算复杂度降低了,但是只使用了一个上采样卷积。这使得大超分倍增系数的学习过程更加困难。为了解决这一缺陷,拉普拉斯金字塔超分辨率网络(LapSRN)和渐进式超分辨率(ProSR)等工作采用了渐进式上采样框架。在这种情况下的模型使用级联的卷积神经网络以在每个步骤以较小的倍增系数逐步重建高分辨率图像。


一个典型的渐进式上采样网络。(来源

通过将一个困难的任务分解成更简单的任务,可以大大降低学习难度,获得更好的性能。此外,可以整合课程学习等学习策略,进一步降低学习难度,提高最终表现。

05

Group 4 — Iterative Up and Down Sampling

Another popular model architecture is the hourglass (or U-Net) structure. Some variants such as the Stacked Hourglass network use several hourglass structures in series, effectively alternating between the process of upsampling and downsampling.

A typical iterative up-and-down sampling network. (Source)

The models under this framework can better mine the deep relations between the LR-HR image pairs and thus provide higher quality reconstruction results.

第四组-迭代式升降采样

另一种流行的结构是沙漏式(或U-Net)结构。诸如堆叠沙漏网络网络之类的一些变体使用串联的几个沙漏式结构,在上采样和下采样的过程之间有效地交替。


一个典型的迭代式升降采样网络。(来源

该框架下的模型可以更好地挖掘LR-HR图像对之间的深层关系,从而提供更高质量的重建结果。

06

Loss Functions

Loss functions are used to measure the difference between the generated High Resolution image and the ground truth High Resolution image. This difference (error) is then used to optimize the supervised learning model. Several classes of loss functions exist where each of which penalize a different aspect of the generated image.

Often, more than one loss function is used by weighting and summing up the errors obtained from each loss function individually. This enables the model to focus on aspects contributed by multiple loss functions simultaneously.total_loss = weight_1 * loss_1 + weight_ 2 * loss_2 + weight_3 * loss_3

In this section we will explore some popular classes of loss functions used for training the models.

Pixel Loss

Pixel-wise loss is the simplest class of loss functions where each pixel in the generated image is directly compared with each pixel in the ground-truth image. Popular loss functions such as the L1 or L2 loss or advanced variants such as the Smooth L1 loss are used.

Plot of Smooth L1 Loss. (Source)

The PSNR metric (discussed below) is highly correlated with the pixel-wise difference, and hence minimizing the pixel loss directly maximizes the PSNR metric value (indicating good performance). However, pixel loss does not take into account the image quality and the model often outputs perceptually unsatisfying results (often lacking high frequency details).

损失函数

损失函数用于测量生成的高分辨率图像和真实高分辨率图像之间的差异。然后使用该差异(误差)来优化监督学习模型。存在几类损失函数,其中每种损失函数都惩罚所生成图像的不同方面。

通常对每一个损失函数所获得的误差进行加权求和。这使得模型能够同时关注多个损失函数所贡献的方面。total_loss = weight_1 * loss_1 + weight_ 2 * loss_2 + weight_3 * loss_3

本节我们将探讨一些用于训练模型的常用损失函数类。

像素级损失

像素损失是最简单的一类损失函数,其中生成的图像中的每个像素都直接与真实图像中的每个像素进行比较。常用的的损失函数,如L1或L2损失,或者如平滑L1损失之类的高级变体。


平滑L1损失图。(来源

PSNR度量(下面讨论)与像素差异高度相关,因此最小化像素损失直接最大化PSNR度量值(指示良好性能)。然而,像素损失并没有考虑到图像质量,而且模型常常输出感知上不令人满意的结果(通常缺乏高频细节) 。


07

Content Loss

This loss evaluates the image quality based on its perceptual quality. An interesting way to do this is by comparing the high level features of the generated image and the ground truth image. We can obtain these high level features by passing both of these images through a pre-trained image classification network (such as a VGG-Net or a ResNet).


Content loss between a ground truth image and a generated image. (Source)

The equation above calculates the content loss between a ground-truth image and a generated image, given a pre-trained network (Φ) and a layer (l) of this pre-trained network at which the loss is computed. This loss encourages the generated image to be perceptually similar to the ground-truth image. For this reason, it is also known as the Perceptual loss.

Texture Loss

To enable the generated image to have the same style (texture, color, contrast etc.) as the ground truth image, texture loss (or style reconstruction loss) is used. The texture of an image, as described by Gatys et. al, is defined as the correlation between different feature channels. The feature channels are usually obtained from a feature map extracted using a pre-trained image classification network (Φ).

Computing the Gram Matrix. (Source)

The correlation between the feature maps is represented by the Gram matrix (G), which is the inner product between the vectorized feature maps i and jon layer l (shown above). Once the Gram matrix is calculated for both images, calculating the texture loss is straight-forward, as shown below:

Computing the Texture Loss. (Source)

By using this loss, the model is motivated to create realistic textures and visually more satisfying results.

内容损失

这种损失是基于图像的感知质量来评估图像质量的。一个有趣的方法是比较生成的图像和真实图像的高层特征。我们可以通过一个预先训练好的图像分类网络(如VGG-Net或ResNet)来传递这两幅图像,从而获得这些高级特征。


真实图像和生成图像之间的内容损失。(来源

上面的方程式通过一个给定的预训练网络(Φ)并且预训练网络中的一层计算损失的层(l)来计算真实图片和生成图片之间的内容损失。该损失促使生成图像在感知上与真实图像相似。因此,它被称为感知损失

纹理损失

为了使生成的图像具有与真实图像相同的样式(纹理,颜色,对比度等),使用纹理损失(或样式重建损失)。Gatys等人将图像的纹理定义为不同特征通道之间的相关性。特征通道通常从使用预训练的图像分类网络(Φ)提取的特征图获得。


计算格拉姆矩阵。(来源

特征图之间的相关性由格拉姆矩阵(G)表示,G是层l上矢量化特征图i与j之间的内积(如上所示)。一旦对两幅图像计算了格拉姆矩阵,计算纹理损失就很简单,如下图所示:


计算纹理损失。(来源

通过使用这种损失,模型被激励创建逼真的纹理和视觉上更令人满意的结果。

08

Total Variation Loss

The Total Variation (TV) loss is used to suppress noise in the generated images. It takes the sum of the absolute differences between neighboring pixels and measures how much noise is in the image. For a generated image, the TV loss is calculated as shown below:

Total Variation Loss used on a generated High Resolution image. (Source)

Here, i,j,k iterates over the height, width and channels respectively.

Adversarial Loss

Generative Adversarial Networks (GANs) have been increasingly used for several image based applications including Super Resolution. GANs typically consist of a system of two neural networks — the Generator and the Discriminator — dueling each other.

Given a set of target samples, the Generator tries to produce samples that can fool the Discriminator into believing they are real. The Discriminator tries to resolve real (target) samples from fake (generated) samples. Using this iterative training approach, we eventually end up with a Generator that is really good at generating samples similar to the target samples. The following image shows the structure of a typical GAN.

GANs in action. (Source)

Advances to the basic GAN architecture were introduced for improved performance. For instance, Park et. al. used a feature-level discriminator to capture more meaningful potential attributes of real High Resolution images. You can checkout this blog for a more elaborate survey about the advances in GANs.

Typically, models trained with adversarial loss have better perceptual quality even though they might lose out on PSNR compared to those trained on pixel loss. One minor downside is that, the training process of GANs is a bit difficult and unstable. However, methods to stabilize GAN training are actively worked upon.

总变差损失

总变差损失被用于抑制生成图像的噪声。它是相邻像素之间的绝对差的和,用于度量图片中的噪声大小。对于一个生成的图像,总变差的的计算方式如下图所示:

  一个生成的高分辨率图像的总变差损失(来源

这里,i,j,k这三个量是由长度,宽度和通道分别迭代得到。

对抗性损失

生成式对抗网络(GANs)正被越来越多地使用在几种基于图像的应用中,这其中就包括超分辨率。GANs通常由两个神经网络组成———发生器和鉴别器———它们相互竞争。

对于一组目标样本,生成器尽力去产生能够蒙骗鉴别器的样本,让鉴别器相信这个样本是真的。而鉴别器则尽力去在假(生成)样本中找出真(目标)样本。使用这种迭代训练的方式,我们最终得到一个生成器,这个生成器真正擅长于生成于目标样本相似的样本。


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

本文作者2019-7-20 03:11 PM
人工智障
粉丝0 阅读1406 回复0

精彩阅读

排行榜

人工智能公众号

扫码微信公众号
我陪你畅想未来

最智能的人工智能网!
QQ:162057003
周一至周五 9:00-18:00
意见反馈:162057003@qq.com

扫一扫关注我们