基于GAN网络的多视点视频重建算法研究.docx

资源描述

1、基于 GAN 网络的多视点视频重建算法研究研究方向：电子信息工程 I基于 GAN 网络的多视点视频重建算法研究中文摘要多视点视频是通过多个相同型号的摄像机阵列从不同角度对 3D 场景进行拍摄，所以多视点视频能够提供丰富的 3D 场景信息，可以为用户提供多方位的沉浸感。因而多视点视频的数据量巨大，会对视频数据的压缩，存储和传输带来巨大的挑战。为了支持交互性，同时使用较少量的数据表示三维场景，多视点视频结合深度信息的方法应运而生。但是，深度图的精度制约了多视点视频的精确表示，不精确的深度信息不能在解码端绘制出正确的多视点图像；因此，现有方法获取的深度图尚需对深度图进行预处理的步骤，增加了

2、系统开销。本课题的主要研究内容是：通过在编码前端传输部分视点信息并在解码端利用GAN 网络重建出原始的多视点信息，并基于GAN网络详细介绍了多视点视频编码与重建的整体框架。在编码技术中，采用传输部分数据量实现重建这种思路的有纹理提取，压缩感知方法，在 3D视频编码中有混合分辨率方法以及多视点视频编码中基于深度图的绘制方法等，它们都是基于像素匹配获得相邻视点图像间的相关性。但是，像素无法表达图像特征的不变性，且易受噪声干扰。近年，由于基于深度学习的单图像超分辨率能够从高维的空间学习高低分辨率图像间的映射关系，实现低分辨率到高分辨率的重建，为上述问题提供新的解决思路。因此，本文的创新性工作具体如下

3、：（1）为了将多视点视频的多路信息输入到基于深度学习的多视点重建网络中，学习高维空间多视点视图间的映射关系，需要解决单图像输入多路信息问题。于是，基于运动的极线约束原理，提出了局部区域对极平面图的制取方法；并且将局部区域平面对极图作为实现多视点视频重建的一个重要的因素。（2）为了降低多视点视频的数据传输，对局部区域对极平面图偶数视点下采样形成混合分辨率的局部区域对极平面图；进一步地,为了使该网络能较好的解决多视点视频的重建任务，通过GAN网络更精确地生成稠密视点的局部图像。实验部分在多个多视点视频序列上进行测试实验，并将提出的基于GAN网络的多视点视频重建方法与最新的基于深度图的绘制技术比较，

4、结果表明提出的方法相对传统方法在PSNR 均值上有了 0.5dB提升，MOS均分上有 0.3 的改善。II关键字：GAN,重建,编码,多视点视频Research on Reconstruction of Multi-view Video Based on GANAbstractMulti-view video captures 3D scenes from different angles through multiple cameras which array of the same model, so multi-view video can provide rich 3D scene i

5、nformation, which can provide users with multi-angle immersion. Therefore, the amount of data of multi-view video is huge, which brings great challenges to the compression, storage and transmission of video data. In order to support interactivity and use a small amount of data to represent 3D scenes

6、, a method of combining multi-view video with depth information has emerged. However, the accuracy of the depth map restricts the accurate representation of multi-view video, and the inaccurate depth information cannot render the correct multi-view image at the decoding end. Therefore, the depth map

7、 obtained by the existing method still needs to be preprocessed, which increases the system overhead.The main research content of this subject is to reconstruct the original multi-viewpoint information by transmitting part of the viewpoint information at the encoding end and using the Generative Adv

8、ersarial Network (GAN) at the decoding end, and introduce the multi-view video coding based on the GAN network and the reconstruction of the overall framework. In the coding technology, the method of reconstructing by using the transmission part data amount includes texture extraction, compressed se

9、nsing, mixed resolution method in 3D video coding, and depth image based on rendering method in multi-view video coding, etc. Correspondence between adjacent viewpoint images in these methods is obtained based on pixel matching. However, pixels cannot represent the invariance of image features and a

10、re susceptible to noise interference. In recent years, single-image super-resolution based on deep learning can learn the mapping relationship between high- and low-resolution images from high-dimensional space, and realize low-resolution to high-resolution reconstruction, which provides a new solut

11、ion to the above problems. Therefore, the innovative work of this paper is as follows:III（1）In In order to input multi-channel information of multi-view video into the multi-view reconstruction network based on deep learning and learn the mapping relationship between multi-view views in high-dimensi

12、onal space, it is necessary to solve the problem of single image input multi-channel information. Based on the principle of polar line constraint, the method of obtaining the local area to the pole plan is proposed. The local area plane pole image is used as an important factor to realize multi-view

13、 video reconstruction.（2）In order to reduce the data transmission of multi-view video, we down-sample the even-view view of the local area to form a local resolution of the hybrid resolution； Further, in order to enable the network to better solve the reconstruction task of multi-view video. We use

14、the GAN network to more accurately generate local area plane pole images of dense viewpoints.The experimental part conducts test experiments on multiple multi-view video sequences, and compares the proposed multi-view video reconstruction method based on GAN network with the latest depth map-based r

15、endering technology. The results show that the proposed method is compared with the traditional method,which has 0.5dB improvement on the PSNR average value, the MOS average has an improvement of 0.3. Keywords: GAN, Reconstruction, Coding, Multi-view videoIV目录中文摘要 .IAbstract .II第一章绪论 .11.1 研究背景及意义

16、.11.2 国内外现状 .21.2.1 多视点视频重建的研究现状 .21.2.2 对抗生成网络研究现状 .31.3 主要研究内容 .51.4 论文结构安排 .5第二章基于 GAN 网络的超高分辨率复原技术 .72.1 引言 .72.2 卷积神经网络 .82.2.1 卷积原理 .82.2.2 卷积层 .92.2.3 池化层 .102.2.4 全连接层 .102.2.5 激活函数 .112.3 经典卷积神经网络 .112.3.1 VGGNet .112.3.2 ResNet.132.4 SR-GAN 网络 .142.5 本章小结 .16第三章多视点视频数据集制取算法 .183.1 引言 .18

17、3.2 对极平面图 .183.2.1 对极几何 .183.2.2 运动的极线约束 .193.2.3 极线平面图像的定义 .203.2.4 极线平面图像的构建 .203.3 局部区域对极平面图 .213.4 实验过程和结果分析 .233.4.1 数据集 .233.4.2 实验设置 .25V3.5 本章小结 .29第四章基于 GAN 网络的多视点视频重建方法 .304.1 引言 .304.2 基于 GAN 网络的多视点视频的重建框架 .314.3 算法改进 .324.3.1 映射关系对的改进 .334.3.2 卷积实现重建 .344.3.3 网络超参数的微调 .354.4 RMV-GAN 网络架

18、构 .374.5 多视点视频的重建 .394.6 实验过程与结果分析 .404.6.1 实验条件及超参数设置 .404.6.2 实验设置 .414.6.3 实验结果分析 .444.7 本章小结 .48总结与展望 .49致谢 .50参考文献 .51个人简历、在学校期间的科研成果及发表的学术论文 .55基于 GAN 网络的多视点视频重建算法研究1章 1 章绪论1.1 研究背景及意义近年来，随着人们对于物质生活要求的提高，在视觉体验方面的需求也是与日俱增。就影视方面而言，从以前普通的 2D 电影到后来发展出的 3D 电影，尤其是 2009 年制作规模最大，技术最先进的 3D 电影阿凡达的诞生，为后

19、来 3D电影的迅猛发展打下了坚实的基础。之后随着 3D数字电影的成功和普及，3D电视也开始走入千万家，2011 年中国 3D 电视出货量约 480 万台，其渗透率已达 12%。现今由于VR技术的良好发展前景，它已经成功为诸如教育产业、房地产建筑、影视娱乐等多个领域带来新的发展方向。VR技术甚至已经得到国家政府的认可，在 2016 年 9 月 3 日的G20 峰会上，习近平曾点名虚拟现实（VR），并制定了很多相关的推动发展的政策，如：工信部、发改委将VR、AR纳入智能硬件产业创新发展专项行动；发改委要求尽快出台虚拟现实关键技术标准；国务院“十三五”科技创新规划：重点研发虚拟现实与增强现实等等。目

20、前，Facebook 、索尼、HTC已面向消费者推出新产品， VR产品的销量在 2020 年前或将达到 210 亿美元。由此可以看出，人们对于视觉沉浸感的探索和尝试从没有止步过。在技术角度：由于 3D电影在进行拍摄时采用的双镜头，拍下的画面需要的存储容量也是双倍的；在 3D电视应用中视频信号是其载体和表现形式，立体电视和自由视点电视（Free-viewpoint Television，FTV）通常都需要 2 路以上高清视频，视频信号数据量巨大，并以时间流的形式存在；同样地，在VR漫游中摄像机数目众多，场景复杂度高，存储量惊人；如果要进行连续的场景漫游，需要很大的带宽支撑。并且，尽管近几年计算机

21、的飞速发展不断增强了CPU，GPU的运算能力，但是依旧没有达到VR技术的需求标准。一个更加严峻的事实是：面对计算速度、存储空间、传输速率和续航能力方面的要求，目前尚未有一款轻便的硬件可以实现。因此，相对于单视点视频来说，多视点视频的数据量与摄像机的数目是成正比关系的；即拍摄时摄像机越多，多视点视频存储下来的数据量就越大。若要摘下这个限制多视点视频广泛应用的枷锁，巨大的数据量是解决的关键。从经济角度：迅雷CEO陈磊指出：“以 1080p、20 多兆来计算，如今的VR体验 360 度传输，要保持同样的利润率，会员需要付费 3000 元/ 月来观看工程硕士学位论文2VR”。所以，若用户想沉浸在高质量

22、且连续的VR视频中，所需要的带宽和存储容量是现有技术水平难以承载的，同时远远超出大众的经济承受能力。为了对巨大的数据量进行压缩，业界内提出认可度较高的通过视点预测机制的多视点视频（Multiview Video Coding，MVC）编码标准 1，这个编码标准是由ITU-T和MPEG 的视频组（ JVT）共同提出的。相较于单独对每路视点单独编码，尽管MVC的编码效率方面可以有 1/4 的提高，但是MVC的编码效率还是取决于视点数目；如果视点数量众多，编码效率依旧会降低，FTV等多视点视频的编码需求还是得不到满足。为此，MPEG于 2007 年 4 月提出多视点视频加深度（Multiview V

23、ideo plus Depth，MVD） 2的编码形式用于FTV 的压缩。这种立体表现形式主要的突出点是引入了深度图，所以在传输过程中不仅需要传输编码原始每路的多视点视频，还要传输编码对应的深度视频；为了降低传输数据量，需要有能够同时对多视点视频与深度视频实现高效编码的方法，但是深度图像是反映物体深度的一种表现形式，它和传统的二维图像是不同的，因此其编码方法也需要重新设计。但是，这 2 种编码技术面对VR全景漫游等更多新型的多视点视频应用依旧面临很大的挑战。1.2 国内外现状1.2.1 多视点视频重建的研究现状常规的压缩编码技术是通过DCT变换，量化，熵编码，预测，码率控制等手段对视频信号进行

24、压缩。这种思路是基于视频数据信号本身存在时间和空间上冗余，通过统计模型方法提出的。从冗余角度分析，若在对视频数据进行采集时就能不采集冗余信息，这势必会大大减少采集的时间或者空间成本；即压缩前处理的步骤是不可或缺的，而目前有潜力的压缩前处理的技术主要有混合分辨率，滤波，压缩感知及纹理编码等，同时混合分辨率也是一种用于重建的后处理技术，在后处理重建技术中，基于深度图的绘制（Depth Image based on Rendering，DIBR）在解决多视点视频问题中发挥出优越的性能。基于此，本课题压缩编码拟从混合分辨率和DIBR两方面进行调研：由于数据传输和容量存储的限制，如何将高质量的超分辨率视图传输给终端用户是目前相关领域研究的一个巨大挑战 3。为此一种多视角混合分辨率框架下的超分辨技术 4-8被提出，该技术主要是利用高分辨率图中的高频部分来增加相邻低分辨率视点的图像质量。2010 年Aflaki等人 9提出了截取利用低分辨率

展开阅读全文