Image restoration techniques covered such as denoising, deblurring and super-resolution for 3D images and models.
From classical computer vision techniques to contemporary deep learning based processing for both ordered and unordered point clouds, depth maps and meshes.
2. Deep Learning for Structure-from-Motion (SfM)
https://www.slideshare.net/PetteriTeikariPhD/deconstructing-sfmnet
Dataset creation for Deep Learning-based Geometric Computer Vision problems
https://www.slideshare.net/PetteriTeikariPhD/dataset-creation-for-deep-learningbased-geometric-computer-visio
n-problems
Emerging 3D Scanning Technologies for PropTech
https://www.slideshare.net/PetteriTeikariPhD/emerging-3d-scannng-techno
logies-for-proptech
Geometric Deep Learning
https://www.slideshare.net/PetteriTeikariPhD/geometric-deep-learning
4. Data structures for real estate scans
RGB+D Pixelgrid presenting color and depth
or 2.5D image
Example from
Prof.Li
Mesh (Polygon) from voxel data(“3D pixels”)
Voxel grid meshing using marching cubes (StackExchange)
Point Cloud unordered datatypically (i.e. not on a grid but sparse pointson non-
integercoordinates)
5. Denoising remove noise from signal
A spatially cohesive superpixel model
for image noise level estimation
Peng Fua, Changyang Li, Weidong Cai, Quansen Sun
Neurocomputing
Volume 266, 29 November 2017, Pages 420-432
https://doi.org/10.1016/j.neucom.2017.05.057
Superpixels generated by
SCSM and SLIC from the
noisy “Fish” image at various
noise levels. (a)–(d) Original and
noisy images; noise SD values of
10, 20, and 30 from (b) to (d),
respectively. (e)–(h) Superpixels
generated by SCSM from the
corresponding images. (i)–(l)
Superpixels generated by SLIC
from the corresponding images.
This paper proposes an
automatic noise level
estimation method. In
contrast with the
conventional rectangular-
block division algorithms, the
images are decomposed into
superpixels that exhibit better
adherence to the local image
structures, thus generating a
division into small regions
that are more likely to be
homogeneous. Moreover, the
effective use of the spatial
neighborhood information
makes the SCSM more
insensitive to image noise
https://sites.google.com/site/pierrickcoupe/softwares/denoising-for-medical-imaging/mri-denoising
http://dx.doi.org/10.1002/jmri.22003
https://youtu.be/5Y7yeRo5vGE
Help selecting noise
reduction plugin for
Photoshop CC 2014
https://www.dpreview.com/forums
/post/54065189
Imagenomic Noiseware,
Neat Image, Nik Software
DFine 2, Topaz DeNoise
5, NoiseNinja
Especially with classical image processing algorithms, it is beneficial to reduce noise before applying the actual processing / analysis.
6. DeConvolution / Deblurring
Recent Progress in Image Deblurring
Ruxin Wang, Dacheng Tao
(Submitted on 24 Sep 2014)
https://arxiv.org/abs/1409.6838
http://blogs.adobe.com/photoshop/2011/10/behind-all-the-buzz-deblur-sneak-peek.html
Deconvolve the image with the Point Spread Function (PSF) that convolved the scene during image formation to sharpen the image
7. edge-Aware Image smoothing
Smooth constant patches while retaining sharp edges instead of “dumb Fourier low-pass filter” that destroys the edges
Deep Edge-Aware Filters
http://lxu.me/projects/deepeaf/ | http://proceedings.mlr.press/v37/xub15.html
L0
smoothing
BLF Bilateral Filter
Our method is based on a
deep convolutional neural
network with a gradient
domain training
procedure, which gives
rise to a powerful tool to
approximate various
filters without knowing the
original models and
implementation details.
Efficient High-Dimensional, Edge-Aware Filtering | http://doi.org/10.1109/MCG.2016.119
Hui Huang, Shihao Wu, Minglun Gong, Daniel Cohen-Or, Uri Ascher,
and Hao Zhang, "Edge-Aware Point Set Resampling," ACM
Trans. on Graphics (presented at SIGGRAPH 2013), Volume 32,
Number 1, Article 9, 2013. [PDF | Project page with source code
https://doi.org/10.1145/2421636.2421645
The denoising capability of the blurring-sharpening strategy based
on the tooth volume (mesh). (a-d) are obtained by adding one
particular type of noise, as indicated by the corresponding
captions. SNR (in dB) of the noisy and the smoothed volumes are
shown in each figure.
8. Super-resolution
Depending on your background, super-resolution mean slightly different things
https://www.ucl.ac.uk/super-resolution:
Super-resolution imaging allows the imaging of fluorescently
labelled probes at a resolution of just tens of nanometers,
surpassing classic light microscopy by at least one order of
magnitude. Recent advances such as the development of photo-
switchable fluorophores, high-sensitivity microscopes and
single-molecule localisation algorithms make super-resolution
imaging rapidly accessible to the wider life sciences research
community.
At UCL we are currently taking a multidisciplinary effort to
provide researchers access to super-resolution imaging systems.
The Super-Resolution Facility (SuRF) currently features
commercial systems supporting the PALM/STORM, SIM and
STED super-resolution approaches.
Beyond diffraction-limited Multiframe
‘Statistical upsampling’
e.g. deep learning
http://www.infrared.avio.co.jp/en/products/ir-thermo/lineup/r500/index.html
http://www.robots.ox.ac.uk/~vgg/research/SR/
https://techcrunch.com/2016/06/20/twitter-is-buying-magic-pony-te
chnology-which-uses-neural-networks-to-improve-images/
Deep Learning for Isotropic Super-Resolution
from Non-Isotropic 3D Electron Microscopy
Larissa Heinrich, John A. Bogovic, Stephan Saalfeld
HHMI Janelia Research Campus, Ashburn, USA
https://arxiv.org/abs/1706.03142
9. Geometrical super-resolution
Both features extend over 3 pixels but in
different amounts, enabling them to be
localized with precision superior to pixel
dimension
Multi-exposure image noise reduction
When an image is degraded by noise, there can be more detail in the average of
many exposures, even within the diffraction limit. See example on the right.
Single-frame deblurring
Known defects in a given imaging situation, such as defocus or aberrations,
can sometimes be mitigated in whole or in part by suitable spatial-frequency
filtering of even a single image. Such procedures all stay within the diffraction-
mandated passband, and do not extend it.
Sub-pixel image localization
The location of a single source can be determined by computing the "center of
gravity" (centroid) of the light distribution extending over several adjacent
pixels (see figure on the left). Provided that there is enough light, this can be
achieved with arbitrary precision, very much better than pixel width of the
detecting apparatus and the resolution limit for the decision of whether the
source is single or double. This technique, which requires the presupposition
that all the light comes from a single source, is at the basis of what has
becomes known as superresolution microscopy, e.g. STORM, where
fluorescent probes attached to molecules give nanoscale distance
information. It is also the mechanism underlying visual hyperacuity.
Bayesian induction beyond traditional diffraction limit
Some object features, though beyond the diffraction limit, may be known to be
associated with other object features that are within the limits and hence
contained in the image. Then conclusions can be drawn, using statistical
methods, from the available image data about the presence of the full object.
The classical example is Toraldo di Francia's proposition of judging whether an
image is that of a single or double star by determining whether its width
exceeds the spread from a single star. This can be achieved at separations well
below the classical resolution bounds, and requires the prior limitation to the
choice "single or double?"
The approach can take the form of extrapolating the image in the frequency
domain, by assuming that the object is an analytic function, and that we can
exactly know the function values in some interval. This method is severely
limited by the ever-present noise in digital imaging systems, but it can work for
radar, astronomy, microscopy or magnetic resonance imaging. More recently,
a fast single image super-resolution algorithm based on a closed-form solution
l2
problems has been proposed (Zheo et al. 2016) and demonstrated to
accelerate most of the existing Bayesian super-resolution methods
significantly.
WIKIPEDIA: Detail-revealing Deep Video
Super-resolution
Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, Jiaya Jia (Submitted
on 10 Apr 2017)
https://arxiv.org/abs/1704.02738
Recent deep-learning-based video SR methods [Caballero et al. 2016;
Kappeler et al. 2016] compensate inter-frame motion by aligning all
other frames to the reference one, using backward warping. We show
that such a seemingly reasonable technical choice is actually not
optimal for video SR, and improving motion compensation can directly
lead to higher quality SR results. In this paper, we achieve this by
proposing a sub-pixel motion compensation (SPMC) strategy, which is
validated by both theoretical analysis and extensive experiments.
10. Optical ordiffractive super-resolution
WIKIPEDIA:
Substituting spatial-frequency bands. Though the bandwidth
allowable by diffraction is fixed, it can be positioned anywhere
in the spatial-frequency spectrum. Dark-field illumination in
microscopy is an example. See also aperture synthesis.
Multiplexing spatial-frequency bands such as structured
illumination, An image is formed using the normal passband of
the optical device. Then some known light structure, for
example a set of light fringes that is also within the passband, is
superimposed on the target. The image now contains
components resulting from the combination of the target and
the superimposed light structure, e.g. moiré fringes, and carries
information about target detail which simple, unstructured
illumination does not. The “superresolved” components,
however, need disentangling to be revealed.
Multiple parameter use within traditional diffraction limit
If a target has no special polarization or wavelength properties,
two polarization states or non-overlapping wavelength regions
can be used to encode target details, one in a spatial-
frequency band inside the cut-off limit the other beyond it.
Both would utilize normal passband transmission but are then
separately decoded to reconstitute target structure with
extended resolution.
Probing near-field electromagnetic disturbance The usual
discussion of superresolution involved conventional imagery of
an object by an optical system. But modern technology allows
probing the electromagnetic disturbance within molecular
distances of the source which has superior resolution
properties, see also evanescent waves and the development
of the new Super lens.
Optical negative-index metamaterials
Nature Photonics 1, 41 - 48 (2007)
doi: 10.1038/nphoton.2006.49 | Cited by 2372
Sub–Diffraction-Limited Optical Imaging
with a Silver Superlens
Science 22 Apr 2005: Vol. 308, Issue 5721, pp. 534-537
doi: 10.1126/science.1108759 | Cited by 3219 articles
Optical and acoustic metamaterials: superlens,
negative refractive index and invisibility cloak
Journal of Optics, Volume 19, Number 8
http://dx.doi.org/10.1088/2040-8986/aa7a1f
→ Special issue on the history of metamaterials
http://zeiss-campus.magnet.fsu.edu/articles/superresolution/supersim.html
11. Inpainting
Paint over artifacts / missing values using surrounding pixels (“Clone Tool in Photoshop”), or more statistically using the
same image (“Content-Aware Fill”), or bigger databases for example in deep learning pipelines
The TUM-Image Inpainting Database
Technische Universität München
https://www.mmk.ei.tum.de/tumiid/
Context
Encoders:
Feature
Learning by
Inpainting
(2016) Deepak Pathak,
Phillip Krähenbühl, Jeff
Donahue, Trevor Darrell,
Alexei A. Efros
http://people.eecs.berkel
ey.edu/~pathak/context_en
coder/ Improve your skin with Inpaint
https://www.theinpaint.com/
Guillemot and Le Meur (2014)
http://dx.doi.org/10.1109/MSP.2013.2273004
Yang et al. (2017) https://arxiv.org/abs/1611.09969
13. Multiframe 2D super-resolution #1
A Unified Bayesian Approach to Multi-Frame
Super-Resolution and Single-Image
Upsampling in Multi-Sensor Imaging
Thomas Köhler, Johannes Jordan, Andreas Maier and Joachim Hornegger
Proceedings of the British Machine Vision Conference (BMVC), pages 143.1-143.12. BMVA Press,
September 2015.
https://dx.doi.org/10.5244/C.29.143
Robust Multiframe Super-Resolution
Employing Iteratively Re-Weighted
Minimization
Thomas Köhler ; Xiaolin Huang ; Frank Schebesch ; André Aichert ; Andreas Maier ;
Joachim Hornegger
IEEE Transactions on Computational Imaging ( Volume: 2, Issue: 1, March 2016 )
https://doi.org/10.1109/TCI.2016.2516909
Future work should consider an adaption of
our prior to blind super-resolution where the
camera PSF is unknown or other image
restoration problems, e. g. image
deconvolution.
In this work, we limited ourselves to non-blind
super-resolution, where the PSF is assumed
to be known. However, iteratively reweighted
minimization could be augmented by blur
estimation. Another promising extension is
joint motion estimation and super-
resolution, e. g. by using the nonlinear least
squares algorithm. Conversely, blur and
motion estimation can also benefit when
using it in combination with our spatially
adaptive model. One further direction of
our future work is to make our approach
adaptive to the scene content, e. g. by a local
selection of the sparsity parameter p.
15. Point cloud acquisition
Guide to quickly build high-quality three-dimensional
models with a structured light range scanner
Bao-Quan Shi and Jin Liang
OSA Applied Optics Vol. 55, Issue 36, pp. 10158-10169 (2016)
https://doi.org/10.1364/AO.55.010158 [PDF] researchgate.net
16. Multiframe techniques or multisweep techniques #1
High Fidelity Scan Merging
Computer Graphics Forum July 2010
http://doi.org/10.1111/j.1467-8659.2010.01773.x
For each scanned object 3D triangulation laser
scanners deliver multiple sweeps corresponding to
multiple laser motions and orientations.
Scan integration as a labelling problem
Pattern Recognition Volume 47, Issue 8, August 2014, Pages 2768-2782
https://doi.org/10.1016/j.patcog.2014.02.008
Example of overlapping scans.
This head is such a complex
structure that not less than 35
scans were acquired to fill in
most holes.
Example of two overlapping
scans, points of each
scanned are first meshed
((c)-(d)) separately. The
result can be compared to
the meshing of points of
both scans together (d)
Comparison of registration
of two scans (colored in
different colors on the top
figure) using Global Non
Rigid Alignment (middle)
and scale space merging
(bottom).
Comparisons of the merging (a) with a
level set (Poisson Reconstruction)
reconstruction method of the
unmerged scans point set (b) and a
filtering of the unmerged scans point
set (c). The level set method
obviously introduces a serious
smoothing, yet does not eliminate the
scanning boundary lines. The bilateral
filter, applied until all aliasing artifacts
have been eliminated, over-smoothes
some parts of the shape.
17. Multiframe techniques or multisweep techniques #2
Density adaptive trilateral scan integration
method
Bao-Quan Shi and Jin Liang
Applied Optics Vol. 54, Issue 19, pp. 5998-6009 (2015)
https://doi.org/10.1364/AO.54.005998
Multi-Focus Image Fusion Via Coupled
Sparse Representation and Dictionary
Learning
Rui Gao, Sergiy A. Vorobyov (Submitted on 30 May 2017)
Aalto University, Dept. Signal Processing and Acoustics
https://arxiv.org/abs/1705.10574
Standard pipelining of 3D modeling of commercial scanner XJTUOM
Integration of 26 partially overlapping scans
of a dice model. (a) SDF method. (b)
Screened Poisson method. (c) Advancing
front triangulation method. (d) K-means
clustering method. (e) The new method.
The new method is more robust to large gaps/registration errors than previous methods.
Owing to the noise-removal property of the trilateral shifting procedure and mean-shift
clustering algorithm, the new method produces much smoother surfaces.
18. Multiframe techniques or multisweep techniques #3
Crossmodal point cloud registration in the
Hough space for mobile laser scanning data
Bence Gálai ; Balázs Nagy ; Csaba Benedek
Pattern Recognition (ICPR), 2016
https://doi.org/10.1109/ICPR.2016.7900155
Top row: Point clouds of three different vehicle mounted Lidar systems (Velodyne HDL64 and VLP16 I3D
scanners, and a Riegl VMX450 MMS), captured from the same scene at Fővám Tér, Budapest. Bottom row:
segmentation results for each cloud by our proposed method
19. Multiframe techniques or multisweep techniques #4
Frame Rate Fusion and Upsampling of
EO/LIDAR Data for Multiple Platforms
T. Nathan Mundhenk ; Kyungnam Kim ; Yuri Owechko
Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE
https://doi.org/10.1109/CVPRW.2014.117
The left pane shows the PanDAR demonstrator sensors with the
red Ladybug sensor mounted over the silver Velodyne 64E
LIDAR. A custom aluminum scaffold connects the two sensors.
The right pane shows the graphical interface with displays of the
3D model in the top, help menus and the depth map at the
bottom.
Multithreaded programing and GP-GPU methods allow us to obtain 10 fps with a Velodyne 64E LIDAR
completely fused in 360° using a Ladybug panoramic camera.
PanDAR: a wide-area, frame-rate, and full
color lidar with foveated region using
backfilling interpolation upsampling
T. Nathan Mundhenk; Kyungnam Kim; Yuri Owechko
Proceedings Volume 9406, Intelligent Robots and Computer Vision XXXII: Algorithms and
Techniques; 94060K (2015)
Event: SPIE/IS&T Electronic Imaging, 2015, San Francisco, California, United States
http://dx.doi.org/10.1117/12.2078348
20. Multiframe techniques or multisweep techniques #5
Upsampling method for sparse light
detection and ranging using coregistered
panoramic images
Ruisheng Wang; Frank P. Ferrie
J. of Applied Remote Sensing, 9(1), 095075 (2015)
http://dx.doi.org/10.1117/1.JRS.9.095075
See-through problem and invalid light detection and ranging (LiDAR,
Velodyne HDL-64E) points returned from building interior. (a) Camera
image rendered from a certain viewpoint, (b) corresponding LiDAR
image rendered from the same viewpoint of (a), (c) corresponding
LiDAR image rendered from a top-down viewpoint
“There are a number of improvements that are possible and are topics for future work. The initial depth ordering
that used to determine visibility assumes a piecewise planar partition of the scene. While this can suffice for the
urban environment considered here, a more general approach would consider a richer form of representation, e.g.,
using statistical modeling methods. Cues that are available in the coregistered intensity data, such as the loci of
occluding contours, could also be exploited. At present, our interpolation strategy samples image space to determine
connectivity and backprojects to 3-D, resulting in a nonuniform interpolation. A better solution would be to perform
the sampling in 3-D by backprojecting the 2-D boundary and forming a 3-D bounding box that could then be
interpolated at the desired resolution. In the limit, true multimodal analysis would consider the joint distribution of
both intensity and depth information with the aim of inferring more detailed interpolation functions. With the
availability of sophisticated platforms such as Navteq True, there is clearly an incentive to move in these directions.”
22. Point cloud processing Reviews
Point Cloud Processing
Raphaële Héno andLaure Chandelier
in 3D Modeling of Buildings. Chapter 5. (2014)
http://doi.org/10.1002/9781118648889.ch5
A review of algorithms for filtering the 3D point cloud
Signal Processing: Image Communication
Volume 57, September 2017, Pages 103-112
https://doi.org/10.1016/j.image.2017.05.009
Octree structuring: point cloud and different
levels of the hierarchical grid
Example of significant noise on
the profile view of a target: the
standard deviation for a point
cloud at target level is 8 mm
A brief discussion of future research directions are presented as follows.
1) Combination of color and geometric information: For point clouds, especially these containing color
information, a pure color or pure geometric attributes based method cannot work well. Hence, it is expected to
combine the color and geometric information in the filtering process to further increase the performance of a
filtering scheme.
2) Time complexity reduction: Because point clouds contain a large number of points, some of which can be up to
hundreds of thousands or even millions of points, computation on these point clouds is time consuming. It is
necessary to develop filtering technologies to filter point cloud effectively to reduce time complexity.
3) Filtering on point cloud sequence: Since object recognition from a point cloud sequence will become the future
research direction. filtering the point cloud sequence will help to improve the performance and accuracy of object
recognition.
23. Point cloud processing SOFTWARE
PCL Point Cloud Library (C++)
http://pointclouds.org/
https://github.com/PointCloudLibrary
MeshLab with beginner-friendly graphical front-end
http://www.meshlab.net/
https://github.com/cnr-isti-vclab/meshlab
CGAL Computational Geometry
Algorithms Library (C++)
https://www.cgal.org/
https://github.com/CGAL/cgal
24. Point cloud denoising #1
Similarity based filtering of point clouds
Julie Digne
Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE
https://doi.org/10.1109/CVPRW.2012.6238917
Photogrammetric DSM denoising
Nex, F; Gerke, M. The International Archives of Photogrammetry, Remote Sensing and Spatial
Information Sciences; Gottingen XL.3: 231-238. Gottingen: Copernicus GmbH. (2014)
http://dx.doi.org/10.5194/isprsarchives-XL-3-231-2014
Differences between ground
truth and noisy DSM
Photogrammetric Digital Surface Models (DSM) are usually affected by both random
noise and gross errors. These errors are generally concentrated in correspondence
of occluded or shadowed areas and are strongly influenced by the texture of the
object that is considered, or the number of images employed for the matching.
In the future, further tests will be performed on other real DSM in order to assess the
reliability of the developed method in very different operative conditions. Then, the
extension from the 2.5D case to the fully 3D will be performed and further
comparisons with other available denoising algorithms will be performed, as well.
In addition, a key feature of our method is that it is independent of a surface mesh: it can work
directly on point clouds, which is useful, since building a mesh of a noisy point cloud is never
easy, whereas building a mesh of a properly denoised shape is well understood. A possible
extension for this work would be to use the filter as a projector onto the surface, in a spirit
similar to [Lipman et al. 2007] for example.
25. Point cloud denoising #2
Point Cloud Denoising via Moving RPCA
E. Mattei, A. Castrodad
Computer Graphics Forum (2106). doi: 10.1111/cgf.13068
Guided point cloud denoising via sharp feature
skeletons
The Visual Computer June 2017, Volume 33, Issue 6–8, pp 857–867
Yinglong Zheng, Guiqing Li, Shihao Wu, Yuxin Liu, Yuefang Gao
https://doi.org/10.1007/s00371-017-1391-8
Denoising synthetic datasets of two planes
meeting at increasingly shallow angles (20.4K
points) with added Gaussian noise of standard
deviation equal to 1% of the length of the
bounding box diagonal. The two planes meet at
an angle of 140º, 150º and 160º. The first and
second rows show the noisy 3D data and 2D
transects, respectively. Rows 3–5 show the
results of the bilateral filter, AWLOP and
MRPCA.
Denoising of the Vienna cathedral SfM model.
The noisy input was processed with MRPCA
followed by a simple outlier removal method
using Meshlab.
Although MRPCA is robust against outliers, this
robustness is achieved only locally. One simple
modification to achieve global outlier robustness
is to use an l1
data fitting term in problem (P6).
Although the l1
-norm will be able to handle the
global outliers better than the Frobenius norm
used in this work, the computational cost will
increase significantly. We point out that the size
of the neighbourhoods is set globally. One
improvement over the current method could be
to make the neighbourhood size a function of
the local point density. This could have a positive
effect when handling datasets with spatially
varying noise
26. Point cloud Edge Detection #1
Fast and Robust Edge Extraction in Unorganized
Point Clouds
Dena Bazazian ; Josep R. Casas ; Javier Ruiz-Hidalgo
Digital Image Computing: Techniques and Applications (DICTA), 2015
https://doi.org/10.1109/DICTA.2015.7371262
27. Point cloud Edge Detection #2
Segmentation-based Multi-Scale Edge Extraction
to Measure the Persistence of Features in
Unorganized Point Clouds
Dena Bazazian ; Josep R. Casas ; Javier Ruiz-Hidalgo
"12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications". Porto:
2017, p. 317-325.
http://dx.doi.org/10.5220/0006092503170325
Estimating the neighbors of a sample
point on the ear of bunny at large scales:
(a) far away neighbors may belong to
foreign surfaces when Euclidean
distance is used; (b) geodesic distance
is a better choice to explore large local
neighborhoods; (c) the point cloud can
be segmented to distinguish different
surfaces.
28. Point cloud super-resolution #1
LidarBoost: Depth superresolution for ToF 3D
shape scanning
Sebastian Schuon ; Christian Theobalt ; James Davis ; Sebastian Thrun
Computer Vision and Pattern Recognition, 2009. CVPR 2009
https://doi.org/10.1109/CVPR.2009.5206804
A new upsampling method for mobile LiDAR
data
Ruisheng Wang ; Jeff Bach ; Jane Macfarlane ; Frank P. Ferrie
Applications of Computer Vision (WACV), 2012 IEEE
https://doi.org/10.1109/WACV.2012.6162998
Real scene - wedges and panels (a): This scene with many depth edges (b) demonstrates the true resolution gain.
Image-based super-resolution (IBSR) (c) demonstrates increased resolution at the edges, but some aliasing
remains and the strong pattern in the interior persists. LidarBoost (d) reconstructs the edges much more clearly
and there is hardly a trace of aliasing, also the depth layers visible in the red encircled area are better captured.
Markov-Random-Field (MRF) upsampling (e) oversmooths the depth edges and in some places allows the low
resolution aliasing to persist.
The main contributions of this paper are
●
A 3D depth sensor superresolution method that incorporates ToF specific knowledge
and data. Additionally a new 3D shape prior is proposed, that enforces 3D specific
properties.
●
A comprehensive evaluation of the working range and accuracy of our algorithm using
synthetic and real data captured with a ToF camera.
●
Only few depth superresolution approaches have been developed previously. We show
that our algorithm clearly outperforms the most related approaches.
29. Point cloud super-resolution #2
Geometry Super-Resolution by Example
Thales Vieira ; Alex Bordignon ; Thomas Lewiner ; Luiz Velho
Computer Graphics and Image Processing (SIBGRAPI), 2009
https://doi.org/10.1109/SIBGRAPI.2009.10
Laser stripe model for sub-pixel peak
detection in real-time 3D scanning
Ingmar Besic ; Zikrija Avdagic
Systems, Man, and Cybernetics (SMC), 2016 IEEE
https://doi.org/10.1109/SMC.2016.7844912
Our tests show that noise does not vary significantly when observed on
different color channels. Thus estimator algorithms can utilize any of color
channels without sacrificing precision due to significantly increased noise.
However if clever choice it to be made then estimator should opt for the green
channel as it provides the most reliable stripe intensity data for both black and
white surfaces across the whole modulation ROI. We have found that noise
does not have continuous uniform distribution PDF, but normal distribution
PDF and proposed a model that fits empirical data. Our measurements support
assumption that the laser stripe image has approximately Gaussian intensity
profile. However RMSE values show that single Gaussian curve fit is not the
best choice as stripe intensity profile is superposed with surface reflections.
After testing Gaussian fits from 1 to 8 we have concluded that models n > 2 are
not suitable as they produce false peaks or subtract light intensity to achieve
better RMSE. Thus we proposed Gaussian fit with two curves as optimal
model based on the empirical data.
Our future work is to target relations between
coefficients of proposed laser stripe intensity profile
and reduce their number if possible. Preliminary
test show that mean values b1
and b2
of Gaussian
curves tend to be equal or differ for subpixel
amount. It is not yet clear if this difference has any
significance or can be neglected. We also intend to
test our model with different angles between laser
source and target surface.
The proposed method is limited to models which have repeated occurrences of a shape,
and restrict the resolution increase to the regions of those occurrences. Increasing the
resolution of other parts would require inpainting-like tools to extrapolate the geometry
[Sharf et al. 2004], together with a superresolution scheme as an extension of the one
proposed here.
30. Point cloud super-resolution #3
Super-Resolution of Point Set Surfaces Using
Local Similarities
Azzouz Hamdi-Cherif,, Julie Digne, Raphaëlle Chaine
Computer Graphics Forum 2017
http://dx.doi.org/10.1111/cgf.13216
Super-resolution of a single scan of the Maya point set. Left: initial scan, right: super-resolution. For
visualization purposes, both are reconstructed with Poisson reconstruction
Super-resolution of an input shape with highly repetitive geometric texture. (a) Underlying shape to be
sampled by an acquisition device. (b) Low-resolved input sampling of the shape and local approximation with a
quadric at each point; geometric texture is the residue over the quadric. (c) Super-resolved re-sampling using our
method (fusion of super-resolved local patches). Right column: Generation of the super-resolved patches. (d)
Construction of a local descriptor of the residue over a low-resolution grid corresponding to the unfolded quadric;
blue points represent the height values estimated at bin centres, red points are the input points. (e) Similar
descriptor points are added (orange points) to the input points (in red) of the local descriptor. (f) A super-resolved
descriptor is computed from the set of red and orange points
Super-resolution of a single scan of the Persepolis point
set. Left: initial scan, right: super-resolution. The shape
details appear much sharper after the super-resolution
process. Parameters: r = 4 (Shape diagonal: 114), nbinslr
=
64, nbinssr
= 400 and rsim
= 0.2.
32. Point cloud Classification #2
Multi-class US traffic signs 3D recognition and
localization via image-based point cloud model
using color candidate extraction and texture-
based recognition
Vahid Balali, Arash Jahangiri and Sahar Ghanipoor Machiani
Advanced Engineering Informatics Volume 32, April 2017, Pages 263-274
https://doi.org/10.1016/j.aei.2017.03.006
An improved Structure-from Motion (SfM) procedure is developed to create a
clean 3D point cloud from the street level imagery and assist with accurate 3D
localization by color and texture features extraction. The detected traffic signs are
triangulated using camera pose information and their corresponding locations are
visualized in 3D environment. The proposed method as shown in Fig. 1, mainly consists
of three key components:
1) Detecting and classifying traffic signs using 2D images;
2) Reconstructing and automatically cleaning a 3D point cloud model; and
3) Recognizing and localizing traffic signs in 3D environment.
33. Point cloud Clustering for simplification #1
Adaptive simplification of point cloud using k-
means clustering
Bao-Quan Shi, Jin Liang, Qing Liu
Computer-Aided Design Volume 43, Issue 8, August 2011, Pages 910-922
https://doi.org/10.1016/j.cad.2011.04.001
A parallel point cloud clustering algorithm for
subset segmentation and outlier detection
Christian Teutsch , Erik Trostmann,, Dirk Berndt
Proceedings Volume 8085, Videometrics, Range Imaging, and Applications XI; 808509 (2011)
http://dx.doi.org/10.1117/12.888654
Cluster initialization of the Stanford bunny. Left: input
data. Middle: initialization of the cluster centroids. Right:
initial clusters are formed, and one cluster is shown in one
color.
If the noise of the 3D point cloud is
serious, effective noise filtering should
be conducted before the simplification.
The proposed method can also simplify
multiple 3D point clouds. Our future
research will concentrate on simplifying
multiple 3D point sets simultaneously
For example, a point set with two million coordinates is analyzed within three seconds and 15 million points within
35 seconds on Intel Core2 processor. It handles arbitrary ndimensional data formats, e.g. with additional color
and/or normal vector information since it is implemented as a template class. The algorithm is easy to parallelize
which further increases the computation performance on multi-core machines for most applications. The
feasibility of our clustering technique has been evaluated at the example of a variety of point clouds from
different measuring applications and 3D scanning devices.
34. Point cloud Object detection #1
Object Detection in Point Clouds Using Conformal
Geometric Algebra
Aksel Sveier, Adam Leon Kleppe, Lars Tingelstad and Olav Egeland
Advances in Applied Clifford Algebras 2017
http://dx.doi.org/10.1007/s00006-017-0759-1
In this paper we focus on the detection of primitive geometric
models in point clouds using RANSAC. A central step in the
RANSAC algorithm is to classify inliers and outliers. We show that
conformal geometric algebra (CGA) enable filters with geometrical
interpretation for inlier/outlier classification. The last step of the
RANSAC algorithm is fitting the primitive to its inliers. This can be
performed analytically with CGA, and the method is identical for
both planes and spheres.
Setup of the robotic pick-
and-place demonstration.
Point clouds from the 3D
camera is used for detecting
the plane, spheres and
cylinder. The information is
sent to the robot arm, which
is used to place the spheres
in the cylinder
Spheres were successfully detected in point clouds with up to 90%
outliers and cylinders could successfully be detected in point
clouds with up to 80% outliers. We suggested two methods for
constructing a cylinder from point data using CGA and found that
fitting two spheres to a cylinder gave performance advantages
compared to constructing a circle and line from 3 points on the
cylinder surface.
35. Point cloud Compression static #1
Research on the Self-Similarity of Point Cloud
Outline for Accurate Compression
Xuandong An; Xiaoging Yu ; Yifan Zhang
2015 International Conference on Smart and Sustainable City and Big Data (ICSSC)
http://dx.doi.org/10.1049/cp.2015.0272
The Lovers of Bordeaux (15.8 million
points). Exploiting self-similarity in the
model, we compress this representation
down to 1.15 MB. The resulting model
(right) is very close to the original one
(left), as the reconstruction error is less
than the laser scanner precision
(0.02mm) for 99.14% of the input points.
Point cloud compression approaches have mostly dealt with coordinates quantization via recursive
space partitioning [Gandoin and Devillers 2002; Schnabel and Klein 2006; Huang et al. 2006;
Smith et al. 2012]. In a nutshell, these approaches consist in inserting the points in a space partitioning
data structure (e.g. octree, kd-tree) of given depth, and to replace them by the center of the cell they
belong to.
Self similarity of measured signals has gained interest over the past decade: research on signal
processing as well as image processing has accomplished outstanding progress by taking advantage of
the self-similarity of the measure. In the image processing field, the idea originated in the non-local
means algorithm [Buades et al. 2005]: instead of denoising a pixel using its neighboring pixels, it is
denoised by exploiting pixels of the whole image looking similar. The similarity between pixels is
computed by comparing patches around them. Behind this powerful tool lies the idea that pixel far away
from the considered area might entail information that will help processing it, because of the natural self-
similarity of the image.
Self-similarity of surfaces has mainly been exploited for surface denoising applications: the non local
means filter has been adapted for surfaces be it meshes [Yoshizawa et al. 2006] or point clouds [
Adams et al. 2009; Digne 2012]. It was also used to define a Point Set Surface variant [
Guillemot et al. 2012] exhibiting better robustness to noise. Self-similarity of surfaces is obviously not
limited to denoising purposes. For example, analyzing the similarity of a surface can lead to detect
symmetries or repetition structures in surfaces [Mitra et al. 2006; Pauly et al. 2008]. An excellent
survey of methods exploiting symmetry in shapes can be found in [Mitra et al. 2013].
There are several ways in which our compression scheme could be improved:
● Exploiting patch-based representation, artifacts may appear in case of boundaries, which could be
dilated throughout decompression. One could mitigate this issue by adjusting the patch size
(clipping some outer grid cells) along boundaries. This would require to store one small integer for
each patch, at a small cost.
● Other seed picking strategies could be implemented, for example by placing the seeds so that they
minimize the local error, in the spirit of [Ohtake et al. 2006].
● Encoding per-point attribute such as normals and colors is possible with the same similarity-based
coder.
Perspectives: Although our algorithm is based on the exploitation of self-similarity on the whole surface,
most of the involved treatments remain local. This is a good prospect for handling data of ever increasing
size, using streaming processes. This is particularly important at a time when the geometric digitization
campaigns sometimes cover entire cities.
36. Point cloud Compression Dynamic #1 voxelized
Motion-Compensated Compression of Dynamic
Voxelized Point Clouds
Ricardo L. de Queiroz ; Philip A. Chou
IEEE Transactions on Image Processing ( Volume: 26, Issue: 8, Aug. 2017 )
https://doi.org/10.1109/TIP.2017.2707807
As a new concept for a new application, much has still to be fine tuned and perfected.
For example, the post-processing (in-loop or otherwise) is far from reaching its peak
performance. Both the morphological and the filtering operations are not well
understood in this context. Similarly, the distortion metrics or the voxel matching
methods are not developed to a satisfactory point. There is still plenty of work to be
done to extend the present framework to use B-frames (bidirectional prediction) and
to extend the GOF to a more typical IBBPBBP... format. Furthermore, we want to use
adaptive block sizes, which are optimally selected in an RD sense and we also want to
encode both the geometry and the color residues for the predicted (P and B) blocks.
Finally, rather than re-using the correspondences from the surface reconstruction
among consecutive frames, we want to develop efficient motion estimation
methods for use with our coder. Each of these enhancements should improve the
coder performance, such that there is a continuous sequence of improvements in this
new frontier to be explored.
37. Point cloud Compression Dynamic #2
Graph-Based Compression of Dynamic 3D Point
Cloud Sequences
Dorina Thanou ; Philip A. Chou ; Pascal Frossard
IEEE Transactions on Image Processing ( Volume: 25, Issue: 4, April 2016 )
https://doi.org/10.1109/TIP.2016.2529506
Example of a point cloud of the ‘yellow dress’
sequence (a). The geometry is captured by a
graph (b) and the r component of the color is
considered as a signal on the graph (c). The size
and the color of each disc indicate the value of the
signal at the corresponding vertex.
Octree decomposition of a 3D model for two
different depth levels. The points belonging to each
voxel are represented by the same color.
There are a few directions that can be explored in the future. First, it has
been shown in our experimental section that a significant part of the bit
budget is spent for the compression of the 3D geometry, which given a
particular depth of the octree, is lossless. A lossy compression scheme that
permits some errors in the reconstruction of the geometry could bring
non-negligible benefits in terms of the overall rate-distortion performance.
Second, the optimal bit allocation between geometry, color and motion vector
data stays an interesting and open research problem, due mainly to the
lack of a suitable metric that balances geometry and color visual quality.
Third, the estimation of the motion is done by computing features based
on the spectral graph wavelet transform. Features based on data-driven
dictionaries, such as the ones proposed in [Thanou et al. 2014], are
expected to increase significantly the matching, and consequently the
compression performance.
38. Dynamic meshes laplace operator
A 3D+t Laplace operator for temporal mesh
sequences
Victoria Fernández Abrevaya , Sandeep Manandhar, Franck Hétroy-Wheeler, Stefanie Wuhrer
Computers & Graphics Volume 58, August 2016, Pages 12-22
https://doi.org/10.1016/j.cag.2016.05.018
In this paper we have introduced a discrete Laplace operator for temporally coherent
mesh sequences. This operator is defined by modelling the sequences as CW complexes
in a 4-dimensional Riemaniann space and using Discrete Exterior Calculus. A
userdefined parameter is associated to the 4D space to control the influence of motionα
with respect to the geometry. We have shown that this operator can be expressed by a
sparse blockwise tridiagonal matrix, with a linear number of non zero coefficients with
respect to the number of vertices in the sequence. The storage overhead with respect to
frame-by-frame mesh processing is limited. We have also shown an application example,
as-rigid-as-possible editing, for which it is relatively easy to extend the classical static
Laplacian framework to mesh sequences with this matrix. Similar results to state-of-the-
art methods can be reached with a simple, global formulation.
This opens the possibility of many other problems in animation processing to be
tackled the same way by taking advantage of the existing literature on the Laplacian
operator for 3D meshes [Zhang et al. 2010]. In the future, we are in particular interested in
studying the spectral properties of the defined discrete Laplace operator.
39. Point cloud inpainting #1
Region of interest (ROI) based 3D inpainting
Shankar Setty, Himanshu Shekhar, Uma Mudenagudi
Proceeding SA '16 SIGGRAPH ASIA 2016 Posters Article No. 33
https://doi.org/10.1145/3005274.3005312
Point Cloud Data Cleaning and Refining for 3D As-
Built Modeling of Built Infrastructure
Abbas Rashidi and Ioannis Brilakis
Construction Research Congress 2016
http://sci-hub.cc/10.1061/9780784479827.093
Future experiments will also be required to quantitatively measure the
accuracy of the presented algorithms especially for the case of outliers’
removal. Developing robust algorithms for automatically recognizing 3D
objects throughout the built infrastructure PCD and therefore enhancing the
object oriented modeling stage is another possible direction for future
research.
40. Point cloud inpainting #2A
Dynamic occlusion detection and inpainting of in
situ captured terrestrial laser scanning point
clouds sequence
Chi Chen, Bisheng Yang
IEEE Transactions on Image Processing ( Volume: 25, Issue: 4, April 2016 )
https://doi.org/10.1016/j.isprsjprs.2016.05.007
In future work, the
proposed method will be
extended to incorporate
multiple geometric
features (e.g. shape index,
normal vector https://github.com/aboulch/normals_Hough
)
of local point distributions
to measure the geometric
consistency in the
background modeling
stage, aiming for higher
recall of the background
points during inpainting.
42. Point cloud Quality assessment #1
Towards Subjective Quality Assessment of Point
Cloud Imaging in Augmented Reality
Alexiou, Evangelos; Upenik, Evgeniy; Ebrahimi, Touradj
IEEE 19th International Workshop on Multimedia Signal Processing, Luton Bedfordshire, United Kingdom, October 16-18, 2017
https://infoscience.epfl.ch/record/230115
On the performance of metrics to predict quality
in point cloud representations
Alexiou, Evangelos; Ebrahimi, Touradj
SPIE Optics + Photonics for Sustainable Energy, San Diego, California, USA, August 6-10, 2017
https://infoscience.epfl.ch/record/230116
As it can be observed, our results show strong correlation between objective metrics and subjective scores in the
presence of Gaussian noise. The statistical analysis shows that the current metrics perform well when Gaussian noise is
introduced. However, in the presence of compression-like artifacts the performance is lesser for every type of content,
leading to a conclusion that the performance is content dependent. Our results show that there is a need for better
objective metrics that can more accurately predict all practical types of distortions for a wide variety of contents.
absolute category rating (ACR)
double-stimulus impairement scale (DSIS)
43. Point cloud Quality assessment #2
A statistical method for geometry
inspection from point clouds
Francisco de Asís López, Celestino Ordóñez, Javier Roca-Pardiñas , Silverio García-Cortés
Applied Mathematics and Computation Volume 242, 1 September 2014, Pages 562-568
https://doi.org/10.1016/j.amc.2014.05.130
Assessing planar asymmetries in shipbuilding
from point clouds
Javier Roca-Pardiñas , Celestino Ordõnez , Carlos Cabo , Agusín Menéndez-Díaz
Measurement Volume 100, March 2017, Pages 252-261
https://doi.org/10.1016/j.measurement.2016.12.048
In this paper, a statistic test to perform geometry inspection is described. The
methodology used allows, by means of bootstrapping techniques, to obtain a p-value
for the statistical hypothesis established.
An important aspect of the developed methodology, proved by means of a simulated
experiment, is its capacity to control type I errors while it is able to reject the null
hypothesis when it is false. This experiment showed that the performance of the
method improves when the point density increases.
The proposed method was applied to the inspection of a parabolic dish antenna, and
the results show that it does not fit its theoretical shape, unless a 1 mm tolerance is
admitted.
It is noteworthy that although the method has been exposed as a global test for
geometry inspection, it would also be possible to apply it to inspect different parts of
the object under study.
Yatch hull surface
estimated from the point
cloud.
44. Point cloud Quality assessment #3 Defect detection
Automated Change Diagnosis of Single-
Column-Pier Bridges Based on 3D
Imagery Data
Ying Shi; Wen Xiong, Ph.D., P.E., M.ASCE; Vamsisai Kalasapudi; Chao Geng
ASCE International Workshop on Computing in Civil Engineering 2017
http://doi.org/10.1061/9780784480830.012
The future work will
include understanding
the correlation
between the
deformation of the
girder and column
with the change in
the thickness of the
connected bearing.
Such correlated
change analysis will
aid in understanding
the cause of the
observed thickness
variation and
performing reliable
condition diagnosis of
all the single pier
bridges.
45. Point cloud Quality assessment #4 with uncertainty
Point cloud comparison under
uncertainty. Application to beam bridge
measurement with terrestrial laser
scanning
Francisco de Asís López, Celestino Ordóñez, Javier Roca-Pardiñas, Silverio García-Cortés
Measurement Volume 51, May 2014, Pages 259-264
https://doi.org/10.1016/j.measurement.2014.02.013
Assessment of along-normal uncertainties for
application to terrestrial laser scanning surveys of
engineering structures
Tarvo Mill, Artu Ellmann
Survey Review (2017) Vol. 0 , Iss. 0,0
http://dx.doi.org/10.1080/00396265.2017.1361565
Future studies should more closely investigate the
dependence of results of different TLS signal
processing methods and also applicability of
combined standard uncertainty (CSU, Bjerhammar 1973;
Niemeier and Tengen 2017), equations considering also
systematic error in TLS surveys.
The application of the proposed
methodology to compare two
point clouds of a beam bridge
measured with two different
scanner systems, showed
significant differences in parts of
the beam. This is important in
inspection works since different
conclusions could be reached
depending on the measuring
instrument.
46. PDE-based Point cloud processing
Partial Difference Operators on Weighted Graphs
for Image Processing on Surfaces and Point
Clouds
François Lozes ; Abderrahim Elmoataz ; Olivier Lézoray
IEEE Transactions on Image Processing ( Volume: 23, Issue: 9, Sept. 2014 )
https://doi.org/10.1109/TIP.2014.2336548
PDE-Based Graph Signal Processing for 3-D
Color Point Clouds : Opportunities for
cultural heritage
François Lozes ; Abderrahim Elmoataz ; Olivier Lézoray
IEEE Signal Processing Magazine ( Volume: 32, Issue: 4, July 2015 )
https://doi.org/10.1109/MSP.2015.2408631
The approach allows processing of signal data on point clouds (e.g.,
spectral data, colors, coordinates, and curvatures). We have applied
this approach for cultural heritage purposes on examples aimed at
restoration, denoising, hole-filling, inpainting, object extraction, and
object colorization.
47. Sparse coding and point clouds #1
Cloud Dictionary: Sparse Coding and Modeling for
Point Clouds
Or Litany, Tal Remez, Alex Bronstein
(Submitted on 15 Dec 2016 (v1), last revised 20 Mar 2017 (this version, v2))
https://arxiv.org/abs/1612.04956
Sparse Geometric Representation Through
Local Shape Probing
Julie Digne, Sébastien Valette, Raphaëlle Chaine
(Submitted on 7 Dec 2016)
https://arxiv.org/abs/1612.02261
With the development of range
sensors such as LIDAR and time-of-
flight cameras, 3D point cloud scans
have become ubiquitous in computer
vision applications, the most
prominent ones being gesture
recognition and autonomous driving.
Parsimony-based algorithms have
shown great success on images and
videos where data points are sampled
on a regular Cartesian grid. We
propose an adaptation of these
techniques to irregularly sampled
signals by using continuous
dictionaries. We present an example
application in the form of point cloud
denoising
48. Building Information models (BIM) and point clouds
An IFC schema extension and binary serialization
format to efficiently integrate point cloud data into
building models
Thomas Krijnen, Jakob Beetz
Advanced Engineering Informatics Available online 3 April 2017
https://doi.org/10.1016/j.aei.2017.03.008
Building elements, which can be represented by various forms of geometry, including 2D and 3D line drawings, Constructive
Solid Geometry (CSG), Boundary Representations (BRep) and tessellated meshes. However, these three-dimensional
representations are just one of the many aspects conveyed in an IFC model. In addition, attributes related to thermal or
acoustic performance, costing or intended use of spaces etc. can be added.
In many common data formats for the storage of point cloud data, such as E57 and PCD, metadata is attached to individual
data sets. This metadata for example includes scanner positions or weather conditions that are perceived during the scan.
From the acquisition process, the point data itself contains no grouping, decomposition or other information that relates the
points to the semantic meaning of the real-world object that was scanned. In subsequent processing steps such labels are
often added to the points. Several exchange formats, such as LAS, have options to store labels along with the points.
The magnitude of the data which is typically found in point cloud data sets and IFC model populations can be
dramatically different for the two file types. A meaningful IFC file can have file sizes in the order of a few megabytes, if
geometrical representations and property values are properly reused and especially when the file contains implicit,
parametric, rather than tessellated geometry. Depending on the amount of detail and precision, point cloud scans can
easily amount to gigabytes of data. Despite the larger size, due the uniform structure and explicit nature, point clouds can
typically be more immediately explored than IFC building models, for which the boolean operations and implicit geometries
need to be evaluated prior to visualization.
The need for a unified and harmonized storage model of the two data types is observed in literature [e.g. Li et al 2008;
Golparvar-Fard et al. 2011]. Yet, the authors acknowledge that other use cases will exist in which a deep coupling between
building models and point clouds is unnecessary or even undesirable. This paper presents an extension to the IFC schema
by which an open andsemantically rich standard arises.
Future: One of the core advantages of the HDF5 format is the usage of transparent
block-level compression. HDF5 allows several compression schemes, including
user-defined compression methods. These would allow much higher compression
ratios by exploiting structural knowledge of the point cloud or by introducing
additional lossiness in the compression methods. In the prototypical implementation
only gzip compression is used. Especially the point clouds segments stored as
height maps projected on parametric surfaces might be suitable for specific-
purpose compression methods, such as jpeg or png, which can exploit and filter
imperceivable differences.
Lastly, future research will indicate how the associated point cloud structure
presented in this paper can be paired with other spatial indexing structures to
further advance the localized extraction of point cloud segments and spatial
querying techniques. Further experiments will be conducted to harness and reuse
the general purpose decomposition and aggregation relationships of the IFC to
implement octrees and kd-trees to further enhance the structure and accessibility
of the data.
49. Dynamic Surface Mesh Detail enhancement #1
Multi-scale geometric detail
enhancement for time-varying surfaces
Graphical Models Volume 76, Issue 5, September 2014, Pages 413-425
https://doi.org/10.1016/j.gmod.2014.03.010
We first develop an adaptive spatio-temporal bilateral filter, which produces temporally-
coherent and feature-preserving multi-scale representation for the time-varying surfaces.
We then extract the geometric details from the time-varying surfaces, and enhance
geometric details by exaggerating detail information at each scale across the time-varying
surfaces.
Velocity vectors estimation. The top row gives 4 frames in the time-varying
surfaces, and the bottom row gives the corresponding velocity vectors for each
frame
Multi-scale
representation and
detail enhancement
for time-varying
surfaces. First row:
Input time-varying
surfaces, second row:
multi-scale filtering
results by filtering
each frame
individually, third
rows: multi-scale
filtering results using
adaptive spatial–
temporal filter, fourth
and fifth rows: multi-
scale detail
enhancement results
using 6 levels and 9
detail levels,
respectively.
Limitations: In our current detail transfer results, we only transfer the
detail of a static model to time-varying surfaces. Our current algorithm
cannot transfer the geometry detail of time-varying surfaces to target
time-varying surfaces, which is challenging since it is difficult to build
the corresponding mapping between the source and target time-
varying surfaces with different surface frames.
Another problem is that although our filtering and enhancement
methods can alleviate the jittering artifacts, for input time-varying
surfaces with heavy jittering, the jittering artifacts still cannot be
removed completely. Processing surface sequences with heavy jittering
is a very hard problem, which requires further sophisticated
investigation.
50. Surface reconstruction Data Priors #1
Surface reconstruction with data-driven exemplar
priors
Oussama Remil, Qian Xie, Xingyu Xie, Kai Xu, Jun Wang
Computer-Aided Design Volume 88, July 2017, Pages 31-41
https://doi.org/10.1016/j.cad.2017.04.004
Given a noisy and sparse point cloud of structural complex mechanical part as input, our system produces the
consolidated points by aligning exemplar priors learned from a mechanical shape database. With the additional
information such as normals carried by our exemplar priors, our method achieves better feature preservation than
direct reconstruction on the input point cloud (e.g., Poisson).
An overview of our algorithm. We extract priors from a 3D shape database within the same
category (e.g., mechanical parts) to construct a prior library. The affinity propagation clustering
method is then performed on the prior library to obtain the set of representative priors, called the
exemplar priors. Given an input point cloud, we construct its local neighborhoods and perform
priors matching to find the similar exemplar prior to each local neighborhood. Subsequently, we
utilize the matched exemplar priors to consolidate the input point scan through an augmentation
procedure, with which we can generate the faithful surface where sharp features and fine details
are well recovered.
Limitations Our method is expected to
behave well with different shape
categories, meanwhile there are a few
limitations that have to be discussed so
far. Our algorithm fails when dealing
with more challenging repositories with
small number of redundant elements,
such as complex organic shapes. In
addition, if there are large holes within
the input scans or big missing parts,
our method may fail to complete them
based on the “matching-to-alignment”
strategy.
51. Surface reconstruction Data Priors #2A
3D Reconstruction Supported by Gaussian
Process Latent Variable Model Shape Priors
Jens Krenzin, Olaf Hellwich
PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science
May 2017, Volume 85, Issue 2, pp 97–112
https://doi.org/10.1007/s41064-017-0009-0
A 2D shape representing a filled circle, where black represents the outside of the object and white represents the inside of the object. b
Corresponding signed distance function (SDF) for the shape shown in a. The 0-level is highlighted in red. c Discrete Cosine Transform (DCT)
coefficients for the SDF shown in b. The first 15 DCT coefficients in each dimension store the important information about the shape. The
remaining coefficients are nearly zero
52. Surface reconstruction Data Priors #2B
3D Reconstruction Supported by Gaussian
Process Latent Variable Model Shape Priors
Jens Krenzin, Olaf Hellwich
PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science
May 2017, Volume 85, Issue 2, pp 97–112
https://doi.org/10.1007/s41064-017-0009-0
Results for object A—cup. a Sample image.
b Erroneous point cloud. c Ground truth. d
Shape prior. e Corrected point cloud
This article presents a method that removes outliers, reduces noise and fills holes in a
point cloud using a learned shape prior. The shape prior is learned from a set of training
objects using the GP-LVM.
It has been shown that an interpolated shape between several training shapes often has ringing
artefacts due to the DCT compression step. Several investigations were made on how these
artefacts could be reduced. In the first investigation, the difference between the training shapes
was reduced and the latent space became denser. As expected this reduced the Euclidean distance
from one training example to the nearest training example. The closer two points are in the latent
space, the more similar the corresponding shapes are. As a result of this the artefacts are reduced,
but only slightly.
In the second investigation, the DCT compression step was removed. The GP-LVM then learns a
lower dimensional subspace directly on the SDF. It has been shown that this leads also to a slight
reduction of the artefacts of the reconstructed shape, but the artefacts are still visible. In this work
the GP-LVM was investigated as a candidate fulfilling the requirements. It has been shown that the
number of shape parameters can be reduced, and that the model can be trained for specific object
classes. Some of the experiments, related to model sparsity and well-behavedness, have
discovered weaknesses of the presented method. Theseissues will be further investigated in future
work.
54. Depth MAP Inpainting #1
Kinect depth inpainting in real time
Lucian Petrescu ; Anca Morar ; Florica Moldoveanu ; Alin Moldoveanu
Telecommunications and Signal Processing (TSP), 2016
https://doi.org/10.1109/TSP.2016.7760974
Example of output from median filter: A) input depth map where black pixels are
not sampled; B) output image after applying the median filter; C) difference
between input and output: grayscale – sampled pixel, blue – inpainted; D)
confidence: blue-filtered, white-sampled, red –unfiltered.
55. Depth MAP Inpainting #2
A new method for inpainting of depth maps
from time-of-flight sensors based on a modified
closing by reconstruction algorithm
Journal of Visual Communication and Image Representation
Volume 47, August 2017, Pages 36-47
https://doi.org/10.1016/j.jvcir.2017.05.003
This procedure uses a modified
morphological closing by
reconstruction algorithm.
Finally, the proposed method works
properly in depth maps where there is a
sufficient good definition of regions or
at least the enough to be able to infer
the missing information, e.g., depth
maps obtained in indoor scenarios or
acquired with sensors or methods that
achieve these characteristics. Low-
quality depth maps and those acquired
in outdoor conditions may require
additional pre-processing stages or even
more robust methods because of the
size of the holes presented in such
images seems to be larger.
Filling Kinect depth holes via position-guided
matrix completion
Zhongyuan Wang, Xiaowei Song , ShiZheng Wang, Jing Xiao, Rui Zhong, Ruimin Hu
Neurocomputing Volume 215, 26 November 2016, Pages 48-52
https://doi.org/10.1016/j.neucom.2015.05.146
56. Depth MAP Inpainting #3
Learning-based super-resolution
with applications to intensity and
depth images
Haoheng Zheng, University of Wollongong, Doctor of Philosophy thesis, School
of Electrical, Computer and Telecommunications Engineering, University of
Wollongong, 2014. http://ro.uow.edu.au/theses/4284
Geometric Inpainting of 3D Structures
Pratyush Sahay, A. N. Rajagopalan
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2015, pp. 1-7
https://doi.org/10.1109/CVPRW.2015.7301388
the proposed framework,
albeit with occasional minor
local artifacts.
“Low-rank Theory”
[184] Candes et al. (2011) “Robust principal
component analysis?” Journal of the ACM
(JACM) Volume 58 Issue 3, May 2011
https://doi.org/10.1145/1970392.1970395
57. Depth MAP super-resolution #1
Depth map super resolution
Murat Gevrekci ; Kubilay Pakin
Image Processing (ICIP), 2011 18th IEEE
https://doi.org/10.1109/ICIP.2011.6116454
Depth map acquisation with ToF
camera with different
integration times. Image on the
left is captured with 10ms
integration time. Note how the
background depth information is
noisy. Image on right is captured
with 50ms integration.
Background depth is captured
reliably at the expense of
saturating the near field with high
integration time.
We propose changing our constraint sets
not only on a single range image but on
differently exposed range images to
increase depth resolution within whole work
space. This concept has resemblance to
High Dynamic Range (HDR)
[Gevrekci and Gunturk 2007] image formation
using differently exposed images. Proposed
algorithm will merge useful depth
information from different levels and
eliminate contaminated data (i.e. saturation,
noise).
Modeling the imaging pipeline is a critical
step in image enhancement as
demonstrated by the author
[Gevrekci and Gunturk 2005]. We propose
modeling the depth map as a function of
internal camera parameters, object and
camera motion, and photometric changes
due to camera response function and
alternating integration time.
Spatially Adaptive Tensor Total Variation-Tikhonov
Model for Depth Image Super Resolution
Gang Zhong ; Sen Xiang ; Peng Zhou ; Li Yu
IEEE Access ( Volume: 5, 2017 )
https://doi.org/10.1109/ACCESS.2017.2715981
Visual comparison of 4× super resolution results on our synthetic scene Chess: (a) the groundtruth depth
image. Super resolution results of (b) using Tikhonov regularization.(c) using total variation regularization. (d)
using color guided tensor total variation regularization. (e) using fused edge map guided tensor total variation
regularization. (f) using the spatially adaptive tensor total variation - Tikhonov regularization.
58. Depth MAP super-resolution #2
Image-guided ToF depth upsampling: a survey
Iván Eichhardt, Dmitry Chetverikov, Zsolt Jankó
Machine Vision and Applications May 2017, Volume 28, Issue 3–4, pp 267–282
https://doi.org/10.1007/s00138-017-0831-9
Effect of imprecise calibration on depth upsampling. The discrepancy between the
input depth and colour images is 2, 5 and 10 pixels, respectively
Effect of optical radial distortion on depth upsampling
59. Depth Map super-resolution #3
Super-resolution Reconstruction for
Binocular 3D Data
Wei-Tsung Hsiao ; Jing-Jang Leou ; Han-Hui Hsiao
Pattern Recognition (ICPR), 2014
https://doi.org/10.1109/ICPR.2014.721
Depth Superresolution using Motion
Adaptive Regularization
Ulugbek S. Kamilov, Petros T. Boufounos (Submitted on 4 Mar 2016)
https://arxiv.org/abs/1603.01633
Our motion adaptive method recovers a high-
resolution depth sequence from high-resolution
intensity and low-resolution depth sequences by
imposing rank constraints on the depth
patches: (a) and (b) t-y slices of the color and
depth sequences, respectively, at a fixed x; (c)–
(e) x-y slices at t1 = 10; (f)–(h) x-y slices at t2 =
40; (c) and (f) input color images; (d) and (g)
input low-resolution and noisy depth images; (e)
and (h) estimated depth images.
Illustration of the block matching within a space-time search
area. The area in the current frame t is centered at the reference
patch. Search is also conducted in the same window position in
multiple temporally adjacent frames. Similar patches are grouped
together to construct a block p = Bp .β φ
Visual evaluation on Road video sequence.
Estimation of depth from its 3× downsized
version at 30 dB input SNR. Row 1 shows the
data at time instance t = 9. Row 2 shows the
data at the time instance t = 47. Row 3 shows
the t-y profile of the data at x = 64. Highlights
indicate some of the areas where depth
estimated by GDS-3D recovers details missing
in the depth estimate of DS-3D that does not
use intensity information.
60. Depth Map super-resolution #4
Depth Map Restoration From
Undersampled Data
Srimanta Mandal ; Arnav Bhavsar ; Anil Kumar Sao
IEEE Transactions on Image Processing ( Volume: 26, Issue: 1, Jan. 2017 )
https://doi.org/10.1109/TIP.2016.2621410
The objective of the paper: (a) Uniform up-sampling of an LR depth map i.e., filling up missing information in an
HR grid generated from a uniformly sampled LR depth map – can be addressed by SR (b) Non-uniform up-
sampling a sparse point cloud i.e., filling up the missing information in a randomly filled HR grid – can be
addressed by PCC, (c) An extreme case of non-uniform up-sampling, where very less data is available. We
suggest an approach wherein this is interpreted as non-uniform up-sampling followed by uniform up-sampling –
can be addressed by PCC-SR.
We have addressed the
problem of depth
restoration by up-
sampling either the
uniformly sampled LR
depth map or sparse
non-uniformly
sampled point cloud in
a unified sparse
representation
framework.
61. Depth MAP Joint Superresolution-Inpainting #1
Range map superresolution-inpainting, and
reconstruction from sparse data
Computer Vision and Image Understanding
Volume 116, Issue 4, April 2012, Pages 572-591
https://doi.org/10.1016/j.cviu.2011.12.005
Depth map inpainting and super-resolution based
on internal statistics of geometry and appearance
Satoshi Ikehata ; Ji-Ho Cho ; Kiyoharu Aizawa
Image Processing (ICIP), 2013 20th IEEE
https://doi.org/10.1109/ICIP.2013.6738194
In this paper, we have proposed depth-
map inpainting and super-resolution
algorithms which explicitly capture the
internal statistics of a depth-map and its
registered texture image and have
demonstrated their state-of-the-art
performance. The current limitation is that
we have assumed the accurate registration
of the texture image and have not
assumed the presence of sensor noise. In
future work, we will evaluate our method’s
robustness to these problems to assess its
handling of more practical situations.
Range image expansion and inpainting. (a and d) LR images
with missing data for the apple and birdhouse datasets. (b
and e) Interpolated images with missing data. (c and f) Range
expansion with inpainting using the proposed method. 3D
reconstructions with light-rendering and gray-scale
representation, respectively, for (g and h) apple and (i and j)
birdhouse.
Range expansion with inpainting across
different objects. (a) Interpolated range
observation. (b) Corresponding HR and
inpainted range output using the
proposed method. (c–e) Unlinked, Linked
and residual edge maps, respectively,
which are used to restrict the smoothness
across edges.
Effect of noise on edge-linking. (a)
Noisy observation. (e) Corresponding
HR and inpainted output (b–d)
Unlinked, linked and residual edges
when no noise is added in the
observation. (f–h) Unlinked, linked and
residual edges for the observation in
(a).
62. Depth MAP Joint Superresolution-Inpainting #2
Superpixel-based depth map enhancement
and hole filling for view interpolation
Proceedings Volume 10420, Ninth International Conference on Digital
Image Processing (ICDIP 2017); 104202O (2017)
http://dx.doi.org/10.1117/12.2281544
Depth enhancement with improved exemplar-
based inpainting and joint trilateral guided filtering
Liang Zhang ; Peiyi Shen ; Shu'e Zhang ; Juan Song ; Guangming Zhu
Image Processing (ICIP), 2016 IEEE
https://doi.org/10.1109/ICIP.2016.7533131
Superpixel-based initial depth
map refinement: (a) superpixel
segmentation of the color image,
(b) initial depth map
segmentation using the same
superpixel label as (a), (c) initial
depth map before refinement, (d)
enhanced depth map of (c).
Superpixel-based warped depth map
hole filling: (a) and (b) are superpixels with
hole regions, (c) and (d) are hole filling
results of (a) and (b), respectively.
In this paper, we propose an efficient superpixel-based depth
information processing method for view interpolation. First of
all, the color image is segmented into superpixels using SLIC
algorithm, and the associated initial depth map is segmented
with the same label. After that, the depth-missing pixels are
recovered by considering the color and depth superpixels
jointly. Furthermore, the holes caused by disocclusion in the
warped depth map can also be filled in superpixel domain.
Experimental results demonstrate that with the incorporation
of the proposed initial depth map enhancement and warped
depth map hole filling method, better view interpolation
performances have been achieved.
64. Image restoration Loss functions & Quality metrics #1A
Loss Functions for Image Restoration With
Neural Networks
Hang Zhao ; Orazio Gallo ; Iuri Frosio ; Jan Kautz NVIDIA, MIT Media Lab
IEEE Transactions on Computational Imaging ( Volume: 3, Issue: 1, March 2017 )
https://doi.org/10.1109/TCI.2016.2644865
The loss layer, despite being the effective driver of the
network’s learning, has attracted little attention within the
image processing research community: the choice of the
cost function generally defaults to the squared l2
norm of
the error [Jain et al. 2009; Burger et al. 2012;
Dong et al. 2014; Wang 2014]. This is understandable, given
the many desirable properties this norm possesses. There
is also a less well-founded, but just as relevant reason for
the continued popularity of l2
: standard neural networks
packages, such as Caffe, only offer the implementation for
this metric.
However, l2
suffers from well-known limitations. For
instance, when the task at hand involves image quality,
correlates poorly with image quality as perceived by a
human observer [Zhang et al. 2012]. This is because of a
number of assumptions implicitly made when using l2
.
First and foremost, the use of l2
assumes that the impact
of noise is independent of the local characteristics of the
image. On the contrary, the sensitivity of the Human
Visual System (HVS) to noise depends on local
luminance, contrast, and structure [Wang et al. 2004]. The
l2
loss also works under the assumption of white
Gaussian noise, which is not valid in general [e.g.
Wang, and Bovik 2009].
We focus on the use of neural networks for image restoration tasks, and we study the effect of different metrics for the network’s loss
layer. We compare l2
against four error metrics on representative tasks: image super-resolution, JPEG artifacts removal, and joint
denoising plus demosaicking. First, we test whether a different local metric such as l1
can produce better results. We then evaluate
the impact of perceptually-motivated metrics. We use two state-of-the-art metrics for image quality: the structural similarity index
(SSIM [Wang et al. 2004]) and the multiscale structural similarity index (MS-SSIM [Wang et al. 2003]). We choose these among the
plethora of existing indexes, because they are established measures, and because they are differentiable—a requirement for the
backpropagation stage. As expected, on the use cases we consider, the perceptual metrics outperform l2
. However, and perhaps
surprisingly, this is also true for l1
, see Figure 1. Inspired by this observation, we propose a novel loss function and show its superior
performance in terms of all the metrics we consider.
65. Image restoration Loss functions & Quality metrics #1b
However, it is widely accepted that l2
, and
consequently the Peak Signal-to-Noise Ratio,
PSNR, do not correlate well with human’s
perception of image quality l2
simply does not
capture the intricate characteristics of the human
visual system (HVS).
There exists a rich literature of error measures,
both reference-based and non reference-based,
that attempt to address the limitations of the
simple l2
error function. For our purposes, we focus
on reference-based measures. A popular
reference-based index is the structural similarity
index (SSIM). SSIM evaluates images accounting
for the fact that the HVS is sensitive to changes in
local structure. Wang et al. 2003 extend SSIM
observing that the scale at which local structure
should be analyzed is a function of factors such as
image-to-observer distance. To account for these
factors, they propose MS-SSIM, a multi-scale
version of SSIM that weighs SSIM computed at
different scales according to the sensitivity of the
HVS. Experimental results have shown the
superiority of SSIM-based indexes over l2
. As a
consequence, SSIM has been widely employed as
a metric to evaluate image processing algorithms.
Moreover, given that it can be used as a
differentiable cost function, SSIM has also been
used in iterative algorithms designed for image
compression [Wang, and Bovik 2009], image
reconstruction [Brunet et al. 2010], denoising and
super-resolution [Rehman et al. 2012], and even
downscaling [Öztireli and Gross 2015]. To the best
of our knowledge, however, SSIM-based indexes
have never been adopted to train neural
networks.
Recently, novel image quality indexes
based on the properties of the HVS
showed improved performance when
compared to SSIM and MS-SSIM. One
of these is the Information Weigthed
SSIM (IW-SSIM), a modification of MS-
SSIM that also includes a weighting
scheme proportional to the local image
information [Wang and Li 2011]. Another
is the Visual Information Fidelity
(VIF), which is based on the amount of
shared information between the
reference and distorted image [
Sheikh and Bovik 2006]. The Gradient
Magnitude Similarity Deviation
(GMSD) is characterized by simplified
math and performance similar to that of
SSIM, but it requires computing the
standard deviation over the whole
image [Xue et al. 2014]. Finally, the
Feature Similarity Index (FSIM),
leverages the perceptual importance of
phase congruency, and measures the
dissimilarity between two images based
on local phase congruency and
gradient magnitude [Zhang et al. 2011].
FSIM has also been extended to FSIMc,
which can be used with color images.
Despite the fact that they offer an
improved accuracy in terms of image
quality, the mathematical formulation of
these indexes is generally more
complex than SSIM and MS-SSIM, and
possibly not differentiable, making their
adoption for optimization procedures
not immediate.
66. Point cloud transformations
Numerical geometry of non-rigid shapes
Michael Bronstein
http://slideplayer.com/slide/4925779/
Left Intrinsic vs. Extrinsic properties of shapes. Top left: Original shape. Top Right: Reconstructed shape from geometry image with cut edges displayed in red. The middle and bottom
rows show the geometry image encoding the y coordinates and HKS, respectively of two spherical parameterizations (left and right). The two spherical parameterizations are
symmetrically rotated by 180 degrees along the Y-axis. The geometry images for Y-coordinate display an axial as well as intensity flip. Whereas, the geometry images for HKS only
display an axial flip. This is because HKS is an intrinsic shape signature (geodesics are persevered) whereas point coordinates on a shape surface are not. Center Intrinsic descriptors
(here the HKS) are invariant to shape articulations. Right Padding structure of geometry images: The geometry images for the 3 coordinates are replicated to produce a 3× 3 grid. The
center image in each grid corresponds to the original geometry image. Observe no discontinuities exist along the grid edges. Sinha et al. (2016)
Left Geometry images created by fixing the polar axis of a hand (top) and aeroplane (bottom), and rotating the
spherical parametrization by equal intervals along the axis. The cut is highlighted in red. Center Four rotated
geometry images for a different cut location highlighted in red. The plots to the right show padded geometry
images wherein the similarity across rotated geometry images are more evident and the five finger features
coherently visible Right Changing the viewing direction for a cut inverts the geometry image. The similarity in
geometry images for the two diametrically opposite cuts emerges when we pad the image in a 3×3 grid
Sinha et al.(2016)
Authalic vs Conformal parametrization: (Left to right) 2500
vertices of the hand mesh are color coded in the first two plots.
A 64× 64 geometry image is created by uniformly sampling a
parametrization, and then interpolating the nearby feature
values. Authalic geometry image encodes all tip features.
Conformal parametrization compress high curvature points to
dense regions [Gu et al. 2003]. Hence, finger tips are all
mapped to a very small regions. The fourth plot shows that the
resolution of geometry image is insufficient to capture the tip
feature colors in conformal parametrization. This is validated by
reconstructing shape from geometry images encoding x, y, z
locations for both parameterizations in final two plots.
67. 2D super-resolution techniques for Geometry images
MemNet: A Persistent Memory
Network for Image Restoration
Ying Tai, Jian Yang, Xiaoming Liu, Chunyan Xu
(Submitted on 7 Aug 2017)
https://arxiv.org/abs/1708.02209
https://github.com/tyshiwo/MemNet
The same MemNet structure achieves the state-of-the-art performance in image denoising, super-resolution and
JPEG deblocking. Due to the strong learning ability, our MemNet can be trained to handle different levels of
corruption even using a single model.
CVAE-GAN: Fine-Grained Image
Generation through Asymmetric
Training
Jianmin Bao, Dong Chen, Fang Wen, Houqiang Li, Gang Hua
(Submitted on 29 Mar 2017)
https://arxiv.org/abs/1703.10155
https://github.com/tatsy/keras-generative
The proposed method can support a
wide variety of applications, including
image generation, attribute morphing,
image inpainting, and data
augmentation for training better face
recognition models
68. Surfaces segmentation and correspondence
Convolutional Neural Networks on
Surfaces via Seamless Toric Covers
Haggai Maron, Meirav Galun, Noam Aigerman, Miri Trope, Nadav Dym Ersin Yumer, Vladimir G.
Kim, Yaron Lipman | Weizmann Institute of Science, Adobe Research
ACM Transactions on Graphics (TOG)
Volume 36 Issue 4, July 2017 Article No. 71
http://dx.doi.org/10.1145/3072959.3073616
Parameterization produced by the geometry image method
of [Sinha et al. 2016]; the parameterization is not seamless
as the isolines break at the dashed image boundary (right);
although the parameterization preserves area it produces
large variability in shape.
Computing the flat-torus structure (middle) on a 4-cover of a
spheretype surface (le!) defined by prescribing three points
(colored disks). The right inset shows the flat-torus resulted from a
di#erent triplet choice
Visualization of “easy”
functions on the surface (top-
row) and their pushed version
on the flat-torus (bottom-row).
We show three examples of
functions we use as input to the
network: (a) average geodesic
distance (left), (b) the x
component of the surface
normal (middle), and (c) Wave
Kernel Signature [
Aubry et al. 2011]. The blowup
shows the face area, illustrating
that the input functions
capture relevant information in
the shape.
Experiments show that our method is able to learn
and generalize semantic functions better than state
of the art geometric learning approaches in
segmentation tasks. Furthermore, it can use only
basic local data (Euclidean coordinates, curvature,
normals) to achieve high success rate, demonstrating
ability to learn high-level features from a low-level
signal. This is the key advantage of defining a local
translation invariant convolution operator. Finally, it is
easy to implement and is fully compatible with current
standard CNN implementations for images.
A limitation of our technique is that it assumes the input
shape is a mesh with a sphere-like topology. An interesting
direction for future work is extending our method to
meshes with arbitrary topologies. This problem is
especially interesting since in certain cases shapes from
the same semantic class may have different genus.
Another limitation is that currently aggregation is done as
a separate post-process step and not as a part of the
CNN optimization. An interesting future work in this regard
is to incorporate the aggregation in the learning stage and
produce end-to-end learning framework.
70. Point clouds Classification and segmentation
PointNet++: Deep Hierarchical Feature
Learning on Point Sets in a Metric Space
Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas | Stanford University
(Submitted on 7 Jun 2017)
https://arxiv.org/abs/1706.02413
https://github.com/charlesq34/pointnet2 (TensorFlow)
Shapes in SHREC15 are 2D surfaces embedded in 3D space. Geodesic
distances along the surfaces naturally induce a metric space. We show through
experiments that adopting PointNet++ in this metric space is an effective way to
capture intrinsic structure of the underlying point set
We follow Rustamov et al. (2009) to obtain an
embedding metric that mimics geodesic
distance. Next we extract intrinsic point
features in this metric space including Wave
Kernel Signature (WKS) [Aubry et al. 2011],
Heat Kernel Signature (HKS) [Sun et al. 2009]
and multi-scale Gaussian curvature [
Meyer et al. 2003].
We use these features as input and then
sample and group points according to the
underlying metric space. In this way, our
network learns to capture multi-scale intrinsic
structure that is not influenced by the specific
pose of a shape. Alternative design choices
include using XYZ coordinates as points
feature or use Euclidean space R3
as the
underlying metric space. We show below
these are not optimal choices.
Aubry et al. 2011
Aubry et al. 2011
Aubry et al. 2011
Aubry et al. 2011
71. Point clouds Novel descriptors
Learning Compact Geometric Features
Marc Khoury, Qian-Yi Zhou, Vladlen Koltun
(Submitted on 15 Sep 2017)
https://arxiv.org/abs/1706.02413
We present an approach to learning features that represent the
local geometry around a point in an unstructured point cloud.
Such features play a central role in geometric registration, which
supports diverse applications in robotics and 3D vision.
The presented approach yields a family of features,
parameterized by dimension, that are both more compact and
more accurate than existing descriptors.
Background The development of geometric descriptors for rigid alignment of unstructured point clouds
dates back to the 90s. Classic descriptors include Spin Images [Johnson and Hebert 1999]
and 3D Shape Context [
Frome et al. 2004]
. More recent work introduced Point Feature Histograms (PFH) [Rusu et al. 2008]
, Fast Point Feature
Histograms (FPFH) [Rusu et al. 2009]
, Signature of Histogram Orientations (SHOT) [Salti et al. 2014]
, and Unique
Shape Contexts (USC) [Tombari et al. 2010]
.
A comprehensive evaluation of existing local geometric descriptors is reported by Guo et al. 2016
The learned descriptor is both more precise and
more compact than handcrafted features. Due to
its Euclidean structure, the learned descriptor
can be used as a drop-in replacement for
existing features in robotics, 3D vision, and
computer graphics applications. We expect
future work to further improve precision,
compactness, and robustness, possibly using
new approaches to optimizing feature
embeddings [Ustinova and Lempitsky 2016,
https://github.com/madkn/HistogramLoss,
https://youtu.be/_N1qYrv321E].
72. Dense Grid Point clouds generative model
Learning Efficient Point Cloud Generation
for Dense 3D Object Reconstruction
Chen-Hsuan Lin, Chen Kong, Simon Lucey
(Submitted on 21 Jun 2017)
https://arxiv.org/abs/1706.07036
We use 2D convolutional operations to predict the 3D structure from multiple
viewpoints and jointly apply geometric reasoning with 2D projection optimization. We
introduce the pseudo-renderer, a differentiable module to approximate the true
rendering operation, to synthesize novel depth maps for optimization. Experimental
results for single-image 3D object reconstruction tasks show that we outperforms state-
of-the-art methods in terms of shape similarity and prediction density.
Network architecture. From an encoded latent representation, we propose to use a
structure generator, which is based on 2D convolutional operations, to predict the 3D
structure at N viewpoints. The point clouds are fused by transforming the 3D structure at
each viewpoint to the canonical coordinates. The pseudo-renderer synthesizes depth
images from novel viewpoints, which are further used for joint 2D projection optimization.
This contains no learnable parameters and reasons based purely on 3D geometry
Concept of pseudo-rendering. Multiple transformed 3D points may correspond to projection on the same pixels in the image
space. (a) Collision could easily occur if were directly discretized. (b) Upsampling the target image increases the precision of the
projection locations and thus alleviates the collision effect. A max-pooling operation on the inverse depth values follows as to obtain
the original resolution while maintaining the effective depth value at each pixel. (c) Examples of pseudo-rendered depth images
with various upsampling factors U (only valid depth values without collision are shown). Pseudo-rendering achieves closer
performance to true rendering with a higher value of U.
73. Point clouds GAN #1A
Representation Learning and Adversarial Generation of 3D Point Clouds
Panos Achlioptas, Olga Diamanti, Ioannis Mitliagkas, Leonidas Guibas Same last author as for PointNet++
(Submitted on 8 Jul 2017)
https://arxiv.org/abs/1707.02392
Editing parts in point clouds using vector arithmetic on the autoencoder (AE) latent space. Left to right: tuning the appearance of cars towards the shape of convertibles, adding armrests to chairs, removing handle from mug.
We build an end-to-end pipeline for 3D point
clouds that uses an AE to create a latent
representation, and a GAN to generate new
samples in that latent space. Our AE is
designed with a structural loss tailored to
unordered point clouds. Our learned latent
space, while compact, has excellent class-
discriminative ability: per our classification
results, it outperforms recent GAN-based
representations by 4.3%. In addition, the latent
space allows for vector arithmetic, which we
apply in a number of shape editing scenarios,
such as interpolation and structural
manipulation
We argue that jointly learning the
representation and training the
GAN is unnecessary for our
modality. We propose a
workflow that first learns a
representation by training an AE
with a compact bottleneck layer,
then trains a plain GAN in that
fixed latent representation. One
benefit of this approach is that
AEs are a mature technology:
training them is much easier and
they are compatible with more
architecturesthan GANs.
We point to theory [ArjovskyandBottou.2017]
that
supports this idea, and verify it empirically:
we show that GANs trained in our learned
AE-based latent space generate visibly
improved results, even with a generator and
discriminator as shallow as a single hidden
layer. Within a handful of epochs, we
generate geometries that are recognized in
their right object class at a rate close to that
of ground truth data. Importantly, we report
significantly better diversity measures (10x
divergence reduction) over the state of the
art, establishing that we cover more of the
original data distribution. In summary, we
contribute
● An effective cross-category AE-
based latent representation on
point clouds.
● The first (monolithic) GAN
architecture operating on 3D point
clouds.
● A surprisingly simpler, state-of-the-
art GAN working in the AE’s latent
space.
74. Point clouds GAN #1B
Raw point cloud GAN (r-
GAN). The first version of
our generative model
operates directly on the
raw 2048 × 3 point set
input
512-dimensional
noise vector
Finally, training a GAN in the latent space is much
faster and much more stable. The inset provides
some intuition with a toy example, where the data
live in a 1D circular manifold. The density in red
is the result of training a GAN’s generator in the
original, 2D, data space. The most commonly
used GAN objectives are equivalent to
minimizing the Jensen-Shannon divergence
(JSD) between the generator and data
distributions. Unfortunately, the JSD is part of a
family of divergences that become unbounded
when there is support mismatch, which is the
case in the example: the GAN places a lot of
mass outside the data manifold. On the other
hand, when training a small GAN in the fixed
latent space of a trained AE (blue), the overlap
of the two distributions increases significantly.
According to recent theoretical advances
[Arjovskyand Bottou. 2017] this should improve stability.
Latent-space GAN (l-GAN). In our latentspace GAN (here, l-GAN ), instead of operating on the raw point cloud input, we pass the data through our pre-trained autoencoder,
trained separately for each object class with the earth mover's distance (EMD) loss function. Both the generator and the discriminator of the GAN then operate on the 512-
dimensional bottleneck variable of the AE. Finally, once the GAN training is over, the output of the generator is decoded to a point cloud via the AE decoder. The architecture for
the l-GAN is significantly simpler than the one of the r-GAN. We found that very shallow designs for both the generator and discriminator (in our case, 1 hidden layer for the
generator and 2 for the discriminator) are sufficient to produce realistic results.
An interesting avenue for future work involves further
exploring the idea of ingesting point clouds by sorting them
lexicographically before applying a 1D convolution. A
possibly interesting extension would be to study different
1D orderings that capture locality differently, e.g. Hilbert
curves (also known as a Hilbert space-filling curve). We can
also aim for convolution operators of higher order (2D and
3D)