21. 参考文献
21
[Belhumeur1997] Belhumeur, P. N., Hespanha, J. P., & Kriegman, D. J. (1997).
Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection.
IEEE Transaction on Pattern Analysis and Machine Intelligence, 19(7), 711–720.
[Cao2012]Cao, X., Wei, Y., Wen, F., & Sun, J. (2012). Face Alignment by Explicit
Shape Regression. In IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
[Taigman2014] Taigman, Y., Ranzato, M. A., & Wolf, L. (2014). DeepFace: Closing
the Gap to Human-Level Performance in Face Verification. In IEEE Conference
on Computer Vision and Pattern Recognition (CVPR).
[Toshev2014] Toshev, A., & Szegedy, C. (2014). DeepPose: Human pose
estimation via deep neural networks. In IEEE Conference on Computer Vision and
Pattern Recognition (CVPR).
[Turk1991] Turk, M., & Pentland, A. (1991). Eigenfaces for Recognition. Journal of
Cognitive Neuroscienceo, 3(1), 71–86.
[Wiskott1997] Wiskott, L., Fellous, J.-M., Kruger, N., & Malsburg, C. von der.
(1997). Face recognition by elastic bunch graph matching. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 19(7), 775–779.
35. 参考文献
35
[Berg2014] Berg, T., Liu, J., Lee, S. W., Alexander, M. L., Jacobs, D.
W., & Belhumeur, P. N. (2014). Birdsnap: Large-scale Fine-grained
Visual Categorization of Birds. In IEEE conference on Computer
Vision and Pattern Recognition (CVPR).
[Cheng2014] Cheng, M.-M., Zhang, Z., Lin, W.-Y., & Torr, P. (2014).
BING : Binarized Normed Gradients for Objectness Estimation at
300fps. In IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
[Kumar2012] Kumar, N., Belhumeur, P. N., Biswas, A., Jacobs, D.
W., Kress, W. J., Lopez, I., & Soares, J. V. B. (2012). Leafsnap: A
Computer Vision System for Automatic Plant Species
Identification. In European Conference on Computer Vision.
[LeCun1998]LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998).
Gradient-based learning applied to document recognition. In
Proceedings of the IEEE (pp. 2278–2324).
36. 参考文献
36
[Wang2012] Wang, P., Wang, J., Zeng, G., Feng, J., Zha, H., &
Li, S. (2012). Salient object detection for searched web
images via global saliency. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
[木村2012]木村昭悟, 米谷竜, 平山高嗣. (2012). “[サーベイ
論文]人間の視覚的注意の計算モデル”, 電気情報通信学会
技術報告
40. 画像を集めて三次元モデルを復元する
40
代表的なプロジェクト(リンク先にデモ動画等あり)
Photo Tourism[Snavely2006]
http://phototour.cs.washington.edu/
Building Rome in a Day[Agarwal2009]
http://grail.cs.washington.edu/rome/
Building Rome on a cloudless day [Frahm2010]
https://www.youtube.com/watch?v=4cEQZreQ2zQ
48. 参考文献
48
[Agarwal2009] Agarwal, S., Snavely, N., Simon, I., Seitz, S. M., &
Szeliski, R. (2009). Building Rome in a day. In International
Conference on Computer Vision (pp. 72–79).
[Blanz1999] Blanz, V., & Vetter, T. (1999). A morphable model for
the synthesis of 3D faces. In Conference on Computer Graphics
and Interactive Techniques (SIGGRAPH) (pp. 187–194).
[Frahm2010] Frahm, J., Fite-georgel, P., Gallup, D., Johnson, T.,
Raguram, R., Wu, C., … Pollefeys, M. (2010). Building Rome on a
Cloudless Day. In European Conference on Computer Vision (pp.
368–381).
[Hoiem2005]Hoiem, D., & Efros, A. A. (2005). Automatic photo
pop-up. In Conference on Computer Graphics and Interactive
Techniques (SIGGRAPH).
[Narasimhan2008] Narasimhan, S. G., Koppal, S. J., & Yamazaki, S.
(2008). Temporal Dithering of Illumination. In European Conference
on Computer Vision (pp. 830–844).
49. 参考文献
49
[Pan2009] Pan, Q., Reitmayr, G., & Drummond, T. (2009).
ProFORMA: Probabilistic Feature-based On-line Rapid Model
Acquisition. Procedings of the British Machine Vision Conference
2009, (c), 112.1–112.11.
[Saxena2008]Saxena, A., Sun, M., & Ng, A. Y. (2008). Make3D:
Depth Perception from a Single Still Image. In AAAI national
conference on Artificial intelligence (pp. 1571–1576).
[Seitz1996]Seitz, S. M., & Dyer, C. R. (1996). View morphing.
Conference on Computer Graphics and Interactive Techniques
(SIGGRAPH).
[Snavely2006]Snavely, N., Seitz, S. M., & Szeliski, R. (2006). Photo
tourism: exploring photo collections in 3D. In Conference on
Computer Graphics and Interactive Techniques (SIGGRAPH).
[松下2011] 松下康之. (2011). 照度差ステレオ. 情報処理学会研究
報告. voi2011-CVIM-177. 29
61. 参考文献
61
[Choi2015] Choi, W. (2015). Near-Online Multi-Target Tracking
With Aggregated Local Flow Descriptor. Proceedings of the IEEE
International Conference on Computer Vision, 3029–3037.
[Grundmann2011] Grundmann, M., Kwatra, V., & Essa, I. (2011).
Auto-directed video stabilization with robust L1 optimal camera
paths. Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, (1), 225–232.
[Hamid2010] Hamid, R., Kumar, R., Hodgins, J., & Essa, I. (2010). A
Computational Framework for Sports Visualization using Multiple
Static Cameras. In IEEE Conference on Computer Vision and
Pattern Recognition (pp. 1–14).
[Hasegawa2015] Hasegawa, K. (2015). Stroboscopic Image
Synthesis of Sports Player from Hand-Held Camera Sequence. In
International Conference on Computer Vision Workshop.
[Kalal2010] Kalal, Z. (2010). P-N Learning : Bootstrapping Binary
Classifiers by Structural Constraints. Constraints.
62. 参考文献
62
[Lu2011] Lu, W., Ting, J., Little, J. J., & Murphy, K. P. (2011).
Learning to Track and Identify Players from Broadcast
Sports Videos Shot segmentation, (December), 1–14.
[Soomro2012] Soomro, K., Zamir, A. R., & Shah, M. (2012).
UCF101: A Dataset of 101 Human Actions Classes From
Videos in The Wild. arXiv Preprint arXiv:1212.0402,
(November).
[Wang2013] Wang, H., Kläser, A., Schmid, C., & Liu, C. L.
(2013). Dense trajectories and motion boundary descriptors
for action recognition. International Journal of Computer
Vision, 103(1), 60–79.
[Zhao2014] Zhao, B., & Xing, E. P. (2014). Quasi Real-Time
Summarization for Consumer Videos. In IEEE Conference on
Computer Vision and Pattern Recognition.
77. 参考文献
77
[Tomasi1998]Tomasi, C., & Manduchi, R. (1998). Bilateral filtering for gray and
color images. International Conference on Computer Vision (CVPR).
[Buades2005]Buades, A., Coll, B., & Morel, J.-M. (2005). A non-local algorithm for
image denoising. In IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
[Dabov2007]Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Image
denoising by sparse 3D transform-domain collaborative filtering. IEEE
Transactions on Image Processing, 16(8), 2080–2095.
[Freeman2002]Freeman, W. T., Jones, T. R., & Pasztor, E. C. (2002). Example-
based super-resolution. Computer Graphics and Applications, 22(2), 56–65.
[Farsiu2003] Farsiu, S., Robinson, D., Elad, M., & Milanfar, P. (2003). Fast and
robust super-resolution. In IEEE International Conference on Image Processing.
[Mitzel2009] Mitzel, D., Pock, T., Schoenemann, T., & Cremers, D. (2009). Video
Super Resolution using Duality Based TV-L Optical Flow. In DAGM symposium
on Pattern Recognition (pp. 432–441).
[Yang2008]Yang, J., Wright, J., Ma, Y., & Huang, T. (2008). Image super-resolution
as sparse representation of raw image patches. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
78. 参考文献
78
[Avidan2007]Avidan, S., & Shamir, A. (2007). Seam carving for
content-aware image resizing. In Conference on Computer
Graphics and Interactive Techniques (SIGGRAPH).
[Agarwala2004]Agarwala, A., Dontcheva, M., Agrawala, M., Drucker,
S., Colburn, A., Curless, B., … Cohen, M. (2004). Interactive digital
photomontage. In Conference on Computer Graphics and
Interactive Techniques (SIGGRAPH) (Vol. 23).
[Barnes2009]Barnes, C., Shechtman, E., Finkelstein, A., & Goldman,
D. B. (2009). PatchMatch: A randomized correspondence algorithm
for structural image editing. In Conference on Computer Graphics
and Interactive Techniques (SIGGRAPH).
[Bertalmio2000]Bertalmio, M., Guillermo, S., Caselles, V., &
Ballester, C. (2000). Image inpainting. In Conference on Computer
Graphics and Interactive Techniques (SIGGRAPH) (pp. 417–424).
79. 参考文献
79
[Brown2003]Brown, M., & Lowe, D. G. (2003). Recognising
Panoramas. In International Conference on Computer Vision
(CVPR).
[Chen2009]Chen, T., Cheng, M.-M., Tan, P., Shamir, A., & Hu,
S.-M. (2009). Sketch2Photo: internet image montage. In
Conference on Computer Graphics and Interactive
Techniques (SIGGRAPH).
[Criminisi2004]Criminisi, A., Pérez, P., & Toyama, K. (2004).
Region filling and object removal by exemplar-based image
inpainting. IEEE Transactions on Image Processing : A
Publication of the IEEE Signal Processing Society, 13(9),
1200–12.
[Hays2007]Hays, J., & Efros, A. A. (2007). Scene completion
using millions of photographs. Conference on Computer
Graphics and Interactive Techniques (SIGGRAPH).
80. 参考文献
80
[Pérez2003]Pérez, P., Gangnet, M., & Blake, A. (2003).
Poisson image editing. In Conference on Computer Graphics
and Interactive Techniques (SIGGRAPH).
[Rother2004]Rother, C., Kolmogorov, V., & Blake, A. (2004).
Grabcut: Interactive foreground extraction using iterated
graph cuts. In Conference on Computer Graphics and
Interactive Techniques (SIGGRAPH).
89. 特定物体認識の仕組み
89
代表的な手法
SIFT等の局所特徴量+近似最近傍探索 [Lowe1999]
大規模なデータベースに対してはBag-of-Featuresを用いる
[Sivic2003]
Histogram of Gradient
Orientations
DB
・・・
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
xx x
x
マッチング+投票
94. 物体検出の特徴量(人検出他)
94
Deformable Part Model [Felzenszwalb2009]
HOG特徴を複数組み合わせることで、検出精度向上
HOG特徴を抽出する位置の歪みも含めてLatent SVMという
機械学習アルゴリズムで学習する
Credit:[Felzenszwalb2009]
Root filter Parts filter Deformation
95. 参考文献
95
[Csurka2004]Csurka, G., Dance, C. R., Fan, L., Willamowski,
J., & Bray, C. (2004). Visual categorization with bags of
keypoints. In Workshop on statistical learning in computer
vision, ECCV (Vol. 1, p. 22).
[Dalal2005]Dalal, N., & Triggs, B. (2005). Histograms of
Oriented Gradients for Human Detection. IEEE Conference
on Computer Vision and Pattern Recognition (CVPR).
[Felzenswalb2009]Felzenszwalb, P. F., Girshick, R. B.,
McAllester, D., & Ramanan, D. (2009). Object detection with
discriminatively trained part-based models. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
32(9), 1627–1645.
[Lowe1999]Lowe, D. G. (1999). Object recognition from local
scale-invariant features. In IEEE International Conference on
Computer Vision (pp. 1150–1157 vol.2).
96. 参考文献
96
[Sivic2003]Sivic, J., & Zisserman, A. (2003). Video Google: a
text retrieval approach to object matching in videos. In IEEE
Internatinal Conference on Computer Vision (CVPR).
[Viola2001]Viola, P., & Jones, M. (2001). Rapid object
detection using a boosted cascade of simple features. IEEE
International Conference on Computer Vision and Pattern
Recognition (CVPR).
128. 参考文献
128
[Deng2009] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-
Fei, L. (2009). ImageNet: A large-scale hierarchical image database.
2009 IEEE Conference on Computer Vision and Pattern
Recognition, 2–9.
[Dong2014] Dong, C., Loy, C. C., & He, K. (2014). Image Super-
Resolution Using Deep Convolutional Networks. European
Conference on Computer Vision, 8828(c)
[Girshick2014] Girshick, R., Donahue, J., Darrell, T., & Malik, J.
(2014). Rich feature hierarchies for accurate object detection and
semantic segmentation. In IEEE Conference on Computer Vision
and Pattern Recognition.
[Girshick2015] Girshick, R. (2015). Fast R-CNN. International
Conference on Computer Vision, 1440–1448.
[He2015] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep
Residual Learning for Image Recognition. arXiv Preprint
arXiv:1512.03385, 7(3), 171–180.
129. 参考文献
129
[Iizuka2016] Iizuka, S., Simo-Serre, E., & Hiroshi, I. (2016). Let there be
Color !: Joint End-to-end Learning of Global and Local Image Priors for
Automatic Image Colorization with Simultaneous Classification. In ACM
Transactions on Graphics (SIGGRAPH),
[Krizhevsky2012]Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012).
ImageNet Classification with Deep Convolutional Neural Networks. In
Advances in Neural Information Processing Systems (NIPS) (pp. 1106–
1114).
[Long2014] Long, J., Shelhamer, E., & Darrell, T. (2014). Fully
Convolutional Networks for Semantic Segmentation. 2015 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 3431–
3440.
[Radford2015] Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised
Representation Learning with Deep Convolutional Generative Adversarial
Networks. arXiv, 1–15.
[Ren2015] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN:
Towards Real-Time Object Detection with Region Proposal Networks.
Advances in Neural Information Processing Systems (NIPS).
130. 参考文献
130
[Simonyan2014]Simonyan, K., & Zisserman, A. (2014). Very Deep
Convolutional Networks for Large-Scale Image Recognition, 1–13.
Computer Vision and Pattern Recognition.
[Simo-Serre2016] Simo-Serre, E., Iizuka, S., Kazuma, S., & Hiroshi, I.
(2016). Learning to Simplify : Fully Convolutional Networks for Rough
Sketch Cleanup. In ACM Transactions on Graphics (SIGGRAPH),
[Szegedy2014]Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,
Anguelov, D., … Rabinovich, A. (2014). Going Deeper with Convolutions.
arXiv Preprint arXiv:1409.4842, 1–12.
[Taigman2014] Taigman, Y., Ranzato, M. A., & Wolf, L. (2014). DeepFace:
Closing the Gap to Human-Level Performance in Face Verification. In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[Uijlings2013] Uijlings, J. R. R., Van De Sande, K. E. A., Gevers, T., &
Smeulders, A. W. M. (2013). Selective search for object recognition.
International Journal of Computer Vision, 104(2), 154–171.
[Vinyals2015] Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015).
Show and Tell: A Neural Image Caption Generator. In IEEE Conference
on Computer Vision and Pattern Recognition.
167. 参考文献
167
[Engel2014] Engel, J., Schops, T., & Cremers, D. (2014). LSD-
SLAM: Large-Scale Direct monocular SLAM. In European
Conference on Computer Vision (pp. 834–849).
[Klein2007] Klein, G., & Murray, D. (2007). Parallel tracking and
mapping for small AR workspaces. 2007 6th IEEE and ACM
International Symposium on Mixed and Augmented Reality, ISMAR.
[Newcombe2011a] Newcombe, R. A., Lovegrove, S. J., & Davison,
A. J. (2011). DTAM: Dense Tracking and Mapping in Real-Time. In
International Conference on Computer Vision (pp. 2320–2327).
[Newcombe2011b] Newcombe, R. a., Davison, A. J., Izadi, S., Kohli,
P., Hilliges, O., Shotton, J., … Fitzgibbon, A. (2011). KinectFusion:
Real-time dense surface mapping and tracking. 2011 10th IEEE
International Symposium on Mixed and Augmented Reality, 127–
136.
168. 参考文献
168
[Newcombe2015] Newcombe, R. a, Fox, D., & Seitz, S. M. (2015).
DynamicFusion: Reconstruction and Tracking of Non-rigid Scenes
in Real-Time. Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, 343–352.
[Shotton2011] Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T.,
Finocchio, M., Moore, R., … Blake, A. (2011). Real-time human
pose recognition in parts from single depth images. In IEEE
Conference on Computer Vision and Pattern Recognition.
178. 参考文献
178
[Banz2010] Banz, C., Hesselbarth, S., Flatt, H., Blume, H., & Pirsch,
P. (2010). Real-time stereo vision system using semi-global
matching disparity estimation: Architecture and FPGA-
implementation. Proceedings - 2010 International Conference on
Embedded Computer Systems: Architectures, Modeling and
Simulation, IC-SAMOS 2010, 93–101.
[Huval2015] Huval, B., Wang, T., Tandon, S., Kiske, J., Song, W.,
Pazhayampallil, J., … Ng, A. Y. (2015). An Empirical Evaluation of
Deep Learning on Highway Driving. arXiv, 1504.01716
[Kammel2008] Kammel, S., & Pitzer, B. (2008). Lidar-based lane
marker detection and mapping. IEEE Intelligent Vehicles
Symposium, 1137–1142.
[Scharwaechter2014] Scharwaechter, T., Enzweiler, M., Franke, U.,
& Roth, S. (2014). Stixmantics: A Medium-Level Model for Real-
Time Semantic Scene Understanding. European Conference on
Computer Vision, 8693, 533–548.
179. 参考文献
179
[Sermanet2011] Sermanet, P., & LeCun, Y. (2011). Traffic Sign
Recognition with Multi-Scale Convolutional Networks. International Joint
Conference on Neural Networks (IJCNN), 2809–2813.
[Teichman2011] Teichman, A., Levinson, J., & Thrun, S. (2011). Towards
3D object recognition via classification of arbitrary object tracks.
Proceedings - IEEE International Conference on Robotics and
Automation, 4034–4041.
[Time2008] Time, R., Detection, L., & Streets, U. (2008). Real Time Lane
Detection in Urban Streets. In IEEE Intelligent Vehicles Symposium (pp.
7–12).
[Wang2011] Wang, C., Jin, T., Yang, M., & Wang, B. (2011). Robust and
Real-Time Traffic Lights Recognition in Complex Urban Environments.
International Journal of Computational Intelligence Systems, 4(6), 1383.
[Ziegler2014] Ziegler, J., Lategahn, H., Schreiber, M., Keller, C. G.,
Knöppel, C., Hipp, J., … Stiller, C. (2014). Video Based Localization for
BERTHA. IEEE Intelligent Vehicles Symposium (IV), (Iv), 1231–1238.
194. Web API
194
Google Cloud Vision API
一般物体認識、顔検出、表情認識、ロゴ、ランドマーク、有害
コンテンツ、文字認識
https://cloud.google.com/vision/
Microsoft Cognitive Service
顔検出、表情認識、年齢/性別認識、顔認証、一般物体認識、
アダルト画像判別、動体検知、顔追跡、動画サムネイル作成
https://www.microsoft.com/cognitive-services/
IBM Watson Visual Recognition
顔検出、年齢/性別認識、有名人認証、一般物体認識
http://www.ibm.com/smarterplanet/us/en/ibmwatson/devel
opercloud/visual-recognition.html
195. Web API
195
PUX Developers Site
顔検出、顔認識(認証)、オブジェクト認識(特定物体認識)、
オンライン手書き文字認識
http://pux.co.jp/api_sdk/
ゼータ・ブリッジ, フォトナビ
顔検出,顔器官検出,顔属性判定(年齢、性別、笑顔)、一致
検索(特定物体認識)
http://biz.photonavi.jp/
Face++
顔検出,顔認証,顔器官検出,顔属性判定(年齢、性別、人
種、笑顔)
http://www.faceplusplus.com/