101: Convolutional Neural Networks

1. Convolutional

2. Neural

4. Networks

5. 101

6. Jaehyun Ahn Machine Learning and Data Mining Department of Computer Science Sogang University Seoul, Korea jaehyunahn@sogang.ac.kr May 13, 2015

7. Convolutional

8. Neural

9. Networks

10. (CNN)

11. •  In

12. machine

13. learning,

15. convolutional

16. neural

17. network

18. (or

19. CNN)

20. is

22. type

23. of

24. feed-forward

25. artificial

26. neural

27. network

28. where

29. the

30. individual

31. neurons

32. are

33. tiled

34. in

35. such

37. way

38. that

39. they

40. respond

41. to

42. overlapping

43. regions

44. in

45. the

46. visual

47. field.

48. Convolutional

49. networks

50. were

51. inspired

52. by

53. biological

54. processes

55. and

56. are

57. variations

58. of

59. multilayer

60. perceptrons

61. which

62. are

63. designed

64. to

65. use

66. minimal

67. amounts

68. of

69. preprocessing.

70. They

71. are

72. widely

73. used

74. models

75. for

76. image

77. and

78. video

79. recognition.

80. http://en.wikipedia.org/wiki/Convolutional_neural_network 2

81. introduction

82. Convolutional

83. Neural

84. Networks

85. (CNN)

86. •  In

87. machine

88. learning,

90. convolutional

91. neural

92. network

93. (or

94. CNN)

95. is

97. type

98. of

99. feed-forward

100. artificial

101. neural

102. network

103. where

104. the

105. individual

106. neurons

107. are

108. tiled

109. in

110. such

111. a

112. way

113. that

114. they

115. respond

116. to

117. overlapping

118. regions

119. in

120. the

121. visual

122. field.

123. Convolutional

124. networks

125. were

126. inspired

127. by

128. biological

129. processes

130. and

131. are

132. variations

133. of

134. multilayer

135. perceptrons

136. which

137. are

138. designed

139. to

140. use

141. minimal

142. amounts

143. of

144. preprocessing.

145. They

146. are

147. widely

148. used

149. models

150. for

151. image

152. and

153. video

154. recognition.

155. http://en.wikipedia.org/wiki/Convolutional_neural_network 3

156. introduction

157. Neural

158. Networks

159. +

160. Convolution

161. layer

162. •  Neural

163. Networks

164. using

165. ‘neurons’,

166. which

167. send

168. messages

169. to

170. each

171. other.

172. The

173. connections

174. have

175. numeric

176. weights

177. that

178. can

179. be

180. tuned

181. based

182. on

183. experience,

184. making

185. neural

186. nets

187. adaptive

188. to

189. inputs

190. and

191. capable

192. of

193. learning

194. 4

196. Neural

197. Networks

198. +

199. Convolution

200. layer

201. •  Neural

202. Networks

203. using

204. ‘neurons’,

205. which

206. send

207. messages

208. to

209. each

210. other.

211. The

212. connections

213. have

214. numeric

215. weights

216. that

217. can

218. be

219. tuned

220. based

221. on

222. experience,

223. making

224. neural

225. nets

226. adaptive

227. to

228. inputs

229. and

230. capable

231. of

232. learning

233. 5

234. Where this came from / How to set / How to tune ? (origin, structure and learning)

235. Neural

236. Networks

237. +

238. Convolution

239. layer

240. •  Neural

241. Networks

242. using

243. ‘neurons’,

244. which

245. send

246. messages

247. to

248. each

249. other.

250. The

251. connections

252. have

253. numeric

254. weights

255. that

256. can

257. be

258. tuned

259. based

260. on

261. experience,

262. making

263. neural

264. nets

265. adaptive

266. to

267. inputs

268. and

269. capable

270. of

271. learning

272. 6

274. Neural

275. Networks

276. +

277. Convolution

278. layer

279. •  Neural

280. Networks

281. using

282. ‘neurons’,

283. which

284. send

285. messages

286. to

287. each

288. other.

289. The

290. connections

291. have

292. numeric

293. weights

294. that

295. can

296. be

297. tuned

298. based

299. on

300. experience,

301. making

302. neural

303. nets

304. adaptive

305. to

306. inputs

307. and

308. capable

309. of

310. learning

311. 7

313. XOR

314. Problem

315. •  Comes

316. from

317. AB

318. Problem

319. (similar

320. with

321. SVM)

322. 8

323. https://web.stanford.edu/group/pdplab/pdphandbook/handbookch6.html origin D-‐dimensions with D-‐1 (under) classiﬁca9on

324. XOR

325. Problem

326. •  XOR

327. Problem

328. cannot

329. be

330. solved

331. by

332. conventional

333. D-1

334. linear

335. function.

336. 9

337. https://web.stanford.edu/group/pdplab/pdphandbook/handbookch6.html 1 2

338. XOR

339. Problem

340. •  XOR

341. Problem

342. 10

343. https://web.stanford.edu/group/pdplab/pdphandbook/handbookch6.html 1 1x 2x 1

344. XOR

345. Problem

346. •  XOR

347. Problem

348. 11

349. https://web.stanford.edu/group/pdplab/pdphandbook/handbookch6.html 1x 2x 2 2

350. 1 2 AND ),( 21 xxXOR 1x 2x XOR

351. Problem

352. •  XOR

353. Problem

355. 12

356. https://web.stanford.edu/group/pdplab/pdphandbook/handbookch6.html

357. 1 2 AND ),( 21 xxXOR 1x 2x XOR

358. Problem

359. •  XOR

360. Problem

362. 13

363. https://web.stanford.edu/group/pdplab/pdphandbook/handbookch6.html Linear combina9on needed to solve a nonlinear discrimina9on problem (some9mes)

364. Structure

365. of

366. neural

367. networks

368. (1)

369. 14

370. The Scientist and Engineer’s Guide to Digital Signal Processing, Ch 26, Steven W. Smith, ISBN 0-7506-7444 X •  Combining perceptrons •  Feed forward Informa9on ﬂow •  Passive node •  without weighted sum input •  Ac9ve node •  with weighted sum structure

371. Structure

372. of

373. neural

374. networks

375. (2)

376. 15

377. The Scientist and Engineer’s Guide to Digital Signal Processing, Ch 26, Steven W. Smith, ISBN 0-7506-7444 X

378. Why

379. do

380. sigmoid

381. functions

382. work

383. in

384. Neural

385. Nets?

386. 1.  To

387. overcome

388. nonlinearity

389. (Gradient

390. Descent

391. problem)

392. a)  Inputs

393. are

394. discrete

395. (images,

396. signals

397. and

398. etc.)

399. b)  Sigmoid

400. is

401. ‘bounded’

402. linear

403. (0

404. to

405. 1)

406. •  1-to-1

407. relation

408. between

409. weighted

410. sum

411. and

412. output

413. value

414. •  Differentiable,

415. its

416. derivative

417. is

418. very

419. fast

420. to

421. compute

422. 2.  Make

423. output

424. probabilistic

425. (Activation

426. function)

427. a)  Probabilistic

428. output

429. is

430. more

431. generous

432. about

433. the

434. error

435. a)  Linear-combinations

436. (=linear

437. classifier)

438. to

439. avoid

440. overfitting

441. problem

442. 16

443. http://stackoverflow.com/questions/11677508/why-do-sigmoid-functions-work-in-neural-nets

444. Value

445. of

446. Differentiable

447. function

448. 17

449. learning

450. Value

451. of

452. Differentiable

453. function

454. 18

455. learning Tunable s9muli

456. Value

457. of

458. Differentiable

459. function

460. 19

461. learning Tunable s9muli How to tune?

462. Value

463. of

464. Differentiable

465. function

466. 20

467. learning Tunable s9muli How to tune? Backpropaga9on method

468. All

469. the

470. weights

487. determine

498. .

501. Thus

502. we

503. can

504. define

505. our

506. error

507. function

508. on

509. example

510. ‘x’

511. on

512. the

513. supervised

514. label

515. ‘y’.

516. That

517. will

518. be

519. Define

520. error

521. function

522. to

523. tune

524. 21

525. W = { ij l w } h(X)

526. All

527. the

528. weights

545. determine

556. .

559. Thus

560. we

561. can

562. define

563. our

564. error

565. function

566. on

567. example

568. ‘x’

569. on

570. the

571. supervised

572. label

573. ‘y’.

574. That

575. will

576. be

577. Define

578. error

579. function

580. to

581. tune

582. 22

583. W = { ij l w } h(X) Therefore,

584. if

585. we

586. want

587. to

588. find

589. optimal

590. weight

591. matrix,

592. which

593. lowering

594. error

595. rate,

596. We

597. have

598. to

599. find

600. error

601. function’s

602. differentiation.

603. This

604. problem

605. changes

606. to

607. Gradient

609. Descent

610. (GD).

612. 23

613. Images from: Learning form Data, Abu. S. Mustafa, ISBN-13: 978-1600490064

614. 24

615. Images from: Learning form Data, Abu. S. Mustafa, ISBN-13: 978-1600490064

616. Tuning

617. algorithm

618. 1.  Initialize

619. all

620. weights

621. at

622. random

623. 2. 

624. for

625. t

626. :

627. T

628. do

629. 3. 

632. Forward

633. :

634. Compute

635. all

637. 4. 

640. Backward

641. :

642. Compute

643. all

644. 5. 

647. Update

648. the

649. weights

650. :

651. 6. 

654. Iterate

655. to

656. the

657. next

658. step

659. until

660. it

661. is

662. time

663. to

664. T

665. 7. 

666. Return

667. the

668. final

669. weights

670. 25

671. Algorithm from: Learning form Data, Abu. S. Mustafa, ISBN-13: 978-1600490064 learning W = { ij l w }

672. Convolutional

673. Neural

674. Networks

675. 101

676. 26

677. To

678. understand

680. Convolutional

681. Neural

682. Networks

683. 101

684. 27

685. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng Computer Science Department Stanford University Stanford, CA 94305, USA ICML 2009 To

686. understand

688. CDBN

689. for

690. Scalable

691. Unsupervised

692. Learning

693. •  Abstract

694. says..

695. 28

696. There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks. Scaling such models to full-sized, high-dimensional images remains a difficult problem. To address this problem, we present the convolutional deep belief network, a hierarchical generative model which scales to realistic image sizes. This model is translation-invariant and supports efficient bottom-up and top-down probabilistic inference. Key to our approach is probabilistic max-pooling, a novel technique which shrinks the representations of higher layers in a probabilistically sound way. Our experiments show that the algorithm learns useful high-level visual features, such as object parts, from unlabeled images of objects and natural scenes. We demonstrate excellent performance on several visual recognition tasks and show that our model can perform hierarchical (bottom-up and top-down) inference over full-sized images.

697. CDBN

698. for

699. Scalable

700. Unsupervised

701. Learning

702. •  Abstract

703. says..

704. 29

705. There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks. Scaling such models to full-sized, high-dimensional images remains a difficult problem. To address this problem, we present the convolutional deep belief network, a hierarchical generative model which scales to realistic image sizes. This model is translation-invariant and supports efficient bottom-up and top-down probabilistic inference. Key to our approach is probabilistic max-pooling, a novel technique which shrinks the representations of higher layers in a probabilistically sound way. Our experiments show that the algorithm learns useful high-level visual features, such as object parts, from unlabeled images of objects and natural scenes. We demonstrate excellent performance on several visual recognition tasks and show that our model can perform hierarchical (bottom-up and top-down) inference over full-sized images.

706. CDBN

707. from

708. Convolutional

709. +

710. DBN

711. •  Difference

712. between

713. CDBN

714. and

715. DBN

716. in

717. this

718. paper

719. –  “First, we introduce the convolutional RBM (CRBM). Intuitively, the CRBM is similar to the RBM, but the weights between the hidden and visible layers are shared among all locations in an image.” –  Probabilistic

720. Max-pooling

721. “In general, higher-level feature detectors need information from progressively larger input regions. Existing translation-invariant representations, such as convolutional networks, often involve two kinds of layers in alternation: “detection” layers, whose responses are computed by convolving a feature detector with the previous layer, and “pooling” layers, which shrink the representation of the detection layers by a constant factor. More specifically, each unit in a pooling layer computes the maximum activation of the units in a small region of the detection layer. Shrinking the representation with max-pooling allows higher-layer representations to be invariant to small translations of the input and reduces the computational burden.” 30

722. Convolution

723. CDBN

724. from

725. Convolutional

726. +

727. DBN

728. •  Difference

729. between

730. CDBN

731. and

732. DBN

733. in

734. this

735. paper

736. –  “First, we introduce the convolutional RBM (CRBM). Intuitively, the CRBM is similar to the RBM, but the weights between the hidden and visible layers are shared among all locations in an image.” –  Probabilistic

737. Max-pooling

738. “In general, higher-level feature detectors need information from progressively larger input regions. Existing translation-invariant representations, such as convolutional networks, often involve two kinds of layers in alternation: “detection” layers, whose responses are computed by convolving a feature detector with the previous layer, and “pooling” layers, which shrink the representation of the detection layers by a constant factor. More specifically, each unit in a pooling layer computes the maximum activation of the units in a small region of the detection layer. Shrinking the representation with max-pooling allows higher-layer representations to be invariant to small translations of the input and reduces the computational burden.” 31

739. Convolution

740. Learning

741. from

742. Convolutional

743. Neural

744. Networks

745. •  Comes

746. from

747. NN

748. •  Sparse

749. Connectivity

750. •  Shared

751. weights

752. •  Convolution

753. operator

754. •  Max

755. pooling

756. 32

757. features http://en.wikipedia.org/wiki/Convolutional_neural_network

758. Learning

759. from

760. Convolutional

761. Neural

762. Networks

763. •  Comes

764. from

765. NN

766. –  “To avoid the situation that there exist billions of parameters if all layers are fully connected, the idea of using a convolution operation on small regions” (Avoid overfitting / computational overhead)

767. •  Sparse

768. Connectivity

769. •  Shared

770. weights

772. operator

773. •  Max

774. pooling

775. 33

776. features http://en.wikipedia.org/wiki/Convolutional_neural_network

777. Learning

778. from

779. Convolutional

780. Neural

781. Networks

782. •  Comes

783. from

784. NN

785. •  Sparse

786. Connectivity

787. •  Shared

788. weights

790. operator

791. •  Max

792. pooling

793. 34

794. features LeNet-5, Convolutional Neural Networks, LeCun, IEEE, Nov 1998, http://yann.lecun.com/exdb/lenet/

795. Learning

796. from

797. Convolutional

798. Neural

799. Networks

800. •  Comes

801. from

802. NN

803. •  Sparse

804. Connectivity

805. –  Obtaining

806. spatially-local

807. correlation

808. “Filter”

809. •  Shared

810. weights

812. operator

813. •  Max

814. pooling

815. 35

817. Learning

818. from

819. Convolutional

820. Neural

821. Networks

822. •  Comes

823. from

824. NN

825. •  Sparse

826. Connectivity

827. 36

828. features LeNet-5, Convolutional Neural Networks, LeCun, IEEE, Nov 1998, http://yann.lecun.com/exdb/lenet/ Scikit-learn, Clustering, http://scikit-learn.org/0.11/modules/clustering.html

829. Learning

830. from

831. Convolutional

832. Neural

833. Networks

834. •  Comes

835. from

836. NN

837. •  Sparse

838. Connectivity

839. 37

840. features LeNet-5, Convolutional Neural Networks, LeCun, IEEE, Nov 1998, http://yann.lecun.com/exdb/lenet/ Scikit-learn, Clustering, http://scikit-learn.org/0.11/modules/clustering.html More ra9onally diﬀeren9able on ‘Swiss-‐roll Problem’

841. Learning

842. from

843. Convolutional

844. Neural

845. Networks

846. •  Comes

847. from

848. NN

849. •  Sparse

850. Connectivity

851. •  Shared

852. weights

854. operator

855. •  Max

856. pooling

857. 38

859. Learning

860. from

861. Convolutional

862. Neural

863. Networks

864. •  Comes

865. from

866. NN

867. •  Sparse

868. Connectivity

869. •  Shared

870. weights

871. –  We

872. multiply

873. ‘shared

874. weights’

875. on

876. each

877. filter

878. to

879. obtain

880. “translation

881. invariance”

883. operator

884. •  Max

885. pooling

886. 39

887. features LeNet-5, Convolutional Neural Networks, LeCun, IEEE, Nov 1998, http://yann.lecun.com/exdb/lenet/ Translation Invariance, Lai, J. H. “Face recognition using holistic Fourier invariant features, Pattern Recognition, 34: 95-109

888. Learning

889. from

890. Convolutional

891. Neural

892. Networks

893. •  Comes

894. from

895. NN

896. •  Sparse

897. Connectivity

898. •  Shared

899. weights

900. –  We

901. multiply

902. ‘shared

903. weights’

904. on

905. each

906. filter

907. to

908. obtain

909. “translation

910. invariance”

911. 40

913. Learning

914. from

915. Convolutional

916. Neural

917. Networks

918. •  Comes

919. from

920. NN

921. •  Sparse

922. Connectivity

923. •  Shared

924. weights

926. operator

927. •  Max

928. pooling

929. 41

931. Learning

932. from

933. Convolutional

934. Neural

935. Networks

936. •  Comes

937. from

938. NN

939. •  Sparse

940. Connectivity

941. •  Shared

942. weights

944. operator

945. –  Sparse

946. Connectivity

947. +

948. Shared

949. weights

950. +

951. kernel

952. =

953. “Feature

954. map”

955. •  Max

956. pooling

957. 42

959. Learning

960. from

961. Convolutional

962. Neural

963. Networks

965. operator

966. –  Sparse

967. Connectivity

968. +

969. Shared

970. weights

971. +

972. kernel

973. =

974. “Feature

975. map”

976. 43

977. features LeNet-5, Convolutional Neural Networks, LeCun, IEEE, Nov 1998 Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations

978. Learning

979. from

980. Convolutional

981. Neural

982. Networks

983. •  Comes

984. from

985. NN

986. •  Sparse

987. Connectivity

988. •  Shared

989. weights

991. operator

992. •  Max

993. pooling

994. (subsampling) 44

996. Learning

997. from

998. Convolutional

999. Neural

1000. Networks

1001. •  Comes

1002. from

1003. NN

1004. •  Sparse

1005. Connectivity

1006. •  Shared

1007. weights

1009. operator

1010. •  Max

1011. pooling

1012. (subsampling) –  Extracting

1013. representative

1014. feature

1015. on

1016. feature

1017. maps

1018. –  Lowering

1019. computational

1020. overhead

1021. 45

1022. features LeNet-5, Convolutional Neural Networks, LeCun, IEEE, Nov 1998, http://yann.lecun.com/exdb/lenet/ Pooling Image from: http://vaaaaaanquish.hatenablog.com/entry/2015/01/26/060622

1023. Features

1024. from

1025. Convolutional

1026. Neural

1027. Networks

1028. 46

1029. features LeNet-5, Convolutional Neural Networks, LeCun, IEEE, Nov 1998, http://yann.lecun.com/exdb/lenet/ Image from: http://vaaaaaanquish.hatenablog.com/entry/2015/01/26/060622

1030. Features

1031. from

1032. Convolutional

1033. Neural

1034. Networks

1035. 47

1036. features LeNet-5, Convolutional Neural Networks, LeCun, IEEE, Nov 1998, http://yann.lecun.com/exdb/lenet/ Image from: http://vaaaaaanquish.hatenablog.com/entry/2015/01/26/060622

1037. Features

1038. from

1039. Convolutional

1040. Neural

1041. Networks

1042. 48

1043. features LeNet-5, Convolutional Neural Networks, LeCun, IEEE, Nov 1998, http://yann.lecun.com/exdb/lenet/ Image from: http://vaaaaaanquish.hatenablog.com/entry/2015/01/26/060622 Convolu9onal operator (kernel)

1044. Features

1045. from

1046. Convolutional

1047. Neural

1048. Networks

1049. 49

1050. features LeNet-5, Convolutional Neural Networks, LeCun, IEEE, Nov 1998, http://yann.lecun.com/exdb/lenet/ Image from: http://vaaaaaanquish.hatenablog.com/entry/2015/01/26/060622 Convolu9onal operator (kernel) Pooling (narrow down)

1051. Features

1052. from

1053. Convolutional

1054. Neural

1055. Networks

1056. 50

1057. features LeNet-5, Convolutional Neural Networks, LeCun, IEEE, Nov 1998, http://yann.lecun.com/exdb/lenet/ Image from: http://vaaaaaanquish.hatenablog.com/entry/2015/01/26/060622 Convolu9onal operator (kernel) Pooling (narrow down) Shared weights

1058. Features

1059. from

1060. Convolutional

1061. Neural

1062. Networks

1063. 51

1064. features LeNet-5, Convolutional Neural Networks, LeCun, IEEE, Nov 1998, http://yann.lecun.com/exdb/lenet/ Image from: http://vaaaaaanquish.hatenablog.com/entry/2015/01/26/060622 Convolu9onal operator (kernel) Pooling (narrow down) Shared weights Sparse Connec9vity

1065. Forward

1066. Propagation

1067. •  Convolutional

1068. layers

1069. –  Suppose

1070. that

1071. we

1072. have

1073. some

1074. N

1075. x

1076. N

1077. square

1078. neuron

1079. layer

1080. which

1081. is

1082. followed

1083. by

1084. our

1085. convolutional

1086. layer.

1087. If

1088. we

1089. use

1090. an

1091. m

1092. x

1093. m

1094. filter

1099. ,

1100. our

1101. convolutional

1102. layer

1103. output

1104. will

1105. be

1106. of

1107. size

1147. .

1148. In

1149. order

1150. to

1151. compute

1152. the

1153. pre-nonlinearity

1154. input

1155. to

1156. some

1157. unit

1163. in

1164. our

1165. layer,

1166. we

1167. need

1168. to

1169. sum

1170. up

1171. the

1172. contributions

1173. (weighted

1174. by

1175. the

1176. filter

1177. components)

1178. from

1179. the

1180. previous

1181. layer

1182. cells:

1183. –  Then

1184. convolutional

1185. layer

1186. applies

1187. its

1188. nonlinearity:

1190. •  Max-Pooling

1191. layers

1192. –  The

1193. max-pooling

1194. layers

1195. are

1196. quite

1197. simple,

1198. and

1199. do

1200. no

1201. learning

1202. themselves.

1203. They

1204. simply

1205. take

1206. some

1207. k

1208. x

1209. k

1210. region

1211. and

1212. output

1213. a

1214. single

1215. value,

1216. which

1217. is

1218. the

1219. maximum

1220. in

1221. that

1222. region.

1223. For

1224. instance,

1225. if

1226. their

1227. input

1228. layer

1229. is

1230. a

1231. N

1232. x

1233. N

1234. layer,

1235. they

1236. will

1237. then

1238. output

1239. a

1253. layer,

1254. as

1255. each

101: Convolutional Neural Networks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to 101: Convolutional Neural Networks

Similar to 101: Convolutional Neural Networks (20)

More from Mad Scientists

More from Mad Scientists (20)

Recently uploaded

Recently uploaded (20)

101: Convolutional Neural Networks