1. ICESS 2016, Takamatsu, Japan
14 ~ 16 Nov. 2016
Young-Min Kang
Tongmyong University
A Parallel Approach to Object Identification
In Large-scale Images
Sung-Soo Kim, ETRI Gyung-Tae Nam, GCSC Inc.
2. Bigger images
• Era of Big data
– Increased sizes of images data
• Image processing
– Heavy Computation
• One of the most fundamental operations
– Object identification/recognition
• Image segmentation
• Connected components labeling
4. Parallel image processing
• Most image processing algorithms
– Pixel-wise operations
• can be implemented with pixel-wise threads
• can be efficiently performed in a data-parallel fashion
• GPU
– Data parallel device
– can be easily applied to various image processing methods
GPU:
Many-core architecture
6. CCL and parallelism
• CCL with graph traversal
– cannot be easily parallelized
• Traversal = sequential
• GPU based approaches
– has not been very successful
15. Label merge
• After computing “column-wise label runs”
– We have separate trees to be merges in accordance
with their connectivity
• What is needed
– Checking vertical adjacency
23. Previous methods
1. Check the connectivity
2. Update the hierarchy
3. Iterate this process until no update is made
A kind of graph traversal
Heavy computation when the pixels make a
long connected chain
24. Our method
• Label merge is performed with fixed
number of iterations
– The number of iteration
• log2(w)
– Computation cost at every iteration
• reduced to be the half the previous one
• Efficient label merge
• Moreover
– Can be easily parallelized
25. Label merge boundary
• 1st merge
w/2 boundaries
h comparisons in each boundary
wh/2 threads
26. Label merge boundary
• 2nd merge
w/22 boundaries
h comparisons in each boundary
wh/22 threads
27. Label merge boundary
• 3rd merge
w/23 boundaries
h comparisons in each boundary
wh/23 threads
28. Label merge boundary
• Final merge
log2(w) –th merge
Computation cost at the 1st merge: C(1)
Total Cost
29.
30. Performance
• Computational cost for each task
– Cost for Initialization = 1
– 4096x4096 images with different number of connected components
50 labels 1869 labels
initialization 1.0 1.0
column-wise run 1.6 1.6
label merge 3.4 3.6
31. Performance
• Computational cost for each task
– Cost for Initialization = 1
– 4096x4096 images with different number of connected components
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
initialization column-wise run label merge
50 labels
1869 labels
32. Experimental results
• Reference
– Grana’s method implemented with OpenCV
• Two Tests
– Random noise with varying densities
– Object identification with shapes
42. Conclusion
• An efficient GPGPU implementation for
CCL
• Data-parallelism of GPU exploited
• Experimental results show its efficiency
• Can be successfully applied to various
applications with large-scale images
– e.g., Object identification from radar signals