This document summarizes different computer vision techniques for text detection:
- Stroke Width Transformation (SWT) uses edge detection and gradient maps to group pixels by similar stroke width into letter candidates. It can detect text of various sizes and styles but is relatively slow.
- Maximally Stable Extremal Regions (MSER) detects stable regions over a range of threshold values, suitable for character features. It can also detect text variations but is sensitive to blur.
- An alternative uses a sequential classifier trained on characters to directly find character regions instead of analyzing and rejecting components. This requires training on font styles but has faster performance.
5. SWT
(Stroke Width Transformation)
Computes per pixel the most likely stroke
width containing the pixel.
Steps:
- Compute Edge Map of image.
- Compute X & Y Gradient Map.
- Calculate Ray from every edge pixel with
the direction from the gradient maps.
- Set the value of the pixels of the ray to
the min of current value and ray length.
- Group neighbor pixels with similar
stroke width together to find letter
candidates.
7. SWT
Rejecting connected components strategies:
- Variance of the stroke width.
- Aspect ratio.
- Too large & too small components
- Components which are clearly not part of a
word / text line
9. SWT
(Stroke Width Transformation)
Advantages:
- Is able to accurately detect
text in different sizes, styles,
colors.
- Can detect text independent
of perspective and rotation.
- First step of SWT is a good all-
rounder thresholding method
for images with text.
Disadvantages:
- Relatively slow performance
(edge & gradient maps).
- Needs information if text or
background is darker (in the
grayscale image).
10. MSER
(Maximally Stable Extremal Regions)
Blob detection method suitable for detecting character features.
This method detects regions which are considered stable over a large range of
threshold values.
13. MSER
(Maximally Stable Extremal Regions)
Advantages:
- Is able to accurately detect
text in different sizes, styles,
colors.
- Can detect text independent
of perspective and rotation.
- Good performance.
Disadvantages:
- Sensible against blur.
- No binary image as an output
(thresholding for OCR still
needed).
14. ER Variation for text detection
Sequential classifier trained for character detection instead of maximum region
Advantages:
- Only Character regions will be found. No need for analyzing and rejecting
components.
Disadvantages:
- Needs training for different font or character types
- Slower performance