Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Handwriting recogntion slides boeing

Handwriting Recognition using Deep Neural Networks and Computer Vision Techniques for Boeing

  • Login to see the comments

Handwriting recogntion slides boeing

  1. 1. Language Technologies Handwriting Recognition A Project of the Boeing/Carnegie Mellon Aerospace Data Analytics Lab Project members: Daniel Clothiaux Vivian Robison Tejashree Gharat Vipul Mascarenhas Project mentors: Dr. Ravi Starzl Dr. Barnabas Poczos
  2. 2. Language Technologies Contents At-a-Glance • Task and Goals • Approach • Challenges • Solutions • Project Roadmap and Context
  3. 3. Language Technologies The Task • Handwriting recognition (HWR) and transcription of airplane maintenance related work/job/task cards and similar paper forms Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]
  4. 4. Language Technologies The Goals • Automatic form-type identification • High-quality OCR / transcription of printed and hand-written characters • Association of content with proper data field Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]
  5. 5. Language Technologies The Approach Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]
  6. 6. Language Technologies The Approach Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]
  7. 7. Language Technologies The Approach Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]
  8. 8. Language Technologies IFE Turn Check Carried out saw DMC-R787-A 44-25-00 -48A-300B-A ROV A1/ 01-Nov 2013. All Ops OK. Outfit Toolkit DACOSL 01 checked complete. The Approach IFE Turn Check Required. Outfit tool kit PACOSL 01 in use Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]
  9. 9. Language Technologies The Approach IFE Turn Check Carried out saw DMC-R787-A 44-25-00 -48A-300B-A ROV A1/ 01-Nov 2013. All Ops OK. Outfit Toolkit DACOSL 01 checked complete. IFE Turn Check Required. Outfit tool kit PACOSL 01 in use Task Card Datastore Subject Action … Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]
  10. 10. Language Technologies The Challenges Automatic form recognition and processing • Form Identification • Deskewing / Denoising • Segmentation Robust handwriting OCR • Network Design • DNN Overfitting Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]
  11. 11. Language Technologies Form Processing • Recognition by Convolutional Template Matching • Minimizing L2 distance to template image with rotation and shearing shear x-axis shear y-axis clockwise-rotation Sum of absolute differences in pixel intensities Evaluate at each pixel in search image Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]
  12. 12. Language Technologies OCR • Character recognition by Deep Neural Networks *Example of LeNet Convolutional Neural Network • Enough power for the task, but watch overfitting Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]
  13. 13. Language Technologies OCR Control overfitting with organic data sets enhanced by and generative writing engines 1.Boeing Data 2.NIST and public data 3.Font-based generation 4.RNN driven generation Estimated ~2 billion examples Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]
  14. 14. Language Technologies Error Analysis kas 7 t • Current system errors stem from writing style differences. • Additional data from Boeing will help address the problem. Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR ]
  15. 15. Language Technologies NIST Special Database 19 Handwriting Sample Forms Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR • MIS(Multiple Image Set) allows multiple images to be stored together where one or more images are stored as a continuous raster. Work ]
  16. 16. Language Technologies NIST Special Database 19 Handwriting Sample Forms Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR Tasks: 1. Fetching the header and data from the MIS file. 2. Decode the data (encoded via CCITT 4 compression technique) 3. Encode it again for the target file format 4. Identify header and file chunks of the target file format 5. Convert the MIS header and body information to the new format and write it to file Work ] MIS content Sample Header
  17. 17. Language Technologies Vertical Projection for Character Segmentation Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR Work ] Example 1: Example 2:
  18. 18. Language Technologies Project Roadmap Beta System ~1 Year (task cards) Deployment System ~2 Years (multiple forms) Style Quantification (Publication) Balanced semi- supervised training set (Publication) Advanced Form processing and segmentation for task cards High quality OCR for Task Cards Improved generalizable OCR for multiple form types Advanced Form processing and segmentation for multiple form types Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR Work ]
  19. 19. Language Technologies Raw Data (PDF / Image) Text Analysis Parts Inventory Optimization Sensor Analysis Handwriting Recognition Project Context Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR Work ]
  20. 20. Language Technologies Task / Goals Approach Challenges Solutions Roadmap / Context[ Forms OCR Work ] Thank you

×