SlideShare a Scribd company logo
1 of 32
Speech Processing
• Fundamentals of Digital Speech processing
1.Anatomy and physiology of speech organs
2.The process of speech production
3.The Acoustic Theory of speech production
4.Digital models for speech signals
Applications of Speech Processing
• 1.Speech recognition: speech to text
• 2.Speech understanding: Not exact words(meaning is
important rather than text) :speech translation
• 3.speech synthesis: Text to speech, computer can
speak to you
• 4.Word processing: check and correct spelling,
grammar and style
• 5.text prediction: speed up word processing
• 6.automatic summarization: Topic identification,
summary generation
• 7.text mining : Necessary data
• Anatomy: It is the study of structure of bodies of people or animals
• Physiology: It is the study of how people’s and animals bodies functions
and understanding the higher order mechanisms within the human central
nervous system that account for speech production in human beings
• Acoustic: It is a scientific study of sounds
• Phonetics: It is relating to the sound of a word or to the sounds that are
used in languages
• Phonemes: It is the smallest unit of sounds which is significant in a
language
• Articulatory:It is the action of productory a sound or word cleary,in speech
or music
• Linguistics: It is study of the way in which language works
• Semantics: It is the branch of Linguistics that deals with the meanings of
words and sentences.
Speech Processing
Signal
Processing Information
Theory
Phonetics
Acoustics
Algorithms
(Programming)
Fourier transforms
Discrete time filters
AR(MA) models
Entropy
Communication theory
Rate-distortion theory
Statistical SP
Stochastic
models
Psychoacoustics
Room acoustics
Speech production
ASR: Application
© James Glass, MIT
7
Recognition
Voice Input Analog to Digital Acoustic Model
Language Model
Display Speech EngineFeedback
Automatic Speech Recognition
Speech Generation
• first talker formulates a message(in this mind)that
he wants to transmit to listener via speech
• The process of message formulation is creation of
printed text expressing the words of message
• The next step is conversion of the message into a
language code.
• This roughly corresponds to converting the
printed text of message into set of phoneme
sequence corresponding to sounds that make up
words and pitch accent associated with the
sounds
• Once the language code is chosen, the talker
must execute a series of neuromuscular
commands to cause the vocal cords to vibrate
when appropriate and shape the vocal tract
such that the proper sequence of speech
sounds is created and spoken by the talker,
then producing an acoustic signal as final
output
Speech Recognition
• First the listener processes the acoustic signal
the basilar membrane in the inner ear, which
providing a running spectrum analysis of the
incoming signal.
• The neural activity along the auditory nerve is
converted into a language code at higher
centers of processing within the brain and
message comprehension is achieved
• The lungs and the associated muscles act as the source
of air for exciting the vocal mechanism.
• The muscle force pushes air out of lungs(shown as a
piston pushing up within a cylinder)and though the
bronchi and trachea.
• When the vocal cords are tensed, the air flow causes
them to vibrate ,producing so called voiced speech
sounds
• When the vocal cords are relaxed, in order to produce
a sound, the air flow either must pass through a
constriction in vocal tract and thereby become
turbulent, producing so called unvoiced speech sounds
Classifications
• 1.silence(s)-no speech is produced()
• 2.Unvoiced(U):vocal cords are not vibrating so
speech signal is aperiodic or random in nature
• 3.Voiced(V): vocal cords are vibrate
periodically when air flows from the lungs, so
speech signal is periodic
Speech Waveform Characteristics
• Loudness
• Voiced/Unvoiced.
• Pitch.
– Fundamental frequency.
• Spectral envelope.
– Formants.
Speech Waveform Characteristics
Cont.
Voiced Speech Unvoiced Speech
/ih/ /s/
Phoneme Hierarchy
Speech sounds
Vowels ConsonantsDiphtongs
Plosive
Nasal
Fricative
Retroflex
liquid
Lateral
liquid
Glide
iy, ih, ae, aa,
ah, ao,ax, eh,
er, ow, uh, uw
ay, ey,
oy, aw
w, y
p, b, t,
d, k, g
m, n, ng f, v, th, dh,
s, z, sh, zh, h
r
l
Language dependent.
About 50 in English.
Signal processing
Digital speech processing
• Speech signals are composed of a sequence of
sounds.
• The study of these rules and their implication
s in human communication is the domain of
linguistics.
• The study and classification of sound of
speech is called phonetics.
speech processing basics
speech processing basics
speech processing basics
speech processing basics

More Related Content

What's hot

Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
Hugo Moreno
 
TEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxTEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptx
Nsaroj kumar
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
Alok Tiwari
 

What's hot (20)

Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
Voice Morping ppt
Voice Morping pptVoice Morping ppt
Voice Morping ppt
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
TEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxTEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptx
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
Subband Coding
Subband CodingSubband Coding
Subband Coding
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overview
 

Viewers also liked (6)

Ppt on speech processing by ranbeer
Ppt on speech processing by ranbeerPpt on speech processing by ranbeer
Ppt on speech processing by ranbeer
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
Radio Communication
Radio CommunicationRadio Communication
Radio Communication
 
Radio communication presentation
Radio communication presentationRadio communication presentation
Radio communication presentation
 
Radio Presentation
Radio PresentationRadio Presentation
Radio Presentation
 
Gsm.....ppt
Gsm.....pptGsm.....ppt
Gsm.....ppt
 

Similar to speech processing basics

Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speech
Nikolay Karpov
 
1-An Introduction to English Phonetics and Phonology.ppt
1-An Introduction to English Phonetics and Phonology.ppt1-An Introduction to English Phonetics and Phonology.ppt
1-An Introduction to English Phonetics and Phonology.ppt
PhamTheTan2
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speech
Nikolay Karpov
 

Similar to speech processing basics (20)

The Phases of Speech
The Phases of SpeechThe Phases of Speech
The Phases of Speech
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speech
 
Phonetics and its types.PPTX
Phonetics and its types.PPTXPhonetics and its types.PPTX
Phonetics and its types.PPTX
 
Part1 speech basics
Part1 speech basicsPart1 speech basics
Part1 speech basics
 
Chapter 3 Phonology , Lesson 1.1 Understanding the Concept.pptx
Chapter 3 Phonology , Lesson 1.1 Understanding the Concept.pptxChapter 3 Phonology , Lesson 1.1 Understanding the Concept.pptx
Chapter 3 Phonology , Lesson 1.1 Understanding the Concept.pptx
 
Phonetics & phonology, INTRODUCTION, Dr, Salama Embarak
Phonetics & phonology, INTRODUCTION, Dr, Salama EmbarakPhonetics & phonology, INTRODUCTION, Dr, Salama Embarak
Phonetics & phonology, INTRODUCTION, Dr, Salama Embarak
 
Physiology of speech and swallowing
Physiology of speech and swallowingPhysiology of speech and swallowing
Physiology of speech and swallowing
 
1-An Introduction to English Phonetics and Phonology.ppt
1-An Introduction to English Phonetics and Phonology.ppt1-An Introduction to English Phonetics and Phonology.ppt
1-An Introduction to English Phonetics and Phonology.ppt
 
Lecture phonetics
Lecture phoneticsLecture phonetics
Lecture phonetics
 
Speech and Language Processing
Speech and Language ProcessingSpeech and Language Processing
Speech and Language Processing
 
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
 
Physiology of speech
Physiology of speechPhysiology of speech
Physiology of speech
 
Audioprocessing
AudioprocessingAudioprocessing
Audioprocessing
 
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
An Introduction To Speech Sciences (Acoustic Analysis Of Speech)
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speech
 
Theories of speech perception.pptx
Theories of speech perception.pptxTheories of speech perception.pptx
Theories of speech perception.pptx
 
Phonetics phonology and sociolinguistics
Phonetics phonology and sociolinguisticsPhonetics phonology and sociolinguistics
Phonetics phonology and sociolinguistics
 
General linguistics
General linguisticsGeneral linguistics
General linguistics
 
Phonetics lesson 1 - general introduction
Phonetics   lesson 1 - general introductionPhonetics   lesson 1 - general introduction
Phonetics lesson 1 - general introduction
 
PHONETICS AND PHONOLOGY COBAEM COURSE pptx
PHONETICS AND PHONOLOGY COBAEM COURSE pptxPHONETICS AND PHONOLOGY COBAEM COURSE pptx
PHONETICS AND PHONOLOGY COBAEM COURSE pptx
 

Recently uploaded

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 

Recently uploaded (20)

Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

speech processing basics

  • 1. Speech Processing • Fundamentals of Digital Speech processing 1.Anatomy and physiology of speech organs 2.The process of speech production 3.The Acoustic Theory of speech production 4.Digital models for speech signals
  • 2. Applications of Speech Processing • 1.Speech recognition: speech to text • 2.Speech understanding: Not exact words(meaning is important rather than text) :speech translation • 3.speech synthesis: Text to speech, computer can speak to you • 4.Word processing: check and correct spelling, grammar and style • 5.text prediction: speed up word processing • 6.automatic summarization: Topic identification, summary generation • 7.text mining : Necessary data
  • 3.
  • 4. • Anatomy: It is the study of structure of bodies of people or animals • Physiology: It is the study of how people’s and animals bodies functions and understanding the higher order mechanisms within the human central nervous system that account for speech production in human beings • Acoustic: It is a scientific study of sounds • Phonetics: It is relating to the sound of a word or to the sounds that are used in languages • Phonemes: It is the smallest unit of sounds which is significant in a language • Articulatory:It is the action of productory a sound or word cleary,in speech or music • Linguistics: It is study of the way in which language works • Semantics: It is the branch of Linguistics that deals with the meanings of words and sentences.
  • 5. Speech Processing Signal Processing Information Theory Phonetics Acoustics Algorithms (Programming) Fourier transforms Discrete time filters AR(MA) models Entropy Communication theory Rate-distortion theory Statistical SP Stochastic models Psychoacoustics Room acoustics Speech production
  • 7. 7 Recognition Voice Input Analog to Digital Acoustic Model Language Model Display Speech EngineFeedback
  • 9.
  • 10. Speech Generation • first talker formulates a message(in this mind)that he wants to transmit to listener via speech • The process of message formulation is creation of printed text expressing the words of message • The next step is conversion of the message into a language code. • This roughly corresponds to converting the printed text of message into set of phoneme sequence corresponding to sounds that make up words and pitch accent associated with the sounds
  • 11. • Once the language code is chosen, the talker must execute a series of neuromuscular commands to cause the vocal cords to vibrate when appropriate and shape the vocal tract such that the proper sequence of speech sounds is created and spoken by the talker, then producing an acoustic signal as final output
  • 12. Speech Recognition • First the listener processes the acoustic signal the basilar membrane in the inner ear, which providing a running spectrum analysis of the incoming signal. • The neural activity along the auditory nerve is converted into a language code at higher centers of processing within the brain and message comprehension is achieved
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18. • The lungs and the associated muscles act as the source of air for exciting the vocal mechanism. • The muscle force pushes air out of lungs(shown as a piston pushing up within a cylinder)and though the bronchi and trachea. • When the vocal cords are tensed, the air flow causes them to vibrate ,producing so called voiced speech sounds • When the vocal cords are relaxed, in order to produce a sound, the air flow either must pass through a constriction in vocal tract and thereby become turbulent, producing so called unvoiced speech sounds
  • 19. Classifications • 1.silence(s)-no speech is produced() • 2.Unvoiced(U):vocal cords are not vibrating so speech signal is aperiodic or random in nature • 3.Voiced(V): vocal cords are vibrate periodically when air flows from the lungs, so speech signal is periodic
  • 20. Speech Waveform Characteristics • Loudness • Voiced/Unvoiced. • Pitch. – Fundamental frequency. • Spectral envelope. – Formants.
  • 21. Speech Waveform Characteristics Cont. Voiced Speech Unvoiced Speech /ih/ /s/
  • 22.
  • 23.
  • 24. Phoneme Hierarchy Speech sounds Vowels ConsonantsDiphtongs Plosive Nasal Fricative Retroflex liquid Lateral liquid Glide iy, ih, ae, aa, ah, ao,ax, eh, er, ow, uh, uw ay, ey, oy, aw w, y p, b, t, d, k, g m, n, ng f, v, th, dh, s, z, sh, zh, h r l Language dependent. About 50 in English.
  • 27.
  • 28. • Speech signals are composed of a sequence of sounds. • The study of these rules and their implication s in human communication is the domain of linguistics. • The study and classification of sound of speech is called phonetics.