EEL 6585 – DIGITAL SPEECH PROCESSING

Catalog Description:         Prerequisite: EEL 5526 Digital Signal Processing or equivalent with permission of instructor.

Scope: Speech production models, characteristic features of speech and their detection, synthesis, enhancement, compression, recognition.

Text Book:                           Discrete-Time Speech Signal Processing by Thomas F. Quatieri. Prentice Hall. ISBN-10: 013242942
Reference Books:               L. Rabiner and R. Shafer, Digital Processing of Speech Signals, Prentice Hall, 1978.
P. Loizou, Speech Enhancement: Theory and Practice, CRC Press, 2007.

Instructor:      Dr. Nurgun Erdol, Professor

Department of Computer and Electrical Engineering and Computer Science

E-mail:                 erdol@fau.edu

Telephone:        561-297-3409

Goals:                    To introduce the fundamentals of speech signal processing and related applications. This course will present the basic principles of speech analysis and speech synthesis, and it will cover several applications including speech enhancement, speech coding and speech recognition.

 

Studio Time:                        T, Tr 4:00-5:20 pm

 

Office Hours:                      T,Tr  1:30-3:00pm and by email and blackboard.

Please put EEL 6585 in the subject field of your email.
Topics:                                 
I. Speech production model (source-system model)
II. Speech perception
A. Classes of speech sounds (consonants, vowels, etc.)
B. Spectral characteristics of consonants and vowels, formants
III. Speech analysis techniques
A.  Pitch detection
B. Endpoint detection
C. Voiced/Unvoiced detection
IV. Speech synthesis techniques
A. Formant-based speech synthesizers (e.g., KLATT synthesizer)
B. Articulatory speech synthesizers
V. Speech recognition
A. Feature extraction algorithms (e.g., mel-frequency cepstrum coefficients)
B. Dynamic-time warping
C. Hidden-Markov Models
VI. Speech enhancement
A. Spectral subtraction methods
B. Wiener filtering
VII. Speech compression
A. ADPCM
B. Linear-predictive coders, analysis-by-synthesis techniques (e.g. CELP)
C. Speech and audio coding standards (e.g.,VSELP,  MPEG)
Grading:              
Grading is based on 4 projects that involve the MATLAB implementation of a speech analysis, enhancement, coding or recognition. Each project will be equally weighted and will consist of

  • A report written in MS Word or editable PDF. The report must be written like a technical paper, with a)an introduction stating the problem and why it is important, b)Possible solutions methods, their pros and cons, c)Proposed solution method and its theoretical foundations d)Implementation of the proposed method. This section should refer to the MATLAB code used; contain relevant and poignant graphics and assessment of how well this solution has performed. You should also discuss the computational complexity of the program. d) Conclusion. Did the solution work well? In what ways was it good? Where did it fail? What suggestions do you have to improve it?
  • MATLAB code package: Main program, functions, and data used. Your name and title of the project should be the first section in each program. You must use variable names that have meaning, for example use “pitch” instead of “p,” “noisy_speech” instead of “x” and please comment liberally.

Web access: Course materials, homework assignments and announcements will be posted on  http://blackboard.fau.edu. Students may access the site by logging in according to the menu on the web site. You will need your SOCIAL SECURITY OR STUDENT NUMBER.