bİlal soylu-b090100037-(tasarım-konuşmacı tanımlayıcı)

Upload: bilal-soylu

Post on 03-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    1/22

    SAKARYANVERSTESSAKARYAN

    VERSTESSAKARYANVERSTESS

    AKARYANVERSTESSAKARYAN

    VERSTESSAKARYANVERSTESS

    AKARYANVERSTESSAKARYAN

    VERSTESSAKARYANVERSTESS

    AKARYANVERSTESSAKARYAN

    VERSTESSAKARYANVERSTESS

    AKARYANVERSTESSAKARYAN

    VERSTESSAKARYANVERSTESS

    AKARYANVERSTESSAKARYAN

    VERSTESSAKARYANVERSTESS

    AKARYANVERSTESSAKARYAN

    VERSTESSAKARYANVERSTESS

    AKARYANVERSTESSAKARYAN

    VERSTESSAKARYANVERSTESS

    AKARYANVERSTESSAKARYAN

    MHENDSLK FAKLTESELEKTRK-ELEKTRONK MHENDSL

    TASARIMI

    KONU

    SPEAK RECOGNTON(KONUMACI TANIMLAYICI)RENC

    BLAL SOYLU-B090100037DANIMAN

    YRD.DO.DR.GKEN ETNEL

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    2/22

    OTOMATK KONUMA TANIMA(AUTOMATC SPEAK RECOGNTON)

    1-)KONUMA TANIMA SSTEMLERNN YARARLARI NELERDR ve NERELERDE

    KULLANILIRLAR?

    a)Yararlar

    Konuma tanma sistemlerinin en byk faydalarndan biri kullanm kolayldr.Konuma

    tanma sistemleri veri giri arac olarak mikrofon veya telefonu kullanrlar. Buradaki veri

    kayna da insann hi zorluk ekmedii konuma alkanldr.

    Bir dier yarar da buna bal olarak veri toplama hzdr.Konuarak gnderilen veri klavye

    veya dier eylem gerektiren bir iten daha hzl olduu iin veri toplama hz asndan da

    olduka yarar vardr.

    Konuma tanma sistemleri ayrca kullancya hareket serbestlii de salar.yle ki ibakmndan ellerini kullanan bir operatr klavye veya fare ile yapamad verigiriini yaka

    mikrofonu veya kulaklkl mikrofon ile yaparak yapt ii brakmadan hareket serbestliiyle

    veri giriini salam olur.

    Sistemler ayn zamanda uzaktan veri girii yapabildii iin cihazlarn uzaktan kontrolnn

    salanmas gibi yarar da vardr.

    b)Nerelerde Kullanlrlar?

    Konuma tanma sistemlerini n balca kullanld yerler Toronto niversitesiprofesrlerinden Stephan Cook tarafndan; dikte, komut kontrol, telefonla hizmet, tbbi

    yetersizlikler ve gml uygulamalar olarak verilmitir.

    DKTE(yazdrma)

    Bu ok nemli uygulama konuma tanma sistemlerini en ok kullanld yerlerden biridir.

    Dikte uygulamas genel itibariyle yledir, kimi oturum,toplant,rportaj,adli vaka vb. gibi

    alanlarda konumann tamamnn dokman olarak eldesi ok zordur ve yavatr.klavyeyle

    veya elle dokman oluturulur fakat dikte sistemiyle konumalar direk yazya dklr ve

    dokman metin eklinde oluur.Bu uygulama imdilik ngilizce de baarl saylsa da ileride

    dier diller iinde uygulamaya konacaktr.Uygulamann en baarl ve kullanlr rnekleri

    Microsoft Diction, DragonDictate, IBM, ViaVoice gibi irketler tarafndan oluturulmutur.

    KOMUT-KONTROL

    Konuma tanmayla birok cihaz kontrol edilebilmektedir.Konumaya gre eletirilmi

    kelime veya harfler belirli komutlara karlk gelebilmektedir.Mesela konuma tanmaya gre

    komut alan ve kontrol edilen robotlar rnektir.Dier bir rnek ise akll ev sistemlerde

    kap,lamba,klima,pencere gibi alr-kapanr zellikli uygulamalarn sese gre komut alpilem yapmasdr.

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    3/22

    TELEFONLA HZMET

    Konuma tanmlama sistemlerinin gnmzde ok nemli bir uygulamasnda telefon

    hizmetleridir. Bu sistem telefon bankacl,sesli imza vs. gibi alanlarda gn getike

    kullanlmaktadr.Konuma tanma sisteminin telefon hizmetlerinde kullanlmas kullanc

    asndan hem byk kolaylk hem de zaman kazandrmaktadr.rnein bir telefonla

    bankaclk hizmetinde kullanc gvenlik asndan tularla birok numara girerken sesli imza

    ile sadece konuarak istedii ilemi rahata ve ok daha hzl yapabilmektedir.

    TIBB YETERSZLKLER

    Konuma tanma sistemi ellerini kullanamayan engelli kiilerde de ok fayda salamaktadr.

    Baz cihazlar iin ama kapama veya kontrol ilemi ellerini kullanamayan kiilerin konumayla

    yaplabilinir.

    GML UYGULAMALAR

    Bu uygulamann en kullanlabilir rnei cep telefonlarnda ki sesli arama zelliidir.Sesli

    arama sistemi,telefon rehberindeki bir kiinin telefon numaras ile kaydedilmi bir ses

    etiketinin ilikilendirilmesi prensibine dayanr ve kullanc arayaca kiiyi konuma tanma

    sistemiyle konuarak arar.

    2-)KONUMACI TANIMLAYICI SSTEMLERE GENEL BAKI

    Konumac tanmlayc sistemin genel alma prensibi u ekildedir;

    Herhangi bir kayt cihazyla (mikrofon vb.) eitli ses rneklerinin kaydedilmesi, konumacnn

    tekrar konumas,sesin sistemle ilenmesi,daha nceki rneklerle karlatrlmas ve eleip

    elemediine baklmas ve son olarak da ilemlerin doruluuna gre ilevin gereklemesi.

    3-)KONUMACI TANIMLAYICI SSTEMLERN ALIMA EKLLER(Sesin lenmesi)

    1. Konuma znitelik Vektrlerinin karmKonuurken kardmz ses sinyalleri ene,dil,dudak vb. yaplardan geerken deiikekiller alr.Bu ekiller tml(a,e,) ve tmsz(k,l,m,) sesler olarak

    tanmlanr.Seslerin bu zelliinden yararlanarak konumann znitelik vektrlerini

    karrz.

    Konuma znitelik vektrlerinin karm iin u admlar ilenir;

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    4/22

    N LEM BLOUSPEKTRAL

    EKLLENDRME

    SPEKTRAL

    ANALZ

    KONUMA SNYAL

    ZNTELK VEKTRLER

    2. n lem Bloun ilem blou, ses sinyali sisteme aktarlrken kullanlan A/D dntrcden

    kaynaklanan grlt ve DC ofset iaretin ve istenmeyen grltlerin ses sinyalindengiderilmesi amacyla gerekletirilir.

    3. Spektral ekillendirmensan ses sistemi alak geiren bir filtreye benzer bu nedenle seslerin tml

    blmlerinde(sesli harflerde) negatif bir eim vardr bu etkiyi kaldrmak iin de ses

    sinyali 1. Dereceden FIR filtre ile filtrelenir. Bu filtreye nvurgu filtresi denir.Transfer

    fonksiyonu da u ekildedir;

    H(z)=1-az-1

    ,0.95a0.97

    renek a tml Sesini Frekans Blgesi Erisi

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    5/22

    4. Spektral AnalizKonuma sinyali tml,tmsz ve bu seslerin birleiminden oluan bir sinyaldir.Bu

    sesler belirli zelliklere sahiptir.Bu zelliklere gre insan konuma sistemine benzer

    bir konuma sinyal retim modeli oluturulabilir.Bu model u ekildedir;

    Bu modelde oluan konuma sinyali, uyar sinyalinin katsaylar zamanla deien ses

    yolu filtresiyle filtrelenmesi sonucu oluur.Modelden de anlalaca gibi konuma

    sinyalinin spektrumu zamanla deien bir yapya sahiptir.

    eitli a Deerlerine Gre Frekans Cevab

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    6/22

    5. PencerelemePencereleme ilemi ses sinyalinin zaman domeninde genellikle Hamming

    Penceresiyle arplma ilemidir.Pencereleme ilemi ses sinyalin ilenmek istenen

    ksmn alr ve dier ksmn ise sfrlar.Hamming Penceresi u ekilde tanmlanr;

    Bu tanmda N pencere uzunluudur ve sistemde deerlendirilirken maksimum

    verimin alnabilmesi iin 15-30 ms aralnda olur.rnek bir konuma sinyalinin

    pencerelenmesi u ekildedir;

    Hamming Pencereleme fonksiyonu k; y(n)=w(n).x(n) dir.

    Hamming Penceresi bir bakma filtre ilemi yapmaktadr.Yukardaki ekiller de

    incelendiinde Hamming Penceresinin konuma sinyalini daha dzgn ve belirgin

    hale getirdii grlmtr.

    6.

    LPC(Dorusal ngrmsel Kodlama)

    Bu yntem, insan grtla ve az yaps zelliklerinin yan sra, ses zelliklerini de

    dikkate alr.Dorusal ngrmsel temel olarak, sesin, periyodik drt veya rastgele

    grlt ile uyarlan, dorusal ve zamana gre deien bir sistemin kts ile

    modellenebilecei prensibine dayanr. Bu model dorusal bir filtre olarak aadaki

    transfer fonksiyonu ile ifade edilmektedir. Burada p, LPC kodlaycnn seviyesi olarak

    ifade edilir.

    Hamming Penceresinden Geirilmi sfr Kelimesinin Sinyali

    Hamming Penceresinden Geirilmemisfr Kelimesinin Sinyali

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    7/22

    Bu transfer fonksiyonuna ters z dnm uygulandnda da aadaki fonksiyonelde edilir;

    LPC ilem yaplacak rnein o rnekten daha nceki bir rnekten elde edilebilecei

    prensibiyle alr.

    Tahmin sonucu elde edilen rnein asl rnekle olan farknn, yani hatann kareleri

    toplamnn minimizasyonu iin bir seri parametre hesaplanr.

    Yukarda ki eitliin zm ile p sayda LPC parametresi hesaplanr.Burada p,LPC

    kodlayc seviyesi,a1,a2,,ap ise LPC parametreleri olarak ifade edilir.Yukardaki filtrekatsaylar,ortalama kare hatas minimum olacak ekilde aadaki gibi hesaplanabilir;

    ((

    ( (

    e*n+ hatas, sinyal 0nN-1 aralnda pencerelendiinden 0nN-1p aralnda

    sfrdan farkldr.Bu yzden bir stteki eitlikte nnin snrlar 0nN-1+p alnr.Aada

    11025 Hz de rneklenmi bir tml ses sinyali ve LPC analizi sonucu elde edilen hata

    sinyali grafikleri vardr.

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    8/22

    Yukarda ki grafiklerde tml ses sinyalinin ksa zamanl spektrumu ve H(z) filtresinin

    frekans cevab verilmitir. Orjinal spektrumlu H(z) filtresinin orijinal konuma sinyali

    ve LPC analizi sonucunda elde edilen hata frekans tepkisi arasnda ki iliki net olarak

    gzlenmitir. H(z) filtresinin frekans tepkisi orijinal sinyalin spektrumunun zarfnn

    yumuatlm bir hali olduu net olarak gzlenmitir.Bundan dolay LPC analizi ksa

    zamanl spektrum tahmini olarak da dnlebilir.

    LPC ile analiz edilen ses sinyali karlatrma ve eletirme yaplmak zere

    DTW(dynamic time warping) yntemi uygulanr.

    7. DTW(Dynamic Time Warping)Bir szcn seslendirilmesinde ayn kii seslendirse bile farkllklar olur.Bu ses ayn

    zamanda uzun veya ksa olarak da seslendirilebilir.te bu DTW ile bu farkl

    seslendirmeler zaman iinde yaylarak yada daraltlarak birbiriyle rtmesi salanr.

    tml Ses Sinyali ve LPC Sonucu Elde Edilen

    Hata

    tml Ses Sinyalinin Ksa Zamanl

    Spektrumu ve H(z) Filtresini Frekans Tepkisi

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    9/22

    DTW ile sistemde kaytl szcklerle seslendirilen szcn zamanda rttrlmesi ve

    karlatrlmas gerekletirilir.Bu algoritmann kullanlma ekli ise; ablon olarak kaytl

    birden ok parametrenin ayr ayr DTW algoritmas yardmyla karlatrma ilemi iin

    hazrlanmas ve beraberce deerlendirilmesidir. Aada ki ekilde LPC parametreleri

    zerine DTW algoritmasnn uygulanmas gsterilmektedir.

    Bu algoritmann uygulama ekli de yledir;

    ablon olarak kaydedilmi szcklerin LPC parametre deerleri ile alma annda ses

    kayd ile alnm szckten hesaplanan LPC parametre deerleri, LPC analizcisi

    yardmyla zaman iinde rttrlr. Bu rttrme sayesinde kaytl tm ablonlar

    ile karlatrma salanarak her ablon iin benzemeleri hesaplanr. Hesaplamalar

    yardmyla en yakn ablona olan yaknlama oran yzde olarak hesaplanmakta ve

    eer bu yaknlama oran, tanmlanan eik deerin stnde ise eletirme

    gerekletirilmektedir.

    LPC kodlayc knda her bir ereve karlnda dn deeri olarak p adet LPC

    parametresi alnmaktadr. fade kuyruu analizcisi, ifadeleri ifade kuyruundan

    ekerek LPC kodlaycsna kodlama iin erevelere ayrarak gndermektedir.

    Kuyruktan ekilen ifadenin m adet ereveden olutuu durumda: bunun sonucu

    olarak kodlanm ifade boyutlar p ve m olan iki boyutlu bir dizidir. Sistemde n adet

    ereveden alnm ifadenin LPC ile kodlanm karl ablon olarak kaytl bulunsun.

    Bu durumda kaytl ablon, boyutlar p ve n olan iki boyutlu bir dizi olacaktr. Bu iki

    dizi; boyutlu uzayda yukardaki ekilde de grld gibi birbirine dik olarak

    yerletirilerek, 1den pye kadar grlen her bir LPC parametre dzleminde farklar

    hesaplanmakta, sonrasnda her LPC parametre dzlemi hcre baznda ortalama

    farklar hesaplanarak tek bir dzleme indirgenmektedir. Sonrasnda DTW

    algoritmasnn uygulanmas, iki ses sinyaline dorudan DTW algoritmasnn

    uygulanmasnda olduu gibi yaplmakta ve bir durumdan dierine geiyaknlklar

    karlarak karlatrma gerekletirilmektedir.

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    10/22

    Speak Recognition C Program

    /*****************************************************************************

    *

    * speaker_recognition.c

    *

    * Main Program to Identitfy a Speaker.

    *

    * The aim of this project is to determine the identity of the speaker

    * from the speech sample of the speaker and the trained vectors.

    *

    * Trained vectors are derived from the speech sample of the speaker at

    * a different time.

    *

    * First the input analog speech signal is digitized at 8KhZ Sampling

    * Frequency using the on board ADC (Analog to Digital Converter)

    * The Speech sample is stored in an one-dimensional array.

    * Speech signal's are quasi-stationary. It means that the

    * speech signal over a very short time frame can be considered to be a

    * stationary. The speech signal is split into frames. Each frame consists

    * of 256 Samples of Speech signal and the subsequent frame will start from

    * the 100th sample of the previous frame. Thus each frame will overlap

    * with two other subsequent other frames. This technique is called

    * Framing. Speech sample in one frame is considered to be stationary.

    *

    * After Framing, to prevent the spectral lekage we apply windowing.

    * Here Hamming window with 256 co-efficients is used.

    *

    * Third step is to convert the Time domain speech Signal into Frequency

    * Domain using Discrete Fourier Transform. Here Fast Fourier Transform

    * is used.

    *

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    11/22

    * The resultant transformation will result in a signal beeing complex

    * in nature. Speech is a real signal but its Fourier Transform will be

    * a complex one (Signal having both real and imaginary).

    *

    * The power of the signal in Frequency domain is calculated by summing

    * the square of Real and Imaginary part of the signal in Frequency Domain.

    * The power signal will be a real one. Since second half of the samples

    * in the frame will be symmetric to the first half (because the speech signal

    * is a real one) we ignore the second half (second 128 samples in each frame)

    *

    * Triangular filters are designed using Mel Frequency Scale. These bank of

    * filters will approximate our ears. The power signal is then applied to

    * these bank of filters to determine the frequency content across each filter.

    * In our implementation we choose total number of filters to be 20.

    * These 20 filters are uniformly spaced in Mel Frequency scale between

    * 0-4KhZ.

    *

    * After computing the Mel-Frequency Spectrum, log of Mel-Frequency Spectrum

    * is computed.

    *

    * Discrete Cosine Tranform of the resulting signal will result in the

    * computation of the Mel-Frequency Cepstral Co-efficient.

    *

    * Euclidean distance between the trained vectors and the Mel-Frequency

    * Cepstral Co-efficients are computed for each trained vectors. The

    * trained vector that produces the smallest Euclidean distance will

    * be identified as the speaker.

    *

    *

    * Written by Vasanthan Rangan and Sowmya Narayanan

    *

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    12/22

    ******************************************************************************/

    /*****************************************************************************

    * Include Header Files

    ******************************************************************************/

    #include "DSK6713_aic23.h"

    Uint32 fs=DSK6713_AIC23_FREQ_8KHZ;

    #include

    #include

    #include "block_dc.h" // Header file for identifying the start of speech signal

    #include "detect_envelope.h" // Header file for identfying the start of speech signal

    #include "training1.h" // Header file containing the trained vectors.

    /*****************************************************************************

    * Definition of Variables

    *****************************************************************************/

    #define Number_Of_Filters 20 // Number of Mel-Frequency Filters

    #define column_length 256 // Frame Length of the one speech signal

    #define row_length 100 // Total number of Frames in the given speech signal

    #define PI 3.14159

    /*****************************************************************************

    * Custom Structure Definition

    *****************************************************************************/

    struct complex {

    float real;

    float imag;

    }; // Generic Structure to represent real and imaginary part of a signal

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    13/22

    struct buffer {

    struct complex data[row_length][column_length];

    }; // Structure to store the input speech sample

    struct mfcc {

    float data[row_length][Number_Of_Filters];

    }; // Structure to store the Mel-Frequency Co-efficients

    /*****************************************************************************

    * Assigning the data structures to external memory

    *****************************************************************************/

    #pragma DATA_SECTION(real_buffer,".EXTRAM")

    struct buffer real_buffer; //real_buffer is used to store the input speech.

    #pragma DATA_SECTION(coeff,".EXTRAM")

    struct mfcc coeff; //coeff is used to store the Mel-Frequency Spectrum.

    #pragma DATA_SECTION(mfcc_ct,".EXTRAM")

    struct mfcc mfcc_ct; //mfcc_ct is used to store the Mel-Frequency Cepstral Co-efficients.

    /*****************************************************************************

    * Variable Declaration

    *****************************************************************************/

    int gain; /* output gain (Used during Play-Back */

    int signal_status; /* Variable to detect speech signal */

    int count; /* Variable to count */

    int column; /* Variable used for incrementing column (Samples inside Frame)*/

    int row; /* Variable used for incrementing row(Number of Frames)*/

    /* Variable to identify where the program is Example: program_control=0 means

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    14/22

    program is capturing input speech signal program_control=1 means that program

    has finished capturing input and ready for processing. At this time the

    input speech signal is replayed back program_control=2 means program is

    ready for idenitification. */

    int program_control;

    float mfcc_vector[20]; /* Variable to store the vector of the speech signal */

    /*****************************************************************************

    * Function Declaration

    *****************************************************************************/

    void fft (struct buffer *, int , int ); /* Function to compute Fast Fouruer Transform */

    short playback(); /* Function for play back */

    void log_energy(struct mfcc *); /* Function to compute Log of Power Signal */

    void mfcc_coeff(struct mfcc * , struct mfcc *); /* Function to compute MFCC */

    void mfcc_vect(struct mfcc * , float *); /* Funciton to compute MFCC Vector */

    /******************************************************************************

    * Function Definition Starts

    ******************************************************************************/

    interrupt void c_int11() { /* interrupt service routine */

    short sample_data;

    short out_sample;

    if ( program_control == 0 ) { /* Beginning of Capturing input speech */

    sample_data = input_sample(); /* input data */

    signal_status = framing_windowing(sample_data, &real_buffer); /* Signal Identification

    * and Framing and Windowing */

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    15/22

    out_sample = 0; /* Output Data */

    if (signal_status > 0) {

    program_control = 1; /* Capturing input signal is done */

    }

    output_sample(out_sample); /* play nothing */

    }

    if ( program_control == 1 ) { /* Beginning of the Play back */

    out_sample = playback(); /* call the playback funciton to get the

    * stored speech sample */

    output_sample(out_sample); /* play the output speech sample */

    }

    return;

    }

    void main() { /* Main Function of the program */

    /****************************************************************************

    * Declaring Local Variables

    *****************************************************************************/

    int i; /* Variable used for counters */

    int j; /* Variable used for Counters */

    int stages,speaker; /* Variable to identify total number of stages

    * and the speaker */

    float distance,ref_distance; /* Variable for storing Euclidean Distance

    * and the reference Distance for comparision

    */

    /*****************************************************************************

    * Execution of functions start

    ******************************************************************************/

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    16/22

    comm_intr(); /* init DSK, codec, McBSP */

    /******************************************************************************

    * Initializing Variables

    *****************************************************************************/

    gain = 1;

    column = 0;

    row = 0;

    program_control = 0;

    signal_status = 0;

    count = 0;

    stages=8; /* Total Number of stages in FFT = 8 */

    ref_distance = 9999999999999999.9999999; /* Variable for storing reference Distance */

    for ( i=0; i < row_length ; i++ ) { /* Total Number of Frames */

    for ( j = 0; j < column_length ; j++) { /* Total Number of Samples in a Frame */

    real_buffer.data[i][j].real = 0.0; /* Initializing real part to be zero */

    real_buffer.data[i][j].imag = 0.0; /* Initializing imaginary part to be zero*/

    }

    }

    for ( i=0; i

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    17/22

    /* Compute FFT of the input speech signal after Framing and Windowing */

    fft(&real_buffer,column_length,stages);

    /* Compute Power Spectrum of the speech signal in Frequency Domain Representation */

    power_spectrum(&real_buffer);

    /* Compute Mel-Frequency Spectrum of the speech signal in Power Spectrum Form */

    mel_freq_spectrum(&real_buffer,&coeff);

    /* Computation of Log of the Power Spectrum */

    log_energy(&coeff);

    /* Computation of Discrete Cosine Transform */

    mfcc_coeff(&mfcc_ct,&coeff);

    /* Compute Vector */

    mfcc_vect(&mfcc_ct,mfcc_vector);

    /* Identifying the Speaker */

    for ( i=0; i

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    18/22

    speaker = i; /* Identify the speaker to be corresponding sampel */

    ref_distance = distance; /* Store the new Distance */

    }

    }

    /* Print the identified Speaker */

    printf("Input Speaker is identified to be %d Speaker from the Training Set\n",++speaker);

    }

    /* Function to Compute Fast Fourier Transform */

    void fft (struct buffer *input_data, int n, int m) {/* Input speech Data, n = 2^m, m = total number of stages */

    int n1,n2,i,j,k,l,row_index; /* Declare Variables

    * n1 is the difference between upper and

    lower

    * i,j,k,l are counters

    * row_index is used to index every frame */

    float xt,yt,c,s,e,a; /* declare variables for storing temporary values

    * xt,yt for temporary real and Imaginary respectively

    * c for cosine

    * s for sine

    * e and a for computing the input to cosine and sine

    */

    for ( row_index = 0; row_index < row_length; row_index++) { /* For every frame */

    /* Loop through all the stages */

    n2 = n;

    for ( k=0; k

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    19/22

    a = j*e;

    c = (float) cos(a);

    s = (float) sin(a);

    /* Do the Butterflies for all 256 samples */

    for (i=j; idata[row_index][i].real - input_data-

    >data[row_index][l].real;

    input_data->data[row_index][i].real = input_data-

    >data[row_index][i].real+input_data->data[row_index][l].real;

    yt = input_data->data[row_index][i].imag - input_data-

    >data[row_index][l].imag;

    input_data->data[row_index][i].imag = input_data-

    >data[row_index][i].imag+input_data->data[row_index][l].imag;

    input_data->data[row_index][l].real = c*xt + s*yt;

    input_data->data[row_index][l].imag = c*yt - s*yt;

    }

    }

    }

    /* Bit Reversal */

    j = 0;

    for ( i=0; idata[row_index][j].real = input_data->data[row_index][i].real;

    input_data->data[row_index][i].real = xt;

    yt = input_data->data[row_index][j].imag;

    input_data->data[row_index][j].imag = input_data->data[row_index][i].imag;

    input_data->data[row_index][i].imag = yt;

    }

    }

    }

    return;

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    20/22

    }

    /* Function to compute log of Mel-Frequency spectrum */

    void log_energy(struct mfcc *co_eff) {

    int i,j; /* Variables declared to act as counters */

    for ( i=0; idata[i][j]); /* Compute log of co-efficients */

    }

    }

    }

    /* Function to compute Discrete Cosine Transform */

    void mfcc_coeff(struct mfcc *mfccct, struct mfcc *co_eff) {

    int i,j,k; /* Variable declared to act as counters */

    for ( i=0; idata[i][j] + co_eff->data[i][k]*cos((double)((PI*j*(k-

    1/2))/Number_Of_Filters));

    }

    }

    }

    }

    /* Function to compute Euclidean distance and conversion to Vector */

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    21/22

    void mfcc_vect(struct mfcc *mfccct, float *mfccvector) {

    int i,j; /* variables declared to act as counters */

    for ( i=0; i< Number_Of_Filters; i++ ) { /* Total Number of Filters */

    mfccvector[i] = 0; /* Initialize the Vector to Zero */

    for (j=0; j< row_length; j++) { /* For all the Frames Compute the distance */

    mfccvector[i] = mfccvector[i] + ((mfccct->data[j][i])*(mfccct->data[j][i]));

    }

    }

    }

    /* Function to play back the speech signal */

    short playback() {

    column++; /* Variable to store the index of speech sample in a frame */

    if ( column >= column_length ) { /* If Colum >=256 reset it to zero

    * and increment the frame number

    */

    column = 0; /* initialize the sample number back to zero */

    row++; /* Increment the Frame Number */

    }

    if ( row >= row_length ) { /* If Total Frame Number reaches 100 initialize

    * row to be zero

    * and change the program control inidcating

    * end of playback */

    program_control = 2; /* End of Playback */

    row = 0; /* Initialize the frame number back to zero */

    }

    return ((int)real_buffer.data[row][column].real); /* Return the stored speech Sample */

    }

  • 7/29/2019 BLAL SOYLU-b090100037-(Tasarm-konumac tanmlayc)

    22/22