a performance evaluation of frequency domain blind source separation using close arrangement of...

9
A Performance Evaluation of Frequency Domain Blind Source Separation Using Close Arrangement of Directional Microphones Hiroki Shimizu, 1 Masanori Ito, 1 Yoshinori Takeuchi, 2,3 Tetsuya Matsumoto, 1 Hiroaki Kudo, 1 and Noboru Ohnishi 1 1 Graduate School of Information Science, Nagoya University, Nagoya, 464-8603 Japan 2 Information Security Promotion Agency, Nagoya University, BMC Research Center, Nagoya, 464-8603 Japan 3 RIKEN BMC, Nagoya, 463-0003 Japan SUMMARY A performance evaluation of frequency-domain blind source separation (BSS) in a close arrangement of direc- tional microphone is carried out. When a close arrangement is used in time-domain BSS, it is possible to reduce the number of taps necessary for separation while maintaining the same performance as in separation without close ar- rangement. In this research, from the point of view of the relationship between the number of taps and the separation performance, the calculation time, and the permutation (output order) unique to the frequency domain BSS, a comparison with nonclose arrangement is made. With re- gard to the calculation time and the permutation error, close arrangement is superior to nonclose arrangement in envi- ronments both with and without reverberation. With regard to the separation performance, close arrangement is better than nonclose arrangement in an anechoic chamber but nonclose arrangement is better if there are many taps. If the number of taps is small, close arrangement is superior. © 2007 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 90(6): 1–9, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjc.20300 Key words: independent component analysis; fre- quency domain; blind source separation; directional micro- phone; close arrangement. 1. Introduction The signal processing technique in which signals sent from several sources are observed with several sensors and the source signals are inferred by using only the information on the obtained mixed signals is called blind signal separa- tion. In this technique, information on the mixing of the signals and on the signal sources is not used. Instead, the assumption is made that the source signals are statistically independent, as a minimum hint for deriving the separated sources. Various methods have been proposed for blind signal separation [1–3]. The mixed state of the signals that are considered can be divided into the following two types depending on the transfer process. Instantaneous mixture, in which the source signals are spatially mixed by a constant matrix [4] © 2007 Wiley Periodicals, Inc. Electronics and Communications in Japan, Part 3, Vol. 90, No. 6, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-A, No. 6, June 2006, pp. 485–493 Contract grant sponsor: Ministry of Education and Science 21st Century COE Program on “Intelligent Media Integration for Social Information Infrastructure.” 1

Upload: hiroki-shimizu

Post on 11-Jun-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A performance evaluation of frequency domain blind source separation using close arrangement of directional microphones

A Performance Evaluation of Frequency Domain Blind SourceSeparation Using Close Arrangement of Directional

Microphones

Hiroki Shimizu,1 Masanori Ito,1 Yoshinori Takeuchi,2,3 Tetsuya Matsumoto,1 Hiroaki Kudo,1 and Noboru Ohnishi1

1Graduate School of Information Science, Nagoya University, Nagoya, 464-8603 Japan

2Information Security Promotion Agency, Nagoya University, BMC Research Center, Nagoya, 464-8603 Japan

3RIKEN BMC, Nagoya, 463-0003 Japan

SUMMARY

A performance evaluation of frequency-domain blindsource separation (BSS) in a close arrangement of direc-tional microphone is carried out. When a close arrangementis used in time-domain BSS, it is possible to reduce thenumber of taps necessary for separation while maintainingthe same performance as in separation without close ar-rangement. In this research, from the point of view of therelationship between the number of taps and the separationperformance, the calculation time, and the permutation(output order) unique to the frequency domain BSS, acomparison with nonclose arrangement is made. With re-gard to the calculation time and the permutation error, closearrangement is superior to nonclose arrangement in envi-ronments both with and without reverberation. With regardto the separation performance, close arrangement is betterthan nonclose arrangement in an anechoic chamber butnonclose arrangement is better if there are many taps. If thenumber of taps is small, close arrangement is superior.© 2007 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3,

90(6): 1–9, 2007; Published online in Wiley InterScience(www.interscience.wiley.com). DOI 10.1002/ecjc.20300

Key words: independent component analysis; fre-quency domain; blind source separation; directional micro-phone; close arrangement.

1. Introduction

The signal processing technique in which signals sentfrom several sources are observed with several sensors andthe source signals are inferred by using only the informationon the obtained mixed signals is called blind signal separa-tion. In this technique, information on the mixing of thesignals and on the signal sources is not used. Instead, theassumption is made that the source signals are statisticallyindependent, as a minimum hint for deriving the separatedsources.

Various methods have been proposed for blind signalseparation [1–3]. The mixed state of the signals that areconsidered can be divided into the following two typesdepending on the transfer process.

• Instantaneous mixture, in which the source signalsare spatially mixed by a constant matrix [4]

© 2007 Wiley Periodicals, Inc.

Electronics and Communications in Japan, Part 3, Vol. 90, No. 6, 2007Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-A, No. 6, June 2006, pp. 485–493

Contract grant sponsor: Ministry of Education and Science 21st CenturyCOE Program on “Intelligent Media Integration for Social InformationInfrastructure.”

1

Page 2: A performance evaluation of frequency domain blind source separation using close arrangement of directional microphones

• Convolutive mixture, in which the source signalsare convoluted with the impulse response and aremixed both spatially and temporally.

In this paper, we consider a mixture of several voicesin a real environment as the target for separation processing.In general, the transmission paths from the source to severalmicrophones for observation are different in real environ-ments. Therefore, the signals observed by the microphoneshave time differences, and consequently the mixed state isconsidered a convolutive mixture.

Separation methods for a convolutive mixture in-volve either estimating separation filter in the time domain[5, 6] or transforming the mixed signal to the frequencydomain by the Fourier transform and treating the signal asan instantaneous mixture in each frequency band [7, 8].

In separation in the time domain, the time interval ofthe signal considered becomes longer if the range of thesignal used is large. Therefore, although the separationperformance is rather good, much time is required forestimation of the separation filter and separation process-ing, because the number of parameters of the estimatedseparation filter is large.

In contrast, in separation processing in the frequencydomain, the number of parameters to be estimated increasesin proportion to the number of frequency bands to beseparated. Further, the estimated signal in each frequencyband has problems of permutation and scaling.

In Ref. 9, separation processing of a convolutivemixture of voice signals in a real environment in the timedomain is considered. By placing directional microphonesclose together, the number of taps of the separation filtercan be reduced and the processing time can be shortenedwhile guaranteeing sufficient separation performance. Inthis paper, the case of close spacing of directional micro-phones in the frequency domain and the case without closespacing are compared from the point of view of the rela-tionship between the number of taps and the separationperformance, the computation time, and the permutationproblem. In addition to reduction of processing time, themethod of using closely spaced directional microphoneshas additional advantages, including the fact that the num-ber of necessary taps can be determined regardless of thepositional relationship of the source and microphones, andthat there is no restriction on the space for allocating micro-phones such as occurs in microphone arrays.

There have been various studies using closely spacedmicrophones as in the present paper [10–13]. In the papersof Taniguchi’s group [10, 11], a perfect instantaneous mix-ture is considered. Hence, sound separation is not success-ful in a room with reverberation. No details concerning thereason are provided. In the paper by Sanchis and colleagues[12], separation as an instantaneous mixture was performedby using stereo microphones in a recording studio and

anechoic chamber. There are many unknown details, in-cluding the experimental conditions, such as the reverbera-tion time, the algorithm for separation, the reduction ofseparation performance with an increased number of tapsfor convolutive mixture, and the SNR prior to separation.In the research reported by Ito and colleagues [13], there isno discussion of the relationship between the number oftaps and the separation performance, and a reduction of thevariation of the mixture matrix at the time of movement dueto proximity of microphone spacing is noted. On the otherhand, in the present study, the effect of the proximity ofmicrophone spacing is studied in an ordinary room. This isthe difference from the four studies mentioned above.

In this paper, the algorithm for the frequency domainBSS used in the experiment is described in Section 2. InSection 3, the mixture model with the proximity setting ofthe directional microphones and the principle of the tapreduction effect are presented. In Section 4, the relationshipbetween the number of taps and the separation performanceis given and a comparative experiment is described fromthe point of view of calculation time and permutation. InSection 5, an experimental comparison of the separationperformance and calculation time when the shift is variedin the short-term Fourier transform is described. Conclu-sions are given in Section 6.

2. Frequency Domain BSS

Let the source signals be s(t) = [s1(t), . . . , sN(t)]T, themixed signals observed by the microphones be x(t) = [x1(t),. . . , xM(t)]T, and the separated signals be y(t) = [y1(t), . . . ,yN(t)]T. Here N and M denote the number of sources and thenumber of mixed signals. Also, M ≥ N. In the time domain,

are found to be convolutive mixtures. The element hji of themixed filter matrix H with a number of taps K is the inputresponse from source i to microphone j, and element Wij ofthe separation filter matrix W with a number of taps K isthe coefficient of the separation filter calculated by the BSS.When Eqs. (1) and (2) are transformed to the frequencydomain, the following instantaneous mixture in each fre-quency bin is obtained:

Here ω is the frequency bin and m is the time frame; S(ω,m), X(ω, m), and Y(ω, m) are obtained by the Fourier

(1)

(2)

(3)

(4)

2

Page 3: A performance evaluation of frequency domain blind source separation using close arrangement of directional microphones

transform after each is separated in each short time frame,and H(ω) and W(ω) are obtained by taking the Fouriertransform of H(k) and W(k). When Y(ω, m) is subjected tothe inverse Fourier transform, the separation signal y(t) canbe derived.

Several separation algorithms in the frequency do-main have been proposed. However, the objective of thepresent research is not comparison of the superiority ofthese algorithms. Therefore, the minimization and naturalgradient method for the mutual information among theelements of Y(ω, m) is used [14]. W(ω) is updated by meansof ∆W(ω) in Eq. (5). Updating is performed in each fre-quency bin.

Here, µ is the training coefficient, η is a coefficient express-ing the nonlinearity, I is the unit matrix, YH is the conjugatetranspose of Y, and <⋅> denotes averaging over the timeframe.

Let the l-th iteration of element Wij(ω) of W(ω) beWij(ω, l). Updating is performed until

Here |⋅| denotes the absolute value of a complex number.W(ω) as determined here contains the problems of scalingand permutation. The scaling problem is resolved by

according to the minimal distortion principle [15]. Thepermutation problem is discussed in Section 4.

3. Principle of Reduction of Number ofTaps

Let us consider the case of two sound sources. In thepresent study, directional microphones are used for obser-vation of the mixed signals. As shown in Fig. 1, the receiv-

ing parts of the two directional microphones are placed atthe same location (strictly speaking they are separated bytheir diameter). It is not necessary that the directions of themaximum gain be oriented toward each source.

Let the z transforms of the two source signals and themixed signals be S(z) = [S1(z), S2(z)]T and X(z) = [X1(z),X2(z)]T. The transfer function from source i to microphonej (i, j = 1, 2) is denoted as Hij(z). The angle between the peakdirectivity direction of the microphone j and the directionof the source i seen from the receiving point is θij. Then thetransfer function due to the directivity characteristic ofmicrophone j in observation of the signal incident fromsource i is denoted as gj(z; θij). Although Hij(z) variesdepending on the direction θ, no reverberation other thanthat from the sound source is taken into account, as afirst-order approximation. Thus, the mixing process of thesound sources is in general given by

When microphones are placed close together as shown inFig. 1, the transfer paths from a source to each microphonebecome identical if the effect of reverberation can be ig-nored. Thus, the following equalities hold with regard to thetransfer function*:

Hence, Eq. (9) becomes

It is found from Eq. (11) that only the transfer function giof the microphone due to the directional characteristicconstitutes a convolutive mixture in the mixed signals ob-served as a result of proximity placement. In the case of atransfer function of lower order than that obtained by theconventional method without closely placed microphones,so that it depends on the frequency characteristic of thedirectional microphone, the number of parameters to beestimated in the separation is expected to be smaller. Byusing this expectation, the present method brings about aconvolutive mixture of a mixed state that is close to aninstantaneous mixture.

(5)

(6)

(7)

(8)

Fig. 1. The arrangement of directional microphones.

(9)

(11)

*It is experimentally shown in Ref. 9 that Eq. (10) holds approximatelyeven if there are deviations in the heights of the microphones as well as inreverberation.

(10)

3

Page 4: A performance evaluation of frequency domain blind source separation using close arrangement of directional microphones

4. Experiment 1

When BSS is applied in the time domain to themixture signal observed by the microphones placed closetogether, it is shown in Ref. 9 that the number of taps neededfor separation is reduced.

In order to verify the performance when BSS isapplied in the frequency domain to a mixture signal ob-served by closely spaced microphones, experiments wereperformed with varying filter lengths.

As a performance metric, the separation performanceSNR, the computation time, and the permutation problemsare addressed. The SNR can be found from the following,if s(t) and n(t) are the target signal sound and the interfer-ence sound:

When the separation performance is derived, the order ofthe output sequence is interchanged if there is any error, sothat separation performance without permutation error oc-curs. When the source signals are regenerated one by one,recorded with the microphones, and passed through theseparation filter, it can be determined how much of thesource signal is present in Yi(ω, t). Hence, manual inter-change is possible. Unrelated to the separation performancedetermined in this manner, the output sequence error rate isderived individually. Since more than 90% of the computa-tion time is spent in updating the separation filter, thecomputation time considered in this paper is the number ofiterations until the separation filter converges. The defini-tion of the number of updating operations is the averagenumber until Eq. (7) is satisfied for each frequency bin.

4.1. Experimental method

In the experiments, there were two microphones andtwo sources. The experiments were carried out in both a realenvironment (a room with reverberation) and a room with-

out reverberation with mixed signals recorded by a proxim-ity arrangement and a nonproximity arrangement. Theroom with reverberation had a width of 7.4 m, a depth of14.0 m, and a height of 2.7 m, and the reverberation timewas 0.55 second. The microphones and sources were placedat the center of the room as shown in Figs. 2(a) and 2(b).Figures 3 and 4 present the directivity and the frequencycharacteristics of the microphones used. The orientationsof the microphones were 30° and –30° in the proximityarrangement and 0° for both in the nonproximity arrange-ment, where the frontal direction of the microphone arraywas chosen as 0°. The two sources were 12 combinationsof four voices (two male Japanese and two female Japanesevoices). The sampling frequency was 16 kHz and the meas-urement time was 10 seconds. The mixed signals obtainedin this manner were separated by changing the length of theestimated separation filter as 4, 8, 16, 32, 64, 128, and 256.

The procedure for the separation processing was asfollows.

(12) Fig. 3. The directivity pattern of the used microphone.

Fig. 2. The arrangement of microphones and speakers.Fig. 4. The frequency characteristics of the used

microphone.

4

Page 5: A performance evaluation of frequency domain blind source separation using close arrangement of directional microphones

• X(ω, m) is derived by taking the Fourier transformof the mixed signal x(t) separated in each shorttime frame.

• By using the separation algorithm described inSection 2, W(ω) was obtained. The parametersused were µ = 0.1 and η = 100. The initial valueof the first frequency bin of W(ω) was givenappropriately. The initial values of the subsequentbins were the converged values of the adjacent bin.

• The scaling problem was resolved by the minimaldistortion principle. A correct solution was givento frequency bins with incorrect output sequence.

• By taking the inverse Fourier transform of theseparation signal Y(ω, m) derived in the frequencydomain, the time domain separation signal y(t)was obtained.

4.2. Experimental results and discussion

4.2.1. Relationship between number of tapsand separation performance

Figures 5 and 6 show the separation performanceSNR when the solution to the permutation problem is givenand the output sequence error is not generated. Figure 5shows the results in a room with reverberation and Fig. 6shows those for the case without reverberation. The hori-zontal axis represents the filter length. The SNR prior toseparation was 3.41 and 0.94 dB in the room with rever-beration for the proximity and nonproximity arrangementsand 4.39 and 1.31 dB in an anechoic chamber for theproximity and nonproximity arrangements.

In the case with reverberation, the separation per-formance was better in the nonproximity arrangement if thenumber of taps was large. In order to achieve separation inthe case with a complex mixing process including rever-beration, the information on the directive characteristic ofthe microphone is not sufficient. Information on the arrivaltime difference between the microphones is considered tobe necessary. In contrast, in the anechoic chamber, themixing process is simple and the information on the direc-tivity characteristic is sufficient. The separation charac-teristic is considered better in the proximity arrangement.The above results show that the frequency domain BSS inthe proximity arrangement is useful in an anechoic chamberand in calm outdoor conditions with almost no wind orreverberation.

Next, let us compare the improvement of the separa-tion performance. In each of the combinations of the rever-beration room and anechoic chamber with proximity andnonproximity arrangements, the SNR at 256 taps was con-sidered as the saturated value of the separation perform-ance. The separation performance reaches 90% of thesaturated value at 32 taps for the proximity arrangement andat 64 taps for the nonproximity arrangement in the rever-beration room and at 64 taps for the proximity arrangementand 128 taps for the nonproximity arrangement in theanechoic chamber. The reason why the 90% values are usedinstead of the saturated values for evaluation is that accuratespecification of the saturated values is difficult and thedeterioration of the separation performance by about 10%from the saturated value is not considered seriously detri-mental. In the reverberation room, the separation perform-ance is better with fewer taps in the proximity arrangementthan in the nonproximity arrangement. Thus, the reduction

Fig. 5. Separation performance versus filter length (in areverberation chamber).

Fig. 6. Separation performance versus filter length (inan anechoic chamber).

5

Page 6: A performance evaluation of frequency domain blind source separation using close arrangement of directional microphones

of the number of taps in the frequency domain BSS isdemonstrated as in the time domain for proximity arrange-ment by [9].

4.2.2. Computation time

Figures 7 and 8 present comparisons of the computa-tion time. The computation time was evaluated in terms ofthe number of iterations until the calculation if separationfilter (5) converged. In either room, the convergence wasfaster in the proximity arrangement than in the nonproxi-mity arrangement regardless of the number of taps. Thisindicates that the mixtures are more similar in adjacent binsin the case of the proximity arrangement than in the non-proximity arrangement. Also, in the case of the reverbera-tion room, the number of iterations reached a minimum ata number of taps for which the separation performance wasabout 90% of the saturated value. In the anechoic chamber,the results were similar to those for the reverberation room.The reason is considered to be as follows. If the number oftaps is small, the number of taps of the separation filter isnot sufficient, so that convergence to an optimum separa-tion filter is not reached and the number of iterationsincreases. If the number of taps is large, the number ofiterations is increased because the extra filter coefficientscannot be converged.

In the case of the proximity arrangement in ananechoic chamber, the number of iterations decreases withthe number of taps so that the results are different fromthose above.

4.2.3. Permutation problem

Figures 9 and 10 show the permutation error rate. Thepermutation error rate is high in the nonproximity arrange-ment but there is little error in the proximity arrangement,

Fig. 7. Iteration times versus filter length (in a reverberation chamber).

Fig. 8. Iteration times versus filter length (in an anechoic chamber).

Fig. 9. Permutation error rate versus filter length (in a reverberation chamber).

Fig. 10. Permutation error rate versus filter length (in an anechoic chamber).

6

Page 7: A performance evaluation of frequency domain blind source separation using close arrangement of directional microphones

with negligible influence on the separation performance.The reason is as follows. In the nonproximity arrangement,the complex transfer characteristics from the source to themicrophone caused by reverberation must be taken intoaccount. On the other hand, in the proximity arrangement,there is no need to consider the transfer characteristics fromthe source to the microphone; only the frequency charac-teristics of the microphone must be considered. Further, asshown in Fig. 4, the frequency characteristics of the micro-phone are not steep. Hence, the use of the converged valuein the previous frequency bin as the initial value for eachfrequency bin is considered effective in the derivation of theseparation filter.

5. Experiment 2

The number of frames used for iteration of the sepa-ration matrix is inversely proportional to the shift at the timeof the short-term Fourier transform. The more frames areused for iteration of the separation matrix, the higher thereliability of separation, but the longer the time for iteration.In order to study the relationship between the number offrames and the performance, the following experiment wasconducted.

5.1. Experimental method

In order to study the relationship between the per-formance and the number of frames used in the iterationsof the separation matrix, separation experiments were car-ried out for the case in which the amount of shift of theframe was one-half of the filter length versus the case inwhich the shift was fixed at 8 points (0.5 ms) regardless ofthe filter length, when the Fourier transform was carried outfor each short time frame. The filter lengths for evaluationof the separation performance were 16, 32, 64, 128, and256. In the experiment, only a reverberation room was used.The other experimental conditions were identical to thosein Experiment 1.

5.2. Experimental results and discussion

In Figs. 11 and 12 the separation performance iscompared for a shift constant of one-half of the filter lengthin both the proximity and nonproximity arrangements in areverberation room. It is evident that the separation per-formance differs little. Next, a comparison of the computa-tion time is shown in Tables 1 and 2. In the tables, L is thefilter length of the separation filter. To eliminate depend-ence on the computational environment, the computationtime was normalized so that the maximum was 1. The timerequired for iteration of the separation matrix in the case of

Fig. 11. Comparison of separation performancebetween variable and fixed shifts (close microphone arrangement).

Fig. 12. Comparison of separation performancebetween variable and fixed shifts (separated

microphone arrangement).

Table 2. Comparison of computation time betweenvariable and fixed shifts (separated microphone

arrangement)

Table 1. Comparison of computation time betweenvariable and fixed shifts (close microphone arrangement)

7

Page 8: A performance evaluation of frequency domain blind source separation using close arrangement of directional microphones

a constant shift is much larger than that when the shift isone-half of the filter length. This is because the number offrames used for separation increases if the amount of shiftis constant.

Since sound information for the prior word is neededwhen a human generates voice, a certain level of overlapbetween frames is considered necessary. The above resultshows that the amount of shift for the Fourier transform ofeach short term frame can be one-half of the filter length(in this case, the overlap between the frames is one-half ofthe filter length). However, since a sufficiently long datasequence of 10 seconds was used in this experiment, thestatistics is considered sufficient even if the size of the shiftis one-half. The detailed relationship between the datalength and the amount of shift must be determined.

6. Conclusions

Performance evaluations of frequency domain BSSfor directional microphones were carried out from thepoints of view of separation performance, computationtime, and permutation. First, with regard to separationperformance, the nonproximity arrangement is superiorwith many taps and the proximity arrangement is betterwith fewer taps in a reverberation room. In an anechoicchamber, the proximity arrangement is superior. With re-gard to the computational time, the proximity arrangementis better than the nonproximity arrangement regardless ofthe number of taps in both types of rooms. The permutationerror is substantially smaller and the influence on the sepa-ration performance is smaller in the proximity arrangementthan in the nonproximity arrangement.

When the Fourier transform is taken by segmentingthe frame into short time frames, the amount of shift perframe is varied so that the number of frames used forseparation is changed. As the number of frames used forseparation is increased, the reliability of separation is im-proved. However, the number of frames has a negligibleeffect on the separation performance.

Acknowledgment. This work was supported inpart by the Ministry of Education and Science 21st CenturyCOE Program on “Intelligent Media Integration for SocialInformation Infrastructure.”

REFERENCES

1. Lee T-W. Independent component analysis theoryand applications. Kluwer Academic Publishers;1998.

2. Hyvärien A, Karhunen J, Oja E. Independent compo-nent analysis. John Wiley & Sons; 2001.

3. Mansour A, Barros AK, Ohnishi N. Blind separationof sources: Methods, assumptions and applications.IEICE Trans Fundam 2000;E83-A:1498–1511.

4. Matsuoka K, Ohya M, Kawamoto M. A neural net forblind separation of nonstationary signals. NeuralNetworks 1995;8:411–419.

5. Kawamoto M, Barros AK, Mansour A, Matsuoka K,Ohnishi N. Blind signal separation for convolvednon-stationary signals. IEICE Trans 1999;J82-A:1320–1328.

6. Kawamoto M, Matsuoka K, Ohnishi N. A method ofblind separation for convolved nonstationary signals.Neurocomput 1998;22:157–171.

7. Kawamoto M, Matsuoka K, Ohnishi N. Method ofblind separation in frequency domain. Tech RepIEICE EA99-7, 1999.

8. Ikeda S, Murata N. A method of ICA in time-fre-quency domain. Proc ICA ’99, p 364–370.

9. Katayama Y, Ito M, Takeuchi Y, Matsumoto T, KudoH, Ohnishi N, Mukai T. Closely arranged directionalmicrophone for source separation—Effectiveness inreduction of the number of taps and preventing fac-tors. Proc ICA2004, p 129–135.

10. Taniguchi T, Kajita S, Yenia H, Takeda K, Itakura F.Evaluation of the source separation method basedminimizing mutual information under real environ-ments. Tech Rep IEICE 1996;SP96-61.

11. Taniguchi T, Kajita S, Takeda K, Itakura F. Blindsignal separation for recognizing overlapped speech.J Acoust Soc Jpn (E) 1998;19:385–390.

12. Sanchis JM, Catells F, Rieta JJ. Convolutive acousticmixtures approximation to an instantaneous modelusing a stereo boundary microphone configuration.ICA2004, p 881–888.

13. Ito M, Kawamoto M, Mukai T, Ohnishi N. A solutionto blind separation of moving sources. Trans SocInstrum Control Eng 2005;41:691–701.

14. Sawada H, Mukai R, Araki S, Makino S. Polar coor-dinate based nonlinear function for frequency do-main blind source separation. IEICE Trans Fundam2003;E86-A:590–596.

15. Matsuoka K, Nakashima S. Minimal distortion prin-ciple for blind source separation. Proc ICA2001, p722–727.

8

Page 9: A performance evaluation of frequency domain blind source separation using close arrangement of directional microphones

AUTHORS (from left to right)

Hiroki Shimizu (student member) graduated from the Department of Information Engineering, Nagoya University, in2004 and entered the M.S. program at the Graduate School of Information Science. He has been engaged in research on soundsource separation.

Masanori Ito (student member) graduated from the Department of Information Engineering, Nagoya University, in 2001,completed the M.S. program at the Graduate School of Engineering, and enrolled in the doctoral program at the Graduate Schoolof Information Science. Since 2005, he has been a research fellow of the Japan Society for the Promotion of Science. His researchconcerns sound source separation.

Yoshinori Takeuchi (member) graduated from the Department of Information Engineering, Nagoya University, in 1994,completed the M.S. program in information engineering and the doctoral program in 1996 and 1999, and became a researchfellow of the Japan Society for the Promotion of Science. In 2000, he became a research associate at Nagoya University, wherehe was appointed an associate professor in 2004. He is now an associate professor in the Information Security PromotionOrganization, Nagoya University. He has been engaged in research and education on active vision and robotics. He holds aD.Eng. degree, and is a member of the Japan Robotics Society and IEEE.

Tetsuya Matsumoto (member) graduated from the Department of Electrical Engineering, Nagoya University, in 1982,completed the M.S. program in 1984, and joined Toshiba. He completed the doctoral program at Nagoya University in 1993and became a research associate at the Information Processing Education Center of Nagoya University. He is now a researchassociate at the Graduate School of Information Science, Nagoya University. He has been engaged in research on imageprocessing and neural network. He holds a D.Eng. degree, and is a member of the Artificial Intelligence Society and the JapanNeural Network Society.

Hiroaki Kudo (member) graduated from the Department of Electrical Engineering, Nagoya University, in 1991,completed the M.S. program in electrical engineering and the doctoral program in 1993 and 1996, and became a researchassociate at Nagoya University. He became a lecturer in 1999 and an associate professor in 2000. He has been engaged inresearch on measurement and modeling of visual information processing functions and computer vision. He holds a D.Eng.degree.

Noboru Ohnishi (member) graduated from the Department of Electrical Engineering, Nagoya University, in 1973,completed the M.S. program in 1975, and became a research staff member at the Disaster Rehabilitation Engineering Centerof the Labor and Welfare Association Labor. In 1986, he became a lecturer on the Faculty of Engineering, Nagoya University,where he was subsequently appointed an associate professor. From 1993 to 2001, he had an adjunct appointment as a teamleader at the Biometric Control Research Center of RIKEN. In 1994, he became a professor in the Graduate School ofInformation Science. He has been engaged in research and education on biological information processing, computer visionand hearing, and welfare engineering. He is a recipient of Measurement and Automatic Control Society Awards (TechnicalAward in 1996 and Best Paper Award in 1999). He holds a D.Eng. degree, and is a member of the Information ProcessingSociety, IEEJ, SICE, the Robotics Society, the Japan Neural Network Society, the Image and Information Media Society, andIEEE.

9