regularized superresolution-based binaural signal separation with nonnegative matrix factorization
TRANSCRIPT
Regularized Superresolution-Based
Binaural Signal Separation
with Nonnegative Matrix Factorization
Daichi Kitamura, Hiroshi Saruwatari,
Yusuke Iwao, Kiyohiro Shikano
(Nara Institute of Science and Technology, Nara, Japan)
Kazunobu Kondo, Yu Takahashi
(Yamaha Corporation Research & Development Center, Shizuoka, Japan)
Outline
• 1. Research background
• 2. Conventional method
– Nonnegative matrix factorization
– Penalized supervised nonnegative matrix factorization
– Directional clustering
– Hybrid method
• 3. Proposed method
– Regularized superresolution-based nonnegative matrix
factorization
• 4. Experiments
• 5. Conclusions
2
Outline
• 1. Research background
• 2. Conventional method
– Nonnegative matrix factorization
– Penalized supervised nonnegative matrix factorization
– Directional clustering
– Hybrid method
• 3. Proposed method
– Regularized superresolution-based nonnegative matrix
factorization
• 4. Experiments
• 5. Conclusions
3
Background
• Music signal separation technologies have received much
attention.
• Music signal separation based on nonnegative matrix
factorization (NMF) has been a very active area of the
research.
• The extraction performance of NMF markedly degrades for the
case of many source mixtures.
4
• Automatic music transcription• 3D audio system, etc.
Applications
We propose a new method for multichannel signal separation with NMF utilizing both spectral and spatial cues included in mixtures of multiple instruments.
Outline
• 1. Research background
• 2. Conventional method
– Nonnegative matrix factorization
– Penalized supervised nonnegative matrix factorization
– Directional clustering
– Hybrid method
• 3. Proposed method
– Regularized superresolution-based nonnegative matrix
factorization
• 4. Experiments
• 5. Conclusions
5
NMF
• NMF is a type of sparse representation algorithm that
decomposes a nonnegative matrix into two nonnegative
matrices. [D. D. Lee, et al., 2001]
6
Time
Freq
uen
cy
AmplitudeFr
equ
ency
Am
plit
ud
e
Observed matrix(Spectrogram)
Basis matrix(Spectral bases)
Activation matrix(Time-varying gain)
Time
Ω: Number of frequency bins
𝑇: Number of frames
𝐾: Number of bases
𝒀: Observed matrix
𝑭: Basis matrix
𝑮: Activation matrix
Penalized Supervised NMF (PSNMF)
• In PSNMF, the following decomposition is addressed under
the condition that is known in advance. [Yagi, et al., 2012]
7
Separation process Fix trained bases and update .
is forced to become uncorrelated with
Update
Training process
Supervised bases
of the target sound
Supervision sound
Penalized Supervised NMF (PSNMF)
• In PSNMF, the following decomposition is addressed under
the condition that is known in advance. [Yagi, et al., 2012]
8
Separation process Fix trained bases and update .
is forced to become uncorrelated with
Update
Training process
Supervised bases
of the target sound
Supervision sound
Problem of PSNMF: When the signal includes many sources,
the extraction performance markedly degrades.
Directional Clustering
• Directional clustering can estimate sources and their direction
in multichannel signal. [Araki, et al., 2007] [Miyabe, et al., 2009]
• This method can separate sources with spatial information in
an observed signal.
9
L R L-c
hin
pu
t sig
na
l
R-ch input signal
:Source component
:Centroid vector
Directional Clustering
• Directional clustering can estimate sources and their direction
in multichannel signal. [Araki, et al., 2007] [Miyabe, et al., 2009]
• This method can separate sources with spatial information in
an observed signal.
10
L R L-c
hin
pu
t sig
na
l
R-ch input signal
:Source component
:Centroid vector
Problem of directional clustering:
This method cannot separate sources in the same direction.
Hybrid method
• Conventional hybrid method utilizes PSNMF after the
directional clustering. [Iwao, et al., 2012]
• This method consists of two techniques.
– Directional clustering
– PSNMF
11
Directional
clusteringL R PSNMF
Spatial
separation
Source
separation
Conventional Hybrid method
Problem of hybrid method
• The signal extracted by the hybrid method suffers from the
generation of considerable distortion due to the binary
masking in directional clustering.
• The signal in the target direction, which is obtained by
directional clustering, has many spectral chasms.
• The resolution of the spectrogram is degraded.
12
1 0 0 0 0 0 0
0 1 1 0 0 1 1
1 0 0 0 0 0 0
0 1 0 1 1 0 1
1 0 0 0 0 0 0
1 1 1 0 1 1 0
Time
Fre
qu
en
cy
: Target direction Time
Fre
qu
en
cy
TimeF
req
ue
ncy
: Other direction :Hadamard product (product of each element)
Input spectrogram Binary mask Separated cluster
Directional Clustering
Outline
• 1. Research background
• 2. Conventional method
– Nonnegative matrix factorization
– Penalized supervised nonnegative matrix factorization
– Directional clustering
– Hybrid method
• 3. Proposed method
– Regularized superresolution-based nonnegative matrix
factorization
• 4. Experiments
• 5. Conclusions
13
Proposed hybrid method
14
Input stereo signal
L-ch R-ch
STFT
Directional clustering
Center component
L-ch R-ch
center cluster
Index of
based SNMF
Superresolution-
based SNMF
Superresolution-
ISTFT ISTFT
Mixing
Extracted signal
Input stereo signal
L-ch R-ch
STFT
Directional clustering
Center component
PSNMFPSNMF
L-ch R-ch
ISTFT ISTFT
Mixing
Extracted signal
Conventional
hybrid method
Proposed
hybrid method
Employ a new supervised NMF algorithm as an alternative
to the conventional PSNMF in the hybrid method.
Regularized superresolution-based NMF
• In proposed supervised NMF, the spectral chasms are treated
as unseen observations using index matrix.
15
: Chasms
Time
Fre
qu
en
cy
Separated clusterChasms
Treat chasms as
unseen observations.
1 0 0 0 0 0 0
0 1 1 0 0 1 1
1 0 0 0 0 0 0
0 1 0 1 1 0 1
1 0 0 0 0 0 0
1 1 1 0 1 1 0
Time
Fre
qu
en
cy
Index matrix
Regularized superresolution-based NMF
• The spectrogram of the target sound is reconstructed using
more matched bases because chasms are treated as unseen.
• The components of the target sound lost after directional
clustering can be extrapolated using supervised bases.
16
Time
Fre
qu
en
cy
Separated cluster
Time
Fre
qu
en
cy
Reconstructed spectrogram: Chasms
Supervised
bases
Superresolution
using supervised
bases
17
Regularized superresolution-based NMF
• Signal flow of the proposed hybrid method
Center RightLeftDirection
sou
rce
com
po
nen
t
(a)
Freq
ue
ncy
of
Observedspectra
Target source
18
Target direction
Regularized superresolution-based NMF
• Signal flow of the proposed hybrid method
Center RightLeftDirection
sou
rce
com
po
nen
t
z
(b)
Freq
ue
ncy
of
Afterdirectionalclustering
Target source
Center RightLeftDirection
sou
rce
com
po
nen
t
(a)
Freq
ue
ncy
of
Observedspectra
Center sources lose some
of their components
Directional
clustering
19
Regularized superresolution-based NMF
• Signal flow of the proposed hybrid method
Center RightLeftDirection
sou
rce
com
po
nen
t
z
(b)
Freq
ue
ncy
of
Afterdirectionalclustering Center sources lose some
of their components
20
Regularized superresolution-based NMF
• Signal flow of the proposed hybrid method
Center RightLeftDirection
sou
rce
com
po
nen
t
z
(b)
Freq
ue
ncy
of
Afterdirectionalclustering Center sources lose some
of their components
Superresolution-
based NMF
Center RightLeftDirection
sou
rce
com
po
nen
t
(c)
Freq
ue
ncy
of
Aftersuper-resolution-based SNMF
Extrapolated
target source
Regularized superresolution-based NMF
• The basis extrapolation includes an underlying problem.
• If the time-frequency spectra are almost unseen in the
spectrogram, which means that the indexes are almost zero, a
large extrapolation error may occur.
• It is necessary to regularize the extrapolation.
21
4
3
2
1
0
F
requency [
kH
z]
43210 Time [s]
Extrapolation error
(incorrectly modifying the activation)
Time
Fre
quency
Separated cluster
Almost unseen frame
Regularized superresolution-based NMF
• We propose two types of regularizations.
22
Regularization of the temporal continuity
Regularization of the norm minimization
𝑰 : Index matrix ∙ : Binary complement
𝑖𝜔,𝑡: Entry of index matrix 𝑰 𝑔𝑘,𝑡: Entry of matrix 𝑮𝑓𝜔,𝑘: Entry of matrix 𝑭
Previous
frame
The intensity of these regularizations are proportional to the
number of chasms in each frame.
Regularized superresolution-based NMF
• The cost function in regularized superresolution-based NMF is
defined using the index matrix as
23
: Regularization term
: Penalty term to force and to
become uncorrelated with each other
: Weighting parameter
Regularized superresolution-based NMF
• The update rules that minimize the cost function are obtained
as follows:
24
Outline
• 1. Research background
• 2. Conventional method
– Nonnegative matrix factorization
– Penalized supervised nonnegative matrix factorization
– Directional clustering
– Hybrid method
• 3. Proposed method
– Regularized superresolution-based nonnegative matrix
factorization
• 4. Experiments
• 5. Conclusions
25
Evaluation experiment
• We compared four methods.
– Conventional hybrid method using PSNMF (Conventional method)
– Proposed hybrid method using superresolution-based NMF without
regularization (Proposed method 1)
– Proposed hybrid method using superresolution-based NMF with
regularization of the temporal continuity (Proposed method 2)
– Proposed hybrid method using superresolution-based NMF with
regularization of the norm minimization (Proposed method 3)
26
Input stereo signal
L-ch R-ch
STFT
Directional clustering
Center component
PSNMFPSNMF
L-ch R-ch
ISTFT ISTFT
Mixing
Extracted signal
Input stereo signal
L-ch R-ch
STFT
Directional clustering
Center component
L-ch R-ch
center clusterIndex of
based SNMFSuperresolution-
based SNMFSuperresolution-
ISTFT ISTFT
Mixing
Extracted signal
Evaluation experiment
• We used stereo-panning signals ( ) and binaural-
recorded signals ( ) containing four instruments, Ob.,
Fl., Tb., and Pf., generated by MIDI synthesizer.
• The sources are mixed as the same power.
• Target source is always located in the center direction (no.1).
• We used the same type of MIDI sounds of the target
instruments as supervision for training process.
27
Center
12 3
4
Left Right
Target source
Supervision
sound
Two octave notes that cover all notes of the target signal
Experimental results (panning signal)• Average SDR, SIR, and SAR scores for each method, where the 4
instruments are shuffled with 12 combinations.
28
12
10
8
6
4
2
0
SD
R [dB
]
24
20
16
12
8
4
0
SIR
[dB
]
10
8
6
4
2
0
SA
R [dB
]
SDR :quality of the separated target sound
SIR :degree of separation between the target and other sounds
SAR :absence of artificial distortion
Proposed method 1 :no regularization
Proposed method 2 :regularization of temporal continuity
Proposed method 3 :regularization of norm minimization
SDR SIR SARGood
Bad
Experimental results (binaural signal)• Average SDR, SIR, and SAR scores for each method, where the 4
instruments are shuffled with 12 combinations.
29
6
5
4
3
2
1
0
SA
R [dB
]
20
16
12
8
4
0
SIR
[dB
]
10
8
6
4
2
0
SD
R [dB
]
SDR :quality of the separated target sound
SIR :degree of separation between the target and other sounds
SAR :absence of artificial distortion
SDR SIR SAR
Proposed method 1 :no regularization
Proposed method 2 :regularization of temporal continuity
Proposed method 3 :regularization of norm minimization
Bad
Good
Conclusions
• We propose a new supervised NMF algorithm, which is
superresolution-based method, for the hybrid method to
separate stereo or binaural signals.
• The proposed hybrid method can separate the target signal
with high performance compared with conventional method.
• The regularization of norm minimization is effective for the
proposed supervised NMF algorithm.
30
Thank you for your attention!