gct634: musical applications of machine learning introductionjuhan/gct634/2018/slides/01... ·...
TRANSCRIPT
-
GCT634: Musical Applications of Machine LearningIntroduction
Graduate School of Culture Technology, KAISTJuhan Nam
-
Who We Are
• Instructor- Juhan Nam (남주한)- Assistant Professor in GSCT, KAIST- Music and Audio Computing Lab: http://mac.kaist.ac.kr
• TAs- Jongpil Lee (이종필), Ph.D. Student in GSCT, KAIST- Soonbeom Choi (최순범), Ph.D. Student in GSCT, KAIST
-
Music
• One of the most widely enjoyed cultural contents and activities.
KAIST Art and Music Festivals (KAMF)
http://times.kaist.ac.kr/news/articleView.html?idxno=1801
-
Music and Computers
• The use of computers is essential in musical activities
-
Music and Computers
• The role of computers represents musical data in a digital form and processes them according to the target task.
Audio Compression
-
Music and Computers
• The role of computers represents musical data in a digital form and processes them according to the target task.
Recording, Mixing, Digital Audio Effects (EQ, Reverb, …)
-
Music and Computers
• The role of computers represents musical data in a digital form and processes them according to the target task.
Control Data
Sound Synthesis
-
Music and Computers
• The processor is designed by human based on - Digital signal processing (DSP): filters, Fourier transform, …- Acoustics: psychoacoustics, musical acoustics, …
• The processor is deterministic and interpretable
ProcessorInput data Output data
-
Music Data and Process
Process
Data
F. Richard Moore
“Musical” Knowledge Base
“Physical” Knowledge Base
Performer
Composer
Instrument
PerceptionCognition
Sound Field
SourceSound
TemporalControl
Symbolic Representation
Room
Listener
-
Music Data and Process
The role of computers has been focused on the acoustic processing
“Musical” Knowledge Base
“Physical” Knowledge Base
Performer
Composer
Instrument
PerceptionCognition
Sound Field
SourceSound
TemporalControl
Symbolic Representation
Room
Listener
Process
Data
-
Music Data and Process
However, computers can be used to conduct “human-level processing”
“Musical” Knowledge Base
“Physical” Knowledge Base
Performer
Composer
Instrument
PerceptionCognition
Sound Field
SourceSound
TemporalControl
Symbolic Representation
Room
Listener
Process
Data
-
Human-Level Processing
-
Musical Information
• Factual Information- Track ID, artist, years, composers
• Score-level Information- Melody, rhythm, chords, structure
• Semantic Information- genre, mood, instrument, text descriptions
-
Human Auditory System
http://www.slideshare.net/Daritsetseg/brainstem-auditory-evoked-responses-baer-or-abr-45762118
Brain (Software)
Ear(Hardware)
Design processors to mimic the functions of this complex system?
-
Music Processing Modules in Brain
4
nition [Per01,03, Zat02, Ter_]. There are two main sources of evidence: studies with brain-damaged patients and neurological imaging experiments in healthy subjects.
An accidental brain damage at the adult age may selectively affect musical abilities but not e.g.speech-related abilities, and vice versa. Moreover, studies of brain-damaged patients haverevealed something about the internal structure of the music cognition system. Figure 2 showsthe functional architecture that Peretz and colleagues have derived from case studies of specificmusic impairments in brain-damaged patients. The “breakdown pattern” of different patientswas studied by representing them with specific music-cognition tasks, and the model in Fig. 2was then inferred based on the assumption that a specific impairment may be due to a damagedprocessing component (box) or a broken flow of information (arrow) between components.The detailed line of argument underlying the model can be found in [Per01].
In Fig. 2, the acoustic analysis module is assumed to be common to all acoustic stimuli (notjust music) and to perform segregation of sound mixtures into distinct sound sources. The sub-sequent two entities carry out pitch organization and temporal organization. These two areviewed as parallel and largely independent subsystems, as supported by studies of patients whosuffer from difficulties to deal with pitch variations but not with temporal variations, or viceversa [Bel99, Per01]. In music performance or in perception, either of the two can be selec-tively lost [Per01]. The musical lexicon is characterized by Peretz et al. as containing represen-tations of all the musical phrases a person has heard during his or her lifetime [Per03]. In somecases, a patient cannot recognize familiar music but can still process musical information oth-erwise adequately.
Figure 2. Functional modules of the music processing facility in the human brain as pro-posed by Peretz et al. (after [Per03]; only the parts related to music processing are repro-duced here). The model has been derived from case studies of specific impairments ofmusical abilities in brain-damaged patients [Per01, 03]. See text for details.
Acoustic input
Acoustic analysis
Rhythmanalysis
Meteranalysis
Contouranalysis
Intervalanalysis
Tonalencoding
Pitch organization
MusicallexiconEmotion
expressionanalysis Vocal plan
formation
Temporalorganization
Singing Tapping
Klapuri PhD thesis (2004)- Redrawn from Peretz and Coltheart (2003)
-
Machine Learning (supervised learning)
• The processor (called model or hypothesis) is learned using - Training data (input and output)- Optimization algorithms
• Given new input, the output can be predicted from the model• The model can be a black-box or not interpretable (e.g. deep
neural networks)
Processor (model) Input data Output data
How machine learning can be applied to music?
-
Music Listening
• The volume of music content in online streaming music services is enormous - Spotify: 30M songs, 2B playlists, 20k new songs per day (2017)- Youtube: 300h+ video uploaded per min (2015)
-
Music Listening
• How can computers search or recommend songs that satisfy my musical taste or context?
Personal Information
(musical taste,context)
-
Collaborative Filtering (CF)
• Basic Idea
- User data: play history and song rating- Formed as a matrix factorization problem- Cold start: newly released songs cannot be recommended - Popularity bias: songs in the short-head tend to be recommended more
than those in the long-tail
Person A: I like songs A, B, C and D.Person B: I like songs A, B, C and E.Person A: Really? You should check out song D.Person B: Wow, you also should check out song E.
-
Content Analysis
• Human expert analysis- Pandora’s music genome project (1999)- 450 musical attributes in various categories- Generate playlist based on the musical attributes and user
feedbacks - Good in music discovery- Costly and expensive
Pandora Internet Radio
-
Content Analysis
• Automatic music classification- Machine learning approach: predict genre, mood or other song
descriptions from audio
Model
Input dataOutput data
-
systems
Music Auto-Tagging (KAIST, MACLab)
-
Content Analysis
• Practical systems use all available data- User data, audio, lyrics, meta data, …- Developing advanced learning algorithms that put the multi-modal data
together
Spotify’s Discover Weekly
-
Music Performance
• Music performance is a great activity but we need assist - Practicing is lonely- Need evaluation while practicing - Musical buddies are not always available- Amateurs are compared to recordings of professionals
• How can computers make people more engaged in music performance?
-
Automatic Music Transcription (AMT)
• Make computers extract score-level information from audio- Notes: onset, duration and velocity- Tempo, beat, chords and structure
Model
Input dataOutput data
-
Yousicianhttps://www.youtube.com/watch?v=e8yvcVWLYdY
-
Jameasyhttps://www.youtube.com/watch?v=Mx84uRADAJ8
-
Audio-to-Score Alignment
• Highly associated with automatic music transcription
• Applications- Score following - Automatic page turner- Auto accompaniment - Performance analysis
-
https://www.youtube.com/watch?v=Yf05nzix3_w
Score Following / Automatic Page Turner (JKU)
-
https://www.youtube.com/watch?v=RnjoxwY3RfAAutomatic Accompaniment (R. Dannenberg)
-
https://www.youtube.com/watch?v=gasgH0A-m00Automatic Accompaniment (Nakamura et. al.)
-
Music Composition
• Automatic composition by computers has a long history- The area is often called “Algorithmic Composition”- A typical approach is generating notes using a random process designed
by a set of rules (e.g. Markov model)
• Style imitation- David Cope’s EMI (Experiments in Music Intelligence) (1980s)
-
Augmented Transition Networks (David Cope)
-
Music Composition
• Machine learning has fueled the area by the data-driven approach - The model is typically trained to predict the current note from previous
notes (e.g. auto-regressive model)
Model
Input data Output data
-
AI Duet (Google Magenta)https://experiments.withgoogle.com/ai/ai-duet
-
“Daddy’s car”: Sony CSL Lab’s Flow Machines
https://www.helloworldalbum.net/
-
HumOnhttps://www.youtube.com/watch?v=wj1r9YJ6INA
-
Sound Synthesis
• Musical Instruments- https://magenta.tensorflow.org/nsynth- https://magenta.tensorflow.org/nsynth-instrument
• Singing Voice - http://www.dtic.upf.edu/~mblaauw/NPSS/
-
Course Objective
• Understanding of machine learning and its musical applications particularly to the following topics- Digital representations of music and audio- Music classification- Automatic music transcription: rhythmic, tonal, polyphonic analysis- Generation of music and audio (if time permits)
• Hand-on experiences with Python language and machine learning libraries
• Gain experience of the full cycle of research
-
Pre-requisite
• Linear Algebra
• Probability and Statistics
• Digital Signal Processing: DFT and Filters
• Programming Language: Python
-
Course Information
• http://mac.kaist.ac.kr/~juhan/gct634/