gct634: musical applications of machine learning introductionjuhan/gct634/2018/slides/01... ·...

GCT634: Musical Applications of Machine LearningIntroduction

Graduate School of Culture Technology, KAISTJuhan Nam

Who We Are

• Instructor- Juhan Nam (남주한)- Assistant Professor in GSCT, KAIST- Music and Audio Computing Lab: http://mac.kaist.ac.kr

• TAs- Jongpil Lee (이종필), Ph.D. Student in GSCT, KAIST- Soonbeom Choi (최순범), Ph.D. Student in GSCT, KAIST

Music

• One of the most widely enjoyed cultural contents and activities.

KAIST Art and Music Festivals (KAMF)

http://times.kaist.ac.kr/news/articleView.html?idxno=1801

Music and Computers

• The use of computers is essential in musical activities

Music and Computers

• The role of computers represents musical data in a digital form and processes them according to the target task.

Audio Compression

Music and Computers


Recording, Mixing, Digital Audio Effects (EQ, Reverb, …)

Music and Computers


Control Data

Sound Synthesis

Music and Computers

• The processor is designed by human based on - Digital signal processing (DSP): filters, Fourier transform, …- Acoustics: psychoacoustics, musical acoustics, …

• The processor is deterministic and interpretable

ProcessorInput data Output data

Music Data and Process

Process

Data

F. Richard Moore

“Musical” Knowledge Base

“Physical” Knowledge Base

Performer

Composer

Instrument

PerceptionCognition

Sound Field

SourceSound

TemporalControl

Symbolic Representation

Room

Listener


The role of computers has been focused on the acoustic processing



Performer

Composer

Instrument

PerceptionCognition

Sound Field

SourceSound

TemporalControl


Room

Listener

Process

Data


However, computers can be used to conduct “human-level processing”



Performer

Composer

Instrument

PerceptionCognition

Sound Field

SourceSound

TemporalControl


Room

Listener

Process

Data

Human-Level Processing

Musical Information

• Factual Information- Track ID, artist, years, composers

• Score-level Information- Melody, rhythm, chords, structure

• Semantic Information- genre, mood, instrument, text descriptions

Human Auditory System

http://www.slideshare.net/Daritsetseg/brainstem-auditory-evoked-responses-baer-or-abr-45762118

Brain (Software)

Ear(Hardware)

Design processors to mimic the functions of this complex system?

Music Processing Modules in Brain

4

nition [Per01,03, Zat02, Ter_]. There are two main sources of evidence: studies with brain-damaged patients and neurological imaging experiments in healthy subjects.

An accidental brain damage at the adult age may selectively affect musical abilities but not e.g.speech-related abilities, and vice versa. Moreover, studies of brain-damaged patients haverevealed something about the internal structure of the music cognition system. Figure 2 showsthe functional architecture that Peretz and colleagues have derived from case studies of specificmusic impairments in brain-damaged patients. The “breakdown pattern” of different patientswas studied by representing them with specific music-cognition tasks, and the model in Fig. 2was then inferred based on the assumption that a specific impairment may be due to a damagedprocessing component (box) or a broken flow of information (arrow) between components.The detailed line of argument underlying the model can be found in [Per01].

In Fig. 2, the acoustic analysis module is assumed to be common to all acoustic stimuli (notjust music) and to perform segregation of sound mixtures into distinct sound sources. The sub-sequent two entities carry out pitch organization and temporal organization. These two areviewed as parallel and largely independent subsystems, as supported by studies of patients whosuffer from difficulties to deal with pitch variations but not with temporal variations, or viceversa [Bel99, Per01]. In music performance or in perception, either of the two can be selec-tively lost [Per01]. The musical lexicon is characterized by Peretz et al. as containing represen-tations of all the musical phrases a person has heard during his or her lifetime [Per03]. In somecases, a patient cannot recognize familiar music but can still process musical information oth-erwise adequately.

Figure 2. Functional modules of the music processing facility in the human brain as pro-posed by Peretz et al. (after [Per03]; only the parts related to music processing are repro-duced here). The model has been derived from case studies of specific impairments ofmusical abilities in brain-damaged patients [Per01, 03]. See text for details.

Acoustic input

Acoustic analysis

Rhythmanalysis

Meteranalysis

Contouranalysis

Intervalanalysis

Tonalencoding

Pitch organization

MusicallexiconEmotion

expressionanalysis Vocal plan

formation

Temporalorganization

Singing Tapping

Klapuri PhD thesis (2004)- Redrawn from Peretz and Coltheart (2003)

Machine Learning (supervised learning)

• The processor (called model or hypothesis) is learned using - Training data (input and output)- Optimization algorithms

• Given new input, the output can be predicted from the model• The model can be a black-box or not interpretable (e.g. deep

neural networks)

Processor (model) Input data Output data

How machine learning can be applied to music?

Music Listening

• The volume of music content in online streaming music services is enormous - Spotify: 30M songs, 2B playlists, 20k new songs per day (2017)- Youtube: 300h+ video uploaded per min (2015)

Music Listening

• How can computers search or recommend songs that satisfy my musical taste or context?

Personal Information

(musical taste,context)

Collaborative Filtering (CF)

• Basic Idea

- User data: play history and song rating- Formed as a matrix factorization problem- Cold start: newly released songs cannot be recommended - Popularity bias: songs in the short-head tend to be recommended more

than those in the long-tail

Person A: I like songs A, B, C and D.Person B: I like songs A, B, C and E.Person A: Really? You should check out song D.Person B: Wow, you also should check out song E.

Content Analysis

• Human expert analysis- Pandora’s music genome project (1999)- 450 musical attributes in various categories- Generate playlist based on the musical attributes and user

feedbacks - Good in music discovery- Costly and expensive

Pandora Internet Radio

Content Analysis

• Automatic music classification- Machine learning approach: predict genre, mood or other song

descriptions from audio

Model

Input dataOutput data

systems

Music Auto-Tagging (KAIST, MACLab)

Content Analysis

• Practical systems use all available data- User data, audio, lyrics, meta data, …- Developing advanced learning algorithms that put the multi-modal data

together

Spotify’s Discover Weekly

Music Performance

• Music performance is a great activity but we need assist - Practicing is lonely- Need evaluation while practicing - Musical buddies are not always available- Amateurs are compared to recordings of professionals

• How can computers make people more engaged in music performance?

Automatic Music Transcription (AMT)

• Make computers extract score-level information from audio- Notes: onset, duration and velocity- Tempo, beat, chords and structure

Model

Input dataOutput data

Yousicianhttps://www.youtube.com/watch?v=e8yvcVWLYdY

Jameasyhttps://www.youtube.com/watch?v=Mx84uRADAJ8

Audio-to-Score Alignment

• Highly associated with automatic music transcription

• Applications- Score following - Automatic page turner- Auto accompaniment - Performance analysis

https://www.youtube.com/watch?v=Yf05nzix3_w

Score Following / Automatic Page Turner (JKU)

https://www.youtube.com/watch?v=RnjoxwY3RfAAutomatic Accompaniment (R. Dannenberg)

https://www.youtube.com/watch?v=gasgH0A-m00Automatic Accompaniment (Nakamura et. al.)

Music Composition

• Automatic composition by computers has a long history- The area is often called “Algorithmic Composition”- A typical approach is generating notes using a random process designed

by a set of rules (e.g. Markov model)

• Style imitation- David Cope’s EMI (Experiments in Music Intelligence) (1980s)

Augmented Transition Networks (David Cope)

Music Composition

• Machine learning has fueled the area by the data-driven approach - The model is typically trained to predict the current note from previous

notes (e.g. auto-regressive model)

Model

Input data Output data

AI Duet (Google Magenta)https://experiments.withgoogle.com/ai/ai-duet

“Daddy’s car”: Sony CSL Lab’s Flow Machines

https://www.helloworldalbum.net/

HumOnhttps://www.youtube.com/watch?v=wj1r9YJ6INA

Sound Synthesis

• Musical Instruments- https://magenta.tensorflow.org/nsynth- https://magenta.tensorflow.org/nsynth-instrument

• Singing Voice - http://www.dtic.upf.edu/~mblaauw/NPSS/

Course Objective

• Understanding of machine learning and its musical applications particularly to the following topics- Digital representations of music and audio- Music classification- Automatic music transcription: rhythmic, tonal, polyphonic analysis- Generation of music and audio (if time permits)

• Hand-on experiences with Python language and machine learning libraries

• Gain experience of the full cycle of research

Pre-requisite

• Linear Algebra

• Probability and Statistics

• Digital Signal Processing: DFT and Filters

• Programming Language: Python

Course Information

• http://mac.kaist.ac.kr/~juhan/gct634/

gct634: musical applications of machine learning introductionjuhan/gct634/2018/slides/01... ·...

Documents