gct634: musical applications of machine learning introductionjuhan/gct634/2018/slides/01... ·...

41
GCT634: Musical Applications of Machine Learning Introduction Graduate School of Culture Technology, KAIST Juhan Nam

Upload: others

Post on 21-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

  • GCT634: Musical Applications of Machine LearningIntroduction

    Graduate School of Culture Technology, KAISTJuhan Nam

  • Who We Are

    • Instructor- Juhan Nam (남주한)- Assistant Professor in GSCT, KAIST- Music and Audio Computing Lab: http://mac.kaist.ac.kr

    • TAs- Jongpil Lee (이종필), Ph.D. Student in GSCT, KAIST- Soonbeom Choi (최순범), Ph.D. Student in GSCT, KAIST

  • Music

    • One of the most widely enjoyed cultural contents and activities.

    KAIST Art and Music Festivals (KAMF)

    http://times.kaist.ac.kr/news/articleView.html?idxno=1801

  • Music and Computers

    • The use of computers is essential in musical activities

  • Music and Computers

    • The role of computers represents musical data in a digital form and processes them according to the target task.

    Audio Compression

  • Music and Computers

    • The role of computers represents musical data in a digital form and processes them according to the target task.

    Recording, Mixing, Digital Audio Effects (EQ, Reverb, …)

  • Music and Computers

    • The role of computers represents musical data in a digital form and processes them according to the target task.

    Control Data

    Sound Synthesis

  • Music and Computers

    • The processor is designed by human based on - Digital signal processing (DSP): filters, Fourier transform, …- Acoustics: psychoacoustics, musical acoustics, …

    • The processor is deterministic and interpretable

    ProcessorInput data Output data

  • Music Data and Process

    Process

    Data

    F. Richard Moore

    “Musical” Knowledge Base

    “Physical” Knowledge Base

    Performer

    Composer

    Instrument

    PerceptionCognition

    Sound Field

    SourceSound

    TemporalControl

    Symbolic Representation

    Room

    Listener

  • Music Data and Process

    The role of computers has been focused on the acoustic processing

    “Musical” Knowledge Base

    “Physical” Knowledge Base

    Performer

    Composer

    Instrument

    PerceptionCognition

    Sound Field

    SourceSound

    TemporalControl

    Symbolic Representation

    Room

    Listener

    Process

    Data

  • Music Data and Process

    However, computers can be used to conduct “human-level processing”

    “Musical” Knowledge Base

    “Physical” Knowledge Base

    Performer

    Composer

    Instrument

    PerceptionCognition

    Sound Field

    SourceSound

    TemporalControl

    Symbolic Representation

    Room

    Listener

    Process

    Data

  • Human-Level Processing

  • Musical Information

    • Factual Information- Track ID, artist, years, composers

    • Score-level Information- Melody, rhythm, chords, structure

    • Semantic Information- genre, mood, instrument, text descriptions

  • Human Auditory System

    http://www.slideshare.net/Daritsetseg/brainstem-auditory-evoked-responses-baer-or-abr-45762118

    Brain (Software)

    Ear(Hardware)

    Design processors to mimic the functions of this complex system?

  • Music Processing Modules in Brain

    4

    nition [Per01,03, Zat02, Ter_]. There are two main sources of evidence: studies with brain-damaged patients and neurological imaging experiments in healthy subjects.

    An accidental brain damage at the adult age may selectively affect musical abilities but not e.g.speech-related abilities, and vice versa. Moreover, studies of brain-damaged patients haverevealed something about the internal structure of the music cognition system. Figure 2 showsthe functional architecture that Peretz and colleagues have derived from case studies of specificmusic impairments in brain-damaged patients. The “breakdown pattern” of different patientswas studied by representing them with specific music-cognition tasks, and the model in Fig. 2was then inferred based on the assumption that a specific impairment may be due to a damagedprocessing component (box) or a broken flow of information (arrow) between components.The detailed line of argument underlying the model can be found in [Per01].

    In Fig. 2, the acoustic analysis module is assumed to be common to all acoustic stimuli (notjust music) and to perform segregation of sound mixtures into distinct sound sources. The sub-sequent two entities carry out pitch organization and temporal organization. These two areviewed as parallel and largely independent subsystems, as supported by studies of patients whosuffer from difficulties to deal with pitch variations but not with temporal variations, or viceversa [Bel99, Per01]. In music performance or in perception, either of the two can be selec-tively lost [Per01]. The musical lexicon is characterized by Peretz et al. as containing represen-tations of all the musical phrases a person has heard during his or her lifetime [Per03]. In somecases, a patient cannot recognize familiar music but can still process musical information oth-erwise adequately.

    Figure 2. Functional modules of the music processing facility in the human brain as pro-posed by Peretz et al. (after [Per03]; only the parts related to music processing are repro-duced here). The model has been derived from case studies of specific impairments ofmusical abilities in brain-damaged patients [Per01, 03]. See text for details.

    Acoustic input

    Acoustic analysis

    Rhythmanalysis

    Meteranalysis

    Contouranalysis

    Intervalanalysis

    Tonalencoding

    Pitch organization

    MusicallexiconEmotion

    expressionanalysis Vocal plan

    formation

    Temporalorganization

    Singing Tapping

    Klapuri PhD thesis (2004)- Redrawn from Peretz and Coltheart (2003)

  • Machine Learning (supervised learning)

    • The processor (called model or hypothesis) is learned using - Training data (input and output)- Optimization algorithms

    • Given new input, the output can be predicted from the model• The model can be a black-box or not interpretable (e.g. deep

    neural networks)

    Processor (model) Input data Output data

    How machine learning can be applied to music?

  • Music Listening

    • The volume of music content in online streaming music services is enormous - Spotify: 30M songs, 2B playlists, 20k new songs per day (2017)- Youtube: 300h+ video uploaded per min (2015)

  • Music Listening

    • How can computers search or recommend songs that satisfy my musical taste or context?

    Personal Information

    (musical taste,context)

  • Collaborative Filtering (CF)

    • Basic Idea

    - User data: play history and song rating- Formed as a matrix factorization problem- Cold start: newly released songs cannot be recommended - Popularity bias: songs in the short-head tend to be recommended more

    than those in the long-tail

    Person A: I like songs A, B, C and D.Person B: I like songs A, B, C and E.Person A: Really? You should check out song D.Person B: Wow, you also should check out song E.

  • Content Analysis

    • Human expert analysis- Pandora’s music genome project (1999)- 450 musical attributes in various categories- Generate playlist based on the musical attributes and user

    feedbacks - Good in music discovery- Costly and expensive

    Pandora Internet Radio

  • Content Analysis

    • Automatic music classification- Machine learning approach: predict genre, mood or other song

    descriptions from audio

    Model

    Input dataOutput data

  • systems

    Music Auto-Tagging (KAIST, MACLab)

  • Content Analysis

    • Practical systems use all available data- User data, audio, lyrics, meta data, …- Developing advanced learning algorithms that put the multi-modal data

    together

    Spotify’s Discover Weekly

  • Music Performance

    • Music performance is a great activity but we need assist - Practicing is lonely- Need evaluation while practicing - Musical buddies are not always available- Amateurs are compared to recordings of professionals

    • How can computers make people more engaged in music performance?

  • Automatic Music Transcription (AMT)

    • Make computers extract score-level information from audio- Notes: onset, duration and velocity- Tempo, beat, chords and structure

    Model

    Input dataOutput data

  • Yousicianhttps://www.youtube.com/watch?v=e8yvcVWLYdY

  • Jameasyhttps://www.youtube.com/watch?v=Mx84uRADAJ8

  • Audio-to-Score Alignment

    • Highly associated with automatic music transcription

    • Applications- Score following - Automatic page turner- Auto accompaniment - Performance analysis

  • https://www.youtube.com/watch?v=Yf05nzix3_w

    Score Following / Automatic Page Turner (JKU)

  • https://www.youtube.com/watch?v=RnjoxwY3RfAAutomatic Accompaniment (R. Dannenberg)

  • https://www.youtube.com/watch?v=gasgH0A-m00Automatic Accompaniment (Nakamura et. al.)

  • Music Composition

    • Automatic composition by computers has a long history- The area is often called “Algorithmic Composition”- A typical approach is generating notes using a random process designed

    by a set of rules (e.g. Markov model)

    • Style imitation- David Cope’s EMI (Experiments in Music Intelligence) (1980s)

  • Augmented Transition Networks (David Cope)

  • Music Composition

    • Machine learning has fueled the area by the data-driven approach - The model is typically trained to predict the current note from previous

    notes (e.g. auto-regressive model)

    Model

    Input data Output data

  • AI Duet (Google Magenta)https://experiments.withgoogle.com/ai/ai-duet

  • “Daddy’s car”: Sony CSL Lab’s Flow Machines

    https://www.helloworldalbum.net/

  • HumOnhttps://www.youtube.com/watch?v=wj1r9YJ6INA

  • Sound Synthesis

    • Musical Instruments- https://magenta.tensorflow.org/nsynth- https://magenta.tensorflow.org/nsynth-instrument

    • Singing Voice - http://www.dtic.upf.edu/~mblaauw/NPSS/

  • Course Objective

    • Understanding of machine learning and its musical applications particularly to the following topics- Digital representations of music and audio- Music classification- Automatic music transcription: rhythmic, tonal, polyphonic analysis- Generation of music and audio (if time permits)

    • Hand-on experiences with Python language and machine learning libraries

    • Gain experience of the full cycle of research

  • Pre-requisite

    • Linear Algebra

    • Probability and Statistics

    • Digital Signal Processing: DFT and Filters

    • Programming Language: Python

  • Course Information

    • http://mac.kaist.ac.kr/~juhan/gct634/