File Download
Supplementary

postgraduate thesis: Controllable music generation via deep learning methods

TitleControllable music generation via deep learning methods
Authors
Advisors
Advisor(s):Kao, CMLau, FCM
Issue Date2024
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Zeng, T. [曾特]. (2024). Controllable music generation via deep learning methods. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractThe field of automatic music generation, since its inception, has been an attractive area of exploration. With the rapid progression of deep learning technologies, new vistas have opened, leading to innovative methodologies to expand our ability to understand, transcribe, and generate music. However, maintaining long-term coherence within generated musical pieces remains a persistent challenge. This thesis addresses this issue by demonstrating how deep learning-based methods can produce more controllable and interactive music through human involvement. This thesis begins by introducing RLChord, a novel reinforcement learning based approach that explores the hierarchical structures within melodies to craft corresponding chord accompaniments. RLChord employs well-crafted reward functions to split the melody sequence into smaller substructures -- phrases and segments -- which are crucial for enhancing the performance of melody harmonization tasks. Following this, we present EmoMusVAE, a system for generating music conditioned on specific emotions. To mitigate the common KL collapse issue in variational autoencoder (VAE) frameworks, an auxiliary discriminator is incorporated, ensuring that the encoder fully leverages the emotional conditions. This method is capable to generate music that is both musically and emotionally coherent, and also enables style transfer tasks. As interactive music composition and the demand for controllable music generation grow, the risk of infringing copyright by reharmonizing or rearranging established works moves to the forefront of legal and ethical considerations, raising the demand for cover version identification systems. To address this, we introduce OpenCover, a Transformer-based method for cover version identification, which outperforms CNN models in capturing the long-range dependencies often found in song structure variations like reordered verses and choruses. Moreover, we introduce a novel loss function, MAPLoss, to optimize the model in alignment with the actual ranking loss, improving the performance of cover song identification. Through these contributions, the thesis advances the field of controllable music generation by enhancing the controllability and coherence of generated music and addressing proprietary concerns in interactive music composition.
DegreeDoctor of Philosophy
SubjectComputer music
Deep learning (Machine learning)
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/354680

 

DC FieldValueLanguage
dc.contributor.advisorKao, CM-
dc.contributor.advisorLau, FCM-
dc.contributor.authorZeng, Te-
dc.contributor.author曾特-
dc.date.accessioned2025-03-03T06:20:29Z-
dc.date.available2025-03-03T06:20:29Z-
dc.date.issued2024-
dc.identifier.citationZeng, T. [曾特]. (2024). Controllable music generation via deep learning methods. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/354680-
dc.description.abstractThe field of automatic music generation, since its inception, has been an attractive area of exploration. With the rapid progression of deep learning technologies, new vistas have opened, leading to innovative methodologies to expand our ability to understand, transcribe, and generate music. However, maintaining long-term coherence within generated musical pieces remains a persistent challenge. This thesis addresses this issue by demonstrating how deep learning-based methods can produce more controllable and interactive music through human involvement. This thesis begins by introducing RLChord, a novel reinforcement learning based approach that explores the hierarchical structures within melodies to craft corresponding chord accompaniments. RLChord employs well-crafted reward functions to split the melody sequence into smaller substructures -- phrases and segments -- which are crucial for enhancing the performance of melody harmonization tasks. Following this, we present EmoMusVAE, a system for generating music conditioned on specific emotions. To mitigate the common KL collapse issue in variational autoencoder (VAE) frameworks, an auxiliary discriminator is incorporated, ensuring that the encoder fully leverages the emotional conditions. This method is capable to generate music that is both musically and emotionally coherent, and also enables style transfer tasks. As interactive music composition and the demand for controllable music generation grow, the risk of infringing copyright by reharmonizing or rearranging established works moves to the forefront of legal and ethical considerations, raising the demand for cover version identification systems. To address this, we introduce OpenCover, a Transformer-based method for cover version identification, which outperforms CNN models in capturing the long-range dependencies often found in song structure variations like reordered verses and choruses. Moreover, we introduce a novel loss function, MAPLoss, to optimize the model in alignment with the actual ranking loss, improving the performance of cover song identification. Through these contributions, the thesis advances the field of controllable music generation by enhancing the controllability and coherence of generated music and addressing proprietary concerns in interactive music composition.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshComputer music-
dc.subject.lcshDeep learning (Machine learning)-
dc.titleControllable music generation via deep learning methods-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2024-
dc.identifier.mmsid991044791811903414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats