Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners

Chen, Z; Shen, Y; Ding, M; Chen, Z; Zhao, H; Learned-Miller, EG; Gan, C

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.48550/arXiv.2212.08066

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners

Title	Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners
Authors	Chen, Z Shen, Y Ding, M Chen, Z Zhao, H Learned-Miller, EG Gan, C
Issue Date	18-Jun-2023
Abstract	Optimization in multi-task learning (MTL) is more challenging than single-task learning (STL), as the gradient from different tasks can be contradictory. When tasks are related, it can be beneficial to share some parameters among them (cooperation). However, some tasks require additional parameters with expertise in a specific type of data or discrimination (specialization). To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad'). This structure allows us to formalize cooperation and specialization as the process of matching experts and tasks. We optimize this matching process during the training of a single model. Specifically, we incorporate mixture of experts (MoE) layers into a transformer model, with a new loss that incorporates the mutual dependence between tasks and experts. As a result, only a small set of experts are activated for each task. This prevents the sharing of the entire backbone model between all tasks, which strengthens the model, especially when the training set size and the number of tasks scale up. More interestingly, for each task, we can extract the small set of experts as a standalone model that maintains the same performance as the large model. Extensive experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.
Persistent Identifier	http://hdl.handle.net/10722/333857

DC Field	Value	Language
dc.contributor.author	Chen, Z	-
dc.contributor.author	Shen, Y	-
dc.contributor.author	Ding, M	-
dc.contributor.author	Chen, Z	-
dc.contributor.author	Zhao, H	-
dc.contributor.author	Learned-Miller, EG	-
dc.contributor.author	Gan, C	-
dc.date.accessioned	2023-10-06T08:39:39Z	-
dc.date.available	2023-10-06T08:39:39Z	-
dc.date.issued	2023-06-18	-
dc.identifier.uri	http://hdl.handle.net/10722/333857	-
dc.description.abstract	<p>Optimization in multi-task learning (MTL) is more challenging than single-task learning (STL), as the gradient from different tasks can be contradictory. When tasks are related, it can be beneficial to share some parameters among them (cooperation). However, some tasks require additional parameters with expertise in a specific type of data or discrimination (specialization). To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad'). This structure allows us to formalize cooperation and specialization as the process of matching experts and tasks. We optimize this matching process during the training of a single model. Specifically, we incorporate mixture of experts (MoE) layers into a transformer model, with a new loss that incorporates the mutual dependence between tasks and experts. As a result, only a small set of experts are activated for each task. This prevents the sharing of the entire backbone model between all tasks, which strengthens the model, especially when the training set size and the number of tasks scale up. More interestingly, for each task, we can extract the small set of experts as a standalone model that maintains the same performance as the large model. Extensive experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.<br></p>	-
dc.language	eng	-
dc.relation.ispartof	The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) (18/06/2023-22/06/2023, Vancouver)	-
dc.title	Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners	-
dc.type	Conference_Paper	-
dc.identifier.doi	10.48550/arXiv.2212.08066	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats