Real-time end-to-end video text  spotter with contrastive representation learning

Wu, W; Li, Z; Li, J; Shen, C; Zhou, H; Gao, T; Wang, Z; Luo, P

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.48550/arXiv.2207.08417

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Real-time end-to-end video text spotter with contrastive representation learning

Title	Real-time end-to-end video text spotter with contrastive representation learning
Authors	Wu, W Li, Z Li, J Shen, C Zhou, H Gao, T Wang, Z Luo, P
Issue Date	2022
Publisher	IEEE.
Citation	17th European Conference on Computer Vision (ECCV) (Hybrid), Tel Aviv, Israel, October 23-27, 2022. In Proceedings of the European Conference on Computer Vision (ECCV) How to Cite? DOI: http://dx.doi.org/10.48550/arXiv.2207.08417
Abstract	Video text spotting(VTS) is the task that requires simultaneously detecting, tracking and recognizing text in the video. Existing video text spotting methods typically develop sophisticated pipelines and multiple models, which is not friend for real-time applications. Here we propose a real-time end-to-end video text spotter with Contrastive Representation learning (CoText). Our contributions are three-fold: 1) CoText simultaneously address the three tasks (e.g., text detection, tracking, recognition) in a real-time end-to-end trainable framework. 2) With contrastive learning, CoText models long-range dependencies and learning temporal information across multiple frames. 3) A simple, lightweight architecture is designed for effective and accurate performance, including GPU-parallel detection post-processing, CTC-based recognition head with Masked RoI. Extensive experiments show the superiority of our method. Especially, CoText achieves an video text spotting IDF1 of 72.0% at 41.0 FPS on ICDAR2015video, with 10.5% and 32.0 FPS improvement the previous best method.
Description	Oral
Persistent Identifier	http://hdl.handle.net/10722/315797

DC Field	Value	Language
dc.contributor.author	Wu, W	-
dc.contributor.author	Li, Z	-
dc.contributor.author	Li, J	-
dc.contributor.author	Shen, C	-
dc.contributor.author	Zhou, H	-
dc.contributor.author	Gao, T	-
dc.contributor.author	Wang, Z	-
dc.contributor.author	Luo, P	-
dc.date.accessioned	2022-08-19T09:04:37Z	-
dc.date.available	2022-08-19T09:04:37Z	-
dc.date.issued	2022	-
dc.identifier.citation	17th European Conference on Computer Vision (ECCV) (Hybrid), Tel Aviv, Israel, October 23-27, 2022. In Proceedings of the European Conference on Computer Vision (ECCV)	-
dc.identifier.uri	http://hdl.handle.net/10722/315797	-
dc.description	Oral	-
dc.description.abstract	Video text spotting(VTS) is the task that requires simultaneously detecting, tracking and recognizing text in the video. Existing video text spotting methods typically develop sophisticated pipelines and multiple models, which is not friend for real-time applications. Here we propose a real-time end-to-end video text spotter with Contrastive Representation learning (CoText). Our contributions are three-fold: 1) CoText simultaneously address the three tasks (e.g., text detection, tracking, recognition) in a real-time end-to-end trainable framework. 2) With contrastive learning, CoText models long-range dependencies and learning temporal information across multiple frames. 3) A simple, lightweight architecture is designed for effective and accurate performance, including GPU-parallel detection post-processing, CTC-based recognition head with Masked RoI. Extensive experiments show the superiority of our method. Especially, CoText achieves an video text spotting IDF1 of 72.0% at 41.0 FPS on ICDAR2015video, with 10.5% and 32.0 FPS improvement the previous best method.	-
dc.language	eng	-
dc.publisher	IEEE.	-
dc.relation.ispartof	Proceedings of the European Conference on Computer Vision (ECCV)	-
dc.rights	Proceedings of the European Conference on Computer Vision (ECCV). Copyright © IEEE.	-
dc.title	Real-time end-to-end video text spotter with contrastive representation learning	-
dc.type	Conference_Paper	-
dc.identifier.email	Luo, P: pluo@hku.hk	-
dc.identifier.authority	Luo, P=rp02575	-
dc.identifier.doi	10.48550/arXiv.2207.08417	-
dc.identifier.hkuros	335582	-
dc.publisher.place	Israel	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Real-time end-to-end video text spotter with contrastive representation learning

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats