The Visual Geometry Group in the Department of Engineering Sciences at the University of Oxford has developed a program called Magi The model that can automatically put thecomicsPages are transcribed into text and a script is generated.
The model implements a fully automated script generation function by recognizing panels, text blocks and characters on a comic page. Its main functions include panel detection, which recognizes individual panels on a comic page, and text block detection, which recognizes text blocks in panels, usually containing dialog or narrative text. In addition, the model is capable of detecting character images on the page and clustering them according to their identities in order to distinguish different characters.
The Magi model also associates text with speakers, determining which text was spoken by which character on the page, ensuring the accuracy of the script. At the same time, the model sorts the text blocks in the order in which they are read to ensure that the narrative logic of the script is consistent with the original comic, allowing the reader to experience the comic story in its entirety by reading the text.
In addition to the Magi model itself, the project includes a dataset called Mangadex-1.5M, which contains about 1.5 million comic pages covering a wide range of genres and art styles. This dataset is designed to provide support for the training of Magi models to solve the problem of automatic comprehension and script generation of comic pages, including panel detection, text block and character detection, character identity clustering, and text-speaker correlation.
Through this project, the researchers hope to advance automated processing and comprehension techniques in the field of comics.
Dissertation.https://arxiv.org/abs/2401.10224