Temporal Image Caption Retrieval

Challenges for Natural Language Processing (CNLPS)

Multimodal models, especially combining vision and text, are gaining great recognition. A recent example is the image generation model, DALL·E 2 [1]. One such multimodal challenge is Text-Image retrieval, which is to retrieve an image for a text query or retrieve a text for a given image. In this challenge, we introduce a task in the Text-Image retrieval setup, additionally extending the modalities with temporal data.

Language models rarely utilize any input information except for text, e.g document timestamp or other metadata information. Although, models trained on snapshots of data may be limited in usage. E.g., when factual knowledge is required, but the facts change over time (answering the question: “Who is the president of the U.S.A”, “Who was the president of the U.S.A at 1950”), or language semantic changes (word “gay” meaning shifted from “cheerful” to referring homosexuality).

The presented task is based on the Chronicling America [2] and Challenging America [3] projects. Chronicling America is an open database of over 16 million pages of digitalized historic American newspapers covering 274 years. Challenging America is a set of temporal challenges built from the Chronicling America dataset.

The detailed specification of the task and the description of the evaluation procedure are available here.

Citations

[1] Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125.
[2] Lee, B. C. G., Mears, J., Jakeway, E., Ferriter, M., Adams, C., Yarasavage, N., ... & Weld, D. S. (2020). The Newspaper Navigator Dataset: Extracting Headlines and Visual Content from 16 Million Historic Newspaper Pages in Chronicling America. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM '20). Association for Computing Machinery, New York, NY, USA, 3055–3062.
[3] Pokrywka, J., Gralinski, F., Jassem, K., Kaczmarek, K., Jurkiewicz, K., & Wierzchoń, P. (2022, July). Challenging America: Modeling language in longer time scales. In Findings of the Association for Computational Linguistics: NAACL 2022 (pp. 737-749).

Submissions

The solutions for the task are to be submitted via the Gonito platform challenge. In order to be included in the final ranking the participants are expected to provide the report that describes their solution. The reports should conform to the requirements for papers submitted to FedCSIS conference and should not exceed 4 pages.

Important Dates

Feb 13, 2023: Training data available
May 17, 2023: Test data available
June 14, 2023: Deadline for submitting the results
June 16, 2023: Announcement of the final results, sending invitations for submitting papers
July 9, 2023: Deadline for submitting invited papers
July 16, 2023: Author notification
July 31, 2023: Final paper submission, registration
Sept 20, 2023: FedCSIS conference