Temporal Image Caption Retrieval

Challenges for Natural Language Processing (CNLPS)

Multimodal models, especially combining vision and text, are gaining great recognition. A recent example is the image generation model, DALL·E 2 [1]. One such multimodal challenge is Text-Image retrieval, which is to retrieve an image for a text query or retrieve a text for a given image. In this challenge, we introduce a task in the Text-Image retrieval setup, additionally extending the modalities with temporal data.

Language models rarely utilize any input information except for text, e.g document timestamp or other metadata information. Although, models trained on snapshots of data may be limited in usage. E.g., when factual knowledge is required, but the facts change over time (answering the question: “Who is the president of the U.S.A”, “Who was the president of the U.S.A at 1950”), or language semantic changes (word “gay” meaning shifted from “cheerful” to referring homosexuality).

The presented task is based on the Chronicling America [2] and Challenging America [3] projects. Chronicling America is an open database of over 16 million pages of digitalized historic American newspapers covering 274 years. Challenging America is a set of temporal challenges built from the Chronicling America dataset.

The detailed specification of the task and the description of the evaluation procedure are available here.


[1] Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125.
[2] Lee, B. C. G., Mears, J., Jakeway, E., Ferriter, M., Adams, C., Yarasavage, N., ... & Weld, D. S. (2020). The Newspaper Navigator Dataset: Extracting Headlines and Visual Content from 16 Million Historic Newspaper Pages in Chronicling America. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM '20). Association for Computing Machinery, New York, NY, USA, 3055–3062.
[3] Pokrywka, J., Gralinski, F., Jassem, K., Kaczmarek, K., Jurkiewicz, K., & Wierzchoń, P. (2022, July). Challenging America: Modeling language in longer time scales. In Findings of the Association for Computational Linguistics: NAACL 2022 (pp. 737-749).


The solutions for the task are to be submitted via the Gonito platform challenge. In order to be included in the final ranking the participants are expected to provide the report that describes their solution. The reports should conform to the requirements for papers submitted to FedCSIS conference and should not exceed 4 pages.

Important Dates

  • Feb 13, 2023: Training data available
  • May 17, 2023: Test data available
  • June 14, 2023: Deadline for submitting the results
  • June 16, 2023: Announcement of the final results, sending invitations for submitting papers
  • July 9, 2023: Deadline for submitting invited papers
  • July 16, 2023: Author notification
  • July 31, 2023: Final paper submission, registration
  • Sept 20, 2023: FedCSIS conference


  • Jakub Pokrywka, Adam Mickiewicz University, Poland
  • Piotr Wierzchoń, Adam Mickiewicz University, Poland
  • Krzysztof Jassem, Adam Mickiewicz University, Poland

Important dates

  • Track proposal submission: November 14, 2022
  • Paper submission (no extensions): May 23, 2023
  • Position paper submission: June 7, 2023
  • Author notification: July 11, 2023
  • Final paper submission, registration: July 31, 2023
  • Discounted payment: August 18, 2023
  • Conference date: September 17⁠–⁠20, 2023

FedCSIS 2023 is organized by

Under patronage of

Warsaw University of Technology

Prof. Krzysztof Zaremba
Rector of Warsaw University of Technology