NewsEye: A Digital Investigator for Historical Newspapers
The NewsEye project aims to improve access to the early European press from the period 1850- 1950 for researchers, library users and the general public. Using 15 million pages digitised by the national libraries of Austria, Finland and France, NewsEye developed automatic tools for character recognition, analysis of newspaper structure (extraction of themes; identification of articles) and multilingual content processing (recognition of mentions of people, places and organisations; analysis of opinion; text mining) based on artificial intelligence. It offers a toolkit for large-scale analysis of digitised newspapers in different languages and from a variety of sources.
The consortium involved three national libraries (National Library of Austria; National Library of Finland; National Library of France), four research groups in the humanities and social sciences, and four in computer science (University of La Rochelle, France; University of Helsinki, Finland; University of Innsbruck, Austria; University of Rostock, Germany; University Paul-Valéry Montpellier, France; University of Vienna, Austria). The majority of the project’s funding came from the European Union’s Horizon 2020 research and innovation programme.
The consortium developed new approaches to content analysis and exploration adapted to historical documents. The project has resulted in new knowledge in history, literature, gender studies and media analysis, on corpora in French, German, Finnish and Swedish, indicating that the advances made by the NewsEye project can benefit all disciplines in the social sciences and humanities, whatever the language of the sources studied.
The project has renewed the state of the art in document analysis and natural language processing and achieved numerous international benchmarks, making it possible to achieve a high level of understanding of textual content, despite imperfect digitisation and character recognition. This paves the way for numerous advances in the automatic analysis of old documents, an area in which the major AI large language models (such as ChatGPT) are not competitive. The project has also led to the creation of the NewsEye platform, a digital library adapted to historical newspaper collections, incorporating a dynamic text analysis toolbox and a personal research assistant.
As a research project, NewsEye has exceeded all its objectives in terms of scientific output, with 64 publications in conference proceedings and 13 in journals or book chapters by the end of the project, and many more subsequently.
“NewsEye is an excellent contribution to open science at the European level and beyond. It is a best practice for the digital humanities and should be considered a baseline for future research in this area. Its use of AI technology specifically responds to the needs of a very specific element of heritage, the Early European press. It is a strong example of how these innovative tools, which are not yet well implemented in the heritage sector, can be utilised for analysis and as a means through which new groups, including young people, can engage with the material. Its integration into an existing European platform and its extensive efforts in knowledge transfer and dissemination has ensured that NewsEye has been accepted by a community of users at a very large scale”, the Jury said.
Contact: Antoine Doucet, La Rochelle Université| newseye-communication@ml.univ-lr.fr | www.newseye.eu