In the upcoming months, the OPERAS Innovation Lab will observe projects and use cases developed by various stakeholders collaborating in a Horizon Europe-funded project running from January 2025 to December 2027. The GRAPHIA consortium (Knowledge Graphs, AI Services and Next Generation Instrumentation for R&D in Social Sciences and Humanities) aims to build the first comprehensive social sciences and humanities knowledge graph (SSH KG), enriched with Artificial Intelligence (AI) and Large Language Models (LLMs).

The project brings together 21 partners (full list here). The consortium unites five SSH initiatives (OPERAS, ERIH-S, EHRI, GGP and RESILIENCE) from the European Strategy Forum on Research Infrastructures (ESFRIs) with eight private entities that range from innovative SMEs to large industry associations. Together, they will co-develop technological advancements for public research infrastructures (RIs) and open new business avenues for industry.

By integrating fragmented SSH data into a unified, machine-readable format, GRAPHIA enhances data visualisation and analysis capabilities. This empowers researchers to uncover patterns and insights from unstructured data, shedding light on complex social phenomena and cultural trends. 

A key component of GRAPHIA is the SSH Citation Index, an innovative framework for extracting and enriching citation data across all SSH disciplines. This tool accelerates access to and understanding of existing literature, facilitating more efficient research.

GRAPHIA also pioneers a series of innovative use cases that demonstrate the real-world potential of its technologies. These include applications such as AI-driven discovery of research trends, automated mapping of scholarly networks across disciplines, and tools for enhancing multilingual access to SSH resources.

Six innovative initiatives

As part of the GRAPHIA project, six innovative pilot initiatives are exploring how AI, knowledge graphs, and large language models (LLMs) can advance research in the social sciences and humanities (SSH). These pilots — spanning automated thematic analysis, geospatial data extraction, impact assessment, and archival classification — demonstrate practical, real-world applications, with a strong focus on multilingualism, open access, and measurable research impact.

Soon, project leaders will showcase their tools and services on the Observatory website. Stay tuned for updates!

1. TALLMesh: Thematic Information Extraction from Interview Data

Developed by Abertay University, TALLMesh is a lightweight, browser-based application that uses LLMs to conduct thematic analysis on qualitative data such as interviews. Designed with a focus on multilingual capability and open access, the tool aims to support researchers in identifying key themes across various languages and formats. Plans include GUI refinement, expansion of analytical algorithms, and open-source deployment.

2. Genius Loci: AI-powered Knowledge Graph Data Extraction for Geospatial Data

Genius Loci’s “Genius Property” platform integrates geospatial and socio-cultural data to analyse real estate contexts. Through GRAPHIA, the platform uses LLMs to extract and structure facts from SSH knowledge graphs, generating metrics like environmental impact and territorial attractiveness. The pilot also explores integrating these insights into visual, ontology-driven property reports.

3. EHRI-Pilot: AI-powered Multilabel Classification for Archival Data

Led by the EHRI RI, this pilot addresses the challenge of subject term classification in Holocaust-related archives. Using tools like Annif, it develops LLM-based multilabel classifiers to assign controlled vocabulary terms to archival descriptions in multiple languages. Human-in-the-loop interfaces and both supervised and automated modes enhance usability and cataloguing efficiency.

4. IMeTo: Impact Measurement Tool

Jointly developed by IBL-PAN and PSNC, IMeTo evaluates the societal and economic impact of scientific research. By fine-tuning LLMs with a corpus from Poland’s RADON database, the tool helps institutions assess both potential and actual impacts of their outputs. It offers both plugin and standalone modes, aiming to support CRIS systems and research administrators.

See more in Observatory portfolio: https://lab.operas-eu.org/portfolio/ 

5. Creating Automatic Object Descriptions

This pilot by PSNC automates metadata generation for digital libraries and repositories. It uses enrichment tools and NER techniques to generate descriptive metadata for 2D/3D objects, improving discoverability and search. The solution is being tested in Poland’s dLibra and Digital Libraries Federation platforms to enhance metadata quality at scale.

6. AI-powered Information Extraction for Survey Data Usage in Publications

Developed by GESIS and GGP, this pilot refines NLP pipelines to link scholarly publications with the survey datasets and variables they use. Using tools like GEFUREX, VADIS, and Outcite, the project builds searchable links between surveys and research outputs. It enhances dataset visibility and supports more targeted literature and data searches.

Leave a Reply