IMeTo: A Scalable AI Workflow for Documenting SSH Research Impact

Patryk Hubar-Kołodziejczyk, Dariusz Perliński

Research evaluation in the social sciences and humanities (SSH) still struggles to capture many forms of societal impact. Standard metrics such as citation counts, journal impact factors, and h-indices work best in fields where influence travels through formal, indexed publication channels. In SSH, however, the most consequential forms of impact often emerge elsewhere: a local government revises its statute after consulting with university researchers, or a parliamentary committee draws on a sociologist’s policy brief. This kind of influence is real, but it does not register in bibliometric databases.

IMeTo (Impact Measurement Tool) is being developed to address this problem in a concrete institutional setting. Its current focus is the Polish research ecosystem, where universities must translate diverse and often dispersed evidence into formal impact documentation. What makes IMeTo particularly interesting is not only the problem it addresses, but also the architecture it proposes: lightweight, task-specific models are used to first extract structured facts, and only then does a large language model (LLM) generate a coherent impact statement. This makes the workflow more controlled, more transparent, and potentially easier to scale than fully generative approaches.

Adam Mickiewicz University In Poznań, photo by Anastasiya Lvova — Adam Mickiewicz University, Collegium Minus, Poznań

A concrete problem in a concrete ecosystem

In Poland, the national research evaluation framework requires universities to document the societal impact of research in a structured way. These reports are expected to describe beneficiaries, mechanisms of change, supporting evidence, and the broader reach of a given activity. In practice, preparing such documentation is labour-intensive.

Researchers rarely describe their work in these administrative categories. Evidence of impact is often dispersed across publications, media mentions, policy materials, institutional records, and correspondence. As a result, the burden falls on researchers, administrators, and evaluation specialists, who must translate complex academic work into formats that fit formal reporting requirements.

This is the specific problem IMeTo is designed to address. The system is being shaped around a real workflow, real institutional needs, and real reporting constraints. Similar tensions between qualitative impact and formal evaluation also appear in other contexts, including the UK Research Excellence Framework (REF) and Horizon Europe. Even so, IMeTo is best understood first as a response to a concrete local problem, and only then as a model with possible wider relevance.

From publications to structured impact documentation

The central task of IMeTo is to help turn a researcher’s body of work into structured, evidence-based impact documentation. Such documentation could support a funding application, an institutional evaluation report, or other forms of public or strategic communication.

The workflow begins with document and metadata ingestion. A researcher uploads publications or connects an ORCID profile, and the system retrieves metadata from sources such as OpenAlex and Crossref. At this stage, however, the material remains only raw input: abstracts, PDFs, author information, and citation-related metadata. To become useful for impact reporting, this material must first be transformed into structured, impact-relevant information. That is where IMeTo’s architecture becomes important.

The core innovation: extract first, generate second

The main innovation of IMeTo is not simply that it addresses impact measurement in SSH. That challenge is already well recognised. What makes the system more distinctive is the way it approaches the task technically.

Instead of passing raw documents directly to an LLM, IMeTo first uses lightweight, task-specific models to perform structured information extraction. These models identify elements such as methodology, studied populations, possible forms of impact, and potential beneficiaries. A separately trained model is intended to recognise the structure of Polish impact reports and detect elements such as mechanisms of change or evidence of impact in existing documentation.

Only after this extraction step does the LLM enter the process. It works not on raw source material, but on structured facts. This design is deliberate. Many generic AI writing tools rely primarily on prompting LLMs over raw text. While this can produce fluent summaries, it does not necessarily provide strong traceability to source material.

By constraining the LLM to structured inputs, the generated output becomes easier to trace back to source material and easier to verify. This design aims to make the workflow more auditable, more resource-efficient, and easier to scale than fully generative pipelines that depend on large models at every stage. In this sense, IMeTo is not only a response to a reporting challenge in SSH, but also an experiment in building a more controlled and modular AI workflow for scholarly support.

Why this matters for GRAPHIA

The connection to GRAPHIA should be made explicit. IMeTo is relevant to the project not only because it serves SSH researchers, but also because it demonstrates how AI methods, structured information extraction, and heterogeneous scholarly data can be combined into a practical workflow.

This is where project synergies become concrete. In the GRAPHIA ecosystem, IMeTo can draw on project-related services such as the SSH Citation Index API and scholarly data available through GoTriple (a discovery platform provided by OPERAS), while also benefiting from broader integration with open scholarly infrastructures and national research information systems. Depending on the task and data availability, this may involve combining publication metadata, citation information, open-access discovery, and institutional research records from different sources.

This matters because impact documentation does not rely on a single type of evidence. It depends on linking publication metadata, citation-related information, and other contextual signals that may come from different systems. Seen from this perspective, IMeTo is not just a writing tool. It is a layered service that can combine extraction, enrichment, and generation across multiple data sources relevant to SSH.

That makes it a useful example of the kind of knowledge-based instrumentation GRAPHIA aims to support. It shows how dispersed scholarly material can be transformed into outputs that are more structured, reusable, and interpretable.

Positioning IMeTo in the current landscape

IMeTo sits at the intersection of several existing tool categories, including bibliometric analytics platforms, research information systems, and general-purpose AI writing tools. Each of these addresses part of the broader problem, but none is designed specifically for evidence-based SSH impact documentation in a concrete institutional workflow.

Bibliometric platforms are strong in quantitative, citation-based analysis, but they are less suited to forms of qualitative impact that do not leave a clear trace in indexed literature. Research information systems can store and organise institutional data, yet they do not necessarily support the transformation of dispersed evidence into structured impact statements. General-purpose AI writing tools can generate fluent text, but without a workflow grounded in extracted evidence they offer limited traceability and verification.

For this reason, IMeTo is best understood not as a replacement for these systems, but as a complementary layer focused on a more specific task: helping translate heterogeneous scholarly evidence into structured impact documentation. This appears to be the area in which IMeTo can make a practical contribution.

Where the project stands now

IMeTo is currently at the proof-of-concept stage. A working prototype of the narrative-generation pipeline already combines document parsing, metadata enrichment, fact extraction, similarity search against the POL-on corpus, LLM-based synthesis, and confidence scoring. The fine-tuning pipeline for the extraction models is being prepared, with the next milestone being the assembly of suitable training data. The components are being connected through MCP (Model Context Protocol), which supports communication between the orchestrating language model and specialised tools. From here, the goal is to improve the extraction pipeline and move toward a more integrated system in which the language model can coordinate these tools across the full workflow.

IMeTo will not replace human judgment in evaluating research. What it can do is take over the most labour-intensive parts of the documentation process: gathering metadata, extracting structured facts, finding comparable examples, and drafting initial narratives that a researcher or evaluation specialist can then refine. The point is to reduce the time spent on mechanical parts of the process so that researchers and evaluation specialists can focus more on interpretation and judgment.

IMeTo: A Scalable AI Workflow for Documenting SSH Research Impact

A concrete problem in a concrete ecosystem

From publications to structured impact documentation

The core innovation: extract first, generate second

Why this matters for GRAPHIA

Positioning IMeTo in the current landscape

Where the project stands now

Like this:

Related

Leave a ReplyCancel reply

Funding

General Information

Licence

IMeTo: A Scalable AI Workflow for Documenting SSH Research Impact

A concrete problem in a concrete ecosystem

From publications to structured impact documentation

The core innovation: extract first, generate second

Why this matters for GRAPHIA

Positioning IMeTo in the current landscape

Where the project stands now

Share this:

Like this:

Related

Leave a ReplyCancel reply

Funding

General Information

Licence

Discover more from