Authors: Ljiljana Jertec Musap, Draženko Celjak, Petra Čačić, Ivan Jelenić
University of Zagreb, University Computing Centre SRCE
Reviewers: Ronald Snijder, Jadranka Stojanovski
Challenges in Open Access Publishing
In the evolving landscape of academic publishing, the push towards open access (OA) has gained significant momentum, particularly with the advent of initiatives like Plan S. Launched by cOAlition S, Plan S strongly recommends the use of standardised, machine-readable formats such as the Journal Article Tag Suite (JATS) XML, which ensures that research articles are accessible and interoperable across different platforms. The CRAFT-OA project also recognised the significance of JATS XML for long-term content preservation and reusability. It was highlighted as one of the key technical standards for Open Access journals in the project’s deliverable Report on Standards for Best Publishing Practices and Basic Technical Standards, Inspired by the FAIR principles.
JATS offers numerous advantages, including standardised tagging, enhanced interoperability, and support for multi-channel publishing. Its flexible structure accommodates different types of journal articles and ensures their long-term preservation. Additionally, JATS improves the discoverability of articles by providing precise indexing and enabling data retrieval, making it an invaluable standard in modern academic publishing.

However, formatting the article in JATS XML, or even converting it from other formats, can be complex, time-consuming, and expensive. Editors often face technical difficulties when trying to incorporate it into their publishing process. A 2020 survey by Scholastica found that, despite the numerous potential advantages of XML, fewer than half of the 63 publishers surveyed reported producing full-text XML article files. Furthermore, a 2022 survey revealed that there has been no increase in the production of full-text XML articles since 2020. JATS XML adds another layer of complexity to OA publishing, making it particularly challenging for smaller publishers and academic journals to maintain compliance.
Our Solution: The JATS XML Converter Service
To address these challenges, the University of Zagreb University Computing Centre – SRCE applied its extensive expertise in developing and maintaining the Portal of Croatian Scientific and Professional Journals – HRČAK and created the JATS XML Converter Service — a service designed to simplify the conversion of academic manuscripts from DOCX to JATS XML format. This innovation was born out of the growing need for automated XML generation, especially considering the challenges journals face in keeping up with Plan S principles and recommendations. The process of creating JATS XML using the Converter Service is intuitive and straightforward. Editors need to ensure that manuscripts are prepared and formatted using basic Microsoft Word styles and features, for which simple guidelines are provided. The manuscript, along with required metadata about the journal and article, is then uploaded to the JATS XML Converter Service, where it is automatically converted into JATS XML which is demonstrated in a short video tutorial. To use the service, it is mandatory to have an active electronic identity that is part of the global authentication and authorisation system, eduGAIN.
The technology behind the JATS XML Converter Service is based on the Pandoc universal converter, which demonstrated the most precise results among open-source converters in our tests. Additionally, for parsing references, the service relies on AnyStyle, an open-source software. All components are integrated and managed within the Laravel PHP framework.
With this solution, editors are no longer burdened by the technical intricacies of XML formatting, allowing them to focus on the content rather than the format.
HRČAK and Support for JATS
HRČAK is a central portal of Croatian professional and scientific journals that serves as a publishing platform for over 545 scientific and professional journals, with over 299,000 full-text articles available in OA. Alongside its role as a central portal providing OA to journals across various disciplines, the service also offers technical support to journal editors and promotes best practices in OA scientific publishing. This includes the use of ORCID identifiers, publishing associated datasets and linking papers to them, using Creative Commons licenses and publishing in machine-readable formats like JATS XML. HRČAK is operated by SRCE, with various experts in information and library science as well as representatives from prominent Croatian journals also involved in its development
The story of HRČAK and its support for JATS XML began in 2017 through the OpenAIRE2020 project, the FP7 Post-Grant Open Access Pilot. The project financed technical enhancements to journals and platforms that support Open Access publishing without Article Processing Charges (APCs) and promote interoperability with the infrastructure being developed through OpenAIRE projects One of the proposed technical enhancements for HRČAK included the implementation of functionality to publish XML files alongside PDFs, enabling content mining of the articles. Initially, the responsibility for creating and publishing JATS files fell on the journals’ editors, who had to find the resources to prepare JATS XML for uploading to HRČAK. However, in 2023, HRČAK introduced automated conversion features for JATS, but they were fully integrated into the HRČAK and available exclusively for Croatian journals. By offering the JATS XML Converter Service, SRCE is sharing the technical solution created for HRČAK with all journals, enabling them to comply with requirements like those of Plan S or PubMed Central.
The JATS XML Service was launched at the end of July 2024, and in the first three months, 21 authenticated users from 18 different organisations used it to generate their XML files.
Performance Evaluation and Future Plans
As part of the development and testing phase, a student intern at SRCE conducted an extensive evaluation of the JATS XML Converter Service’s performance. Testing was conducted on a sample of 250 DOCX files and the results were as follows:
- Successfully converted (no edits required): 41 cases (16%)
- Satisfactory (edits requiring up to 15 minutes of work): 145 cases (58%)
- Unsuccessful (extensive edits required): 64 cases (26%).
These results demonstrate that while the service is effective in most cases, there is room for improvement, particularly in reducing cases requiring substantial manual intervention. These insights will guide future updates and adjustments of the JATS XML Converter Service (i.e., including improved reference recognition or enhanced section identification) to minimise manual efforts and increase the percentage of successfully converted articles.
The future of the service depends on resources, which are currently quite limited. The SRCE team is aware of other solutions available for similar purposes, but we lack the resources to clarify the differences among them. We welcome collaboration with partners interested in comparing tools and/or collaborating with us on the development of the JATS XML Converter Service through joint projects. Such comparisons and their results would provide the community with a clear understanding of the tools and services available to them, along with their strengths and weaknesses. Additionally, tool maintainers would gain valuable feedback on how to improve their solutions and potentially establish mutual collaborations.
SRCE’s Broader Role and Services
SRCE plays a pivotal role in supporting Croatia’s research and higher education system. It is a central infrastructure institution that plans, develops, and improves e-infrastructure and digital services for the needs of the academic and scientific community in Croatia and provides support in their usage. SRCE’s service catalogue offers a wide range of services and resources, including advanced computing, authentication and authorisation infrastructure, digital repository and OA platforms, key information systems for research and higher education, comprehensive e-learning solutions, and many other relevant services. Those services foster a modern, integrated academic environment that enhances research, collaboration, and educational delivery for students, educators, researchers, and the public.
The JATS Converter Service provides a solution to enable and/or enhance machine-readability to journal articles, helping scientific publishing to keep up with the possibilities the technological development provides and meet good practices and recommendations of initiatives like Plan S. By simplifying the XML conversion process, the service could improve the impact of scientific publishing making it more Artificial Intelligence (AI) ready.
SRCE remains committed to continuously improving the JATS Converter Service and invites editors, publishers, and publishing experts to actively participate in its refinement. More real-world testing is essential to address any issues that may not be apparent during development, and feedback from the scientific and academic community will help steer the service’s development. This input can help define future development directions for the JATS XML Converter Service, such as the creation of an API interface to enable more extensive automated conversion of articles into JATS XML format. Additionally, the development of a plugin for OJS could be explored, leveraging the service for conversion while sourcing metadata for the front section directly from OJS itself.
We invite journal editors and publishers to visit and test the JATS XML Converter Service. If you would like to join the effort to improve the future of Open Access publishing, feel free to contact the HRČAK team.
About the authors
Ljiljana Jertec Musap is a project lead at the University of Zagreb University Computing Center (SRCE). Her work focuses on the planning, specification, development, user support, and education for the national repository infrastructure DABAR, and the Portal of Croatian Scientific and Professional Journals – HRČAK. She is the national RDA Coordinator for Croatia and actively contributes to the EU-funded CRAFT-OA project. Her expertise includes data and digital asset management, encompassing storage, description, and long-term preservation. Additionally, she is involved in advancing open science initiatives and improving scientific communication.
Draženko Celjak is the head of the Data Management Department at the University of Zagreb University Computing Centre (SRCE). He aims to advance open science through the development of essential infrastructure. His responsibilities include managing the national repository infrastructure DABAR, the Portal of Croatian Scientific and Professional Journals – HRČAK, and the national cloud storage service PUH. He is the national RDA Coordinator for Croatia and leads the SRCE team in the EU-funded CRAFT-OA project. He participates in the activities of the national Open Science Cloud initiative (HR-OOZ), and as part of EOSC-related activities, he works on establishing the Croatian national node, and integrating it into the EOSC federation.
Petra Čačić is an Information Specialist for Open Science and Digital Repositories at the University of Zagreb University Computing Center (SRCE). Her work includes developing, maintaining, and improving the national repository infrastructure DABAR, and the Portal of Croatian Scientific and Professional Journals – HRČAK, providing user support, conducting webinars, and holding online consultations. She actively participates in the EU-funded CRAFT-OA project, contributing to its work packages and ensuring SRCE’s compliance. Additionally, she supports the national RDA node, contributes to the Croatian Open Science Cloud Initiative (HR-OOZ). Her expertise includes open science advocacy, research data management, digital repositories, and scientific publishing, with a strong focus on user education, troubleshooting, and fostering an understanding of FAIR principles.
Ivan Jelenić is a project lead at the University of Zagreb University Computing Center (SRCE). His work focuses on the technical development, maintenance, and improvement of web applications for the national repository infrastructure DABAR, the national cloud storage service PUH, and the Portal of Croatian Scientific and Professional Journals – HRČAK. His expertise includes web technologies such as PHP and the Laravel framework.
