Smart Web-based Suggestion Chatbot for Research Data Management

A project of 2nd BioHackathon Germany

Bielefeld

Approach:

1. Data Collection and Integration:

Gather relevant information from reliable sources such as FAIRsharing, RDMkit, FAIR Cookbook, Helmholtz Metadata Collaboration, RDMO, and NFDI sources. Integrate the collected data to create a comprehensive knowledge base for the chatbot.

2. Natural Language Processing (NLP) Implementation:

Develop NLP algorithms to effectively understand and interpret user queries. Extract key concepts and user intents from the queries. Match user queries with appropriate responses from the knowledge base.

3. Interactive Chatbot Development:

Design and implement an intuitive and user-friendly chatbot interface. Integrate the NLP algorithms and the knowledge base to provide tailored recommendations for RDM best practices, policies, and available tools. Enable interactive communication between the chatbot and the user to address specific RDM queries. To ensure the chatbot provides up-to-date information, APIs will be utilized to fetch the current status of the sources mentioned at runtime. This approach will ensure that the chatbot accesses the most recent guidelines, policies, and tools available from these sources. By dynamically retrieving information, the chatbot can offer researchers the latest and most relevant recommendations for their data management needs. If this approach to dynamically retrieve information through API is technically not feasible, we will revert to utilizing a data dump and use this as the knowledge base for the chatbot.

While the chatbot may not reach its final version within the short timeframe of the hackathon, our goal is to develop a functional prototype that demonstrates the capabilities of the chatbot. This prototype will serve as a foundation for future development, refinement, and expansion of the chatbot.

This project aligns with the broader topic of utilizing Artificial Intelligence (AI) for data management practices, an area actively explored by de.NBI and ELIXIR. By leveraging Large Language Models (LLMs), the chatbot enhances data transformation efficiency and accuracy, promoting interdisciplinary collaboration and scientific advancements in research.

The expertise of team members from DataPLANT and ELIXIR Plant Sciences Community ensures a solid foundation for success. Collaborative efforts during the BioHackathon will focus on developing the chatbot, highlighting the innovative use of chatbots in data management practices.

Project leads: Xiaoran Zhou, FZJ / DataPLANT, x.zhou@fz-juelich.de, Sebastian Beier, FZJ / ELIXIR-DE, s.beier@fz-juelich.de