Seedbox

Challenge

In the realm of healthcare, accessing and understanding medical information presents a significant challenge for various groups within the population, including the elderly and non-native speakers. The primary issues stem from the size, structure, and complexity of the language used in package leaflets of medicines and drugs. Currently, there lacks an online, easily accessible alternative to direct consultation with pharmacists or doctors.

Despite the availability of data for all drugs approved in Germany on online platforms, there is a gap in making this information readily understandable and accessible to everyone. The challenge extends beyond mere access to ensuring that the information is presented in a manner that can be easily comprehended by a diverse audience, thereby democratizing access to medical information. This endeavor also carries the responsibility of safeguarding against any potential misuse of the information, such as for self-harm or harming others, making it imperative to strike a balance between accessibility and safety. The question then arises: how can we leverage AI to provide a form of first-level support that makes medical information universally accessible, while also implementing measures to prevent misuse?

Solution

To address the challenge of making medical information more accessible and understandable, we developed an advanced retrieval augmented generation (RAG) system. This sophisticated AI tool has been designed to navigate and analyze an extensive database comprising over 100,000 drugs, translating to more than 3.6 million individual data points.

Key to this solution is the system's integration with an advanced web search tool, which allows it to access a wide array of sources for information and fact-checking. This capability is crucial for ensuring the accuracy and reliability of the information provided. Moreover, the system is specifically engineered to minimize errors or "hallucinations," common challenges in AI-generated content, thereby enhancing its trustworthiness.

A significant feature of our RAG system is its alignment with ethical guidelines and safety measures. It is programmed to avoid responding to prompts that could lead to misuse, harm, or any other undesirable outcomes. This safeguard is vital in preventing the potential for self-harm or harm to others, ensuring that the system contributes positively to users' health and well-being.

To ensure the system remains current and relevant, we have implemented an effective data ingestion pipeline. This pipeline facilitates the continuous update of the system with the latest drug information and guidelines, keeping the tool accurate and useful.

Accessibility and user experience were also relevant considerations in the development of this solution. As such, the system boasts an easy-to-use interface designed to be inviting and navigable for a diverse user base, regardless of their technological proficiency or background.

With a release scheduled for the second half of April 2024, “PharmaGPT” stands as a tool in democratizing access to medical information, embodying our commitment to leveraging AI for societal benefit while maintaining the highest standards of safety and ethical responsibility.

AI Tech Stack

Qdrant Vector Database for storing sparse and dense vector embeddings
Custom german finetuned SPLADE model for creating sparse vectors
Finetuned multilingual-e5-large for dense vector embeddings
Custom RAG finetuned version of Kafka 8x7b (SFT + DPO)
Custom finetuned reranking model for post-processing optimization
BERT-based router for optimized guardrails
TSNE clustering and semantic deduplication applied to a vast amount of parallel retrieved web data