NNIT Detail Lifesciences (3)Umbraco
Data & AI

Algorithm sorts 18,000 documents at LEO Pharma

​With a combination of machine learning, automation and co-creation, NNIT and LEO Pharma has worked together to pioneer a new method for ensuring better order and structure in thousands of business-critical documents. The project is expected to free up a two-digit million DKK amount of resources.

Document management is a common challenge in the life science industry. Most pharmaceutical companies store thousands of documents on platforms like Veeva Vault. This includes work instructions and standard operating procedures (SOPs) which are critical for the production of medicines. But avoiding that duplicates, tests and long-forgotten drafts, some with fewer than 100 characters, get mixed in among the valid documents requires pharma companies to devote significant resources to maintaining document compliance.

In an effort to innovate their document management practices and optimize cleanup efforts, LEO Pharma has worked with NNIT to take a groundbreaking approach to the challenge: employing AI to assist document management and make it more user friendly.

With a conservative estimate, the overall project is expected to free up a two-digit million DKK amount of resources, which can then be devoted to more productive tasks. The initiative is also intended to pave the way for optimization of internal processes and generally a more data-driven approach to document management.

50 years of cleanup
The idea to use AI came after LEO Pharma calculated that it would take approximately 50 years for a human to review and cleanup the nearly 18,000 documents one-by-one.

– We quickly realized how long it would take to perform the cleanup manually. Just keeping track of which of the documents we have been through is a challenge. To make the process more manageable, we decided looking into the possibilities of using AI and get an algorithm to help sort it, says Josefine Hedehus, Project Manager, Quality Systems & Business Process Management at LEO Pharma.

Together with her colleague Torben Craner, she is lead of the cleanup project, which extends across LEO Pharma's global organization and involves a number of stakeholders, especially from the business areas responsible for many documents.

Co-creation throughout the project
As part of the cleanup process, LEO Pharma wanted to adjust the way the documents were organized so that the content was grouped into relevant subcategories. Therefore, they needed a partner who was not only familiar with AI and Veeva, but also with the work-processes and regulatory requirements that apply to a life sciences organization.

– In addition to their experience with Veeva, NNIT also has great insight into working with document management and in-depth knowledge of GxP. From the beginning there was a good atmosphere and a good drive in the project. The team from NNIT were good at asking clarifying questions so that we could get the sorting calibrated correctly. Along the way, they offered ideas and observations that we had not considered, so the whole process was characterized by co-creation and commitment beyond all expectation, says Torben Craner.

Man and machine working together
Based on the initial matching of expectations, NNIT designed a Data Quality solution based on Amazon Web Services (AWS) technology which uses natural language processing (NLP). This means that the AI algorithm can not only recognize individual words but also interpret and analyze the text based on context and natural language so that it can scan the documents and extract structured content directly from the text.

The result of the algorithm-sorting is displayed in a user-friendly AI training-application where subject matter experts (SME) from LEO Pharma can review and qualify the documents. This allows users to communicate directly to the algorithm and improve it to make it even more accurate. Once the documents are properly classified, sorting can be further automated.

An important take-away was that it worked well to combine the algorithm with human users and automation. The raw data cannot stand alone. For example, much of the content is LEO-specific, which their SMEs need to understand and qualify. We ran a series of workshops where we reviewed the rough sorting and about 90 percent of the content was properly sorted after the first round of input from the SMEs.

Improves the quality of documents used for training
The positive effects of the new document management process can also be felt in other areas of LEO Pharma’s business, such as the quality of documents used for training.

– Training is an important part of GxP. By implementing a more structured clean-up process, we ensure that the quality of the documents we use for training is improving, which will increase the quality of the part of our training that deals with Read and Understand, says Josefine Cramer Hedehus.

After a total of 20 weeks of start-up and implementation, the solution has now been handed over to LEO Pharma, where it is used for further cleaning and reorganization of documents and as part of the overall data reporting.

Hybrid cloud, Algorithm infographic

The Data Quality-solution for LEO Pharma consists of various AWS services, including AWS Textract, Amazon Simple Storage Service (S3) and AWS Comprehend Medical, and hosted on AWS' Cloud infrastructure.