From Hype to Reality: Architectural, Strategic, and Data Lessons for AI in Pharma

Proven data optimization and migration methodology

Christof Wascher

Consulting Director R&D

Eric Dexter

Data Scientist

Why do so many AI initiatives stall before delivering real value?

In this article, we explore three foundational areas that shape real-world AI adoption in regulated industries.

In short:

If you want AI to achieve real business impact in a regulated environment, you must focus on these three factors:

System architecture: AI agents must adhere to the same standards and operate as part of a governed ecosystem with clearly defined roles, permissions, and rules for collaboration. Safeguards and systems that maintain reliability, traceability, and compliance are essential.
Strategic capability building: AI must transition from isolated projects to a coherent strategy that encompasses your entire organization. This includes a long-term roadmap, leadership focus, training and creating the right tools and pathways to successful AI adoption.
Data readiness: Often overlooked and underestimated, data is the main factor for success or failure. Expect to spend most of your time on data discovery, investigation, and cleaning, and only a small fraction on actual AI model development. In addition, all data used for model testing must have its source, labeling accuracy, exclusion rationale, and independence documented and justified.

Leaders must get these three areas right to build AI systems that are reliable, compliant, and genuinely value-creating.

#1: Robust AI agent ecosystems for regulated environments

Regulated environments demand meticulous attention to system architecture, permission boundaries, and operational safeguards for AI agents. These must comply with relevant legislation, such as Annex 22 expectations for traceability, the EU AI Act’s requirements for high-risk systems and GDPR guidelines for storing sensible data.

Building robust agent ecosystems isn't just about technical implementation - it's about creating systems that maintain reliability, traceability, and compliance even as they operate with increasing autonomy.

Permissions and access

Apply the principle of least privilege: agents should access only the data and tools necessary for their roles. Conduct thorough impact and dependency mapping to mitigate cascading failures. For instance, a clinical-trial protocol analysis agent should not have access to patient databases, and a data-formatting agent must not trigger errors affecting downstream analyses. Access-control measures and audit trails are required for all agent interactions involving GxP data

Orchestration patterns

Agent-to-agent interaction directly influences system reliability. Collaborative agents may share tasks (e.g., data extraction and analysis) if data-access rules are respected. For critical decisions, independent verification is preferable—agents should work in parallel, without sharing outputs, and have results reviewed by humans when inconsistencies arise. Governance should define how discrepancies between code behavior and documentation are resolved, ensuring the validated documentation remains the source of truth

Technical infrastructure and safeguards

Scaling from proof-of-concept to production demands standardized communication and auditability. Implement frameworks such as Model Context Protocol (MCP) to manage how agents access external tools. All actions involving databases should be logged, with rollback mechanisms and deletion safeguards for critical data.

Data governance and privacy controls

Protect sensitive data such as patient records or proprietary formulations. Prevent cross-context data sharing, for example, anonymized trial data must not move to less secure environments. Favor API-based structured access over GUI navigation to improve accuracy, maintain auditability, and meet data integrity and ALCOA++ principles. Standardized schemas and metadata improve transparency and reduce regulatory risk.

Monitoring and system health

Monitoring must go beyond uptime. Track agent decision logic, resource use, and interaction frequency. Failures (e.g., inaccessible resources) should trigger explicit alerts—agents must not make assumptions or default decisions based on incomplete data. Establish workflow hierarchies and trigger mechanisms aligned to agent roles to prevent circular dependencies and maintain predictable operational behavior.

#2: Building strategic AI capabilities

AI initiatives should align with core business goals, not operate as isolated projects. A unified strategy allows each project to build a foundation for the next, creating reusable assets and shared knowledge.

Address cross-organizational needs

Target problems that affect multiple departments to avoid fragmentation. For instance, if document analysis is required across teams, implement a centralized NLP pipeline rather than individual tools. Establish consistent master control protocols (MCP), unified data schemas, and governance early to prevent redundant or conflicting development.

AI initiatives should align with core business goals, not operate as isolated projects.

Christof Wascher, Consulting Director R&D, NNIT

Build institutional infrastructure

Develop a common AI framework trained on internal SOPs, GMP guidelines, and project histories. This ensures new solutions build on validated process knowledge rather than generic AI capabilities. Define and enforce technical standards for data formats, APIs, and security controls to prevent integration challenges. These practices support scalability and compliance across the organization

#3: The "Data" Box Illusion

Every AI flowchart contains a simple box labeled "data" feeding into layers of sophisticated processing. This innocent rectangle masks a dangerous assumption: that data exists in a clean, standardized, ready-to-consume format. This assumption routinely derails timelines and budgets.

Strategic AI adoption begins with data readiness. If your organization is serious about AI capabilities, you must invest in comprehensive data governance, standardization, and quality control before launching model development.

Understand the 90/10 rule

The "garbage in, garbage out" principle gains relevance as systems grow more sophisticated. Yet project plans consistently allocate 90% of resources to model development while assuming data preparation happens instantly.

NNIT’s experience working with AI projects indicates an inverse truth: 90% of project time involves data discovery, investigation, and cleaning, while actual model development requires only 10%. This ratio holds remarkably constant across projects, yet planning documents rarely reflect it. Instead, teams start behind schedule, because patient IDs don't match across systems or critical fields contain unexpected values.

Data harmonization compounds the challenge. Individual departments often maintain well-organized datasets for their specific needs; clinical trials in one format, regulatory submissions in another, and manufacturing records in a third. Each makes perfect sense in isolation but combining them requires extensive translation work.

Build an AI-ready foundation first

Pre-emptive data preparation accelerates everything downstream, because teams with clean, standardized datasets can focus on innovation instead of data hygiene.

This investment in data quality yields greatest return if it is seen as institutional infrastructure rather than overhead for individual AI projects. And well-structured, documented data doesn’t just benefit AI. The same cleaning effort also reduces errors in routine analyses and simplifies compliance audits.

The cost of rushed data cleaning

Rushed data cleaning introduces bias and undocumented changes that can invalidate results. Regulators emphasize traceable manipulations, test-data independence, and ongoing drift monitoring.

To remain compliant:

Allocate time for root cause analysis.
Include SMEs to validate data context.
Apply change control for all transformations.

Where to start?

By now, you probably know that turning these reflections into action doesn’t require a “big bang” transformation. It does require a deliberate first step in each of the three areas:

Map your agent ecosystem: Start with an inventory of existing and planned agents, their permissions, and the systems they touch. Classify them by risk, define who may initiate actions vs. only verify, and put basic monitoring in place for high-impact workflows.
Define a strategic AI operating model: Agree on a reference architecture (models, platforms, integration standards) and a simple governance structure for AI: who owns the platform, who prioritizes use cases, and how value and risk are assessed across R&D, clinical, and commercial.
Start building a clean shared data foundation: Select one or two high-value domains—such as trial data combined with safety or manufacturing—and invest in making them AI-ready: clear ownership, harmonised structures, documented business rules, and quality metrics that are visible to both domain and technical teams.

At NNIT, we are ready to assist and guide you toward a robust AI agent ecosystem that can scale safely within a regulated life sciences environment.

Talk to our AI Experts

Let’s scope your roadmap, derisk your compliance, and unlock rapid ROI from AI

First name *

Last name *

Title *

Company *

Email *

Phone

Questions and/or needs

When you submit your inquiry to NNIT via the contact form, NNIT process the collected personal data in accordance with the Privacy Notice, where you can read more about your rights and how NNIT process your personal data.