An AI-native research product that cuts synthesis time for analysts by 88%

How I led the end-to-end design for one of the earliest successful AI products that helped reposition BlueDot from a data provider to decision partner

Built in 2023

Working PoC in 4 months

Shipped within 6 months

Play Video

Shows the product workflow of querying for information and taking action on it

Sparknotes

Timeline

Core development Mar-Sep 2023

Team Composition

PM・4 Eng・SME・Design Lead

Meet the Team

My Role

Co-defined strategy and vision

Led exploration, prototyping, and usability and product testing

Led design, incl. optimizing the design system and interaction patterns for generative AI

See the Nitty Gritty

Impact

Reduced the time-to-task for research by from 15 hours to less than 3 hours a week

Also built a use case for AI among multiple clients and gave BlueDot vital insights to build AI workflows and enrich datasets using AI

See Full Impact

Methodologies Implemented

Information architecture optimized for Gen AI

Nomenclature and language for probabilistic systems

Transparency & trust

Interaction and motion design

Task flow and error recovery

Emotion management design

Usability optimizations

Context & Problem

Analysts burnt up to 15 hours a week reading and manually synthesizing data

There were too many sources to go through, which became a bottleneck

Who are the analysts?

Public health analysts

Pharma researchers and sales

Enterprise risk managers

Where did the research time go?

Scanning news articles, alerts, and case trends.

Manually synthesizing that information.

What did analysts produce?

Recommendations for decision-makers in their organizations

This usually took the form of regular reports and bulletins.

Strategic Bet

AI will transform research from a burdensome task into an engaging part of building recommendations

I built a case for LLMs with the CPO during our strategic realignment retreat, to elevate BlueDot from a data provider to a partner in decision

Why use generative AI?

Changing landscape, caused by a post-COVID budget squeeze

Clients had less budget to spend on purchasing data and intelligence products, invest in in-house analysts, and expand processes to get more insights.

Rising demand for curated insights

Raw data didn't cut it anymore: given the pressure to optimize processes, there was increasing demand for curated insights. Sales of custom reports had gone up significantly year-over-year.

Capitalize on the first-mover window

Assistant was built in 2023, when LLMs were a new technology, so even if we failed fast, it was a solid strategy to get ahead of the curve. Also, the competitive landscape was over-invested in their own proprietary technologies while BlueDot was primarily a provider of structured data and could move quickly to adopt AI.

Exploration & Core Insight

The risk of one wrong answer outweighed the upside of speed

I led the research, in partnership with an internal SME and an AI engineer, to assess the impact of using AI at the heart of the product. I prioritized understanding how common workflows could evolve and the risks specific to our high-stakes and regulated domain. What I learnt was clear: balancing speed with trust was key.

Architectural Decision

Use AI to understand intent only

The LLM would map user queries to specific BlueDot datasets rather than generating its own answers.

How I drew this conclusion?

Primary research

Analysts most cared about research outcomes and accountability. We responded to this with digestible artifacts and the ability to manually validate outputs.

Secondary research

At the time, LLMs performed best at parsing user intent. But there wasn’t enough specialized training data yet to accurately respond to high-stakes questions.

I mapped the user journey and co-defined the architecture

User prompts in natural language

Map user intent to dataset(s)

Clarify till LLM has what it needs

AI Workflow

Extract data and pick the output format

Programmed Workflow

This pattern of "AI handling the thinking and programmed workflows supporting the output" is now seen in tools like Claude, validating the early bet.

Design Decisions

Feasibility

Core mechanics of the product

Usability

Intent management & new chat pattern

Interaction

Flow, prompt engineering & reliability

Emotion

Lexicon & response tone

Workflow

Output artifacts & value

Initial prototyping

I structured the layout so outputs were the centre of gravity on the screen

Using learnings from my contextual inquiry, I placed the chat and output in dedicated zones to mirror how analysts compared multiple artifacts when building their understanding of an event.

Early client testing

Users immediately pushed past basic questions, so we adapted our development approach

We built a scrappy version of the product in 3 weeks to test how clients engage with LLMs. Testers asked complex questions that combined datasets and references earlier questions in chat, something our RAG-based service was not built to handle.

Working within 2023 limits

The early testing let us assess what trade-offs to take: I worked with engineering to map technical feasibility to the experiences worth preserving.

Feasibility Decision 1

One question = One LLM session

Priority:

Giving each question a reliable context window

Context windows were low enough that the instructions to make the interaction reliable took over half the available memory. We couldn't guaranty a consistent number of user conversation within a session. So I co-developed a pattern with the tech lead that let users seamlessly switch from one LLM session to the next (with a new context window each time), while the UI stacked questions in the frontend as if it was all a single chat session.

Trade-off:

Each question was a silo

While it hindered exploration, as users could not compare data or build on questions, the cost and latency also needed to be considered. So I proposed we solve this problem in a future release, assuming the foundational models would improve over time.

Feasibility Decision 2

Ask more clarifications to maintain trust

Priority:

Ensuring the LLM relaibly understands intent

Early testing made it clear that clarifications were annoying, but wrong answers, mostly caused by misunderstanding the user's intent were called out by testers for being frustrating.

Trade-off:

Added friction and annoyance

I partnered with the tech lead to influence stakeholders to take a more careful approach when answering queries. More clarifications were a cost worth paying to keep the brand trust high.

Usability

Launch & Adoption

We released a controlled MVP in 4 months and launched in 6 months

We provided early access to 5 marquee clients, their response was overwhelming positive.

Given the newness of the technology among career analysts, we needed targeted onboarding and training. I worked with CS to build sample queries users can ask.

Further design and model refinements continued for another 6 months, through February 2024.

Launch Decision

MVP = Minimum Valuable Product

We didn't release Assistant till we were satisfied with the output quality.

This was a joint decision between the PM, tech lead, epidemiologist and me.

Media coverage

Highlighting the use case of this technology

Read on Cohere's Blog

Recognition of BlueDot's recent market growth

Read on The Globe and Mail

Project Impact

Efficiency & reliability

88%

Reduced time-to-task

Down from 15 hours a week, for analysis and manual synthesis.

97%

Intent recognition accuracy

>90% at launch and 97% after post-launch refinement.

> 1 min

Time-to-first-artifact

Compared to 30 to 45 minutes (including manual processing).

Adoption & behaviour

Rapid adoption of AI into user workflows

The library of sample queries that I developed with CS, made it easier for career analysts to adopt LLMs. Users quickly built the confidence to do multi-step research workflows.

Default product interface

Within a quarter, the tool adoption reached over 85% among clients with the enhanced package. This, in part, helped leadership rethink the pricing tiers and make Assistant available by default to most clients.

Strategy & business

Internal expansion of AI

Learnings from Assistant helped various teams expand AI workflows internally and enrich datasets using AI (notably using AI to make global news more relevant by reducing 100s of articles into a handful of topics).

Data provider to a partner in decisions

Assistant became the first step in a broader shift to reposition BlueDot. Its learnings and initial reception helped BlueDot make consorted efforts towards expanding into a value model based on intelligence.

Reflections

Trust and curation are foundational considerations in the age of AI design

I was lucky to find myself in a team culture where design was seen as critical to this new product. In 2023, when global teams were racing to integrate AI into their products, considerations like curating results and maintaining trust were not loud within community conversations. As the technology has matured, these opinions have aged well. Frontier AI labs are increasingly training foundation models on specialized datasets and tools like Claude have validated the demand for specialized workflows at scale. The lesson I carry is to keep the user's experience at the centre of any product, not the technology.

If I had a do-over, I'd be more ambitious with my decisions

Good design decisions are dated; I try to record every major I make so I can reflect on constraints in hindsight and react to ones with assumptions that past their expiry. Once such case is sacrificing the speed of prototyping to avoid the risk of hallucinations. I would have created a "labs" section and let ambitious users play around knowing the risks. This would have allowed us to capture a lot of first-hand AI interaction data and in turn understand real world use cases clients were focussed on.

I'd also apply different weightages when adapting the mental model for users

Another dated assumption I had was that users need a lot more guidance and clarifications. I'm a big proponent of product builders understanding the curse of knowledge. But I also find the opposite is also sometimes true, where creators assume users need more context than is necessary. I made this mistake by being verbose and biasing towards user confirmations. I designed the interaction assuming I am building trust but I unwittingly communicated to users that we lacked confidence in the model or our product. I now challenge myself to test my decisions on both sides of this spectrum.

I've since grown in my convictions that design is critical for AI products

We build a product on top of a foundational model, something most products are doing. But the language, error recovery, and emotion management are the real value we offer to users. These are all fundamental UX considerations, which all companies investing in probabilistic outputs should adopt.

Rahul Malkani

Senior Product Designer

Rahul Malkani

Senior Product Designer

Rahul Malkani

Senior Product Designer

Work

AI-Native Analytics

Curated Insights at Scale