Microsoft Foundry

This partnership was done in collaboration with the course HCDE 517: Usability Testing, taught by Katya Cherukumilli at the University of Washington.

IMPACT

  • Influenced implementation of 2 high priority recommendations for 80k+ users

  • Evaluated the effectiveness of Foundry's discover and exploration features

  • Delivered a series of design recommendations to a team of 30+ people

TEAM

4 Usability Researchers

ROLE

Usability Researcher

TIMELINE

Jan-Mar 2026

10 Weeks

SKILLS

Usability Testing

Qualitative Data Collection

Data Triangulation

COLLABORATORS

1 Product Designer

1 Product Manager

1 Research Ops Manager

1 User Researcher

BACKGROUND

Foundry is a comprehensive platform to build, deploy, and manage generative AI applications and agents.

Our client at Microsoft Core AI was most interested in gauging how easy it was for users to complete their tasks with discovery the tools provided to them, and uncovering whether these tools matched their expectations. They expressed that their "north star metric" was conversion rate.

In other words…

Can users successfully discover AI models that suit their needs, and what do their typical workflows look like?

PARTICIPANT PROFILE

Due to the specialized nature of Foundry as a platform for AI developers, we utilized a questionnaire via User Interviews to screen people based on their AI development experience.


To reduce bias, we excluded those with UX research experience from the participant pool. A total of 8 participants were interviewed.

4 Students

Undergraduates & Graduates

4 Professionals

Working full-time in tech

  • Background in software development


  • 1+ year of experience using and developing generative AI tools


  • Thorough understanding of LLMs and model deployment workflow

CRITERIA

(HOVER ME!)

RESEARCH GOALS

The foundation of our study plan begins with translating business goals into research goals.

Business Goals

The "north star" metric is

conversion rate.




Research Goals

Understand how users:

  • Navigate the platform

  • Evaluate & compare model information

  • Determine the most suitable model



METHODOLOGY

60 Minute Sessions


Virtual Moderated Test


Structured Interview &

Direct Observation Tasks

Post-Task Questionnaire


FINDINGS & DESIGN RECOMMENDATIONS

Foundry's discovery tools were misaligned in user intention, clarity, and representation of information.

While our users could technically get around just fine relying on their existing knowledge about current AI models their experience searching for and comparing these models could definitely be improved.


We organized the usability issues that we found based on the following criteria: high priority (prevents completion of a task), medium priority (causes significant delay), and low priority (minor effects on usability).

HIGH PRIORITY

MEDIUM PRIORITY

LOW PRIORITY

Issue 1: 50% of users expressed a desire for an AI-assisted search tool, despite there already being one.

Many participants used the "Search with AI" tool as a traditional search bar, most likely due to its visual design and users' correlation to the design pattern of regular search bars. Because they prompted it as such, they failed to notice the AI capabilities of this tool.

EVIDENCE

  • 4 out of 8 users prompted the search bar with single keywords such as "models" and "chatbots"


  • Only 1 user actually interacted with the "Ask AI" icon next to the search bar

OBSERVATIONS

“So many models to choose from. If only there was a way for me to share my use case with an AI model built in… and have it recommend me a model.” - Participant 4


“It's a lot of stuff to compare and contrast by hand… It would actually be useful to have a chat assistant, honestly. I give you my requirements and then you give me what you think would be best.” - Participant 2

DESIGN RECOMMENDATIONS

BEFORE

Search bar design with AI tool button next to it


AFTER

AI tool made more obvious based on

well-known design patterns for chatbots

Issue 2: The "Compare Models" tool was easy to use, but difficult to find.

Despite 100% of our users reporting that "Compare Models" was easy to use, their time on task told a different story:

Additionally, the main difference in time on task depended on which page our participants started from:

This suggests that the "Compare Models" tool had better visibility on the "Model Catalog" page than anywhere else. As a result, we made the following design recommendations:

DESIGN RECOMMENDATIONS

BEFORE

No CTA to "Compare Models" on Model Specifications page

AFTER

Surface CTA for "Compare Models" early in the page

Medium Priority

IMPACT

As of April 2026, the Microsoft Core AI team has pushed the following updates based on our findings and design recommendations:

Surfacing actionable model exploration CTA much earlier in the home page (High Priority Issue 2)

Hover tooltips for benchmarks and explanations of what they mean (Medium Priority Issue 3)

Reflections

1

Precise wording is everything

When my team was drafting screener questions, we used words such as "familiarity" to ask users to describe their experience with AI development tools. However, we didn't realize that "familiarity" looks different for everyone— and in the realm of usability testing, it's important to make sure that everyone has the same baseline when they're evaluating themselves to prevent bias. In the end, we opted to ask users for years of experience.