返回首页
InfoQ

Presentation: Dynamic Moments: Weaving LLMs into Deep Personalization at DoorDash

8.5Score
Presentation: Dynamic Moments: Weaving LLMs into Deep Personalization at DoorDash
AI 深度提炼
  • LLM生成用户画像和内容结构,传统模型负责最终排序,形成混合架构提升响应能力。
  • 系统聚焦‘动态时刻’,捕捉用户短时效意图,而非依赖静态历史行为或品类偏好。
  • 面对本地商业海量SKU,该方案平衡个性化精度与计算效率,支撑非餐饮品类增长。
#LLM#推荐系统#DoorDash#个性化#机器学习
打开原文

[InfoQ Homepage](https://www.infoq.com/ "InfoQ Homepage")[Presentations](https://www.infoq.com/presentations "Presentations")Dynamic Moments: Weaving LLMs into Deep Personalization at DoorDash

View Presentation

Speed:

47:02

!Image 1/presentations/llm-personalization/en/slides/Sud-1776767233016.jpg)

Summary

Sudeep Das and Pradeep Muthukrishnan explain the shift from static merchandising to dynamic, moment-aware personalization at DoorDash. They share how LLMs generate natural-language "consumer profiles" and content blueprints, while traditional deep learning handles last-mile ranking. This hybrid approach allows the platform to adapt to short-lived user intent and massive catalog abundance.

Bio

Sudeep Das is the Head of Machine Learning and Artificial Intelligence, New Business Verticals, at DoorDash. He is a frequent speaker at RecSys, SIGIR, ICML, ReWork, MLConf, Nordic Media Conference, and other ML conferences. Pradeep is the Head of Growth for New Business Verticals at DoorDash, where he leads end-to-end growth efforts across personalization, targeting new product experiences.

About the conference

Software is changing the world. QCon San Francisco empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

INFOQ EVENTS

  • !Image 2/filters:no_upscale()/sponsorship/eventsnotice/2c3c9704-98d2-4d27-8bcc-70bf2fc91d2a/resources/1YugabyteWebinarMay12-transcripts-1774546444287.png)May 12th, 2026, 1:30 PM EDT

#### Designing Data Layers for Agentic AI: Patterns for State, Memory, and Coordination at Scale

Presented by: Karthik Ranganathan - Co-CEO & Co-Founder at YugabyteDB, and Aditi Gupta - Snr. GenAI/ML Specialist Solutions Architect | GTM Data and AI at AWS

  • !Image 3/filters:no_upscale()/sponsorship/eventsnotice/3ecacfe1-02d1-4d54-a048-8ee8571a77bb/resources/1EonWebinarMay21-transcript-1774373522295.png)May 21st, 2026, 12 PM EDT

#### Portable by Design: Data Mobility & Recovery Patterns for Multi-Cloud Systems

Presented by: Liore Shai - Solutions Architect at Eon

  • !Image 4/filters:no_upscale()/sponsorship/eventsnotice/1302f11a-f90f-4a79-96d1-3dd20d032144/resources/1HarnessWebinarMay28-Transcripts-1776246863928.png)May 28th, 2026, 1 PM EDT

#### Shipping Faster, Breaking More: Rethinking Delivery Systems in the Age of AI

Presented by: Eric Minick - Sr. Director of DevOps Solutions at Harness, and Aaron Newcomb - Senior Product Marketing Manager at Harness

Transcript

**Sudeep Das**: I'm Sudeep.

**Pradeep Muthukrishnan**: I'm Pradeep.

**Sudeep Das**: We are going to take you through a journey of how we are reimagining personalization at DoorDash, and something we'll talk about a lot is hyper-personalization. In the last week, how many of you have ordered something from DoorDash? How many of you have ordered something that's not restaurant food? Over time, when I do this talk, I've asked this question and the fraction of hands of the non-restaurant side has gone up. We're doing something right. Why are we doing this? Basically, we're doing this to capture all shoppable moments. The moments here is tagged in a different color because we're going to talk a lot about moments today, and something called dynamic moments we'll come to. Essentially, the idea for DoorDash being your local commerce buddy, basically. Everything that's sold around you, be it restaurants or grocery, convenience, alcohol, flowers, pets, you name it, we can deliver it and you can shop it on the platform. That's the goal. That's the idea of the product. That comes with a cost.

Once you have too much stuff on your platform, it means that you're now dealing with abundance. When there is abundance, there is the concept of, how do I choose the thing that I really need? How do I find and show you the customer the thing that you need at this very moment? If you think about this, this problem is an age-old problem of personalization. Add to this the fact that now we have millions of users, thousands of merchants, a massive catalog. We need to understand every user. We need to understand every item we carry. Also, we need to understand that consumers have affinities towards merchants.

Then add to that the fact that people's interests and intent shifts very quickly. It's late night, you're watching Netflix, and then you're getting hungry, you want to order something, so the late night snacks, we need to understand the intent in that moment. Maybe you have a flu, you want the medicines right now. Then, maybe it's an event like Black Friday, and you have been looking at headphones and TVs. We need to understand that there's this latent interest that you're expressing, and then serve it up back to you when the moment is right.

This is not, as I said, new in personalization. I've been in personalization for many years. It started with this Netflix Prize where we had matrix factorization. People did a lot of LDA type things in the early days. Then came the wave of deep learning-based personalization. We looked at all these revolutions like wide plus deep models, MTML, two-tower embeddings for retrieval. When you think about those types of personalization, what I'm calling classic personalization here, they were basically doing a couple of things right. One is this learning from within the system. Basically, those systems learn from engagement and somewhat from product metadata.

Mostly, it's doing something akin to collaborative filtering at the end of the day. It's people like you, they buy this thing. Or if you buy this thing, you might be interested in this other thing that's similar. The problem with that type of personalization, which is something we still need, is that it doesn't always meet the customer at the moment of need. Right now, I am hungry late night. I want some snacks. Understanding that in the moment context, and making personalization be dynamic in that context, is something that classic personalization cannot do very well.

The second thing that classic personalization cannot do very well is bringing in the world knowledge. It's essentially like if you have something, let's say DoorDash onboarded Best Buy yesterday, and there is no engagement yet on the platform. Still, I should be able to know, given your other purchases, what are the things that you should be interested in. This is where world knowledge comes in, in the sense like generative AI or LLMs have this world knowledge embedded into them. You can actually blend the classic personalization with LLMs to enhance that experience.

The other thing that LLMs are really good at is basically looking at your past behavior and also the behavior you are expressing within this session to understand what your intent is. Not only does it understand the intent, but it also can express it back in natural language, essentially something that is explainable back to the consumer. This is where the lean, one of the themes of this talk is going to be, is like how we're actually layering that intelligence on top of the classic personalization systems that we have to make the personalization grow from the left, which is classic, which is people like you like X, to the right, which is extremely moment-aware and hyper-personalized to you, which is basically moving the narrative from people like you like X to you need X now. How do we get there is the story we're going to tell today.

Why Static Personalization is Not an Answer

Going back to the static personalization or little old school merchandising strategies that people used to have, it doesn't really work anymore and doesn't really work in the context of DoorDash and its massive catalog today. It's basically because most of these systems only learn from long-term interests. Long-term interests are good. It tells you a lot about you, but it cannot react fast enough when you're interested in changing. If it's Black Friday, you saw a deal, you looked at that, you spent like 30 seconds on a product details page, it's not going to be able to be active enough to throw it back at you when you come to the app.

Then, there is also the thing about a lot of merchandising still used to get done by humans or by systems that are very editorial, which is something obviously you need, but at the same time you need to be able to generate a lot of content to meet all the consumers. This generative piece is something also getting solved by actually adding LLMs into the mix. That's something that we'll talk about later in more detail. What static personalization cannot do and what I'm trying to illustrate here on the right-hand side, so basically when you look at a billboard on the highway. This is the one billboard for everyone. This billboard can only be contextual or personalized to the average person out there, but not to me versus you. That billboard does not dynamically change given who looks at it.

Similarly, if you think of a store, a physical store, the store manager is basically putting out deals and promos within the store. They're reorganizing the locations of the things in the store to sell better, but they're also thinking about the average consumer because there's no way they can create 5 million replicas of the store for 5 million consumers that walk in. Instead, it's on the consumer to walk the path that they think is the personalized path for them. Whereas in the digital world like in DoorDash, we have the ability to actually give you your personalized path within the app without you having to actually think about it. This is where basically we can go back and serve these high intent moments like Black Friday. Let's say we have Black Friday coming up. We know a lot about you. Then I'll talk about an example of how an ideal experience looks like, but combining old school ML with generative AI and with some technologies that we'll talk about, Pradeep will touch in more detail, we can make these magical experiences happen now.

Dynamic Moments - What are They?

I talked about dynamic moments, so just a little bit to formalize the idea. What are these? Basically, these are short-lived, high context events that the consumer is expressing. It could be a movie night for someone. It could be something very personal like your birthday. It could also be something like a social event, a nationwide event like Black Friday. These moments are where we need to meet the consumer and we need to understand that things are never static. Even within the context, in this short period of a Black Friday, a single consumer's intent might also change dynamically. They may be interested in the TV now. Tomorrow, they might be interested in this other headphone. User interest changes, seasonalities come in and go.

Then the other thing, what is very important for these kinds of events, especially events like Black Friday, is that you have access to and you can bubble up the right deals and promotions at the right moment to the right customer. Essentially, the thing that we codified here is like, whatever life throws at you, we adapt in real time. The ingredients of this is what we're going to go into a little bit more in detail.

The Ideal Experience

Before we go there, let's walk through an ideal experience. Basically, I talked about all the foundational thinking behind this hyper-personalization and gave you some framework, but to make it really visceral, let's talk through an ideal example. This is Alice. What we know about Alice is like, Alice has been on DoorDash for many years. We know about their restaurant history. Now that we are starting to serve up retail offerings on the platform, we have seen that she has ordered chargers, cables, electronic essentials. She's aware that we are selling electronic items. We also see that she has been browsing these little high-end headphones on the app. We see that she has shown interest in Apple AirPods Max, Bose QC Ultra, Beats over-the-ear headphones.

One thing you can deduce just from this is that she's basically not interested in the earbuds, but also the fact that she's interested in noise-canceling headphones. Basically, from this behavior, we can deduce that there's a strong pattern of over-the-ears, noise-canceling headphones for Alice. Now comes the Black Friday moment. What do you want to do? Because we know that Alice has this propensity towards electronics, but also has shown strong interest towards these high-end noise canceling headphones, in that moment, we want to meet her by showing her the assets and the offerings that will help us win the moment, knowing all the recent behavior that she has expressed.

Basically, convert that interest into a purchase and help her find the thing that she's interested in with the minimum amount of effort. What would something like that look like? Alice opens the DoorDash app on Black Friday in the morning. She sees this page which is flooded with electronic deals, and it has honed in on the fact that she has been looking at headphones recently. Also, it knows that she likes noise-canceling, over-the-ear headphones. It's not only just headphones, also, because you don't want to actually have a whole page of headphones, which does not have any variety. It also understands that she has expressed interest in TVs, and in general there are some other deals on the page.

As you see here, she may have expressed some interest in video games. There's also a healthy amount of exploration that you need to throw in into this, because it cannot always all be targeting. You should also be exploring a little bit on this page. Then you'll see, also, that we understand her affinity towards merchants, in this case, like Best Buy and Home Depot. We're trying to guide her towards those. This is what an ideal experience would look like.

To make that ideal experience happen, we need certain ingredients. The ingredients, from the left to right, is, first, we need to understand, which headphones are over-ear and noise-canceling? Then we also need to know which headphones have deals on them right now. This goes to rich product understanding, and also intelligence of the inventory. What do we have now? What has deals and things like that? The second piece is the user side of the same story, which is deep understanding of the user. What are the different interests of this user? What's their affinity towards certain brands, merchants, certain categories of items? We need to know all of that, and also derive that in the moment as they're doing things on the app.

Then there's the moment awareness, which is a segue from the last thing I said. It's, the system needs to know that even that's happening now. It could also be your birthday. We might be just aware of the birthday. It could be something like Black Friday. Then we also need to hydrate that moment with the thing that she came from looking at this ad on this Bowers & Wilkins headphone. Then the last piece of this puzzle is blending what we can do offline and what you can do real-time, because if you want to do everything real-time with LLMs, it can get very costly very quickly. There's also latency concerns. The beauty of this and the things that we'll describe is the system needs to be built in a way that the real-time piece and the offline blend seamlessly and produces this final moment that we talked about.

At a very high level, the architecture of this looks like the following. We have this layer, which is just product understanding layer. I'll talk about how we have actually done a lot of generative AI to understand products. Then the second piece is user understanding, which is where we know about the user and we spit out things called narratives about the user. The third piece is where, knowing both the user and the item, I want to pull in the things that are relevant to you and then rank them properly. This is where traditional ML shines. That's where they can take two-tower embeddings and multi-task deep learning models. They can be very efficient and low latency. Then the last piece is this dynamic content. Now I have all these ingredients, how do I create content that is both speaking to the event that's happening, like Black Friday, but also the session that's happening right now, basically.

On the product knowledge side, one of the huge shifts that DoorDash has gone through recently is moving from a human-centric system to an AI-driven system. I would just give you some numbers. There's a task that we used to do. Let's call it extraction. Basically, for this headphone, I need to extract the fact whether it's noise canceling, what's the brand, what's the color? Is it over-the-ear or is it an earbud, things like that?

Similarly, for grocery, for everything we sell, like a bag of chips, what's the brand, what's the flavor? Does it have allergens in it, things like that? There's a certain task that used to take 28 days, we do that exact same thing in 2 days today. That's like showing the power of automation through AI for these kinds of tasks. What did we need to do to get there? We started with vanilla LLMs. They get you somewhere. It's the 80 of the 80/20. Because we cannot actually sell something for which the allergen was wrongly extracted because someone might actually die, so, basically, we need to be very precise. We set pretty high bars on the accuracy of these systems, which meant that we had to go and do fine-tuning. We did a lot of our fine-tuning in-house.

Then we also ground these systems with RAG. Basically, we let the system know about our ontology, taxonomy, and the products we carry so that it can make an informed decision about the thing it's extracting. Last but not the least, a lot of mom-and-pop stores, they send us a spreadsheet that has very abbreviated things that a human even won't be able to decipher. How do we actually then go and hydrate that thing? This is where agentic processes come in. Basically, with that text, if you go and search on Google, normally, it finds a lot of information that's very relevant, even if it's highly abbreviated.

Then from all the information we gather from Google, and also, we can go to the merchant's website and scrape the data, things like that, we can bring in a bunch of information over which an LLM can reason and then extract the right information. By doing this, we have been able to do small merchant onboarding. We have been able to solve it from a purely human-driven process because those items are so hard, to something that's very much AI-driven with human in the loop. The story is human in the loop. Once you've extracted this, it helps both the consumer and the Dasher when they're shopping for you, because we can tell the Dasher exactly the thing that you're looking for, the variants and stuff.

Consumer Profiles - The Core Pillar

Pradeep Muthukrishnan: Now let's talk about how do we build the other core pillar that is needed in order to provide this hyper-personalized experience, and that's consumer profiles. For people who've been working for very long, consumer profiles are nothing really new. I think it has existed in the industry for ages. Early 2000s, it was more so feature vectors. You just essentially represented every consumer with these immense feature vectors, and you would just try to pump in as much as possible about how did they engage with different parts of the different pages, different taxonomies, different categories. Did they click on this item versus that item? You would just basically try to pump in as much of the data and hope that your ML model has enough expressiveness to be able to learn all of it. That worked for a bit. Then we basically said, we need something a little bit more expressive, but not as expansive and just keeps forever getting bigger, because the model doesn't have enough ability to learn from these huge feature vector representations.

Then we moved to embeddings. This was probably around 2010, 2015. We started working a lot more on the embedding side, and we said we're going to have multiple different embeddings to represent the consumer profiles. That worked for a bit as well. I think what all of them basically still didn't really work with is it was all data learned from your own app and your app engagement, but it didn't have any of the external real-world information, which is where LLMs do really well.

The latest trend has been like, let's just represent consumer profiles in just plain old English, and it has enough expressiveness. You should be able to write anything that you'd want about what you think this user is interested in, what their preferences are when it comes to stores, categories, item level, brand level preferences, whatever it is, you can just write that out. That's one part of the puzzle.

The other part where it basically helps is explaining these recommendations back to the user itself. Now you actually have something which traditional machine learning couldn't do really well, is, how do you explain these recommendations once you make them, back to the user? If you have this profile essentially just stored with LLM narratives, then you can essentially use some of that to explain back to the consumer, this is why you're seeing this piece of content. That has an immense impact in terms of their engagement with the recommendations that you show.

These consumer profiles that we talk about, we group them into different memory blocks about dietary habits, household information, category preferences, item brand preferences, taxonomy preferences, and so on. It becomes a shared primitive that we use across all of our different problems, whether it is generating notifications for you, whether it is recommending different carousels or collections of items on the homepage for you, or whether it is for different ranking problems or search, what have you. These become like a core pillar that you essentially use across the entire company there.

Here's an example profile, continuing the example of Alice itself. This is a piece of the narrative that would exist within the taxonomy preference. Alice tends to purchase really last-minute electronics, but also has shown interest in premium over-ear, noise-canceling headphones like Apples, Boses, and Beats, and Bowers & Wilkins, and so on. You can also extract structured facets from this and store them as well so that it makes it easier for you during retrieval time. You can say whether it's electronics taste or preferred form factor, all of these things could be just attributes that you store as part of the profile as well. Then you can use it even for notification saying that like, I just want to target consumers who have electronics taste as premium. That makes it a lot more easier for you to do retrieval or targeting consumers rather than just using the narrative snippets.

Hyper-Personalized Merchandising

From that, the profile that we've essentially generated for all the different consumers, how do you go into doing hyper-personalized merchandising? For every single consumer, every week, for different use cases, we essentially ask the LLMs again saying that like, what should we show to this consumer given that this is the profile, this is the long-term memory that we have about this consumer in terms of what their preferences are? What carousels would really make sense? These use cases that I talk about could be evergreen use cases like grocery stock-up, or brunch basics, or nightly snacks, whatever it is. Or it could be in-the-moment use cases such as Black Friday, or it is back to school that's coming up soon, or it's the flu season, whatever it is. What it outputs is essentially a notion of a content blueprint for different carousels, essentially, you say that this is the best way to sell grocery stock-up to you. For me, it could essentially be premium seafood essentials because I actually like to eat a lot more seafood. Maybe that's not the case with Sudeep. How do you basically populate that carousel itself? We say that like, here are the kinds of queries that you need to do in order to essentially populate it.

The notion of having essentially just simple search as a way to populate these carousels is expansive enough for all our different use cases. You could essentially say that for any collection, what is the kinds of items that you need to be able to show within it? It becomes a pretty easy to use primitive. You can add additional constraints saying if they are price sensitive versus not, and that could change based on different carousels. Maybe I'm price sensitive for retail purchases or electronics deals purchases, but not really for grocery or something. You can add brands, price, merchant, whatever it is. I have different store level preferences or propensities. You could add all of those as constraints as well. You could essentially get all of this for free from the LLM.

Offline vs. Online

In terms of what really happens offline versus online, LLMs are still not great at using them online. You don't want them in your serving path itself. A, for latency reasons, as well as for cost reasons. Most of the content generation about what to showcase to every single consumer is generated offline. This is where your carousel config for every single consumer have like 40 to 50 different ideas of items that you want to be able to merchandise. That is all generated offline. What you do want to be able to do online is actually populating these carousels because that needs to respect your inventory. That needs to respect what is on a deal versus not. It also needs to be respecting the user's intent in case it changes since they came onto the app. This is still based on everything that you've done till you've come onto the DoorDash app. Once you've come on, maybe your interest has changed. Maybe you're not looking for your headphones anymore, but you've just been searching for TVs. We want to be able to adapt to it. You still want how you populate these items, to change to items.

The other good reason why you want to be able to do this is if they just change their address to some other location, then you need to be able to respect what is available within that address as well. Generating it end-to-end, everything offline, isn't really a good idea. You want to make sure that you have some notion of it. You say that this is the kind of stuff that we want to show to this user, be able to adapt to it based on whatever it is that has changed since the user has come on to the app. How do we incorporate those real-time signals that I was talking about? The long-term stuff is essentially all your profiles. In real-time behavior, maybe you've done a few different searches, we're trying out a bunch of different agents as well within the app.

If in case you log in to the app and you start chatting around with an agent, and say that I'm hosting a taco night party right now, and one of my friends is allergic to cilantro or something, you want to make sure that you don't actually show cilantro, even if that was part of your grocery stock-up. Then you need to be able to do this blending to make sure that your retrieval, in this example, it was about 75-inch TVs for Alice, and say that I also want to show TVs, not just headphones.

How The e2e Experience Comes Together

How does all of these come together? From your mobile app, you have, essentially, your feed service, which is this orchestration layer that we have. All that it does is it basically just passes through all the different events and calls the direct microservices. You're streaming these consumer events, and you have this service of DoorDash Brain, which understands everything about the consumer, which hosts also your profiles and serves them, in addition to that, it also does the blending of your real-time intent. As all the consumer events go to your DoorDash Brain originally, and then it keeps track of how is your intent evolving, and we keep track of this through multiple different embeddings.

In the real-time thing, it is not really possible to call an LLM and basically have narratives itself. You try to, essentially, approximate the user's intent since they've come onto the app, largely through embeddings. Then you can batch update them back to your profile itself over a day or so. When someone says that I want my homepage right now, we essentially go and fetch what are the different carousels that we've already generated for this particular consumer. For each of those carousels, you need to make 10 to 20 different queries. In addition to that, if you want to expand it based on whatever the real-time intent has been, you add those as well. It becomes 20 to 30 different embedding retrievals that you would essentially have to do, go search for your item-level embeddings that you've stored in a different index and do this embedding-based retrieval, do some lexical search, and do your typical semantic search workflow as well. Then it returns back all of those things.

The other key point here is that when you return all of this, you still need to be able to do ranking of these things. The queries help you with certain level of personalization, but your two-tower embedding models or whether it is your MTML ranker models, don't throw them away. Still, basically, you need to use them to make sure that they encode a lot more information about specific item preferences that you essentially have. Especially for categories where there's often repeat purchases like grocery, that is where the personalization happens as well, a part of the personalization, it's not just what the LLM did for you.

Why Evaluating This is Hard

The last part of it is like, we need to do evaluation right now. How do you evaluate this? Evaluating it is pretty hard because previously you had this cohort-level merchandising that you did. You knew exactly what different consumers were going to see. There were probably 30 different use cases and 100 different carousels that a human could go through and say, this was good, this is not good, great. Right now, for every single consumer, there's 50 different carousels which are brand new and which are going to say different things. The copy is going to be different.

The items are going to basically be different. How do you measure that like, is this a good carousel or not? You need some way to do this systematically and be able to optimize the system as well, not just evaluation for the sake of evaluation. You can't just go based off vibes. We evaluate these things across three different axes today. This becomes your reward function for GEPA. We use GEPA right now within DSPy.

GEPA is Genetic-Pareto, which is one of the ways in which you can optimize these compound AI systems where you have a lot of different moving parts of the system where you have certain prompts, you have essentially the profile piece itself. You have some retrieval logic. You have some ranking logic. You want to be able to treat each one of these things as parameters that you can essentially optimize for. GEPA essentially helps you do that. For it to be able to do well, you need to essentially say, what's the reward function? How am I supposed to say that this is a good outcome versus this is not a good outcome? We look at quantitative metrics. We look at, essentially, these are your typical rubrics. Does this match the personalization? Is this essentially what the user is likely to click on? If in case you already have been running this as an experiment, you can just use your actual online data. Then you use LLM-as-a-judge as well. This is the part where it feels like it's LLMs all the way down. LLMs generated it, and then you ask the LLM saying, was this good?

The LLM basically tells you some part of it. In addition to that, this is the part where GEPA does really well, is if you're able to collect human annotated feedback, especially just textual feedback, it does much better in terms of optimizing your prompts itself. We work with a lot of human annotators as well as our own employees with dogfooding to collect this kind of textual feedback. This can be very broad. It doesn't need to be just pure numbers and scores and scalers. You could just say like, what do you think about this carousel? Or you could just show a few of those carousels together and say, if you saw this on the homepage, is this a good or bad experience? The users can essentially tell you whatever it is that they think, in just textual feedback, and that becomes your reward function back into the GEPA optimizer itself.

What do we use the GEPA to essentially optimize? These are the parameters of the compound AI system that we are right now basically trying to tune. One of them is just prompting and templates. Maybe it figures out that we needed to make sure that within the primary category itself, there's enough variety. That it doesn't basically just become all headphones. Maybe that's what a consumer said that they didn't really like.

Then it would go and actually change the prompt itself. Or the search term generation. Are these search terms all basically exactly getting the same set of items? You want to make sure that you have more diversity again. That is one of the things that you could do that. Or you just missed completely something about the user. Maybe they bought some Bose headphones six months ago, and you've not taken care of that in your profile generation. You make sure that I should have generated the profile with this particular facet as well. You can make sure to expand to that. The last bit is the ranking objective itself. You could essentially say that your objective function for your ranking itself has a parameter and say, how much of it should have been exploration versus exploitation? What should the value model basically say? With all of that, you optimize it.

How ML + LLMs Work Together

The last few slides that I have are more about the lessons that we've learned doing this. Yes, LLMs are great. They're very good at specific things, but so is traditional machine learning. It's sad to say that deep neural networks have come to be called as traditional machine learning now, and they're only around for 10 years or so. Don't throw these things away. I think both of them need to work in unison for things to really work well.

Some of it is for pure latency reasons. Some of it is so that you can actually explain it. Some of it is just this is a lot more, not just stochastic, but you can actually optimize these things. GEPA is great, but it's not really doing any convex optimization over any of these things. It's just doing some evolutionary biology-based mutations here and there, and does some hill climbing. Your traditional machine learning is good at ranking and more tightly constrained problems, versus if in case you need something which generalizes really well, understands consumers much better, creates much better profiles, you can't beat LLMs with traditional machine learning. They both have their strengths, and both of them need to work together in order for you to deliver end-to-end good consumer experiences.

Lessons Learned

That's basically what this says, LLMs shine at turning messy behavior into clean, understandable narratives. The deep learning models do really well when you have to optimize for some very concrete metrics about whether it's CTR or CVR under constraints. Don't let your LLMs do the last mile ranking, and don't, in fact, do ranking, do the content ideation piece, or your ML models do the content ideation piece.

The last bit about organizational or product lessons, invest in shared primitives, profiles, product graph, evaluation frameworks. It's good to make sure that these are actually generic things that can be used in a lot more different applications. You're not tightly coupling them with the specific use case that you want to build it for. A, it's not cost effective. B, there's a lot more to basically be one, if in case you make them as shared primitives. Treat the LLM plus deep learning thing as integration, as product work, not just infra work. Meaning to say that when you do it as an end-to-end experience, that's what the consumer cares about. The consumer doesn't care that you did this as a piece of infra work, and it's pretty cool. They really care about what it is that they see on the screen, and does the thing make sense or not.

Treat that as part of the product work itself. The last bit, I've seen a lot of different companies from other friends that I've been talking to as well, is people tend to start building these as large infra projects within the company. I think that that doesn't really work. When you're doing, whether it is profiles or whether it is eval frameworks, make sure that you have real experiments where you can drive return on these investments soon enough. You also would basically figure out how do you even build this and generalize them as frameworks much better. Make sure to actually start a little small before you make them as generic frameworks. The last bit I would say is, I hope that you say this, that you love DoorDash because it gets you, and it gets you the things that you need.

Questions and Answers

**Participant 1**: I think you answered this in the online blending. If a customer is interacting with one of the suggestions, they purchased the product, you actually want to remove that. Is that where it's happening?

**Pradeep Muthukrishnan**: That part would happen as part of the blending as well. We need to make sure not to show the same thing. Surprisingly, this has still not been solved. I think now it's solvable. At least now it's purely an engineering problem at least.

**Participant 1**: You talked a lot about using LLMs, but you didn't talk a lot about agents. Do you actually do anything with agents, or are you just using LLMs in your offline process?

**Pradeep Muthukrishnan**: I think to start with, we've been saying that let's use this in a non-agentic way and get this to work first. Once we have that, then we want to make this and the entire app to be agentically merchandised.

**Sudeep Das**: We are developing agents. There is quite a bit of an internal team that's building agents and agentic primitives. There's a lot of lessons that we learned building those agents. Some of the things that Pradeep talked about, like GEPA, like prompt optimization, is proving to be very useful there also. Then the grounding of the agent with RAG, with this profile thing that we talked about. If you want the agent to pick the right item, so if I say I'm going to make chicken pasta tonight, it needs to know that Sudeep likes chicken thighs and he only likes organic. Essentially the same primitive that we built, this profile, can actually be sucked into the agent's context through RAG.

Then the agent can do this last-mile reasoning over which chicken to pick. The search will bring back, say, five or six things. Then which one to pick in the context of the recipe I'm making is where LLMs shine in the last reasoning layer. These are some of the things I can talk about right now, but in the same spirit of how we're doing hyper-personalization. There's quite a bit of agentic things that's consumer-facing that we're developing. As I said, our catalog building process is extremely agentic because there used to be a lot of manual stuff that needs to happen. A spreadsheet moving from X to Y, it can be completely agentified, basically. We're also doing those. Then we're also thinking about agents that will help the Dasher shop as their insider store. You'll see a lot of those coming out.

**Participant 2**: When you were discussing user feedback in that context, you were then also talking about, and I think you said this might be based on user feedback, maybe we want to, in the prompt, say, please include at least 30% of non-specified product to expand beyond just showing headphones. Does that imply that you modify the prompts on a per-user basis?

**Pradeep Muthukrishnan**: That becomes too expensive, so we still do a single prompt for a use case, but not for per user.

**Sudeep Das**: I think the trick with these things is you have the prompt, that's generic, but the thing is like, what you pull into the context as part of the answer generation process, you can have some leeway there. That's also another way you kind of solve this personalization within the context part. We have tried that also, especially the agentic use case is really benefiting from that.

**Participant 3**: Can you talk about how you update the plain English user profile that you generate? Where does that originate from? How do you incorporate updates to that as it evolves. Is there any point at which there's actual user input text being input that would cause prompt injection, jailbreaking, state corruption that you would have to sanitize for?

**Sudeep Das**: How do you update the profile as new information comes in?

**Pradeep Muthukrishnan**: A, the profile, we think about it as like there is an initialization part of it where you want to essentially dump everything that the consumer has done over the last year or so. Whether that's what items they've basically bought, what kinds of search queries have they basically done. Have they asked for substitutions when they've gone through the actual end of the shopping flow and the Dasher says that this is not available, what else did they substitute with? Give all of that information and generate different memory blocks. You don't need to essentially send all the data for all the different memory blocks either, because otherwise it gets too expensive as well. You can say that only the restaurant orders for your restaurant habits, for example, and only the new vertical orders essentially for your grocery shopping.

On top of that, in terms of how do you do the updates to these things, each of these facets are the memory blocks that I'm talking about, which is dietary habits and your item level preferences and your brand level preferences. They can be updated on different cadences. Not all of them basically need to be updated every day or every single time that the user has basically logged onto the app. For each one of them, we figure out what the right cadence is and what is the right model to essentially use to update it as well. You do really delta updates. You don't really need to send all of the one year's history again, essentially to do that, again, for cost reasons, as well as you want to make sure to recency weight your most recent interactions of the consumer with your app. You want to do some recency weighting, so doing delta updates for these things, use the right model and figure out what the right cadence is.

**Participant 4**: In that profile, do you separate the recommendations from your system from the actions the user took as a result of the recommendations from your system so that you don't maybe pollute the user's original intent with just following recommendations? Let's say I bought gloves, you recommended I buy a cap, you put a wool cap in it.

**Sudeep Das**: Where you're going is an age-old problem with any recommendation system. This is the pigeonhole effect. Basically, I like certain things, you recommend me more of those things, I buy those things, so I'm stuck. This is not going to go away in the LLM world either. The way to break that has always been to introduce exploration. Even in the LLM-generated world, so what we give the LLM is the factual history of what you have bought, what you've interacted with. Obviously, behind that, there is recommender systems. There's no way to break this cycle without having introduced some of the exploration. That remains true if it's in the traditional recommendation systems and in these generative ones. It's just like from time to time, there has to be exploration that's happening on you, so we're discovering new interests and then breaking that sort of cycle. I think the same thing applies here.

**Pradeep Muthukrishnan**: Exactly. I think the exploration part is true, but in terms of how you do the exploration, you can essentially rely on the LLM to help you with the exploration as well. For example, you're saying, generate different search queries as to how to populate this carousel. You can tell the LLM to make sure to expand to adjacent categories that this consumer might be interested in. If it comes to restaurants, this person seems to like Indian restaurants and Mexican restaurants. Maybe Thai isn't too different from them, or there's a lot of overlap there and that comes from the real-world knowledge and they start recommending those other cuisines as well.

**Sudeep Das**: Actually, this is a really good question because how we explore is also changing with LLMs. I think it's a little bit underexplored area today. Because the LLMs have so much world knowledge, the adjacency, so we used to do Q-learning and all those things to basically not explore on things that are expensive and the users will not engage with, and that's wasted effort. Also, it takes up real estate on the screen. With the LLMs, I think there is an angle here of doing much better guided explorations that we could not do in the classical ML world. I don't think it has been thought through or explored a lot yet. It's a great question, actually, like it planted a thought in my head.

**See more presentations with transcripts**

Recorded at:

![Image 5](https://qconsf.com/)

Apr 21, 2026