The last AI feature I shipped was a chat box. I am not proud of it. The product needed the user to pick a date, attach a file, confirm three line items, and hit submit. We shipped it as a chat. Users typed dates in seventeen different formats, attached files by pasting links to Drive, and asked the agent to “just do it” which is exactly the message the agent could not act on. The drop-off was 60 percent before the user even got to the part where the agent would have helped.
The next version replaced the middle of the conversation with an actual form. The agent still talked to the user before and after. But the moment data needed to be collected, a real component appeared inside the chat with three labelled inputs and a submit button. The drop-off went to 12 percent. Same model. Same prompt. Same backend. The only thing that changed was that we stopped pretending text was the right surface for picking a date.
That is the entire pitch for generative UI. The model is good enough to plan, retrieve, and reason. The chat surface is the part that has been holding everything back. In 2026 the tooling to do better is finally production-ready, and the teams that figure out the patterns first are going to ship noticeably better products. This is what I have learned about it from putting it in real apps.
What Generative UI Actually Means
The term gets used loosely. Some people mean “an LLM that picks which prebuilt component to render.” Some mean “an LLM that writes JSX at runtime and the browser executes it.” Some mean an entire interface composed by an agent on the fly. They are very different things with very different risk profiles.
The version that is shipping in production today, and the one I am writing about, is the first kind: an agent decides which from a curated set of components to render, fills in the props, and the framework handles the rest. The components are real React (or Vue, or Svelte) components written by humans with normal styling, accessibility, and tests. The agent is just deciding which ones the user needs right now and what data to pass to them.
This is the pattern that scales because the surface area of “things that can go wrong” stays small. The agent cannot invent new components. It cannot ship arbitrary code. It works inside a registered toolbox of UI elements that you have already designed and tested. The freedom is in the composition and the data, not in the rendering.
The other versions exist. Runtime JSX generation works in narrow cases like internal admin tools where you can sandbox aggressively. Fully agent-composed interfaces are still mostly research. For a product you ship to customers, the curated-component pattern is where the value is, and it is what the major frameworks have settled on this year.
Why Chat Is Not Enough
You can build a chatbot in an afternoon. You can also tell, within thirty seconds of using one, why most of them feel bad. The reasons are structural, not stylistic.
Chat is linear. The user sees a stream of text from top to bottom and has to read all of it to find the part that matters. A response that includes three options buried in two paragraphs of explanation is harder to act on than three buttons.
Chat is ambiguous. “Yes, do the second one” requires the agent to remember that it offered three things, that the second one was the database migration, and that “do” means run it. A button labeled “Run migration” requires the user to do the work the agent just did to interpret the message.
Chat is poor at structured data. Filling in a multi-field form via chat means the agent has to ask, parse, validate, and re-ask. Every back-and-forth is a chance for the user to give up. A form does that work in one render.
Chat is one-dimensional. It cannot show a chart, a map, a calendar, a side-by-side comparison, or a preview without either embedding an image (static, not interactive) or punting to a link (which loses the agent’s context).
Generative UI fixes these by letting the agent render the right shape of UI at the right time. A question becomes a form. A list becomes selectable cards. A confirmation becomes a modal with the actual diff. A schedule becomes a calendar. The agent’s job is to decide what shape the answer should take, not to render the answer as text and hope the user can parse it.
This connects directly to the same constraints I talked about in voice agents. The medium shapes what the agent can do well. Voice has to fit inside latency budgets. Chat has to fit inside what text can express. Generative UI lifts the latter constraint by putting actual UI back in the mix.
The Pattern That Works in Production
After shipping a few of these, the structure that holds up looks roughly the same regardless of framework.
You define a registry of components the agent can render. Each component has a stable name, a typed schema for its props, and a real implementation. The schema is what the model sees when it decides whether to call this component. The implementation is normal frontend code that knows nothing about the agent.
The agent’s tool calls become render calls. Instead of a tool that returns JSON the model summarizes into a chat message, you have a tool that returns a payload addressed to a specific component. The framework then mounts that component into the conversation thread with the props the agent specified.
The conversation thread is a mix of text turns and rendered components. The user can interact with the components (click a button, fill a field, drag a slider), and those interactions feed back into the conversation as structured events. The agent sees “user submitted the form with values X” or “user clicked the second option,” not “user replied with some text.”
State lives in two places. Ephemeral interaction state (what is the user currently typing, which option are they hovering) lives in the component itself. Durable conversation state (what has been rendered, what was confirmed, what data was collected) lives in the agent’s context so subsequent turns can reason about it.
The framework you pick is mostly responsible for wiring those four things together. The implementation differences between them are real but the conceptual model is the same across all of them.
Picking a Framework Without Regretting It Six Months Later
Five frameworks are worth paying attention to in 2026. They overlap in capability but they are aimed at different jobs.
Vercel AI SDK is the default if you are already in Next.js or any React app and you want to render components from your agent without picking up a new abstraction. The streaming primitives are excellent, the tool-call-to-component pattern is clean, and the integration with the rest of the AI SDK v6 means you get model routing, streaming text, and component rendering in the same API. If you are in the Vercel ecosystem already, start here.
CopilotKit is what you reach for if the AI is sitting alongside an existing app rather than being the whole app. It assumes you have a real product UI and you want to add an AI copilot that can render extra components, drive the existing UI, or run actions on behalf of the user. The mental model is “agent inside your app” rather than “app inside an agent.” For SaaS products adding AI features without rebuilding, this is the smoother path.
assistant-ui focuses hard on the chat surface itself, with strong primitives for messages, threads, attachments, and component rendering inside conversations. It is the choice when the chat thread is the product, not a side panel. The component model is similar to Vercel AI SDK’s but the chat-surface ergonomics are more polished.
Thesys / Crayon is the rapid-prototype option. It will get you to a working demo faster than anything else and the runtime is portable. The tradeoff is less control over the long tail. If you want to validate that generative UI works for your use case before committing, this is the cheapest way to find out.
Google A2UI is the protocol-first option. Instead of being a framework, it is a spec for how agents and UIs talk to each other, with implementations across multiple stacks. If you are building something that needs to render the same agent’s output across web, mobile, and a third-party surface, A2UI is the abstraction worth paying attention to. For a single-app shop it is overkill.
The decision between them looks something like this. If you are in React and you want one thing that handles models, streaming, and component rendering, pick Vercel AI SDK. If you have an existing product and want AI alongside it, pick CopilotKit. If you are building a chat-first product and want the chat to be excellent, pick assistant-ui. If you need to ship a demo by Friday, pick Thesys. If you are building cross-platform agents that need to render across surfaces you do not control, pick A2UI.
I have used three of these in production in the last six months and all three work. The framework choice matters less than the design discipline you bring to the components themselves.
Designing Components for Generative UI
This is the part that surprised me. The components you would design for a normal app are not always the right ones for an agent to call.
A normal form expects the user to know what they are filling in. A generative UI form will be rendered by an agent that may have already gathered partial information and may not have the rest. The form has to handle being pre-filled, validated, and submitted in any combination, with affordances for the user to edit fields the agent guessed wrong.
A normal table assumes the user wants to scan and filter. A generative UI table is being rendered because the agent already filtered for the user. It needs strong defaults, sensible truncation, and a way for the user to ask the agent to refine the query rather than fight with column toggles.
A normal modal interrupts the user to confirm something they initiated. A generative UI modal interrupts the user to confirm something the agent wants to do. The framing has to be different. “Run this migration?” is not the same as “Confirm migration of users table.” The agent’s intent has to be visible.
The principles that emerge from this:
Components in the agent’s toolbox should describe a single conversational move, not a UI panel. “Pick a date,” “confirm an action,” “review changes,” “compare options.” If you cannot describe what the component does in one verb-phrase, the agent will struggle to know when to render it.
Props should be concrete and typed. The agent will fill them in based on the schema you give it. Fuzzy schemas produce fuzzy renders. A field labeled data: any will be filled with garbage. A field labeled appointmentSlots: { startTime: string; durationMinutes: number; doctorName: string }[] will be filled correctly because the schema told the model exactly what it needed to produce.
Default to safe states on render. The component should be usable even if the agent gave it incomplete or weird data. Form fields should validate. Buttons should not nuke production on a single click. Confirmation should require an explicit action from the user, not just a render.
Let the user override the agent’s choices. The number one frustration with agent-rendered UI is when the agent picks the wrong option for the user and the user cannot fix it. Every component should expose the underlying choice so the user can adjust without going back to chat.
Style components for embedding, not for full-page layout. They will appear inside a conversation surface, possibly in a narrow column, possibly mixed with text turns. Constrain widths, use sensible padding, and assume the surrounding context is busy. Components designed for full-bleed dashboards look terrible when rendered inline in a chat.
State, Persistence, and the Hard Parts
The first generative UI demo you build will work because the conversation is one turn, the component is one render, and the state is in memory. The first one you ship to production will break because the conversation is twelve turns, the user reloaded the page halfway through, and the component the agent rendered six turns ago needs to still be there with its current values.
State management in generative UI has three layers and they all matter.
The thread state is the persistent record of what was said and what was rendered. This needs to live in a database, keyed by conversation ID, with enough structure that you can reconstruct the conversation including the rendered components and their final values. Storing this as a string of message text loses everything that makes generative UI work. Store it as structured turns, with component renders as first-class entries that include both the call payload and the latest interaction state.
The component state is what the user has been doing inside a rendered component. Form values, selected options, expanded sections. This state needs to flow back into the thread state when the user takes a meaningful action (submits the form, clicks confirm) and possibly when the user navigates away. The exact rules depend on the product, but the discipline is to be explicit about what survives and what does not.
The agent’s working memory is what the agent thinks it knows. After the user has interacted with a component, the agent’s next turn should be aware of what happened. This means the user’s interaction with the component has to be reflected back into the agent’s context as a new message or event. The framework usually handles this, but you have to feed it the right shape of data. “User submitted appointment form with date 2026-05-12 and provider Dr. Singh” is the kind of synthetic message the agent needs in its context to reason about the next step.
Persistence is not optional even for early-stage products. Users reload pages, lose connection, and come back hours later expecting the conversation to still be there. Building this in from day one is much cheaper than retrofitting it after launch. The same observability principles I talked about for debugging AI agents in production apply doubly here. You need to be able to replay a conversation including all the renders and interactions to understand why something went wrong.
Streaming, Latency, and the User Experience
Generative UI is not slower than text chat. Done right, it is faster, because a rendered form is faster to fill than a multi-turn back-and-forth to gather the same information.
Done wrong, it feels janky. The agent thinks for two seconds, then a component pops in fully formed, then the agent thinks for two more seconds, then a text response appears. The user has no idea what is happening between the renders.
A few patterns help.
Stream component scaffolds early. The framework can render the empty shell of the component as soon as the agent commits to rendering it, then fill in the props as the model produces them. The user sees the form appearing rather than waiting for it.
Use optimistic transitions. When the user submits a form, render the success state immediately and roll back if the action fails. The model will catch up with the new state on the next turn.
Keep text turns short and fast. The text the agent says alongside renders should be conversational and brief. Long explanations are exactly what generative UI is supposed to replace, not supplement.
Show the agent’s intent during longer operations. When the agent is fetching data to render a component, an inline indicator like “looking up your appointments” beats a blank pause. This is the same pattern as conversational filler in voice agents, applied to a visual surface.
Where Generative UI Falls Down
It is not a universal answer. There are categories where it is the wrong tool.
Pure information retrieval where the answer is one paragraph is fine as text. Wrapping a sentence in a card adds complexity for no benefit.
Highly creative output like writing or code is better as text or as a code block. The user wants to read it linearly and possibly edit it, not interact with it as components.
Long-running background work should not be a component the user has to wait on. It should be a notification, a job, an email when done.
High-stakes irreversible actions deserve more friction than a rendered confirm button. Real two-step verification, typed confirmations, or human approval flows are appropriate when the cost of a wrong click is high.
Workflows where the user already knows the UI cold should not be reinvented. If your power users use the keyboard shortcuts and the dense table view, do not replace that with a chat thread that renders cards. Generative UI is for the part of the experience where the user does not know exactly what they want yet, or where the system has reasoning to do before the right UI is obvious.
The trap is to use generative UI everywhere because it is the new shiny thing. The right framing is to use it where the conversational layer adds real value and the rendered components remove friction the chat would have introduced.
What I Would Build First
If I were starting on a generative UI feature today, the rough plan would be:
Pick the smallest, highest-friction part of an existing flow. Not the whole product. One step where users get stuck. Add an agent that handles that step with a small set of rendered components. Measure completion before and after.
Keep the component registry small at first. Five components that work well beats twenty that the agent gets confused choosing between. The model’s accuracy at picking the right component drops as the registry grows. Start tight and expand carefully.
Make every rendered component testable in isolation. Storybook entries with realistic props. The components should make sense out of context, because that is how the agent will sometimes call them.
Build the conversation persistence before the second feature. Trying to add persistence after you have shipped two flows is more painful than building it once at the start.
Instrument everything. Which components were rendered, with what props, what the user did with them, what the agent’s next turn was. The data will tell you what the agent is good at and what it gets wrong, and you cannot improve either without seeing the full picture.
Resist the urge to ship a chat surface as the only UI. Generative UI works best when the chat is one of several entry points, not the front door. The user should still have direct buttons and menus that go straight to the same components, just without the agent’s involvement.
Where This Is Going
The frameworks are converging fast. The component-registry-plus-tool-call pattern is now standard across all of them, which means the lock-in cost of picking one is lower than it looks. Migrating between frameworks is mostly a matter of remapping component definitions and rewriting tool boilerplate.
Standards like A2UI are interesting because they hint at a future where the same agent can render across surfaces it does not own. Right now your agent renders into your app. The interesting horizon is your agent rendering into someone else’s app, or a system app like a notification center or a watch face. That is years away from being routine but the protocols are being written now.
The interface model is the next big shift in AI products. Models stopped being the bottleneck a while ago. Surfaces are the bottleneck now. Teams that figure out how to put the right component in front of the user at the right moment are going to ship products that feel meaningfully better than chat-only competitors, and the gap is going to be visible to users without them being able to articulate why.
The form that fixed my drop-off problem was a four-field React component that had been sitting in the codebase for months. The unlock was not building anything new. It was getting the agent to render it at the right moment instead of typing the same questions out as text. That is what generative UI is. Less new technology than new wiring of the technology you already have.
The work to do it well is real, but the payoff is one of the few places in AI engineering right now where the user can immediately tell that the product got better. That makes it worth doing.