An MCP server exposes tools that a model can call during a conversation and returns results according to the tool contracts. Those results can include extra metadata—such as inline HTML—that the Apps SDK uses to render rich UI components (widgets) alongside assistant messages.
If the connector is enabled by the prompt or via a UI interaction, it calls your MCP server. They have created some meta fields your tool can respond with, one of which is something about producing a widget along with a field for html.
In the current implementation, it makes an iframe (or webview on native) that loads a sandboxed environment which then gets another iframe with your html injected. Your html can include meta field whitelisted remote resources.
The docs mention returning resources, and the example is returning a rust file as a resource, which is nonsensical.
This seems similar to MCP UI in result but it's not clear how it works internally.