The model is the commodity. The glue code is the product.
Ground Model — Daily AI newsletter for builders who ship
Lead Story: AWS's Strands Glue Code Reveals Where Margins Actually Live
AWS published a detailed walkthrough on building custom model providers for Strands Agents using LLMs hosted on SageMaker endpoints. On the surface, it's a technical tutorial about parsing response formats. Read it through the commoditization lens and it's something more revealing.
What's actually happening: Organizations are deploying open models (Llama 3.1 in this case) on SageMaker via SGLang/vLLM, but these endpoints speak OpenAI-compatible formats while AWS's Strands agent framework expects Bedrock Messages API format. The solution? A custom parser layer that translates between them.
Stop and think about what this means. AWS is explicitly building infrastructure that makes the model layer hot-swappable. Llama today, Mistral tomorrow, your fine-tuned model next week — the Strands agent doesn't care. The parser adapts. The agent orchestration persists.
The builder's takeaway is stark: If you're building at the model layer, you're building a commodity. AWS, Google, and every cloud provider are racing to make model-swapping frictionless. The three-layer architecture in this post — Model Deployment Layer, Parser Layer, Agent Orchestration Layer — is a map of where value accrues:
- Model Deployment Layer: Commodity. Race to zero. You're competing with $0.25/1M token Gemini Flash-Lite.
- Parser/Translation Layer: Thin margin. Necessary but not defensible. This is plumbing.
- Agent Orchestration + Domain Logic Layer: This is where the margin lives. The tools you connect, the workflows you encode, the domain-specific decision trees that actually solve business problems.
This is also classic AWS playbook: open-source the agent SDK (Strands), commoditize the model layer, and capture margin on the infrastructure (SageMaker endpoints, Bedrock integration, compute). They want you building on their orchestration layer while they make the model beneath it irrelevant.
AWS is also pushing MCP integration with SageMaker — combining predictive ML models with LLM agents via an open protocol. Again: standardize the interfaces, commoditize the models, lock in on infrastructure.
What to do about it: If you're an AI product builder, audit your stack right now. How much of your value proposition disappears if someone swaps your underlying model for a cheaper one? If the answer is "most of it," you have a model-dependency problem, not a product. The defensible play is proprietary data pipelines, domain-specific tool integrations, and workflow logic that would take a competitor months to replicate — regardless of which model powers the reasoning underneath.
We've been saying it: cloud providers are weaponizing capital to embed AI into their platforms. This Strands architecture is exactly that strategy in action. The model is the loss leader. The orchestration platform is the lock-in.
Quick Hits
$1,500 to train a competitive text-to-image model. Photoroom's PRX research trained a diffusion model in 24 hours on 32 H200 GPUs. If a startup can train a competitive image model for the price of a used car, the "we trained an expensive model" moat is dead. Defensibility has to come from the application layer.
Gemini 3.1 Flash-Lite: $0.25 per million input tokens. Google's new bottom-tier model is designed for high-volume, cost-sensitive workloads. Google explicitly competing on price in the commodity model market. Use cheap models for cheap tasks, expensive models for hard tasks. Model routing is becoming a core engineering competency.
GPT-5.2 announced. OpenAI dropped GPT-5.2 — details sparse, but the rapid cadence tells the real story: model generations are compressing. The gap between "frontier" and "good enough" shrinks with every release.
BNY Mellon goes all-in on OpenAI. The financial services giant is building "AI for everyone, everywhere" with OpenAI. Classic enterprise lock-in: embed into institutional workflows, capture proprietary financial data flows, make switching costs astronomical. We've tracked this pattern for months — it's accelerating.
OpenAI updates Model Spec with teen protections. New guardrails for younger users. Regulatory compliance is becoming product surface area. If you're shipping consumer-facing AI without age-gated behavior, you're building regulatory debt.
Company Watch: AWS Strands Agents
What it is: An open-source SDK from AWS for building AI agents, now with custom model provider support for SageMaker-hosted LLMs.
Why it matters: AWS is playing the classic platform game — open-source the developer tooling, commoditize the model layer, and monetize the infrastructure underneath. Strands + SageMaker + Bedrock creates a full-stack agent platform where AWS captures margin at every layer except the model itself.
The strategic read: AWS is betting that the orchestration framework — not the model — becomes the developer's primary dependency. If Strands becomes the default way enterprises build agents, AWS wins regardless of whether the model underneath is Llama, Mistral, or Claude. This is why they're investing in making model-swapping trivially easy.
Builder implication: If you're building agent infrastructure that competes with Strands, you need a differentiation story beyond "we also orchestrate LLM calls." Domain-specific agent templates, proprietary tool ecosystems, or vertical-specific compliance layers are the play. Generic orchestration is being given away for free.
Tool of the Day: ml-container-creator (awslabs)
AWS's tool for deploying custom LLM serving containers on SageMaker. Handles container packaging for SGLang, vLLM, and other serving frameworks so you can deploy open models to SageMaker endpoints without hand-rolling Docker configs.
Why builders should care: The fastest way to cut inference costs is to self-host open models for tasks that don't need frontier capabilities. This tool removes the DevOps friction. Not glamorous, but it's the infrastructure decision that separates profitable AI products from ones subsidizing OpenAI's revenue.
Stat of the Day
$1,500 and 24 hours to train a competitive text-to-image diffusion model on 32 H200 GPUs — down from months and millions just 3 years ago. (Source: Photoroom/Hugging Face PRX Part 3)
The model is the commodity. The workflow is the product. The data is the moat. Build accordingly.