What happened
Meta keeps pitching Llama to companies that will not put customer data on a shared public API, need to fine-tune on their own text, or need paperwork that traces where the weights came from. Partner stories lately all hit the same beat: Llama handles generation, another service handles embeddings and rerank (Cohere shows up here, or an in-house stack), and a policy layer sits between the model and tool calls. Benchmark trivia barely comes up. People talk about latency inside a VPC, whether monthly cost is predictable, and whether engineers can ship prompt changes without waiting on a vendor release train.
Why it matters
Regulated pilots usually stall on engineering and policy, not because the base model cannot write a courteous email. Data residency, log retention, and who may touch production weights are where the calendar slips. Running open weights gives procurement a simpler picture: you host the weights, you own inference, and you can wire the model to Stripe for money, GitHub for code, and internal knowledge bases through MCP-style connectors without one vendor owning every tier. That is how mature teams already split databases, identity, and observability. Treating those pieces as architecture, not accessories, is the point.
Directory impact
Teams comparing Gemini-class cloud APIs with self-hosted Llama often run both. Cloud for fast iteration; open weights for workloads with tighter boundaries. Enterprise LLM work still means legacy code, brittle ETL, and half-documented APIs. Refactoring in small steps with tests beats another integration project that never reaches production. You will see more write-ups about retrieval quality, eval harnesses, and incident playbooks than about raw parameter counts.
What to watch next
SLAs around fine-tuning data handling need to get specific. Compliance Q&A in regulated domains needs eval suites teams can reuse instead of reinventing. Tool protocols need to stay dull and interoperable so MCP bridges do not become the next fragile glue layer. When VPC inference, encrypted logging, and human review for risky actions turn into a small set of well-tested recipes, the jump from demo to audited production gets shorter. Until then, every program still hand-rolls half the stack.