GPT-5.6 Access Limits, GLM-5.2 Hardware Math, Cerebras 750tps — AI Daily Jun 27
490 messages · 63 active members
Overview
Topics
Reports that GPT-5.6 (Fable 5) will be limited to ~20 approved US companies — with possible KYC and residency-based blocks — divided the chat. Builders argued this hands the lead to Chinese open-source models and accelerates local-model and Huawei/ARM hardware adoption, while others framed it as the cost of doing business under any administration.
@jasonakatiff posted detailed pricing and memory math for serving GLM-5.2 locally — 8×H200 HGX (~$370k, ~1.13TB HBM3e) baseline, B200/B300 for full 1M context, or used 8×H100 with NVFP4 as the budget path. Counterargument: renting at $2.50–3.50/GPU/hr or the GLM-5.2 API ($1.40/$4.40 per M tokens) wins unless you're at 24/7 utilization. Driver of the buy thesis: fear that new models go API-only and hardware 10x's.
@jcartu highlighted Cerebras + OpenAI hitting 750tps (Kimi at 2000tps on waferscale), with @jasonakatiff noting it outpaces human reading speed and asking who Anthropic's waferscale partner will be. @thewildzeno argued ASIC tiers will become premium subscription upsells pressuring NVIDIA, alongside a thesis that frontier generalists will coexist with distilled specialized models (coding-only, on-device voice/LLM). Samsung/Hynix $500k–$1M worker payouts cited as proof memory pricing is wildly above cost.
@tounano detailed a VPS+mosh+tmux workflow with a meta-orchestrator that spawns worktrees in ./.worktrees, assigns deterministic port pools, auto-prepends planning prompts, blinks idle tabs as a productivity signal, and merges/cleans on completion. @thewildzeno runs a comparable WSL-based setup with 20+ terminals and is migrating to native Linux + Superset. Kieran's 'oh my pi' similarly orchestrates Codex, Opus, Gemini Flash, and Grok in one terminal as a Claude-outage hedge.
@drcopybymatt built a Duolingo competitor at an Austin hackathon using gpt-realtime + Azure pronunciation assessment in parallel (standard STT only resolves words, not pronunciation quality). @victorbrss validated Cartesia Sonic 3.5 as a cheaper ElevenLabs replacement, with @samtome noting open-source TTS is largely solved. @realcrischico benchmarked Grok Imagine (~17¢/video) vs Seedance 2 Mini (~$1) and shared a POV fixer-upper ad prompt; @iamgalba published Q-gates via Python runners for pre-ship campaign QA.
Key Takeaways
- GPT-5.6 access may be limited to ~20 approved companies with possible KYC and residency-based blocks — plan for Chinese open-source and local-model fallbacks.
- 8×H200 only beats renting/API for GLM-5.2 if you run it ~24/7 for over a year — otherwise $2.50–3.50/GPU/hr rentals or the $1.40/$4.40 per M-token API wins.
- Cerebras + OpenAI delivers ~750tps (Kimi at 2000tps) — design agent loops around the latency step-change, and expect ASIC tiers to become premium subscription upsells.
- Treat blinking idle tmux tabs as your productivity signal: deterministic port pools per worktree, auto-merge/cleanup, and at least one agent always waiting on you.
- For pronunciation-scoring voice apps, pair gpt-realtime with Azure's pronunciation assessment model in parallel — standard STT only resolves words, not pronunciation quality.
Hot Threads
GPT-5.6 release timing and the approved-company access list
Buying an 8×H200 node for GLM-5.2 vs renting vs API-only future
Tmux orchestrator + VPS worktree workflow for parallel agents