Hermes Stack, GLM-5.2 Coding, Ornith-1.0 — AI Daily Jun 28

651 messages · 79 active members

651
messages
79
active members
@jcartu, @samb69, @robinroy
top contributors

Overview

Today's 651 messages across 79 builders centered on three big threads: harness/model stack choices, agentic coding methodology, and AI voice tutors as a flagship 'actually useful' use case. The Hermes agent stack dominated — builders compared the three Mac apps vs Telegram (Telegram wins for speed), dug into the new /moa mixture-of-agents debate/synthesis mode, and circulated a 15-level Hermes mastery roadmap. VPS + tmux + /remote-control emerged as the standard pattern for keeping agents coding while laptops are closed. Oh My Pi (OMP) pulled mindshare from OpenCode thanks to working progress bars, OAuth, and daily updates. On models, the consensus stack crystallized: Opus 4.6/4.7 for scaffolding and PRDs (4.8 widely panned), GLM-5.2 running locally for actual coding (rated on par with Opus 4.6–4.7), and Gemini Flash 2.5 (270 tps, 1M context) as the orchestration brain for Hermes — with warnings that Google rate limits stay tight until ~$1k spend. DeepReinforce dropped Ornith-1.0, a self-scaffolding agentic-coding LLM with a frozen judge + deterministic monitor to block reward hacking. Meanwhile @Tz1888's 13-hour GPT-5.5 chain (plan → subagent review → implement → E2E with HTML reports → PR review) sparked a methodology debate, with @Anonymoushat arguing for OpenTelemetry + a fine-tuned SLM watching logs over bloated E2E suites. On the builder/business side, @drcopybymatt's 3.5-hour voice language tutor hackathon (GPT realtime + Azure pronunciation) exposed tough economics — ~5 hrs/user/month breaks consumer pricing — but the group flagged AI tutoring at ~17¢/session as a genuine flagship use case (Stimuler highlighted). Ops threads converged on treating DB migrations as gated release steps with ledger tables and drift-check commands. Claude Code rate limits are biting even at 2 parallel sessions, and Jason's anti-goals/directional-thinking philosophy resonated across the group.

Topics

Builders compared the three Hermes Mac apps vs Telegram (Telegram wins for speed), dug into the new /moa mixture-of-agents debate/synthesis mode (powerful but slow), and circulated a 15-level mastery roadmap covering foundation → leverage → autonomy. VPS + tmux + /remote-control became the standard pattern for keeping agents working when laptops are closed.

@jcartu's stack hardened into a community pattern: Opus 4.6/4.7 for scaffolding/PRDs (4.8 widely panned), GLM-5.2 locally for coding (rated on par with Opus 4.6–4.7), and Gemini Flash 2.5 (270 tps, 1M ctx, near-perfect tool calls) for Hermes orchestration. DeepReinforce also dropped Ornith-1.0, a self-scaffolding agentic-coding LLM using a frozen judge + deterministic monitor to block reward hacking. GLM-5.5 weights drop in August.

@Tz1888 detailed a 13-hour GPT-5.5 chain (plan → subagent review → implement → simplify → E2E with HTML screenshots → PR review). @Anonymoushat pushed back that E2E suites are backward-looking and bloat context — instrument with OpenTelemetry and pipe logs to a fine-tuned SLM that triggers Codex/Claude fixes in real time. @thewildzeno argued production systems with customer money need both. Claude Code rate limits are biting even at 2 parallel sessions.

@drcopybymatt prototyped a conversational voice tutor in a 3.5-hour hackathon using GPT realtime + Azure pronunciation analysis. Group consensus: personalized AI tutoring at ~17¢/session is a genuine 'actually useful' LLM application (Stimuler highlighted), but economics break at ~5 hrs/user/month on consumer pricing. Duolingo optimizes for retention not fluency — Pimsleur and live conversation patterns work better.

@jasonakatiff hit silent schema-skip issues merging worktrees. @seekersight outlined a 6-step gated flow (deploy → migrate staging → smoke → promote → migrate prod → verify) with separate CI jobs. @navuud shared his agent's pattern: SQL files in repo, transactional apply script, a _ff_migrations ledger table, and a drift-check command that diffs repo vs DB and fails CI on unapplied migrations.

Key Takeaways

  • Consensus production stack: Opus 4.6/4.7 for scaffolding, GLM-5.2 locally for coding, Gemini Flash 2.5 for Hermes orchestration — Opus 4.8 widely panned, GLM-5.5 weights drop in August.
  • Run Hermes on a VPS with tmux + /remote-control to keep agents working when your laptop is closed; the new /moa mode adds debate/synthesis but is slow.
  • Replace bloated E2E suites with OpenTelemetry + a fine-tuned SLM watching logs to trigger real-time Codex/Claude fixes — keeps context windows focused on functionality.
  • Ornith-1.0 uses a frozen LLM judge + deterministic monitor to block reward hacking in agentic coding RL — worth evaluating against existing harnesses.
  • Treat DB migrations as an explicit gated release step with a ledger table and drift-check CI command — never let deploys silently apply or skip schema changes.

Hot Threads

@weslindquiststarted

Hermes Mac apps vs Telegram, /moa mode, and the 15-level mastery roadmap

22 replies7 participants
@drcopybymattstarted

Building a voice-based AI language tutor — economics and approach

20 replies6 participants
@yangthegoatstarted

Why agents still miss edge cases — E2E chains vs OTEL + SLM observability

18 replies5 participants

Linked Items