AI Insights & News

Insights for working smarter with AI

Practical AI news, automation tips, and real-world insights to help your business stay ahead.

We Benchmarked Our AI Agent Against Its Own Local LLM and the Results Blew Us Away

We Benchmarked Our AI Agent Against Its Own Local LLM and the Results Blew Us Away

We ran 18 tests across 3 models: a cloud frontier model and two local LLMs on our own hardware. The $0 local model tied the cloud. Here's the full breakdown.

20 May 20267 min read
Read more
How We Optimised a 229 Billion Parameter AI Model on a Desktop Computer: A 12-Phase Journey

How We Optimised a 229 Billion Parameter AI Model on a Desktop Computer: A 12-Phase Journey

We deployed MiniMax M2.7 (229B params) on a single NVIDIA DGX Spark and spent a day optimising it. Thread tuning added 12% speed, --no-mmap cut cold start from 8 min to 90 seconds, and we discovered a GCC bug on Grace CPU. Full breakdown of what worked and what did not.

20 May 202613 min read
Read more
Google Just Published Its Official AI Search Optimisation Guide: Here Is What Actually Works

Google Just Published Its Official AI Search Optimisation Guide: Here Is What Actually Works

Google's May 2026 AI optimisation guide debunks AEO myths, says llms.txt and content chunking are unnecessary, and emphasises non-commodity content as the key to appearing in AI Overviews and AI Mode. Practical takeaways for Australian businesses.

19 May 202611 min read
Read more
We Hit 120 Tokens Per Second With 1 Million Token Context on a Single Desktop AI Computer

We Hit 120 Tokens Per Second With 1 Million Token Context on a Single Desktop AI Computer

How we achieved 120 tok/s with 1 million token context on a single NVIDIA DGX Spark using Atlas and Qwen 3.6 NVFP4. Zero regression, 100% retrieval accuracy, zero per-token cost.

18 May 202615 min read
Read more
Why We Run Two AI Models on Two Desktop Computers Instead of One Big One

Why We Run Two AI Models on Two Desktop Computers Instead of One Big One

How a two-model private AI cluster using Qwen 3.6 (120 tok/s) for speed and Step 3.5 Flash (20.6 tok/s) for reasoning outperforms a single-model setup. Built on two NVIDIA DGX Sparks for $18K AUD with zero ongoing costs.

18 May 202612 min read
Read more
Harness Engineering: Why Your AI Agent Scaffolding Matters More Than the Model

Harness Engineering: Why Your AI Agent Scaffolding Matters More Than the Model

A decent model with a great harness beats a great model with a bad harness. Harness engineering is the discipline of building the prompts, tools, hooks, sandboxes, and feedback loops that turn AI models into reliable agents.

12 May 202615 min read
Read more
The Transformation Paradox: Why Your AI-Ready Employees Are Being Held Back by Your Organisation

The Transformation Paradox: Why Your AI-Ready Employees Are Being Held Back by Your Organisation

Microsoft surveyed 20,000 workers and found 58% are producing work they could not do a year ago. Yet only 13% are rewarded for reinventing work with AI. The problem is not your people. It is your organisation.

12 May 202613 min read
Read more
ISO 42001 AI Management System Requirements: What Organisations Building Agentic Employees Need to Know

ISO 42001 AI Management System Requirements: What Organisations Building Agentic Employees Need to Know

Complete guide to ISO 42001 AI Management System requirements. Covers all 10 clauses, 39 Annex A controls, and practical implementation guidance for organisations deploying AI agents as digital employees.

12 May 202616 min read
Read more
How to Run DeepSeek V4 Flash Locally: ds4 Engine Runs Frontier AI on Your Laptop

How to Run DeepSeek V4 Flash Locally: ds4 Engine Runs Frontier AI on Your Laptop

The creator of Redis just built ds4, a custom inference engine that runs DeepSeek V4 Flash (284B parameters) locally on a 128GB MacBook. Here is why this changes everything for businesses that want frontier AI without the cloud.

10 May 202611 min read
Read more