STACKQUADRANT
Signal Desk

AI News Stream

Official research and engineering updates, ranked by relevance. Auto-scored from RSS feeds and refreshed continuously.

Active feeds50
Last updateMay 3
Powered byRSS ingest

Featured signals

Highest scoring items from the last sync.

Editorial analysis

Latest drops

All recent items in reverse chronological order.

Google Developers BlogMay 3
Closing the knowledge gap with agent skills
To bridge the gap between static model knowledge and rapidly evolving software practices, Google DeepMind developed a "Gemini API developer skill" that provides agents with live documentation and SDK guidance. Evaluation results show a massive performance boost, with the gemini-3.1-pro-preview model jumping from a 28.2% to a 96.6% success rate when equipped with the skill. This lightweight approach demonstrates how giving models strong reasoning capabilities and access to a "source of truth" can effectively eliminate outdated coding patterns.
Google Developers BlogMay 3
Subagents have arrived in Gemini CLI
Gemini CLI has introduced subagents, specialized expert agents that handle complex or high-volume tasks in isolated context windows to keep the primary session fast and focused. These agents can be customized via Markdown files, run in parallel to boost productivity, and are easily invoked using the @agent syntax for targeted delegation. This architecture prevents "context rot" by consolidating intricate multi-step executions into concise summaries for the main orchestrator.
Google Developers BlogMay 3
MaxText Expands Post-Training Capabilities: Introducing SFT and RL on Single-Host TPUs
MaxText has introduced new support for Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) on single-host TPU configurations, leveraging JAX and the Tunix library for high-performance model refinement. These features enable developers to easily adapt pre-trained models for specialized tasks and complex reasoning using efficient algorithms like GRPO and GSPO. This update streamlines the post-training workflow, offering a scalable path from single-host setups to larger multi-host configurations.
Google Developers BlogMay 3
Production-Ready AI Agents: 5 Lessons from Refactoring a Monolith
The blog post outlines the transition of a brittle sales research prototype into a robust production agent using Google’s Agent Development Kit (ADK). By replacing monolithic scripts with orchestrated sub-agents and structured Pydantic outputs, the developers eliminated silent failures and fragile parsing. Additionally, the post highlights the necessity of dynamic RAG pipelines and OpenTelemetry observability to ensure AI agents are scalable, cost-effective, and transparent in real-world applications.
Google Developers BlogMay 3
Building with Gemini Embedding 2: Agentic multimodal RAG and beyond
Google has announced the general availability of Gemini Embedding 2, a unified model that maps text, images, video, audio, and documents into a single semantic space. This model allows developers to process interleaved multimodal inputs in a single request, significantly improving performance for tasks like agentic RAG, visual search, and content moderation. By supporting over 100 languages and offering features like task-specific prefixes and Matryoshka dimensionality reduction, the model provides a highly efficient and accurate foundation for building complex AI agents.
Reddit r/MachineLearningMay 2
I implemented meta paper [P]
github link : genji970/Scaling-Test-Time-Compute-for-Agentic-Coding-: paper implementation of Meta Ai paper link : https://arxiv.org/abs/2604.16529v1 As far as I know, there is no public implementation of this paper yet, so I built a minimal research implementation of the core PDR+RTV pipeline. I made project to run gemini-3.1-pro model and test on SWE benchmark(In paper, there is one more benchmark and used models such as opus and more) Need gemini-api-key to run. submitted by /u/Round_Apple2573 [link] [comments]
Docker BlogMay 1
A Virtual Agent team at Docker: How the Coding Agent Sandboxes team uses a fleet of agents to ship faster
I work on Coding Agent Sandboxes, aka “sbx” at Docker. The project provides secure, microVM-based isolation for running AI coding agents like Claude Code, Gemini, Codex, Docker Agent and Kiro. Agents get full autonomy inside a sandbox (their own Docker daemon, network, filesystem) without touching your host system. Over the past couple of weeks, we...
AWS Machine Learning BlogApr 30
Reinforcement fine-tuning with LLM-as-a-judge
In this post, we take a deeper look at how RLAIF or RL with LLM-as-a-judge works with Amazon Nova models effectively.
AWS Machine Learning BlogApr 30
AWS Generative AI Model Agility Solution: A comprehensive guide to migrating LLMs for generative AI production
In this post, we introduce a systematic framework for LLM migration or upgrade in generative AI production, encompassing essential tools, methodologies, and best practices. The framework facilitates transitions between different LLMs by providing robust protocols for prompt conversion and optimization.
AWS Machine Learning BlogApr 30
Unleashing Agentic AI Analytics on Amazon SageMaker with Amazon Athena and Amazon Quick
This post demonstrates how agentic AI assistant from Amazon Quick transform data analytics into a self-service capability by using Amazon Simple Storage Service (Amazon S3) as a storage, Amazon SageMaker and AWS Glue for lakehouse, Amazon Athena for serverless SQL querying across multiple storage formats (S3 Table, Iceberg, and Parquet).
MIT Technology Review AIApr 30
This startup’s new mechanistic interpretability tool lets you debug LLMs
The San Francisco–based startup Goodfire just released a new tool, called Silico, that lets researchers and engineers peer inside an AI model and adjust its parameters—the settings that determine a model’s behavior—during training. This could give model makers more fine-grained control over how this technology is built than was once thought possible. Goodfire claims Silico…
Together AI BlogApr 30
Announcing Together AI and Adaption Partnership
Together AI and Adaption partner to bring Together Fine-Tuning natively into Adaptive Data, helping teams optimize datasets, run fine-tuning, evaluate results, and deploy stronger open models.
Apple Machine Learning ResearchApr 30
STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows
Normalizing flows (NFs) are end-to-end likelihood-based generative models for continuous data, and have recently regained attention with encouraging progress on image generation. Yet in the video generation domain, where spatiotemporal complexity and computational cost are substantially higher, state-of-the-art systems almost exclusively rely on diffusion-based models. In this work, we revisit this design space by presenting STARFlow-V, a normalizing flow-based video generator with substantial benefits such as end-to-end learning, robust causal prediction, and native likelihood estimation…
OpenAI BlogApr 29
Where the goblins came from
How goblin outputs spread in AI models: timeline, root cause, and fixes behind personality-driven quirks in GPT-5 behavior.
Hugging Face BlogApr 29
Granite 4.1 LLMs: How They’re Built
No summary available.
NVIDIA Developer BlogApr 28
NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single Efficient Open Model
Agentic systems often reason across screens, documents, audio, video, and text within a single perception‑to‑action loop. However, they still rely on... Agentic systems often reason across screens, documents, audio, video, and text within a single perception‑to‑action loop. However, they still rely on fragmented model chains—separate stacks for vision, audio, and text. This increases inference hops and orchestration complexity, driving up inference costs while weakening cross-modal context consistency. NVIDIA Nemotron 3 Nano Omni… Source
Together AI BlogApr 28
Together AI Brings NVIDIA Nemotron 3 Nano Omni to Developers on Day 0
NVIDIA Nemotron 3 Nano Omni is now on Together AI: a single open model that reasons across video, images, audio, and text, built for agentic workloads at scale.
GitLab BlogApr 28
GitLab and Anthropic: Governed AI for enterprise development
For enterprise and public sector leaders, the tension is familiar: Software teams need to move faster with AI, while security, compliance, and regulatory expectations only get more stringent. GitLab deepens its Anthropic Claude integration so organizations get access to newly released Claude models inside GitLab’s intelligent orchestration platform where governance, compliance, and auditability already run. Claude powers capabilities across GitLab Duo Agent Platform as the default model out of the box, across a variety of use cases from code generation and review to agentic chat and vulnerability resolution. If you've used GitLab Duo, you've already experienced how Duo agents automate workfl
OpenAI BlogApr 28
Our commitment to community safety
Learn how OpenAI protects community safety in ChatGPT through model safeguards, misuse detection, policy enforcement, and collaboration with safety experts.
OpenAI BlogApr 28
OpenAI models, Codex, and Managed Agents come to AWS
OpenAI GPT models, Codex, and Managed Agents are now available on AWS, enabling enterprises to build secure AI in their AWS environments.
AWS Machine Learning BlogApr 27
Build Strands Agents with SageMaker AI models and MLflow
In this post, we demonstrate how to build AI agents using Strands Agents SDK with models deployed on SageMaker AI endpoints. You will learn how to deploy foundation models from SageMaker JumpStart, integrate them with Strands Agents, and establish production-grade observability using SageMaker Serverless MLflow for agent tracing. We also cover how to implement A/B testing across multiple model variants and evaluate agent performance using MLflow metrics and show how you can build, deploy, and continuously improve AI agents on infrastructure you control.
AWS Machine Learning BlogApr 27
How Popsa used Amazon Nova to inspire customers with personalised title suggestions
In this post, we share how we applied Amazon Bedrock and the Amazon Nova family of models to reimagine our Title Suggestion feature. By combining metadata, computer vision, and retrieval-augmented generative AI, we now automatically generate creative, brand-aligned titles and subtitles across 12 languages. Using the unified API of Amazon Bedrock, Anthropic’s Claude 3 Haiku, and Amazon Nova Lite and Pro, we improved quality, reduced cost, and cut response times. This resulted in higher customer satisfaction, measurable uplifts in engagement and purchase rates, and over 5.5 million personalised titles generated in 2025.
GitLab BlogApr 27
Give your AI agent direct, structured GitLab access with glab CLI
When teams use GitLab Duo, Claude, Cursor, and other AI assistants, more of the development workflow runs through an AI agent acting on your behalf — reading issues, reviewing merge requests, running pipelines, and helping you ship faster. Most developers are already using the GitLab CLI (glab) from the terminal to interact with GitLab. Combining the two is a natural next step. The problem is that without the right tools, AI agents are essentially guessing when it comes to your GitLab projects. They might hallucinate the details of an issue they've never seen, summarize a merge request based on stale training data rather than its actual state, or require you to manually copy context from a b
AWS Machine Learning BlogApr 24
Building Workforce AI Agents with Visier and Amazon Quick
In this post, we show how connecting the Visier Workforce AI platform with Amazon Quick through Model Context Protocol (MCP) gives every knowledge worker a unified agentic workspace to ask questions in. Visier helps ground the workspace in live workforce data and the organizational context that surrounds it while letting your users act on the conversational results without switching tools.
NVIDIA Developer BlogApr 23
Winning a Kaggle Competition with Generative AI–Assisted Coding
In March 2026, three LLM agents generated over 600,000 lines of code, ran 850 experiments, and helped secure a first-place finish in a Kaggle playground... In March 2026, three LLM agents generated over 600,000 lines of code, ran 850 experiments, and helped secure a first-place finish in a Kaggle playground competition. Success in modern machine learning competitions is increasingly defined by how quickly you can generate, test, and iterate on ideas. LLM agents, combined with GPU acceleration, dramatically compress this loop. Historically… Source
AWS Open Source BlogApr 23
Decoupling Authorization at Scale: MongoDB Atlas and Cedar-Based Resource Policies
As organizations scale applications, managing authorization becomes increasingly complex. What starts as role-based permissions quickly evolves into intricate rules spanning multiple services, regions, and compliance requirements. Traditional approaches of embedding authorization logic in application code lead to fragmented policies scattered across codebases, making them difficult to maintain, audit, and scale. These challenges have become more […]
OpenAI BlogApr 23
Introducing GPT-5.5
Introducing GPT-5.5, our smartest model yet—faster, more capable, and built for complex tasks like coding, research, and data analysis across tools.
AWS Machine Learning BlogApr 22
Cost-effective multilingual audio transcription at scale with Parakeet-TDT and AWS Batch
In this post, we walk through building a scalable, event-driven transcription pipeline that automatically processes audio files uploaded to Amazon Simple Storage Service (Amazon S3), and show you how to use Amazon EC2 Spot Instances and buffered streaming inference to further reduce costs.
NVIDIA Developer BlogApr 22
Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron
Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved... Higher-order optimization algorithms such as Shampoo have been effectively applied in neural network training for at least a decade. These methods have achieved significant success more recently when applied to leading LLMs. In particular, Muon (MomentUm Orthogonalized by Newton-Schulz) was used to train some of today’s best open source models, including Kimi K2 and GLM-5. Source
MIT Technology Review AIApr 22
AI needs a strong data fabric to deliver business value
Artificial intelligence is moving quickly in the enterprise, from experimentation to everyday use. Organizations are deploying copilots, agents, and predictive systems across finance, supply chains, human resources, and customer operations. By the end of 2025, half of companies used AI in at least three business functions, according to a recent survey. But as AI becomes…
OpenAI BlogApr 22
Workspace agents
Learn how to build, use, and scale workspace agents in ChatGPT to automate repeatable workflows, connect tools, and streamline team operations.
OpenAI BlogApr 22
Speeding up agentic workflows with WebSockets in the Responses API
A deep dive into the Codex agent loop, showing how WebSockets and connection-scoped caching reduced API overhead and improved model latency.
OpenAI BlogApr 22
Introducing workspace agents in ChatGPT
Workspace agents in ChatGPT are Codex-powered agents that automate complex workflows, run in the cloud, and help teams scale work across tools securely.
OpenAI BlogApr 21
Introducing ChatGPT Images 2.0
ChatGPT Images 2.0 introduces a state-of-the-art image generation model with improved text rendering, multilingual support, and advanced visual reasoning.
Hugging Face BlogApr 21
QIMMA قِمّة ⛰: A Quality-First Arabic LLM Leaderboard
No summary available.
GitLab BlogApr 21
GitLab + Amazon: Platform orchestration on a trusted AI foundation
If your team runs GitLab and has a strong AWS practice, a new combination of Duo Agent Platform and Amazon Bedrock is just for you. The model is simple: GitLab acts as your orchestration layer to help accelerate your entire software lifecycle with agentic AI, and Bedrock is designed to provide a secure, compliant foundation model layer with AI inference behind the scenes. GitLab Duo Agent Platform enables you to handle planning, merge pipelines, security scanning, vulnerability remediation, and more as part of your GitLab workflows, while the GitLab AI Gateway routes model calls to Bedrock (or GitLab-managed Bedrock-backed endpoints, depending on your setup). That means you can build on the
NVIDIA Developer BlogApr 21
Maximizing Memory Efficiency to Run Bigger Models on NVIDIA Jetson
The boom in open source generative AI models is pushing beyond data centers into machines operating in the physical world. Developers are eager to deploy these... The boom in open source generative AI models is pushing beyond data centers into machines operating in the physical world. Developers are eager to deploy these models at the edge, enabling physical AI agents and autonomous robots to automate heavy-duty tasks. A key challenge is efficiently running multi-billion-parameter models on edge devices with limited memory. With ongoing constraints on… Source
NVIDIA Developer BlogApr 21
Run High-Throughput Reinforcement Learning Training with End-to-End FP8 Precision
As LLMs transition from simple text generation to complex reasoning, reinforcement learning (RL) plays a central role. Algorithms like Group Relative Policy... As LLMs transition from simple text generation to complex reasoning, reinforcement learning (RL) plays a central role. Algorithms like Group Relative Policy Optimization (GRPO) power this transition, enabling reasoning-grade models to continuously improve through iterative feedback. Unlike standard supervised fine-tuning, RL training loops are bifurcated into two distinct, high-intensity phases: a… Source
NVIDIA Developer BlogApr 20
Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments
AI tools are significantly accelerating software development and changing how developers work with code. These tools serve as real-time copilots, automating... AI tools are significantly accelerating software development and changing how developers work with code. These tools serve as real-time copilots, automating repetitive tasks, executing tasks, writing documentation, and more. OpenAI Codex, for example, is a coding agent designed to assist developers through tasks like code generation, debugging, and automated pull request (PR) creation. Source
NVIDIA Developer BlogApr 18
Full-Stack Optimizations for Agentic Inference with NVIDIA Dynamo
Coding agents are starting to write production code at scale. Stripe’s agents generate 1,300+ PRs per week. Ramp attributes 30% of merged PRs to agents.... Coding agents are starting to write production code at scale. Stripe’s agents generate 1,300+ PRs per week. Ramp attributes 30% of merged PRs to agents. Spotify reports 650+ agent-generated PRs per month. Tools like Claude Code and Codex make hundreds of API calls per coding session, each carrying the full conversation history. Behind every one of these workflows is an inference stack under… Source
Cloudflare BlogApr 17
Unweight: how we compressed an LLM 22% without sacrificing quality
Running LLMs across Cloudflare’s network requires us to be smarter and more efficient about GPU memory bandwidth. That’s why we developed Unweight, a lossless inference-time compression system that achieves up to a 22% model footprint reduction, so that we can deliver faster and cheaper inference than ever before.
NVIDIA Developer BlogApr 16
How to Build Vision AI Pipelines Using NVIDIA DeepStream Coding Agents
Developing real-time vision AI applications presents a significant challenge for developers, often demanding intricate data pipelines, countless lines of code,... Developing real-time vision AI applications presents a significant challenge for developers, often demanding intricate data pipelines, countless lines of code, and lengthy development cycles. NVIDIA DeepStream 9 removes these development barriers using coding agents, such as Claude Code or Cursor, to help you easily create deployable, optimized code that brings your vision AI applications to… Source
Cloudflare BlogApr 16
Cloudflare’s AI Platform: an inference layer designed for agents
We're building AI Gateway into a unified inference layer for AI, letting developers call models from 14+ providers. New features include Workers AI binding integration and an expanded catalog with multimodal models.
Cloudflare BlogApr 16
Building the foundation for running extra-large language models
We built a custom technology stack to run fast large language models on Cloudflare’s infrastructure. This post explores the engineering trade-offs and technical optimizations required to make high-performance AI inference accessible.
Cloudflare BlogApr 16
Artifacts: versioned storage that speaks Git
Give your agents, developers, and automations a home for code and data. We’ve just launched Artifacts: Git-compatible versioned storage built for agents. Create tens of millions of repos, fork from any remote, and hand off a URL to any Git client.
OpenAI BlogApr 16
Introducing GPT-Rosalind for life sciences research
OpenAI introduces GPT-Rosalind, a frontier reasoning model built to accelerate drug discovery, genomics analysis, protein reasoning, and scientific research workflows.
GitLab BlogApr 16
Claude Opus 4.7 is now available in GitLab Duo Agent Platform
The GitLab Duo Agent Platform now supports Claude Opus 4.7, Anthropic's latest model, available today via model selection in Agentic Chat and across agent-powered workflows in your GitLab instance. For teams running agents across the full software delivery lifecycle, Opus 4.7 brings meaningful improvements to the tasks that matter most: the complex, multistep work that requires sustained reasoning, precise instruction following, and the ability to verify its own outputs before surfacing results. Stronger reasoning across every agent workflow The most significant gain is in how Opus 4.7 handles difficult, long-running work. GitLab's internal evaluations showed improved performance over both S
DeepMind BlogApr 15
Gemini 3.1 Flash TTS: the next generation of expressive AI speech
Our newest audio model introduces granular audio tags that give you precise control to direct AI speech for expressive audio generation.
OpenAI BlogApr 15
The next evolution of the Agents SDK
OpenAI updates the Agents SDK with native sandbox execution and a model-native harness, helping developers build secure, long-running agents across files and tools.
OpenAI BlogApr 13
Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI
Cloudflare brings OpenAI’s GPT-5.4 and Codex to Agent Cloud, enabling enterprises to build, deploy, and scale AI agents for real-world tasks with speed and security.
Twilio BlogApr 13
Orc-estrating an Optimized Workflow for Quickly Prototyping on Twilio with Claude Code Plugins
Learn to use Claude Code Plugins and some specialized AI subagents to help you with your Twilio app build.
NVIDIA Developer BlogApr 12
MiniMax M2.7 Advances Scalable Agentic Workflows on NVIDIA Platforms for Complex AI Applications
The release of MiniMax M2.7 adds enhancements to the popular MiniMax M2.5 model, built for agentic harnesses,... The release of MiniMax M2.7 adds enhancements to the popular MiniMax M2.5 model, built for agentic harnesses, and other complex use cases in fields such as reasoning, ML research workflows, software, engineering, and office work. The open weights release of MiniMax M2.7 is now available through NVIDIA and across the open source inference ecosystem. The MiniMax M2 series is a sparse mixture-of… Source
Salesforce Engineering BlogApr 10
Building a Distributed Persistent Queue That Scaled AI Workloads 5x Under LLM Rate Limits
By Karthik Premnath, Arie Kusnadi, and Felix Yu. In our Engineering Energizers Q&A series, we highlight the engineering minds driving innovation across Salesforce. Today, we spotlight Karthik Premnath, a lead software engineer on the Agentforce Sales Engagement team that engineered a distributed persistent queue that orchestrates AI workloads and human workflows within strict infrastructure limits […] The post Building a Distributed Persistent Queue That Scaled AI Workloads 5x Under LLM Rate Limits appeared first on Salesforce Engineering Blog.
NVIDIA Developer BlogApr 9
Cut Checkpoint Costs with About 30 Lines of Python and NVIDIA nvCOMP
Training LLMs requires periodic checkpoints. These full snapshots of model weights, optimizer states, and gradients are saved to storage so training can resume... Training LLMs requires periodic checkpoints. These full snapshots of model weights, optimizer states, and gradients are saved to storage so training can resume after interruptions. At scale, these checkpoints become massive (782 GB for a 70B model) and frequent (every 15-30 minutes), generating one of the largest line items in a training budget. Most AI teams chase GPU utilization… Source