Every week in AI seems to outdo the last, and this week was no exception. Major new AI model releases, impressive benchmark results, and innovative applications made headlines. From OpenAI and Anthropic unveiling cutting-edge models (and upping the competition) to Amazon turbocharging Alexa with a powerful AI brain, there’s a lot to cover. Let’s dive into the biggest AI news of the week, with a technical lens but in plain English.

Anthropic Launches Claude 3.7 Sonnet and Claude Code

Anthropic kicked off the week with the release of Claude 3.7 Sonnet and Claude Code, two significant additions to its AI arsenal, emphasizing coding prowess and agentic capabilities.

◗ Claude 3.7 Sonnet: This model is an evolution of Claude 3.5, with a clear focus on enhancing software engineering capabilities. On the SWE-Bench (Software Engineering Benchmark), Claude 3.7 Sonnet outshines competitors like DeepSeek R1, OpenAI’s O3 Mini, OpenAI’s O1, and its predecessor, Claude 3.5 Sonnet. It also excels in agentic tool use, enabling it to perform tasks autonomously—a feature that ties into Amazon’s announcement later this week. While it lags behind Grok 3 and OpenAI’s O3 Mini in graduate-level reasoning and math problem-solving, its coding-centric improvements make it a top choice for developers.

All Ads on this website are served by GOOGLE

◗ Extended Thinking Mode: A new feature allows Claude 3.7 Sonnet to allocate more time to reason through complex problems, akin to DeepSeek R1 and OpenAI’s O1/O3. This isn’t a separate model but an option to extend the same model’s processing time, yielding more thoughtful responses. For instance, a simple query like “How many R’s are in ‘strawberry’?” takes seconds longer but ensures accuracy, while complex prompts—like designing a protein-folding framework—benefit from up to 32 seconds of deliberation.

◗ Claude Code: Alongside Claude 3.7, Anthropic announced Claude Code, a coding companion that you can run in your terminal. Install Claude Code in a project folder, and it gains access to your codebase, allowing it to read files, make suggestions, debug, and even write new code. Essentially, it’s like having an AI pair programmer with full context of your project. In just days since launch, developers have shown off incredible demos built with Claude Code. For example, people used single prompts to generate entire web applications and games: from a real estate website with modern design, to an animated weather app, to a 3D racing game dubbed “Claude Kart.” One demo had Claude Code create a self-aware Snake game that prints the snake’s humorous “thoughts” as it moves. Another user even made a simple 3D city simulation complete with moving people and changing shadows. These one-prompt creations highlight how powerful an AI coding assistant can be when it understands your entire project context. Claude Code is impressing the community and positioning Anthropic’s AI as a top tool for developers.

For more, visit Anthropic’s official site.

OpenAI Unveils GPT-4.5: Codename “Orion”

Not to be outdone, OpenAI answered back later in the week with the release of GPT-4.5, internally codenamed “Orion.” This new model is an interim step between GPT-4 and a potential future GPT-5, and OpenAI has been training it for over a year. While its training data cutoff is still 2023 (same as GPT-4), GPT-4.5 brings notable improvements in how it responds. OpenAI’s launch presentation emphasized improved “vibes” – in other words, GPT-4.5’s answers feel more natural, conversational, and human-like in tone. In side-by-side examples, GPT-4.5’s writing was more concise and fluid compared to earlier models.

Benchmark Compare

Benchmark Comparison

Metric	GPT-4.5	GPT-4.0	O1	O3 Mini
Simple QA	62.5%	38.6%	47%	15%
Hallucinations	37.1%	61.8%	44%	80%
Math	36.7%	9.3%	–	87.3%
Science	71.4%	53.6%	–	79.7%
SWE-Bench	38%	–	–	61%

◗ Key Features: GPT-4.5 prioritizes conversational “vibes” over raw reasoning power. It scores 62.5% on a simple QA benchmark (vs. 38.6% for GPT-4.0, 47% for O1, and 15% for O3 Mini) and reduces hallucinations to 37.1% (compared to 80% for O3 Mini). However, it doesn’t compete with reasoning-heavy models like O3 Mini in math (36.7%) or science (71.4%), nor does it top the SWE-Bench (38% vs. O3’s 61%).

Beyond style, GPT-4.5 made technical strides in reliability. On OpenAI’s internal evaluations, GPT-4.5 scored much higher on simple question-answering accuracy than previous models, and it hallucinates (makes up incorrect facts) far less. In one benchmark for factual Q&A, GPT-4.5 got 62.5% of answers correct, versus 38% for the original GPT-4 and even lower for older models. It also only hallucinated about 37% of the time on that test, whereas a model like GPT-3.5 (the “03 mini” variant referenced by OpenAI) hallucinated ~80% of the time. This is a significant reduction in false information. However, OpenAI notably did not compare GPT-4.5 to any competitor models in their charts – only against their own previous versions.

All Ads on this website are served by GOOGLE

Early hands-on testing reveals GPT-4.5’s niche: it shines in creative and conversational tasks. It’s excellent at brainstorming, writing in different styles, and being an engaging chat partner – to the point that OpenAI’s CEO Sam Altman remarked that GPT-4.5 is “the first model that feels like talking to a thoughtful person.” It’s less likely to over-explain or produce overly formal answers compared to GPT-4. For instance, when asked a casual question, GPT-4.5 responds with a brief, natural answer where GPT-4 might ramble or default to a formal tone. This “vibe” adjustment makes interactions feel smoother.

On the flip side, GPT-4.5 is not a giant leap in raw reasoning power. Altman cautioned that 4.5 is “not a reasoning model” and “won’t crush benchmarks.” Indeed, on complex math problems and certain logic puzzles, other specialized models like Grok 3 (from X.AI) or DeepSeek R1 still have the edge. OpenAI seems to position GPT-4.5 as a more intelligent communicator rather than a quantitative problem-solver. It’s a different kind of intelligence focused on conversation quality.

Currently, GPT-4.5 is only available to ChatGPT Pro plan subscribers (a $200/month tier for enterprise and power users). OpenAI cited an extreme demand on their GPU servers as the reason for the limited rollout. They literally ran out of GPUs to deploy it more widely at launch. The company plans to add tens of thousands of GPUs in short order and roll GPT-4.5 out to all ChatGPT Plus ($20/month) users in the coming week. So, broader access is imminent as they scale up. Notably, ChatGPT Plus users also got some new features this week (more on that below), but GPT-4.5 is the big ticket item reserved for Pro users until OpenAI’s infrastructure catches up.

Check out OpenAI’s announcement for more details.

All Ads on this website are served by GOOGLE

Amazon’s Alexa Plus: Claude-Powered Agentic AI

Amazon introduced Alexa Plus, a revamped voice assistant powered by Anthropic’s Claude, free for Prime members.

Technical Edge: Leveraging Claude 3.7’s agentic strengths, Alexa Plus can handle tasks like ordering Uber Eats or booking rides, connecting to third-party services with enhanced conversational fluency. This aligns with Anthropic’s focus on agentic tool use, making Alexa a more autonomous assistant.
Implications: The Claude integration suggests a deep partnership between Amazon and Anthropic, potentially reshaping the smart assistant market with AI-driven functionality.

Learn more at Amazon’s blog.

ChatGPT and Grok Find Their Voice

Voice interaction was a theme this week. OpenAI expanded ChatGPT’s voice mode to more users. Previously, only paying users could use the impressive voice conversation feature (where you can talk to ChatGPT and it talks back in a natural voice). As of this week, free ChatGPT users have a chance to preview voice conversations, powered by a lightweight GPT-4 model. This means anyone can now chat with ChatGPT by voice on mobile and hear it respond, making the AI feel even more like a personal assistant.

Not to be outdone, X.AI’s Grok 3 rolled out its own voice feature in the Grok app. This one has a twist: Grok’s voice has multiple personality modes or “voices” you can choose from. Some of the modes are quite colorful — names like Storyteller, Romantic, Meditation, Conspiracy, Not-a-Therapist, and even Unhinged or Sexy. These alter the style and tone of the AI’s spoken responses. For example, Unhinged mode will produce an extremely uncensored, brash style of speaking (complete with profanity and attitude). A quick demo showed the AI in Unhinged mode greeting the user with, “Yo, I’m fantastic, how about you? [Things are] hitting the fan out there or what?” – definitely not your typical polite assistant! While these modes are more for fun than utility (you probably don’t want your kids interacting with an AI in “Unhinged” voice), it shows how AI voice can be tuned for character and entertainment value.

Grok’s voice feature is currently available to users who subscribe to the highest tier of X (Twitter) Premium (around $30–$40/month) through the Grok app. It’s a novel experiment in making AI assistants not just smart, but entertaining. It wouldn’t be surprising if other AI platforms introduce similar personality-driven voices or styles in the future, especially for consumer-facing AI that people use at home.

Microsoft Bolsters Copilot with Free Features and New Models

Microsoft announced several updates to its Copilot ecosystem:

◗ Free Features: Unlimited access to Think Deeper and voice mode, powered by OpenAI’s tech, offers a cost-free entry to advanced reasoning and conversational capabilities.

◗ Five 4 and Five 4 Mini: These new multimodal language models are optimized for on-device use, targeting mobile apps and consumer hardware. They reflect the trend toward efficient, edge-based AI.

◗ Mac App: Copilot is now available on macOS, broadening its reach to Apple users.

Details are at Microsoft’s Copilot page.

All Ads on this website are served by GOOGLE

Apple Integrates Intelligence into Vision Pro

Apple revealed plans to roll out Apple Intelligence to the Apple Vision Pro, enhancing its mixed-reality headset with:

◗ Features: Writing tools, image playground, gen emojis, and a new Spatial Gallery, mirroring iPhone AI capabilities.

◗ Impact: While Vision Pro adoption remains niche, this update strengthens Apple’s AI ecosystem integration.

See more at Apple’s Vision Pro site.

Inception Labs’ Diffusion Language Model: Speed Redefined

Inception Labs unveiled a diffusion-based language model, adapting image generation techniques (e.g., Stable Diffusion) for language tasks, specifically coding.

◗ Performance: It generates code at 1,000 tokens per second—five times faster than Quinn 2.5 Coder 7B (200 tokens/second). Benchmarks show it competes with top models, though newer releases like Claude 3.7 and GPT-4.5 aren’t yet compared.

◗ Potential: Currently code-focused, its speed could extend to other domains like creative writing.

Try it at Inception Labs’ playground.

Rapid-Fire AI Updates

Here’s a roundup of additional announcements:

◗ Google AI Studio: A branching feature lets users revisit and diverge from any conversation point, a likely trendsetter for other platforms.

◗ Quinn QWQ Max Preview: An open-source thinking model competing with DeepSeek R1.

◗ Meta: Plans for a standalone Meta AI app.

◗ Idiogram 2A: A faster, affordable text-to-image model, excelling at text-inclusive designs (10 seconds, or 5 with Turbo).

◗ Magnific: A structure reference feature, akin to Stable Diffusion’s ControlNets, enhances image generation control.

◗ Pika Labs Pika 2.0: Offers 10-second 1080p video generations with smooth keyframe transitions.

◗ Onean AI: An open-source video platform rivaling V2 and Sora in quality.

◗ Kaa AI W 2.1: A video generation model with free trials, producing detailed motion.

◗ Luma AI Dream Machine: Now generates audio for videos, enhancing immersion.

◗ ElevenLabs Scribe: Claims top-tier speech-to-text accuracy.

◗ Octave: A text-to-speech model with contextual understanding and style prompting (e.g., emotional tones).

◗ Perplexity Comet: A teased browser for agentic search.

◗ Figure Robotics Helix: Home robots planned for 2025, with alpha testing this year.

AI’s Unstoppable Momentum

This week’s AI news highlights a dual focus: enhancing model capabilities (e.g., Claude 3.7, GPT-4.5) and integrating AI into consumer products (e.g., Alexa Plus, Vision Pro). Speed (Inception Labs), agentic functionality (Anthropic, Amazon), and multimodal applications (video, audio, robotics) are key trends. As benchmarks evolve and GPU resources scale, the AI landscape promises even more transformative advancements. Stay tuned to our magazine for the latest technical insights.

CLOXMAGAZINE, founded by CLOXMEDIA in the UK in 2022, is dedicated to empowering tech developers through comprehensive coverage of technology and AI. It delivers authoritative news, industry analysis, and practical insights on emerging tools, trends, and breakthroughs, keeping its readers at the forefront of innovation.