OpenAI recently released six new models, giving AI users more options to choose from. I’ve tested all of them to see which ones perform best for creative writing tasks, which is what my channel focuses on. While these newer mini and nano models offer cost advantages, they don’t always deliver the quality needed for creative content.
After thorough testing, I narrowed down the standout performers to just two models: GPT-4.1 and O3. The other models (GPT-4.1 Mini, GPT-4.1 Nano, O4 Mini High, and O4 Mini) produced acceptable results but nothing remarkable. This highlights an important point – the “best” model often depends on your specific task. For routine tasks, a smaller model might save money while still performing adequately, but for creative writing where quality matters, the more powerful models usually win out.
Key Takeaways
- Different AI models excel at different tasks, so testing multiple options for your specific needs is worthwhile.
- More powerful models like GPT-4.1 and O3 generally produce better creative writing content than smaller, cheaper alternatives.
- For businesses and frequent users, matching the right-sized model to each task can significantly reduce costs without sacrificing quality.
Overview of OpenAI’s New Model Lineup
GPT-4.1
OpenAI released GPT-4.1 last week as part of their expanded model lineup. This is one of their most powerful options and stands out for creative writing tasks. While testing this model, I found it produced decent results for brainstorming activities like creating log lines and outlines, though I wouldn’t call it exceptional.
The log lines it generated included some cliché ideas like “a disgraced knight seeks redemption,” but others showed promise, such as a story about a princess sacrificed to a sea god who must save her kingdom. When creating outlines, GPT-4.1 provided detailed breakdowns including setting, character wants, obstacles, and flaws.

🚀 Join the Story Hacker Community: Discover Your Full Potential as a Writer
Ready to write faster, finish your stories, and take your creativity to the next level? This is your home for growth and success.
You’ll be getting:
🔥 Weekly calls with me for personalized guidance
🔥 A supportive author community for answering all your publishing questions
🔥 Weekly classes on AI writing, AI art, and storytelling techniques
🔥 Success Path to take you from ZERO to Book-in-Hand
Plus a lot more!
However, I noticed some issues with story cohesion. For example, it would introduce random elements (like a scarf) without proper setup or foreshadowing. While this model is better than what we had a year ago, it doesn’t quite match the quality of top creative writing models like 3.7 Sonnet or Gemini 2.5 Pro.
GPT-4.1 Mini
GPT-4.1 Mini offers a smaller, more cost-effective version of the main model. In my testing for creative writing tasks, this model didn’t particularly stand out. It produced acceptable content but lacked the depth and quality needed for serious creative work.
This model might be suitable for businesses looking to save money on AI usage, especially for routine tasks that don’t require premium quality. However, for writers focused on producing high-quality creative content, the Mini version’s limitations become apparent quickly.
GPT-4.1 Nano
The Nano version represents the smallest and most affordable option in the GPT-4.1 family. Like the Mini version, it didn’t impress me during creative writing tests. While functional, it falls short for users who need sophisticated narrative development or nuanced character creation.
The value of this model lies primarily in its cost-effectiveness rather than its creative capabilities. For simple writing tasks or initial drafts that will undergo substantial human editing, it could be a budget-friendly option.
O3
O3 emerged as one of the top performers in my testing, alongside GPT-4.1. This model produced quality creative content despite its unusual naming convention. It handled complex narrative tasks well and generated more cohesive and interesting storylines than some of the smaller models.
For writers looking for a balance between performance and cost, O3 represents a solid option. It matched or exceeded GPT-4.1 in several creative writing scenarios, making it worth considering for serious creative projects.
O4 Mini High
O4 Mini High sits in the middle of OpenAI’s new lineup. During my testing, this model didn’t show any standout qualities for creative writing tasks. While it performed better than the smallest models, it didn’t justify its use over more powerful options for serious creative work.
This model might find its niche in specific technical applications, but for narrative development and creative content generation, other models in the lineup offer better results.
O4 Mini
The standard O4 Mini rounds out OpenAI’s six new models. Like several others in this release, it didn’t demonstrate exceptional capabilities for creative writing. Its performance was adequate but unremarkable compared to the top-tier models.
For budget-conscious users who need basic AI writing assistance, this model could serve as a starting point. However, those focused on producing high-quality creative content would likely find it limiting.
How to Judge AI Writing Tools
Why Creative Skills Matter
I’ve found that testing AI models for creative writing reveals their true abilities. Creative tasks push AI to its limits in ways other tasks don’t. When I evaluate models, I look for ones that can capture a unique voice, develop coherent storylines, and avoid generic content. The best AI tools show creativity through original ideas rather than recycling common plots and character types.
Different Models Excel at Different Tasks
Not all AI models perform equally across all writing tasks. In my testing of several new models, I noticed significant differences in quality:
| Task Type | Best-Performing Models | Weaker Models |
|---|---|---|
| Creative writing | GPT-4.1, O3 | Mini and Nano versions |
| Brainstorming | Results varied by specific prompt | Most struggled with originality |
| Plot outlining | Top models maintained coherence | Smaller models introduced random elements |
I recommend testing multiple models for your specific needs. A model that excels at writing dialogue might struggle with creating outlines. This case-by-case approach helps identify which tool works best for each writing task.
Finding the Right Balance of Price and Quality
When choosing AI writing tools, cost matters alongside quality. Here’s what I’ve learned:
- For critical writing tasks – Use the most powerful models (like GPT-4.1)
- For routine tasks – Consider cheaper “mini” versions that save money
- For businesses – Test which tasks can use smaller models to reduce costs at scale
While cheaper models have improved dramatically (better than premium models from just a year ago), they still don’t match top-tier models for creative work. I’ve found the most expensive models typically deliver the best creative writing results, but the right choice depends on your specific needs and budget.
Performance Comparison of Tested Models
When looking at OpenAI’s six new models, I found significant differences in their creative writing abilities. After testing each one with the same prompts, I narrowed down to two standout performers while the others fell short for creative tasks. Though the mini and nano versions might save money, they simply don’t match the quality of their more powerful counterparts for creative writing tasks.
Best Performing Models
GPT-4.1 and O3 clearly outperformed the other models in my testing. While they cost more to use, their quality was noticeably superior. This reinforces my belief that choosing the right model depends on your specific needs – testing different models for your particular use case is always worthwhile.
Strengths of GPT-4.1
GPT-4.1 performed adequately on creative writing tasks, though I wouldn’t call it exceptional. For brainstorming prompts like creating loglines, outlines, and story beats, it was good but not great. Some of its story ideas felt clichéd, like “a disgraced knight seeks redemption” – something I’ve seen countless times before.
I did find a few promising concepts, such as a story about “a rebellious princess bargained as a sacrifice to an ancient sea god, who must save her kingdom from the deity she’s now bound to serve.” While vague, this showed potential if developed further.
The model’s outlining capabilities were similarly middling. It tended to provide atmospheric descriptions rather than proper scenes. For example, instead of writing a full prologue scene, it gave me cinematic-style opening images of a mountain city with guards overlooking a wasteland.
When creating chapter outlines, GPT-4.1 broke content down by storytelling elements like setting, want, obstacle, and flaw. However, these elements didn’t always connect cohesively. The “save the cat” moments often defaulted to characters doing something nice, despite my prompt explaining this wasn’t necessary.
Strengths of O3
O3 matched GPT-4.1 in overall performance for creative writing tasks. Both models stand as the clear winners among the six tested options.
For creative work, O3 consistently produced usable content that was significantly better than the mini and nano variants. While not quite reaching the heights of 3.7 Sonnet or Gemini 2.5 Pro (which I consider the current top models for creative projects), O3 demonstrated solid capabilities.
Its loglines avoided the worst clichés, and its story outlines maintained better internal consistency than some competitors. The model showed a reasonable understanding of narrative structure and could follow creative prompts effectively.
Limitations of Mini and Nano Models
The mini and nano versions (GPT-4.1 Mini, GPT-4.1 Nano, O4 Mini High, and O4 Mini) all fell short in creative writing tasks. Their outputs were forgettable and lacked the quality needed for serious creative work.
These models struggled with:
- Narrative coherence – introducing story elements without proper setup
- Originality – relying on more clichéd ideas and patterns
- Depth – producing shallow character motivations and world-building
- Consistency – maintaining story elements across scenes or chapters
While these smaller models might be suitable for simpler tasks or when budget is the primary concern, they simply don’t deliver the quality needed for creative writing that matches your vision.
I should note that these mini models aren’t bad in absolute terms – they’re actually better than top models from just a year ago. They’re certainly usable if cost is your main concern, but you’ll notice the difference in quality compared to their more powerful counterparts.
Looking Closer at AI Story Generators
Log Line Results Assessment
I tested six new AI models on creative writing tasks, focusing first on log lines. GPT-4.1 produced acceptable but uninspiring results. Many outputs felt generic, like “a disgraced knight seeks redemption” stories that lacked originality. One log line about “a rebellious princess bargaining with a sea god” showed potential, but needed development. Overall, the quality fell below what I’ve seen from models like 3.7 Sonnet and Gemini 2.5 Pro, which currently lead for creative writing tasks.
Story Structure Output Review
When examining outline capabilities, I found similar limitations. For prologues, GPT-4.1 gave atmospheric descriptions rather than complete scenes – showing “panoramic views” and “grim royal guards” but lacking narrative substance. Chapter outlines included story elements from my template (setting, want, obstacle, flaw), but didn’t connect cohesively. The “save the cat” moments were particularly problematic, defaulting to generic acts of kindness despite my prompt specifying various options for creating compelling characters.
Poor story logic was evident throughout. For example, the inciting incident mentioned “LRA’s scarf” as evidence in a plot against the protagonist, but this item hadn’t been introduced earlier. This lack of proper setup and foreshadowing stands in contrast to better narrative continuity I’ve seen from 3.7 Sonnet and Gemini 2.5 Pro.
Story Beat Performance
The beats prompt results were merely adequate. Most newer AI models can handle basic beat structure, but the quality varies significantly. While cheaper models might save money, they often lack the narrative sophistication needed for creative writing. For serious creative work, I found only GPT-4.1 and O3 worth deeper consideration among the six models tested.
I recommend testing different models for specific writing tasks. What works for outlining might not excel at character development. Businesses using AI at scale should identify which tasks can be handled by smaller models to save costs, while creative professionals should invest in premium models for critical creative tasks.
How Our Models Compare to Others
We’ve tested OpenAI’s new models against industry leaders to see how they stack up for creative writing tasks. Let’s examine the performance of two standout AI writing assistants.
Sonnet 3.7
Sonnet 3.7 currently leads the pack for creative writing tasks. I found it excels at maintaining cohesive storytelling across multiple scenes. When working with the log line prompt, Sonnet creates more original and less cliché ideas than OpenAI’s new offerings.
A key strength of Sonnet is its foreshadowing capability. Unlike GPT-4.1, which sometimes introduces random elements without setup (like a scarf appearing from nowhere during a pivotal scene), Sonnet builds narrative elements thoughtfully throughout the story.
The model also handles “save the cat” moments with more variety and nuance. While GPT-4.1 often defaults to characters “doing something nice,” Sonnet creates compelling character moments that feel organic to the story.
For outlining, Sonnet creates true scenes rather than just descriptive moments. This makes its outlines more immediately useful to writers who need fully-formed narrative structures.
Gemini 2.5 Pro
Gemini 2.5 Pro performs neck-and-neck with Sonnet for creative writing tasks. The choice between these two often comes down to personal preference rather than clear performance differences.
I’ve found Gemini excels at brainstorming tasks, generating log lines and story concepts that avoid common clichés. Its fantasy concepts particularly shine with original elements that move beyond the standard “disgraced knight seeks redemption” tropes.
Gemini’s outlining capabilities produce cohesive narratives where scenes connect logically. Plot points build on previously established elements rather than introducing random new items.
For scene development, Gemini follows templates while maintaining natural storytelling flow. This balance makes it valuable for writers who need structural guidance without sacrificing creativity.
Both models significantly outperform OpenAI’s mini and nano models, which, while more affordable, can’t match the creative quality of these premium options.
Choosing the Right AI Models
When picking AI models for your work, making smart choices helps you get better results. I’ll share what I’ve learned about matching models to tasks and saving money.
Picking Models for Specific Tasks
Finding the right AI model depends on what you’re trying to do. I’ve tested the new OpenAI models and found big differences in how they perform for creative writing. While the mini and nano versions are cheaper, they don’t match the quality of more powerful models for creative tasks.
For creative writing, I found that GPT-4.1 and O3 stood out as the best performers among the six new models I tested. The others (GPT-4.1 Mini, GPT-4.1 Nano, O4 Mini High, and O4 Mini) produced acceptable content but nothing remarkable.
My key recommendation: Test multiple models for your specific task before deciding. One model might excel at writing while another handles coding better. I’ve seen this pattern consistently – what works great for one job might fail at another.
Optimizing Cost for Business Use
When using AI at scale, balancing quality and cost becomes crucial. Here’s a simple approach I recommend:
- Identify task requirements: Determine which tasks need premium quality and which can use cheaper models
- Match model to importance: Use powerful models (like GPT-4.1) for high-value work
- Use smaller models where possible: Save money by using mini or nano versions for simpler tasks
For businesses using AI heavily, this approach can dramatically reduce costs. The savings multiply when scaling up operations.
Cost-saving strategy: Test whether smaller models can handle routine tasks adequately. The quality difference might not matter for internal documents or first drafts, but could be critical for customer-facing content.
I find this balanced approach works best – premium models for important creative work, and more affordable options for everything else.
Key Findings on New AI Models
After testing OpenAI’s six new models, I’ve narrowed down the best options for creative writing tasks. While the mini and nano versions are cost-effective, they don’t deliver the high-quality creative writing that most of us need.
The two standout models from my testing were GPT-4.1 and O3. These models performed significantly better than GPT-4.1 Mini, GPT-4.1 Nano, O4 Mini High, and O4 Mini. Though they cost more, their performance justifies the price for serious creative work.
For the log line prompt, GPT-4.1 was acceptable but not exceptional. Many responses were clichéd, like “a disgraced knight seeks redemption.” Some ideas showed potential but needed more development. Both 3.7 Sonnet and Gemini 2.5 Pro still outperform these new models for creative projects.
The outlining capabilities were similarly mediocre. Instead of complete scenes, I often got descriptive snapshots. The models struggled with cohesive storytelling and proper foreshadowing, introducing random elements (like an unexplained scarf) without prior setup.
I recommend testing different models for specific tasks rather than assuming one is best for everything. For businesses using AI extensively, identify which tasks can be handled by smaller models to save money. For quality creative work, invest in the best model for that particular need.

