Beyond “Check the Hands”: What Reddit’s r/ChatGPT Says About Photorealistic AI Images

How a handful of community threads chart the messy, rapid progress of AI photo-realism—and the tells that still give it away.

The spark: a “reminder” that fooled more than a few eyes

A widely shared r/ChatGPT post rounded up lifelike portraits and lifestyle shots as a reminder that near-photorealistic AI has been possible “since at least last summer.” The comment section became a crowd-sourced forensics class: people flagged subtle giveaways in hands and nails, janky keyboards, garbled UI on screens, odd background faces, blurry eyeglass lenses, and off-kilter teeth. The lesson wasn’t that every image is obviously fake—it’s that tiny local glitches can betray the whole.

Where the models stand (according to the crowd)

Midjourney vs. DALL·E 3. Many redditors say Midjourney still wins on raw photorealism, while DALL·E 3 often feels more stylized—sometimes by design to reduce deepfake risk. DALL·E tends to excel at prompt following and conversational iteration. Quality can ebb and flow as models update, which fuels recurring “did they change something?” debates.

Local / open models (Flux, Stable Diffusion). A newer wave of posts highlights rapid gains from local or open models like Flux and modern Stable Diffusion checkpoints: better anatomy, stronger spatial reasoning, and improved prompt adherence. Multi-person scenes and complex overlaps still trip them up.

Specialists. Some users single out Ideogram for producing more “ordinary-looking” people and for strong text rendering, though this heavily depends on the use case.

The recurring tells humans keep spotting

Hands, nails, wrists. Counts, joints, proportions, and nail beds can go subtly wrong.
Text and repeating patterns. Signage, book spines, keyboards, and UI elements melt into nonsense or repeat inconsistently.
Glasses and teeth. Lenses blur past frames; teeth lines and inner mouths skew plasticky.
Background people and depth cues. Peripheral faces, arms, and overlaps in busy scenes expose compositional weakness.

None of these are guaranteed on their own; they’re weak signals that add up—especially in crowded, complex environments.

Prompting for realism: what actually helps

Skip the vague “photorealistic.” Instead, describe a photo. Talk like a photographer: focal length, aperture, ISO, lens/body (“50 mm f/1.8, ISO 200, soft window light”), candid atmosphere, natural skin texture, specular highlights, depth-of-field. Avoid negative prompts like “not CGI / not cartoonish” that can backfire.

Even then, iterate: regenerate, zoom, and inspect; accept a hit-rate (e.g., one keeper out of several); refine composition and anatomy by being explicit about pose, angle, and occlusions.

Why some models “hold back” (and why people argue about it)

A recurring argument is that certain hosted models are intentionally constrained on photorealistic people to deter misuse (deepfakes, harassment). Whether or not that’s the full story, community consensus is that policy and product positioning meaningfully shape the “look” and limits users encounter.

What’s genuinely improved since “last summer”

Across late-2024 into 2025, redditors note real progress: better skin micro-detail, fewer classic six-finger flubs, stronger spatial layout, and increasing text reliability in some systems. At the same time, single-subject portraits are still the easy case; complex social scenes—with hands touching, bodies overlapping, and lots of legible world detail—continue to reveal seams.

A quick field guide (from the hive mind)

Zoom the edges. Fingertips, jewelry interfaces, hairlines, ear shapes, and where objects touch.
Scan for rules. Keyboards, book text, signage, and UI should obey straight lines, consistent glyphs, and sane spacing.
Intersections are hard. Look where arms cross torsos, multiple hands overlap, or crowds interweave.
Look past the subject. Background faces and props often get the least “attention” from the model.

The bottom line

Photorealism is no longer a party trick, but it’s still uneven across tasks and models. If you need the cleanest “looks-like-a-photo” output right now, the community tends to reach for Midjourney or well-tuned local models; if you want frictionless prompting and editing, DALL·E inside a chat workflow is beloved—just expect stylistic guardrails. And whether you’re generating or scrutinizing, the best advice hasn’t changed: think like a photographer, inspect like a skeptic.

‍