00 / 08Quality Deep-Dive

Why Your Face Swap Looks Fake — And How to Fix It

Every tool built on InsightFace's inswapper_128 model — FaceFusion, Rope, Reactor, VisoMaster — shares the same 128×128 pixel bottleneck. This guide explains exactly why your results look plastic, and walks you through the settings, models, and workflows that produce photorealistic output.

Before — face swap with plastic skin look
Before — raw inswapper output, no enhancer
After — photorealistic face swap with optimized settings
After — with face enhancer + optimized blend
128pxModel Resolution
63.3Current Realism Score
90.2Achievable Score

Step 1 — Identify Your Problem

Quick Diagnosis: What's Wrong With Your Face Swap?

Face swap quality issues fall into two opposite camps. Pick the symptom that matches your output to jump straight to the fix.

AWaxy / Sticker Look
128 × 128
  • No pores / micro-texture
  • Waxy, sticker-like skin
  • Flat lighting on face
  • Visible upscale blur
Cause

Raw 128px output upscaled without face enhancer

vs
BCGI / Over-Smooth
100% blend
  • Hyper-sharp, synthetic pores
  • CGI / "Instagram filter" skin
  • Unnaturally crisp eyes
  • Uncanny valley effect
Cause

Face enhancer at 100% blend — AI hallucinated all texture

Select your symptom below

What does your face swap look like?

The Science

The 128×128 Bottleneck — Why Every Face Swap Tool Shares the Same Problem

Understanding the root cause helps you make informed decisions about your pipeline. Here's what's actually happening inside the model.

Every popular open-source face swap tool — FaceFusion, Rope, Reactor, VisoMaster — uses the same engine under the hood: InsightFace's inswapper_128.onnx model. The '128' in the name isn't just a version number. It's the resolution the model was trained on.

Face Swap Pipeline — Resolution at Each Stage

Detect
1920×1080
Crop
512×512
SwapBottleneck
128×128
Upscale
512×512
Paste
1920×1080

1920 × 1080 → 128 × 128 → 1920 × 1080 — Your face loses 99.6% of its pixel data at the swap stage, then gets stretched back. This is why every inswapper result needs a face enhancer.

How the Pipeline Works

1

Face Detection

A face detector (RetinaFace, YOLO, or ScrFD) finds faces in your target image/video and crops them out.

2

Downscale to 128×128

The cropped face is resized to exactly 128×128 pixels — regardless of your source image resolution. A 4K photo becomes 128 pixels wide.

3

Identity Transfer

The ArcFace encoder creates a 512-dimensional embedding of your source face. The ONNX decoder reconstructs a face that matches the target pose/expression but carries the source identity — all at 128×128.

4

Upscale & Paste

The tiny 128px face gets upscaled back to match the original crop size and blended onto the frame. This is where quality falls apart — you're stretching roughly 16,000 pixels to fill hundreds of thousands.

InsightFace's Own Benchmark Data

InsightFace published internal benchmarks comparing their open-source 128px model against the commercial 512px variant locked inside Picsi.ai. The numbers speak for themselves:

ModelResolutionRealism ↑ID Score ↑Access
inswapper_128128×12863.352.8Open-source (free)
inswapper_512_live512×51273.7 – 90.278.4Commercial only (Picsi.ai)

Realism scored by FID (Fréchet Inception Distance) — lower distance = more realistic. Scores normalized to 0–100 scale where 100 is indistinguishable from real. Source: InsightFace internal evaluation.

Realism Score Comparison (0–100)

inswapper_128128×128 · Open Source
63.3
inswapper_512512×512 · Commercial
90.2
0255075100
+42%

The commercial 512px model scores 42% higher in realism — but it's locked behind Picsi.ai. The open-source community is bridging this gap with 256px models.

The Next Generation Is Here

The open-source community hasn't stood still. ReSwapper (256px, MIT license) and FaceFusion's own HyperSwap models (256px, default in 3.x) are closing the gap. While they can't match the commercial 512px model, they represent a significant leap from the original 128px baseline.

The Core Insight

Two Roads to Plastic Skin

Here's the insight that most tutorials miss: plastic skin isn't one problem — it's two opposite problems that look deceptively similar. Most users are stuck at one extreme or the other.

No face enhancer — waxy, plastic skin
0% Enhancer — No enhancement
80% face enhancer blend — natural, optimal result
80% Enhancer — Sweet spot
100% face enhancer blend — over-processed CGI look
100% Enhancer — Over-enhanced

The Target Zone

Face enhancer at 65–80% blend. The restorer adds realistic texture while the original face data bleeds through to maintain natural variation. Skin looks real because it IS partially real.

Key Takeaway

The face enhancer is not a quality slider that you crank to max. It's a blend between the AI-reconstructed face and the original face data. The magic happens at 65–80%, where you get the restorer's texture without losing the natural imperfections that make faces look real.

The Fix

Optimal FaceFusion Settings for Photorealistic Output

These settings are distilled from hundreds of community tests, InsightFace benchmarks, and our own A/B comparisons. Copy them directly into your FaceFusion configuration.

!

The #1 Mistake

Never run face enhancer at 100% blend. This is the single most common cause of 'fake-looking' results. At 100%, the enhancer overwrites all original face data with AI-hallucinated texture. Drop to 65–80% and you'll see an immediate improvement.

Default FaceFusion settings — before optimization
Default Settings — before optimization
Optimized FaceFusion settings — photorealistic output
Optimized Settings — after optimization

Recommended Settings for Image Face Swap

Face Swap Model

inswapper_128_fp16 (or HyperSwap_256 on 3.x)

fp16 uses half the VRAM with negligible quality loss. HyperSwap is preferred if available.

Face Enhancer

CodeFormer (preferred) or GFPGAN 1.4

CodeFormer preserves more identity fidelity. GFPGAN produces sharper but slightly more 'enhanced' results.

Enhancer Blend Ratio

70–80%

Start at 75%. If the result looks too synthetic, drop to 65%. If it looks too soft, nudge up to 80%. Never exceed 85%.

Face Detector

RetinaFace

More accurate face alignment than YOLO. Slower but produces better landmark mapping for the swap.

Pixel Boost

512 or 768

Going beyond 768 has quadratic processing cost with diminishing quality returns. 512 is the sweet spot for most use cases.

Face Detector Score

0.5 (default)

Lower if faces aren't being detected in difficult angles. Don't go below 0.3 or you'll get false positives.

Settings Impact: Default vs Optimized

SettingDefaultOptimizedVisual Impact
Face EnhancerNoneGFPGAN 1.4 / CodeFormerMassive — eliminates waxy/sticker look entirely
Blend Ratio100%70–80%Critical — removes CGI/synthetic appearance
Pixel BoostOff (128→target)512 or 768Significant — adds face detail before paste
Face DetectorYOLORetinaFaceModerate — better landmark alignment
Face Mask Blur00.3Subtle — hides paste boundary artifacts

Model Deep-Dive

Face Swap & Enhancer Model Matrix

Not all models are equal. This matrix covers every swap model and face enhancer available in FaceFusion's ecosystem, with real-world quality and performance data.

Face Swap Models

ModelRes.QualitySpeedVRAMNotes
inswapper_128128px★★☆☆☆Fast~2 GBOriginal model. Baseline quality. Widest compatibility.
inswapper_128_fp16128px★★☆☆☆Fast~1 GBHalf-precision variant. Same quality, half the VRAM. Preferred over base.
ReSwapper 256256px★★★☆☆Medium~3 GBOpen-source reproduction at 2× resolution. MIT license. Measurable quality improvement.
HyperSwap 256Recommended256px★★★★☆Medium~3 GBFaceFusion 3.x default. Best open-source quality currently available.
inswapper_512_live512px★★★★★SlowN/ACommercial model. Locked behind Picsi.ai. Gold standard for quality.

Face Enhancer Models

ModelMax Res.QualitySpeedBest For
GFPGAN 1.4512px★★★★☆FastVideo workflows. Most temporally stable. Community favorite.
CodeFormerRecommended512px★★★★★MediumImage workflows. Best identity preservation. Handles occlusion well.
GPEN 256256px★★★☆☆FastLow-VRAM systems. Lightweight but limited detail.
GPEN 512512px★★★★☆MediumBalanced option. Good detail without heavy VRAM cost.
GPEN 10241024px★★★★☆SlowHigh-res photos. Excellent micro-detail reconstruction.
GPEN 20482048px★★★★★Very SlowPrint/production. Maximum detail but requires 8+ GB VRAM.
RestoreFormer++512px★★★★☆MediumDamaged/low-quality sources. Strongest restoration capability.

For most users: HyperSwap 256 + CodeFormer at 75% blend for images, GFPGAN 1.4 at 70% blend for video.

Beyond Face Swap

Alternative Approaches to High-Quality Face Transfer

Traditional face swap (detect → swap → enhance) isn't the only game in town. These alternative methods can produce superior results for specific use cases — at the cost of more complexity.

Intermediate★★★★★

Flux 2 Klein + BFS LoRA

Uses Flux's powerful image generation backbone with a face-swap LoRA for identity transfer. Produces the most photorealistic single-image results currently possible in open source.

Intermediate★★★★☆

Wan2.1 VACE

Video-native face transfer using Wan2.1's VACE (Video Aesthetic Control Engine). Generates entire video clips with identity transfer built into the generation process.

Beginner★★★★☆

ACE++ (Style Reference)

Uses style-reference conditioning to generate images that match a target identity. Less precise than face swap but more natural-looking, as the identity is baked into the generation rather than pasted on.

Advanced★★★★★

Custom LoRA Training

Train a face-specific LoRA on 15–30 photos of the target identity. The model learns the face at a deep level, producing the most consistent and highest-quality results across any pose, lighting, or expression.

Advanced★★★★★

DeepFaceLab (DFL)

The original deepfake tool. Trains a custom model for each source/target pair over hours. Produces the highest-quality video face swaps when given enough training time and data.

Tool Showdown

FaceFusion vs the Competition

All these tools use the same underlying inswapper model, but their UI, features, and default configurations produce very different experiences. Here's how they actually stack up.

ToolEase of UseMax QualitySpeedActive DevModelsReal-timePlatform
FaceFusion★★★★☆★★★★☆★★★★☆★★★★★★★★★★Yes (webcam)Windows / Linux / macOS
Rope★★★★★★★★☆☆★★★★★★★☆☆☆★★☆☆☆YesWindows
VisoMaster★★★☆☆★★★★☆★★★☆☆★★★★☆★★★★☆NoWindows / Linux
DeepFaceLab★☆☆☆☆★★★★★★☆☆☆☆★☆☆☆☆★★★☆☆NoWindows
Reactor (SD extension)★★★☆☆★★★☆☆★★★☆☆★★★☆☆★★☆☆☆NoCross-platform (A1111/Forge)

Power Users

The Advanced Quality Pipeline

For users who want absolute maximum quality, here's the full multi-stage pipeline used by professionals. This can be run in FaceFusion's CLI or as a ComfyUI node workflow.

ComfyUI Node Pipeline — Recommended Workflow

Face Detect

RetinaFace · 1080p

1/5
Face Swap

inswapper_128 · ONNX

2/5
Enhance

CodeFormer · 512px

3/5
Blend

face_enhancer_blend 70

4/5
Output

1920×1080 · Final

5/5
Face Detect
Face Swap
Enhance
Blend
Output
Data Flow

The 5-Stage Pipeline

1

Face Detection & Alignment

Use RetinaFace with a detection score of 0.5. This gives the most accurate facial landmark mapping, which directly affects how well the swapped face aligns with the target's pose and expression. Poor alignment is the #2 cause of uncanny results after blend ratio.

--face-detector-model retinaface --face-detector-score 0.5
2

Face Swap at Native Resolution

Run the face swap with Pixel Boost set to 512. This tells FaceFusion to upscale the 128px model output to 512px before pasting, which gives the face enhancer more detail to work with in the next stage.

--face-swapper-pixel-boost 512
3

Face Enhancement with Controlled Blend

Apply CodeFormer (for images) or GFPGAN 1.4 (for video) at 70–75% blend. This is where the magic happens: the enhancer reconstructs realistic skin texture, pore patterns, and micro-details, while the 25–30% original face data prevents the result from looking synthetic.

--face-enhancer-model codeformer --face-enhancer-blend 75
4

Color Correction & Mask Refinement

Apply face mask blur at 0.3–0.5 to feather the paste boundary. If there's a color mismatch between the swapped face and surrounding skin, use FaceFusion's color correction option or run a manual color grade pass.

--face-mask-blur 0.3
5

Final Output & Quality Check

Export at your target resolution. For video, use the temp-frame-format PNG option for maximum quality (larger files but no compression artifacts). Always review the output at 100% zoom — artifacts that are invisible at overview can ruin close-ups.

--temp-frame-format png --output-video-quality 95

Full CLI Command

ComfyUI Node Workflow

For ComfyUI users, the same pipeline can be built as a node graph: Load Image → FaceFusion Face Swap Node → CodeFormer Enhancement Node → Color Match Node → Save Image. The advantage of ComfyUI is that you can batch-process hundreds of images and fine-tune each stage independently. Popular node packs: ComfyUI-ReActor, ComfyUI-FaceRestore, ComfyUI-Impact-Pack.

FAQ

Frequently Asked Questions

  • The 'plastic skin' effect has two possible causes: (1) You're using the raw face swap output without a face enhancer — the 128×128 model can't produce realistic skin texture at higher resolutions. Fix: add GFPGAN 1.4 or CodeFormer as your face enhancer. (2) You're running the face enhancer at 100% blend — this overwrites all natural face variation with AI-hallucinated texture. Fix: reduce blend to 65–80%.

  • inswapper_128 is the face swap model created by InsightFace (the same team behind ArcFace). It was trained on 128×128 images because that was the practical resolution limit when the model was developed — higher resolution models require exponentially more training data and compute. InsightFace has a 512px commercial model (inswapper_512_live) but it's locked behind their Picsi.ai app.

  • The rubber face effect is usually caused by over-enhancement. Reduce your face enhancer blend ratio from 100% to 70–75%. If you're stacking multiple enhancers, remove all but one. Also check if you're applying sharpening filters after the face swap — these amplify the synthetic look.

  • For images: CodeFormer. It preserves more of the original identity and handles partially occluded faces better. For video: GFPGAN 1.4. It produces more temporally stable results with less frame-to-frame flickering. Both should be used at 65–80% blend, never 100%.

  • Start at 75% and adjust from there. If the result looks too synthetic/CGI, drop to 65%. If it looks too soft/waxy, nudge up to 80%. The optimal value depends on your source image quality and the specific enhancer model. Never exceed 85% — beyond that you lose the natural texture bleed-through that makes faces look real.

  • Yes, significantly. Pixel Boost upscales the 128px model output before pasting it onto the target frame. At 512, you get 4× the face detail. At 768, 6×. However, there's a quadratic cost increase: 768 takes roughly 2.3× longer than 512, and 1024 takes 4×. For most use cases, 512 is the sweet spot between quality and speed.

  • Video face swap has an additional challenge: temporal consistency. The face swap is applied independently to each frame, so slight variations in face detection, enhancement, and blending create visible flickering. Fixes: use GFPGAN (more temporally stable than CodeFormer), reduce blend ratio by 5% versus your image setting, and use RetinaFace for more consistent face detection across frames.

  • InsightFace's inswapper_512_live exists but is commercially locked behind their Picsi.ai app. You can't download or use it in FaceFusion. Open-source alternatives at higher resolution include ReSwapper (256px, MIT license) and FaceFusion's HyperSwap (256px, default in 3.x). These don't reach 512px quality but are a significant improvement over the 128px baseline.

  • ReSwapper is an open-source reproduction of the inswapper architecture trained at 256×256 resolution (2× the original). Created by researcher somanchiu on GitHub, it's available under MIT license. It produces measurably better output than inswapper_128 but requires more VRAM (~3 GB vs ~2 GB). If your hardware supports it, yes — it's a free quality upgrade.

  • Three strategies: (1) Use the fp16 model variant (inswapper_128_fp16) — same quality, half the VRAM. (2) Reduce Pixel Boost from 768 to 512 — minimal quality loss, significant VRAM savings. (3) Reduce execution threads to 1 — slower but uses the minimum amount of VRAM. Also ensure you're not running other GPU-intensive applications simultaneously.

  • The inswapper model struggles with extreme poses (profile views, looking up/down) because it was primarily trained on near-frontal faces. The identity embedding doesn't perfectly reconstruct features at oblique angles. Fixes: use a high-quality frontal source photo, enable face detection for all angles, and consider using multiple source photos at different angles if your tool supports it.

  • FaceFusion itself is open-source, but the inswapper_128 model has a non-commercial research license from InsightFace. For commercial work, you'd need to either license the model from InsightFace, use the commercially-licensed HyperSwap models in FaceFusion 3.x, or use alternative approaches like custom LoRA training that don't rely on inswapper.

  • All three use the same inswapper_128 model. FaceFusion has the most features, widest model support, and most active development (Gradio web UI). Rope is the simplest and fastest — click-and-go with real-time preview, but limited to Windows and fewer options. VisoMaster offers the most advanced face-editing controls (landmark adjustment, manual masking) and is gaining community traction, but has a steeper learning curve.

  • Color mismatch happens when the source face has different skin tone, lighting, or white balance than the target. FaceFusion has a built-in color correction option — enable it in the face swap settings. For manual fixes: adjust the face mask blur (0.3–0.5) to better blend the edges, and consider a light color grade in post-processing to match the face to the scene.

  • Minimum: NVIDIA GPU with 4 GB VRAM (GTX 1650 or equivalent) for basic face swap. Recommended: 8 GB VRAM (RTX 3060/3070) for face swap + enhancer + pixel boost. Ideal: 12+ GB VRAM (RTX 3080/4070 Ti or better) for maximum settings with video processing. AMD GPUs work via DirectML but are slower. Apple Silicon Macs work via CoreML with decent performance on M1 Pro and above.

Ready to Get Started

Start Creating Photorealistic Face Swaps

Apply everything you've learned in this guide. FaceFusion's web interface lets you configure all the settings we've covered — face enhancers, blend ratios, pixel boost, and model selection — without touching a command line.

Free tier available · No credit card required · All settings accessible