Skip to content

2D to 3D Generator — Internal Code Documentation

The Shoe 3D Generator runs on the wearfits-genai-api backend, which also serves the GenAI Try-On product. This document focuses on the shoe-3d-specific code paths while describing the shared infrastructure both products rely on.


Architecture

The application follows a layered architecture optimised for the Cloudflare Workers environment, where each request must complete within the platform's CPU and wall-clock limits. Long-running AI workloads are offloaded via Cloudflare Queues to a background processor, which in turn delegates GPU-intensive tasks to a Modal.com Python backend.

graph TD
    Entry[src/index.ts] --> App[src/app.ts]
    App --> Middleware[src/api/middleware/]
    App --> Routes[src/api/routes/]
    Routes --> Controllers[src/api/controllers/]
    Controllers --> Services[src/services/]
    Services --> Providers[src/providers/]
    Services --> Storage[src/storage/]

    subgraph "Async Loop"
        Queue[Cloudflare Queue] --> Consumer[src/queue/consumer.ts]
        Consumer --> Processor[src/queue/processor.ts]
        Processor --> ServiceLogic[src/services/]
    end

Layer Responsibilities

Layer Path Role
API src/api/ HTTP concerns: routing (Hono), schema validation (Zod), OpenAPI generation
Service src/services/ Business logic orchestration across providers and storage
Provider src/providers/ Normalised connectors to external AI services (Fal.ai, OpenRouter, OpenAI)
Queue src/queue/ Background job processing for long-running tasks that exceed Worker limits
Storage src/storage/ Abstractions over Cloudflare KV and R2

The high-level system topology, showing how edge computing (Cloudflare Workers) relates to external AI infrastructure (Modal, Fal.ai):

graph TD
    User[Web/Mobile Client] -->|HTTP| CFW[Cloudflare Workers - Hono]
    CFW -->|Auth| DB[(Prisma/PostgreSQL)]
    CFW -->|Queue Job| Queue[Cloudflare Queue]
    Queue -->|Process| Consumer[Queue Consumer]
    Consumer -->|External AI| Fal[Fal.ai - 3D Gen]
    Consumer -->|External AI| OpenRouter[OpenRouter - LLMs]
    Consumer -->|Heavy GPU| Modal[Modal.com - Python Backend]
    Modal -->|Body Mesh/Pose| SAM3D[SAM3D/MHR]
    Consumer -->|Storage| R2[(Cloudflare R2)]
    Consumer -->|Status| KV[(Cloudflare KV)]
    User -->|Poll Status| CFW

Directory Structure

src/
├── api/
│   ├── routes/          # Hono route definitions using zod-openapi
│   ├── controllers/     # Route handlers; translate HTTP requests to service calls
│   ├── middleware/      # Auth, logging, rate limiting, error handling
│   └── views/           # Server-side rendered HTML (monitoring dashboard)
├── services/            # Core business logic (virtual try-on, shoe 3D, caching)
├── providers/           # Connectors for AI (Fal.ai, OpenRouter) and other APIs
├── queue/               # Queue consumer and job processing logic
├── schemas/             # Shared Zod schemas for validation and API contracts
├── storage/             # R2 and KV storage implementations and factory
└── utils/               # Generic helpers (crypto, URL, image, etc.)

Data Model

The API uses a minimal Prisma schema whose primary purpose is validating SaaS user API keys.

erDiagram
    User ||--o{ ApiKey : "owns"
    User {
        string id PK
        string name
        string email
        string image
        datetime createdAt
    }
    ApiKey {
        string id PK
        string key "unique"
        boolean isActive
        datetime expiresAt
        int usageCount
        int maxUsage
        string userId FK
    }

Transient state lives outside the relational database:

  • Job Status — Cloudflare KV namespace WF_GENAI_JOB_STATUS
  • Result Assets — Cloudflare R2 bucket wf-genai-results
  • Metadata — TTL and owner hash stored as R2 object metadata alongside each result

Shoe 3D Generation Flow

The shoe-3d pipeline is a multi-stage process that moves from raw product photos through image correction, 3D generation, texture enhancement, and mesh optimisation before delivering a GLB file and webhook notification.

sequenceDiagram
    participant User
    participant Worker as API Worker
    participant Queue as Job Queue
    participant Modal as Modal Backend
    participant Fal as Fal.ai
    participant R2 as Storage (R2)

    User->>Worker: POST /shoe-3d (submit images)
    Worker->>Queue: Submit Job
    Worker-->>User: 202 Accepted (Job ID)

    Queue->>Worker: Consume Job
    Worker->>Modal: Correct Image (optional)
    Worker->>Fal: Submit 3D Generation
    Fal-->>Worker: GLB Result
    Worker->>Modal: Render Views & Enhance Texture (optional)
    Worker->>Worker: Optimise / Decimate Mesh
    Worker->>R2: Upload GLB & Preview Grid
    Worker->>User: Webhook Result (job.completed)

The caller receives an immediate 202 Accepted with a job ID and is notified of completion via webhook (or can poll GET /api/v1/jobs/{id}).


Texture Enhancement Pipeline

Texture enhancement is triggered by options.enhanceTexture: true on the shoe-3d request. Quality is controlled by options.genaiQuality:

Mode genaiQuality Views Grid Size Timeout
Default "default" 4-view 2048 × 2048 ~3 min
High "high" (default) 6-view 2048 × 3072 ~5 min

4-view layout: 3 side views at 30°/150°/270° (15° elevation) + top-down → 2×2 grid

6-view layout: 4 side views at 0°/90°/180°/270° + top + bottom → 2×3 grid

Pipeline Steps

  1. Render views — transparent background renders (1024×1024 each) via a single shared GL context
  2. Compose grid — views are composited on a white background (AI models handle white better than transparency)
  3. Send to Gemini — grid is sent to Gemini Flash (or Pro for packshot correction in high-quality mode) via OpenRouter with reference packshot images
  4. Validate grid alignment — per-view silhouette IoU check; computes an N×N IoU matrix and finds the optimal view permutation. If a swap is detected (requires >0.5 total IoU improvement to avoid false positives on white shoes), views are remapped automatically
  5. Logo refinement (optional, options.refineLogo: true) — a second AI pass focused solely on logos, brand text, and emblems, run between validation and projection so there is only one projection pass onto the original GLB
  6. Project back — enhanced views are projected onto the GLB texture using the shared projection algorithm (vectorised NumPy, normalDotView falloff, texture dilation)
  7. Color correction — three-stage correction: polynomial color mapping (K-Means swatches in Lab space), L-gamma brightness lift, and selective white boost for near-white pixels
  8. Auto smooth normals — angle-based normal smoothing (30° threshold); edges below the threshold receive smooth averaged normals, edges above remain sharp
  9. Compress GLB — texture resizing via v1-compress-glb to meet the WEARFITS 30 MB upload limit

Seven endpoints are deployed to Modal.com and run on T4 GPU hardware with ModernGL + EGL:

Endpoint URL Slug Description
Pose Transfer v1-pose-transfer MHR pose transfer
Body Mask from Size v1-body-mask-from-size Nearest-neighbour lookup against BodyM dataset
Image Resize v1-image-resize Resize and format conversion
Compose Grid v1-compose-grid Garment grid composition
Render GLB v1-render-glb Render GLB from 6 angles as a grid image (GPU)
Texture Enhance GLB v1-texture-enhance-glb Full AI texture enhancement pipeline (GPU)
Compress GLB v1-compress-glb Auto smooth normals + texture resize

Base URL pattern: https://wearfits--{endpoint}.modal.run

GPU rendering uses GPURenderWorker backed by ModernGL with EGL. GL contexts are shared across views within a single request to avoid redundant context creation and destruction.


Tech Stack

Component Technology
API framework Hono with zod-openapi
Runtime Cloudflare Workers (TypeScript)
Database Prisma (PostgreSQL) via Prisma Accelerate
Storage Cloudflare R2 (assets) + KV (job status)
Queues Cloudflare Queues
3D processing @gltf-transform/core, meshoptimizer
GPU backend Modal.com (Python 3.10+), ModernGL + EGL, T4 GPU
AI providers Fal.ai (3D generation), OpenRouter / Gemini (texture enhancement)
Testing Vitest with @cloudflare/vitest-pool-workers

Resilience

524 Timeout Handling

Cloudflare imposes a ~100 s origin timeout. The 6-view texture enhancement pipeline regularly exceeds this, causing a 524 error to be returned to the worker even though Modal continues executing.

Resolution strategy:

  1. The worker passes a job_id to Modal before the request is dispatched
  2. Modal always writes status.json to R2 at pose-transfer/texture-enhance-{job_id}-status/result.json upon completion
  3. On a 524 or timeout, the worker begins polling R2 for status.json every 10 seconds
  4. When status.json appears, it is treated identically to a successful HTTP response (same payload structure)

Fal.ai 504 Handling

Fal.ai occasionally reports a job as COMPLETED but returns 502/503/504 (downstream_service_unavailable) when the worker fetches the result — indicating the downstream worker lost the output.

The fal-3d.ts provider distinguishes RESULT_UNAVAILABLE (gateway error, re-submittable) from RESULT_FETCH_FAILED (other fetch failure). On RESULT_UNAVAILABLE, the shoe-3d polling loop automatically re-submits the same generation (reusing the fal CDN image URL, no re-upload) up to 2 times using withTracedRetry. Submit options are shared between the original and retry submissions to avoid configuration drift.


Known Technical Debt

High Priority

  • Usage trackingUserApiKeyService does not yet increment usageCount in the database after successful key validation (src/services/user-api-key-service.ts).
  • Multi-material textures — The GPU renderer in Modal handles multi-material GLBs incorrectly: all meshes share the first texture (tools/pose-transfer/gpu_uv_renderer.py).

Medium Priority

  • OpenRouter model IDs — Model IDs are hardcoded and may change without notice; consider making them configurable via KV or environment variables.
  • Webhook scalingWebhookService may hit memory limits when handling large metadata objects.

Maintenance

  • Localhost CORS — Explicit localhost CORS allowance in src/app.ts should be removed after the testing period.
  • Direct Gemini APIPhotoAnalysisService currently routes through OpenRouter; implementing direct Gemini API calls would reduce external dependency.