Skip to content

Modal Backend Worker

Python-based backend services deployed on Modal.com, providing: - Pose Transfer - Body pose manipulation using Meta's MHR model - Image Resize - Image resizing and format conversion - Garment Grid Composition - Combine garment images into labeled grids

Production Endpoints (7 deployed)

Base URL: https://wearfits--{endpoint}.modal.run
Authorization: Bearer <POSE_TRANSFER_API_KEY>
Endpoint Path Description
Pose Transfer v1-pose-transfer Transfer pose to body mesh
Body Mask from Size v1-body-mask-from-size Find matching body mask from measurements
Semantic Split GLB v1-semantic-split-glb Extract material masks (e.g. shoe sole) using AI. Automatically concatenates multi-mesh inputs.
Render GLB v1-render-glb Render GLB from 6 angles as grid image (GPU)
Texture Enhance GLB v1-texture-enhance-glb AI texture enhancement (4 or 6 views). Automatically merges multi-mesh GLBs while preserving dynamic base texture resolution.
Image Resize v1-image-resize Resize and convert images
Compose Grid v1-compose-grid Create labeled garment grid
Compress GLB v1-compress-glb Compress GLB by resizing textures

Local-Only Endpoints

These endpoints are available via modal serve for development but not deployed to production (to stay under Modal's 8 endpoint limit):

Endpoint Path Description
Texture Render v1-texture-render Render view + UV map for headless texture projection (GPU)
Texture Project v1-texture-project Project image(s) onto GLB texture with edge smoothing (GPU)
Health Check v1-health Service health status (services check config instead)

Architecture

┌─────────────────────────────────────┐
│  Cloudflare Worker (TypeScript)     │
│  /api/v1/virtual-fitting            │
└─────────────┬───────────────────────┘
              │ HTTP POST + Bearer Auth
┌─────────────────────────────────────┐
│  Modal.com Worker (Python)          │
│  wearfits-tools                     │
│                                     │
│  Services:                          │
│  - MHR pose transfer (PyMomentum)   │
│  - PIL image processing             │
│  - Results uploaded to R2           │
└─────────────────────────────────────┘

API Reference

POST / (Pose Transfer)

Transfer pose to a source body mesh while preserving body shape and facial expression.

Headers:

Content-Type: application/json
Authorization: Bearer <POSE_TRANSFER_API_KEY>

Request:

{
  "source_glb_url": "https://example.com/body.glb",
  "pose_id": "standing_arms_down",
  "render": true,
  "render_width": 1024,
  "render_height": 1024,
  "render_format": "webp",
  "render_quality": 85
}

Or with reference mesh instead of cached pose:

{
  "source_glb_url": "https://example.com/body.glb",
  "reference_glb_url": "https://example.com/reference.glb",
  "render": true
}

Parameters:

Field Type Required Default Description
source_glb_url string Yes - URL to source GLB (provides body shape)
pose_id string No* - Name of cached pose (e.g., "standing_arms_down")
reference_glb_url string No* - URL to reference GLB (provides pose)
render boolean No true Render visualization image
render_width integer No 1024 Visualization width in pixels
render_height integer No 1024 Visualization height in pixels
render_format string No "webp" Output format: "webp", "png", "jpg"
render_quality integer No 85 Quality for lossy formats (1-100)

*Either pose_id or reference_glb_url is required.

Response (200):

{
  "status": "completed",
  "job_id": "a1b2c3d4e5f6",
  "output_glb_url": "r2://wf-genai-results/pose-transfer/a1b2c3d4e5f6/result.glb",
  "visualization_url": "r2://wf-genai-results/pose-transfer/a1b2c3d4e5f6-depth/result.webp",
  "processing_time_ms": 7800
}

Note: URLs are in r2:// format. The Cloudflare Worker resolves these to signed public URLs.


POST / (Cache Pose)

Cache a pose from a reference GLB and return the .npz contents for poses/.

Headers:

Content-Type: application/json
Authorization: Bearer <POSE_TRANSFER_API_KEY>

Request:

{
  "reference_glb_url": "https://example.com/reference.glb",
  "pose_id": "my_new_pose",
  "pose_description": "Standing with arms out"
}

Response (200):

{
  "pose_id": "my_new_pose",
  "description": "Standing with arms out",
  "npz_base64": "<base64 data>"
}

Save to file:

curl -s https://wearfits--v1-cache-pose.modal.run \
  -H "Authorization: Bearer <POSE_TRANSFER_API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{"reference_glb_url":"https://.../reference.glb","pose_id":"my_new_pose","pose_description":"Standing with arms out"}' \
  | python - <<'PY'
import base64, json, sys
data = json.load(sys.stdin)
with open("tools/pose-transfer/poses/my_new_pose.npz", "wb") as f:
    f.write(base64.b64decode(data["npz_base64"]))
print("Saved tools/pose-transfer/poses/my_new_pose.npz")
PY


POST / (Body Mask from Size)

Find the closest matching body mask from the BodyM dataset based on measurements. This endpoint requires R2 because SAM3D consumes mask URLs; base64 responses are not supported here.

Headers:

Content-Type: application/json
Authorization: Bearer <POSE_TRANSFER_API_KEY>

Request:

{
  "measurements": {
    "height": 170,
    "chest": 90,
    "waist": 70,
    "hip": 95,
    "inseam": 78
  },
  "gender": "female"
}

Parameters:

Field Type Required Description
measurements.height number Yes Body height in cm (140-210)
measurements.chest number No Chest circumference in cm
measurements.waist number No Waist circumference in cm
measurements.hip number No Hip circumference in cm
measurements.inseam number No Inseam length in cm
gender string No "male" or "female" for gender-specific matching

At least height and one other measurement required.

Response (200):

{
  "front_mask_url": "r2://wf-genai-results/pose-transfer/mask-abc123-front/result.png",
  "side_mask_url": "r2://wf-genai-results/pose-transfer/mask-abc123-side/result.png",
  "cache_key": "abc123",
  "matched_subject": {
    "subject_id": "KIL2didw2nNdy66kJxKb6pXjzOtMfl8BG-bWtdlKy-k",
    "distance": 0.036,
    "gender": "female",
    "measurements": {
      "height": 169.5,
      "chest": 89.2,
      "waist": 71.1,
      "hip": 94.3,
      "inseam": 77.8
    }
  },
  "processing_time_ms": 1200
}

Note: If R2 is not configured, this endpoint returns 503 because SAM3D must fetch masks via URL.

Dataset: BodyM dataset with 2,018 subjects (heights 141-198cm). Heights outside this range match to closest available body.


POST / (Render GLB)

Render a GLB 3D model from 6 angles (front, back, left, right, top, bottom) and return a composite grid image. Useful for quick quality inspection of 3D models.

Headers:

Content-Type: application/json
Authorization: Bearer <POSE_TRANSFER_API_KEY>

Request (with URL):

{
  "glb_url": "https://example.com/model.glb",
  "width": 512,
  "height": 512,
  "format": "webp",
  "quality": 85,
  "upload": true
}

Request (with base64):

{
  "glb_base64": "<base64-encoded GLB data>",
  "width": 512,
  "height": 512,
  "format": "png",
  "upload": false
}

Parameters:

Field Type Required Default Description
glb_url string No* - URL to GLB file
glb_base64 string No* - Base64-encoded GLB data
width integer No 512 Width of each render in pixels
height integer No 512 Height of each render in pixels
format string No "webp" Output format: "webp", "png", "jpg"
quality integer No 85 Quality for lossy formats (1-100)
upload boolean No true Upload to R2 or return base64

*Either glb_url or glb_base64 is required.

Response (upload=true):

{
  "status": "completed",
  "url": "r2://wf-genai-results/render-glb-abc123/result.webp",
  "width": 1576,
  "height": 1104,
  "processing_time_ms": 2500
}

Response (upload=false):

{
  "status": "completed",
  "base64": "/9j/4AAQSkZJRgABAQAA...",
  "data_url": "data:image/webp;base64,/9j/4AAQSkZJRgABAQAA...",
  "width": 1576,
  "height": 1104,
  "processing_time_ms": 2500
}

Grid Layout:

┌─────────────────────────────────────────────────┐
│   Front      │    Back      │    Left           │
├──────────────┼──────────────┼───────────────────┤
│   Right      │    Top       │    Bottom         │
└──────────────┴──────────────┴───────────────────┘

Grid dimensions are (3 × width + padding) × (2 × height + padding + labels)


POST / (Semantic Split GLB)

Extract semantic material masks from a 3D GLB model using GenAI and OpenCV. Renders 6 orthogonal and angled views of the model, leverages a fast generative vision pipeline to identify segments (e.g. shoe upper vs. sole), and maps the segment contours back into the original UV map using projection, saving as a Solid Color PNG map.

Generally triggered via the Cloudflare API /api/v1/texture-painter/split-materials method multiview_ai.

Headers:

Content-Type: application/json
Authorization: Bearer <POSE_TRANSFER_API_KEY>

Request:

{
  "glb_url": "https://example.com/model.glb",
  "texture_type": "shoe",
  "num_views": 6,
  "fov": 45
}

Parameters:

Field Type Required Default Description
glb_url string Yes - URL to GLB file
texture_type string No "shoe" Object type prompt guiding semantic logic (e.g., shoe, shirt)
num_views integer No 6 Number of rendered views to analyze
fov number No 45 Field of view in degrees

Response:

{
  "status": "completed",
  "texture_base64": "iVBORw0KGgo...",
  "mask_width": 2048,
  "mask_height": 2048,
  "processing_time_ms": 17850
}


POST / (Texture Render)

Render a GLB model for headless texture projection workflow. Returns a textured view (PNG) and UV map (Float32 binary) that can be used to project 2D edits back onto the 3D model's textures.

This endpoint enables the same texture projection workflow as the browser-based Texture Painter tool, but via API for automation and AI-driven texture editing.

Headers:

Content-Type: application/json
Authorization: Bearer <POSE_TRANSFER_API_KEY>

Request:

{
  "glb_url": "https://example.com/model.glb",
  "camera_position": [0, 0, 3],
  "camera_target": [0, 0, 0],
  "fov": 45,
  "width": 1024,
  "height": 1024
}

Parameters:

Field Type Required Default Description
glb_url string Yes - URL to GLB file
camera_position array No [0, 0, 3] Camera position [x, y, z]
camera_target array No [0, 0, 0] Point camera looks at [x, y, z]
fov number No 45 Field of view in degrees
width integer No 1024 Render width in pixels
height integer No 1024 Render height in pixels

Response (with R2):

{
  "status": "completed",
  "view_url": "https://api.wearfits.com/files/signed?key=...",
  "uv_map_url": "https://api.wearfits.com/files/signed?key=...",
  "uv_map_width": 1024,
  "uv_map_height": 1024,
  "mesh_info": [
    {
      "index": 0,
      "name": "geometry_0",
      "texture_size": [2048, 2048],
      "has_texture": true,
      "has_uvs": true
    }
  ],
  "processing_time_ms": 3200
}

Response (without R2):

{
  "status": "completed",
  "view_base64": "/9j/4AAQSkZJRgABAQAA...",
  "uv_map_base64": "AAAAAAAAAAAAAAAA...",
  "uv_map_width": 1024,
  "uv_map_height": 1024,
  "mesh_info": [...],
  "processing_time_ms": 3200
}

UV Map Format:

The UV map is a raw Float32 binary file with 4 channels per pixel (RGBA):

Channel Value Description
R 0.0-1.0 U texture coordinate
G 0.0-1.0 V texture coordinate
B mesh_index/255 Mesh index (1-indexed, 0 = background)
A 0.0-1.0 normal·view (for falloff, 1 = facing camera)

Usage with TextureProjectionService:

// 1. Render view and UV map
const renderResult = await textureProjection.renderViewAndUVMap({
  glbUrl: 'https://example.com/model.glb',
  camera: { position: [0, 0, 3], target: [0, 0, 0], fov: 45 },
  width: 1024,
  height: 1024
});

// 2. Edit the view (e.g., via AI or manual editing)
const editedImageUrl = await editImage(renderResult.viewUrl);

// 3. Project edits back onto GLB (include meshInfo for reliable texture mapping)
const modifiedGlb = await textureProjection.projectImage({
  glbUrl: 'https://example.com/model.glb',
  editedImageUrl: editedImageUrl,
  uvMapUrl: renderResult.uvMapUrl,
  uvMapWidth: renderResult.uvMapWidth,
  uvMapHeight: renderResult.uvMapHeight,
  meshInfo: renderResult.meshInfo  // Pass through for name-based texture alignment
});

POST / (Texture Project)

Project one or more 2D images onto a GLB model's texture using GPU-accelerated UV rendering with edge smoothing and texture dilation. Supports single or multi-projection (multiple images from different cameras applied sequentially in one call).

Camera position can be provided in the request JSON per projection, or extracted automatically from PNG metadata (wearfits-projection tEXt chunk embedded by the browser texture painter). JSON takes precedence over PNG metadata.

Headers:

Content-Type: application/json
Authorization: Bearer <POSE_TRANSFER_API_KEY>

Request (single projection — backward compatible):

{
  "glb_url": "https://example.com/model.glb",
  "projection_url": "https://example.com/edited-view.png",
  "camera_position": [0, 0, 3],
  "camera_target": [0, 0, 0],
  "fov": 45,
  "width": 1024,
  "height": 1024
}

Request (multi-projection):

{
  "glb_url": "https://example.com/model.glb",
  "projections": [
    {
      "image_url": "https://example.com/front-edit.png",
      "camera_position": [0, 0, 3],
      "camera_target": [0, 0, 0],
      "fov": 45
    },
    {
      "image_url": "https://example.com/back-edit.png",
      "camera_position": [0, 0, -3],
      "camera_target": [0, 0, 0],
      "fov": 45
    },
    {
      "image_url": "https://example.com/side-edit-with-metadata.png"
    }
  ],
  "width": 1024,
  "height": 1024,
  "strip_pbr": true
}

Parameters:

Field Type Required Default Description
glb_url string Yes - URL to GLB file (or ZIP containing GLB)
projection_url string No* - Single projection image URL (backward compat)
projections array No* - Multi-projection array (see below)
camera_position array No [0,0,3] Camera [x,y,z] for single projection
camera_target array No [0,0,0] Camera look-at for single projection
fov number No 45 FOV for single projection
width integer No 1024 Render width
height integer No 1024 Render height
uv_supersample integer No 2 UV map supersampling factor
falloff_start_angle number No 70 Edge falloff start angle (degrees)
falloff_end_angle number No 85 Edge falloff end angle (degrees)
dilation_iterations integer No 8 Texture dilation passes
strip_pbr boolean No true Remove PBR textures for flat look

*Either projection_url or projections is required.

Projection object fields:

Field Type Required Default Description
image_url string Yes - URL to RGBA PNG projection image
camera_position array No From PNG metadata Camera [x,y,z] position
camera_target array No From PNG metadata Camera look-at target
fov number No From PNG metadata, then 45 Field of view

Camera resolution order: JSON fields → PNG wearfits-projection tEXt metadata → skip projection.

Response:

{
  "status": "completed",
  "glb_url": "https://api.wearfits.com/files/signed?key=...",
  "view_url": "https://api.wearfits.com/files/signed?key=...",
  "projections": [
    {"index": 0, "status": "success", "projected_pixels": 45230},
    {"index": 1, "status": "success", "projected_pixels": 38100},
    {"index": 2, "status": "skipped", "reason": "No camera position in request or PNG metadata"}
  ],
  "total_projected_pixels": 83330,
  "processing_time_ms": 8500
}

Per-projection status values:

Status Description
success Projection applied, projected_pixels shows count
skipped No camera found or no image URL, reason explains why
error Projection failed (download error, rendering error), reason explains why

PNG Camera Metadata Format:

The browser texture painter embeds camera state in PNG tEXt chunks with key wearfits-projection:

{
  "version": 1,
  "camera": {
    "position": [x, y, z],
    "target": [x, y, z],
    "fov": 45
  },
  "resolution": 1024,
  "timestamp": "2026-01-27T12:00:00.000Z"
}


POST /resize (Image Resize)

Resize and convert image format. Useful for normalizing person photos and garment images.

Request:

{
  "image_url": "https://example.com/photo.jpg",
  "max_width": 1024,
  "max_height": 1024,
  "format": "webp",
  "quality": 85,
  "fit": "contain",
  "upload": true
}

Parameters:

Field Type Required Default Description
image_url string Yes - URL of image to resize
max_width integer No 1024 Maximum output width
max_height integer No 1024 Maximum output height
format string No "webp" Output format: "webp", "png", "jpg"
quality integer No 85 Quality for lossy formats (1-100)
fit string No "contain" Resize mode (see below)
upload boolean No true Upload to R2 or return base64

Fit Modes:

Mode Description
contain Fit within box, maintain aspect ratio, don't upscale
cover Fill box, crop excess, maintain aspect ratio
exact Exact size (may distort aspect ratio)

Response (upload=true):

{
  "status": "completed",
  "url": "r2://wf-genai-results/resize-abc123/result.webp",
  "width": 1024,
  "height": 768,
  "format": "webp",
  "processing_time_ms": 450
}

Response (upload=false):

{
  "status": "completed",
  "base64": "/9j/4AAQSkZJRgABAQAA...",
  "data_url": "data:image/webp;base64,/9j/4AAQSkZJRgABAQAA...",
  "width": 1024,
  "height": 768,
  "format": "webp",
  "processing_time_ms": 450
}


POST /compose-grid (Garment Grid)

Compose multiple garment images into a labeled grid for AI try-on. Images can be provided as HTTP/HTTPS URLs or base64 data URLs.

Request:

{
  "rows": [
    {
      "label": "TOP GARMENT",
      "images": [
        "https://example.com/shirt-front.jpg",
        "https://example.com/shirt-back.jpg"
      ]
    },
    {
      "label": "BOTTOM GARMENT",
      "images": ["https://example.com/pants.jpg"]
    },
    {
      "label": "SHOES",
      "images": ["https://example.com/shoes.jpg"]
    }
  ],
  "max_cell_width": 1024,
  "max_cell_height": 1024,
  "min_cell_size": 512,
  "format": "webp",
  "quality": 85,
  "upload": true
}

Parameters:

Field Type Required Default Description
rows array Yes - List of row definitions
rows[].label string Yes - Row label (e.g., "TOP GARMENT")
rows[].images array Yes - Image URLs (1-2 per row)
max_cell_width integer No 1024 Max width per cell
max_cell_height integer No 1024 Max height per cell
min_cell_size integer No 512 Min size for shorter edge
format string No "webp" Output format
quality integer No 85 Quality for lossy formats
upload boolean No true Upload to R2 or return base64

Standard Labels:

Label Use For
TOP GARMENT Shirts, blouses, jackets, sweaters
BOTTOM GARMENT Pants, skirts, shorts
FULL BODY GARMENT Dresses, jumpsuits, rompers
SHOES Any footwear

Grid Layout:

┌─────────────────────────────────────────────┐
│              TOP GARMENT                    │
├─────────────────┬───────────────────────────┤
│   [image 1]     │   [image 2]               │
├─────────────────┴───────────────────────────┤
│            BOTTOM GARMENT                   │
├─────────────────┬───────────────────────────┤
│   [image 1]     │                           │
├─────────────────┴───────────────────────────┤
│                SHOES                        │
├─────────────────┬───────────────────────────┤
│   [image 1]     │                           │
└─────────────────┴───────────────────────────┘

Response:

{
  "status": "completed",
  "url": "r2://wf-genai-results/grid-abc123/result.webp",
  "width": 2088,
  "height": 1600,
  "rows": 3,
  "format": "webp",
  "processing_time_ms": 1200
}


GET /health

Health check endpoint (no auth required).

Response:

{
  "status": "ok",
  "engine": "ready"
}


Deployment

Prerequisites

  1. Modal CLI installed and authenticated to the wearfits workspace:
    pip install modal
    python -m modal setup --profile wearfits
    python -m modal profile activate wearfits
    

Important: Secrets are workspace-specific. Verify you're in the wearfits workspace: - Dashboard: https://modal.com/apps/wearfits/main/deployed - CLI: python -m modal profile list should show wearfits as active

  1. Modal secret wearfits-r2 with R2 credentials:

    modal secret create wearfits-r2 \
      R2_ENDPOINT_URL=https://<account_id>.r2.cloudflarestorage.com \
      R2_ACCESS_KEY_ID=<access_key> \
      R2_SECRET_ACCESS_KEY=<secret_key> \
      R2_BUCKET_NAME=wf-genai-results
    

  2. Modal secret wearfits-api with API key:

    modal secret create wearfits-api \
      POSE_TRANSFER_API_KEY=<your_api_key>
    

Deploy

cd tools/pose-transfer
modal deploy modal_app.py

Output:

✓ Created web endpoint => https://wearfits--v1-pose-transfer.modal.run
✓ Created web endpoint => https://wearfits--v1-render-glb.modal.run
✓ Created web endpoint => https://wearfits--v1-texture-render.modal.run
✓ Created web endpoint => https://wearfits--v1-texture-project.modal.run
✓ Created web endpoint => https://wearfits--v1-image-resize.modal.run
✓ Created web endpoint => https://wearfits--v1-compose-grid.modal.run
✓ Created web endpoint => https://wearfits--v1-health.modal.run

Test Deployment

# Health check
curl https://wearfits--v1-health.modal.run

# Pose transfer
curl -X POST https://wearfits--v1-pose-transfer.modal.run \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
    "source_glb_url": "https://example.com/body.glb",
    "pose_id": "standing_arms_down"
  }'

# Image resize
curl -X POST https://wearfits--v1-image-resize.modal.run \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
    "image_url": "https://example.com/photo.jpg",
    "max_width": 1024,
    "format": "webp"
  }'

# Garment grid
curl -X POST https://wearfits--v1-compose-grid.modal.run \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
    "rows": [
      {"label": "TOP GARMENT", "images": ["https://example.com/shirt.jpg"]}
    ]
  }'

# Render GLB (6-angle grid)
curl -X POST https://wearfits--v1-render-glb.modal.run \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
    "glb_url": "https://example.com/model.glb",
    "width": 512,
    "height": 512,
    "format": "webp"
  }'

# Texture render (for projection workflow)
curl -X POST https://wearfits--v1-texture-render.modal.run \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
    "glb_url": "https://example.com/model.glb",
    "camera_position": [0, 0, 3],
    "camera_target": [0, 0, 0],
    "width": 1024,
    "height": 1024
  }'

# Texture project (single projection)
curl -X POST https://wearfits--v1-texture-project.modal.run \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
    "glb_url": "https://example.com/model.glb",
    "projection_url": "https://example.com/edited-view.png",
    "camera_position": [0, 0, 3],
    "camera_target": [0, 0, 0],
    "fov": 45
  }'

# Texture project (multi-projection)
curl -X POST https://wearfits--v1-texture-project.modal.run \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
    "glb_url": "https://example.com/model.glb",
    "projections": [
      {"image_url": "https://example.com/front.png", "camera_position": [0,0,3], "camera_target": [0,0,0], "fov": 45},
      {"image_url": "https://example.com/back.png", "camera_position": [0,0,-3], "camera_target": [0,0,0], "fov": 45}
    ]
  }'

View Logs

modal app logs wearfits-tools

Files

File Description
modal_app.py Modal app definition with all web endpoints (CPU + GPU workers)
pose_transfer.py Core pose transfer logic using MHR
pymomentum_fitting.py Meta's PyMomentum-based fitting (hierarchical IK solver)
mask_lookup.py Body mask lookup from measurements (BodyM dataset)
gpu_uv_renderer.py GPU UV map + textured view rendering (ModernGL/EGL, NVIDIA T4 via EGL ICD). Functions accept ctx param for context reuse.
projection_pipeline.py Headless texture projection pipeline (multi-projection, shared GL context, PNG metadata)
texture_enhance_pipeline.py AI texture enhancement: render grid (4 or 6 views), enhance with AI, validate/fix swaps, project back. Per-step timing logged.
color_correction.py Polynomial color correction (Finlayson 2015) + luminance/white boost
texture_projection.py GLB loading, UV map rendering, view rendering for texture projection
file_assets.py Asset file paths for fitting
render_utils.py Mesh rendering utilities (multi-angle grids)
image_utils.py Image processing utilities (resize, format conversion)
http_utils.py HTTP download utilities with retry logic
test_projection_pipeline.py Tests for projection pipeline (local + Modal GPU)
poses/ Cached pose files (.npz format)
assets/ Fitting assets (head_hand_mask.npz)

Cached Poses

Pre-extracted poses in poses/ directory:

Pose ID Description
default Alias for system default pose (configurable via DEFAULT_POSE_ID, currently girl_pose)
standing_arms_down Natural standing with arms relaxed at sides
man_pose Male standing pose
girl_pose Female standing pose
shoe_girl_pose Standing looking down at shoes, one leg forward - ideal for shoe try-on

Use the helper script to create a pose template directly from an image:

npx tsx scripts/create-pose-template.ts <image_path> <pose_id> [description]

# Example:
npx tsx scripts/create-pose-template.ts assets/test/my-pose.jpg my_pose "Standing with hands on hips"

The script will: 1. Upload the image to fal.ai 2. Extract body mesh using SAM3D 3. Cache the pose via Modal and save the .npz file

Then redeploy: modal deploy modal_app.py

Add New Pose Manually

If you already have a GLB file with the desired pose:

  1. Cache via Modal v1-cache-pose and save to poses/: bash curl -s https://wearfits--v1-cache-pose.modal.run \ -H "Authorization: Bearer <POSE_TRANSFER_API_KEY>" \ -H "Content-Type: application/json" \ -d '{"reference_glb_url":"https://.../reference.glb","pose_id":"my_new_pose","pose_description":"Pose description"}' \ | python - <<'PY' import base64, json, sys data = json.load(sys.stdin) with open("tools/pose-transfer/poses/my_new_pose.npz", "wb") as f: f.write(base64.b64decode(data["npz_base64"])) print("Saved tools/pose-transfer/poses/my_new_pose.npz") PY
  2. Commit the new .npz file to poses/
  3. Redeploy: modal deploy modal_app.py

Local Development

cd tools/pose-transfer
modal serve modal_app.py
# Creates temporary endpoints: https://wearfits--v1-{endpoint}-dev.modal.run

This uses the same container image as production with all dependencies.

CLI (Requires Local Dependencies)

Local CLI requires conda dependencies that are difficult to install outside Modal:

# Requires conda/mamba environment with:
# - pymomentum-cpu (from conda-forge)
# - ezc3d, assimp, urdfdom, suitesparse

python pose_transfer.py \
  --source body.glb \
  --pose standing_arms_down \
  --output posed.glb \
  --render

python pose_transfer.py --list-poses

Environment Variables

Set via Modal secrets:

Secret Variable Description
wearfits-r2 R2_ENDPOINT_URL Cloudflare R2 S3-compatible endpoint
wearfits-r2 R2_ACCESS_KEY_ID R2 access key
wearfits-r2 R2_SECRET_ACCESS_KEY R2 secret key
wearfits-r2 R2_BUCKET_NAME R2 bucket name for results
wearfits-api POSE_TRANSFER_API_KEY API key for authentication

Integration with WEARFITS API

The Cloudflare Worker calls Modal endpoints via service classes:

// In wrangler.jsonc
"POSE_TRANSFER_API_URL": "https://wearfits--v1-pose-transfer.modal.run"
"POSE_TRANSFER_API_KEY": "<set via wrangler secret>"

// Pose Transfer Service
import { createPoseTransferService } from './services/pose-transfer-service';
const poseTransfer = createPoseTransferService(env);
await poseTransfer.applyPose({ sourceGlbUrl, poseId: 'standing_arms_down' });

// Image Processing Service
import { createImageProcessingService } from './services/image-processing-service';
const imageService = createImageProcessingService(env);
await imageService.resize(imageUrl, { maxWidth: 1024, format: 'webp' });
await imageService.composeGrid(rows, { format: 'webp' });

The API then resolves r2:// URLs to signed public URLs using the R2 storage service.


Image Processing Details

Resolution Standards

Image Type Resolution Format
Silhouette visualization 1024×1024 WebP
Person/selfie (normalized) 1024×1024 box WebP
Garment grid ~2048×2048 max WebP
Individual garments 512-1024 shorter edge WebP

WebP Benefits

  • 60-70% smaller file size compared to PNG
  • Lossy and lossless compression support
  • Transparency support (unlike JPEG)
  • Fast decoding in modern browsers

MHR Model Details

The pose transfer manipulates MHR parameters using PyMomentum's hierarchical IK solver:

Parameter Size Source
identity_coeffs 45 Preserved from source (body shape)
lbs_model_params 204 From reference/cache (pose)
face_expr_coeffs 72 Preserved from source (expression)

Fitting Stages: 1. Stage 0: Face rigid transformation 2. Stage 1.0: Face identity 3. Stage 1.1: Face expression 4. Stage 2: Body rigid transform 5. Stage 3: Torso and limb roots 6. Stage 4: Full limbs (excluding hands) 7. Stage 5: All parameters

Input format: GLB from SAM3D (geometry_0 = body mesh, 18439 vertices at LOD1)


Troubleshooting

GPU Rendering (NVIDIA EGL)

The gpu_image in modal_app.py includes an NVIDIA EGL ICD config file at /usr/share/glvnd/egl_vendor.d/10_nvidia.json. This is critical — without it, libglvnd only finds Mesa's EGL → llvmpipe software rendering (100-1000x slower). Modal mounts NVIDIA drivers at runtime but the container needs the ICD JSON to discover them. On startup, GPURenderWorker.setup() logs GL_RENDERER — verify it shows Tesla T4, not llvmpipe.

GL contexts are shared across multiple renders via the ctx parameter on render_lit_view_transparent(), render_view_and_uv_map_gpu(), and _render_result_view(). The texture enhancement pipeline creates one context for all view renders and one for all projection renders, avoiding redundant context creation/destruction.

Texture Enhancement 524 Resilience

The v1-texture-enhance-glb endpoint sits behind Cloudflare's proxy on modal.run (which has a ~100s origin timeout). For 6-view mode, the total pipeline time (~150s) exceeds this limit, causing Cloudflare to return HTTP 524. However, Modal continues executing and uploads results to R2.

The worker handles this via R2 polling: 1. Worker generates a job_id and passes it to Modal in the request body 2. Modal uses this job_id for R2 paths and writes a status.json to pose-transfer/texture-enhance-{job_id}-status/result.json after completion 3. On 524 or timeout, the worker polls R2 for status.json every 10s 4. status.json contains all result URLs (enhanced GLB, grid input/output, per-view URLs, timing, validation)

Cold Start Slow (~25s)

First request after idle loads the MHR model and initializes PyMomentum. Subsequent requests are fast (~8s). The scaledown_window=60 keeps containers warm for 60 seconds.

"pymomentum-cpu not found"

Ensure pymomentum-cpu is in the micromamba_install list in modal_app.py (not pip_install).

"FBX file not found"

MHR assets not downloaded correctly. Check the run_commands in modal_app.py downloads and unzips to /root/assets/.

R2 Upload Fails

Check Modal secrets are correctly configured:

modal secret list
modal secret show wearfits-r2
v1-body-mask-from-size will return 503 if R2 is missing because SAM3D requires mask URLs.

401 Unauthorized

Missing or invalid API key. Ensure Authorization: Bearer <key> header is included.

Image Processing Errors

  • "image_url is required" - Missing required parameter
  • "Failed to fetch" - Image URL not accessible from Modal servers
  • "No valid images found" - All image URLs failed to load

References