Modal Backend Worker
Python-based backend services deployed on Modal.com, providing: - Pose Transfer - Body pose manipulation using Meta's MHR model - Image Resize - Image resizing and format conversion - Garment Grid Composition - Combine garment images into labeled grids
Production Endpoints (7 deployed)
| Endpoint | Path | Description |
|---|---|---|
| Pose Transfer | v1-pose-transfer |
Transfer pose to body mesh |
| Body Mask from Size | v1-body-mask-from-size |
Find matching body mask from measurements |
| Semantic Split GLB | v1-semantic-split-glb |
Extract material masks (e.g. shoe sole) using AI. Automatically concatenates multi-mesh inputs. |
| Render GLB | v1-render-glb |
Render GLB from 6 angles as grid image (GPU) |
| Texture Enhance GLB | v1-texture-enhance-glb |
AI texture enhancement (4 or 6 views). Automatically merges multi-mesh GLBs while preserving dynamic base texture resolution. |
| Image Resize | v1-image-resize |
Resize and convert images |
| Compose Grid | v1-compose-grid |
Create labeled garment grid |
| Compress GLB | v1-compress-glb |
Compress GLB by resizing textures |
Local-Only Endpoints
These endpoints are available via modal serve for development but not deployed to production (to stay under Modal's 8 endpoint limit):
| Endpoint | Path | Description |
|---|---|---|
| Texture Render | v1-texture-render |
Render view + UV map for headless texture projection (GPU) |
| Texture Project | v1-texture-project |
Project image(s) onto GLB texture with edge smoothing (GPU) |
| Health Check | v1-health |
Service health status (services check config instead) |
Architecture
┌─────────────────────────────────────┐
│ Cloudflare Worker (TypeScript) │
│ /api/v1/virtual-fitting │
└─────────────┬───────────────────────┘
│ HTTP POST + Bearer Auth
▼
┌─────────────────────────────────────┐
│ Modal.com Worker (Python) │
│ wearfits-tools │
│ │
│ Services: │
│ - MHR pose transfer (PyMomentum) │
│ - PIL image processing │
│ - Results uploaded to R2 │
└─────────────────────────────────────┘
API Reference
POST / (Pose Transfer)
Transfer pose to a source body mesh while preserving body shape and facial expression.
Headers:
Request:
{
"source_glb_url": "https://example.com/body.glb",
"pose_id": "standing_arms_down",
"render": true,
"render_width": 1024,
"render_height": 1024,
"render_format": "webp",
"render_quality": 85
}
Or with reference mesh instead of cached pose:
{
"source_glb_url": "https://example.com/body.glb",
"reference_glb_url": "https://example.com/reference.glb",
"render": true
}
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
source_glb_url |
string | Yes | - | URL to source GLB (provides body shape) |
pose_id |
string | No* | - | Name of cached pose (e.g., "standing_arms_down") |
reference_glb_url |
string | No* | - | URL to reference GLB (provides pose) |
render |
boolean | No | true |
Render visualization image |
render_width |
integer | No | 1024 |
Visualization width in pixels |
render_height |
integer | No | 1024 |
Visualization height in pixels |
render_format |
string | No | "webp" |
Output format: "webp", "png", "jpg" |
render_quality |
integer | No | 85 |
Quality for lossy formats (1-100) |
*Either pose_id or reference_glb_url is required.
Response (200):
{
"status": "completed",
"job_id": "a1b2c3d4e5f6",
"output_glb_url": "r2://wf-genai-results/pose-transfer/a1b2c3d4e5f6/result.glb",
"visualization_url": "r2://wf-genai-results/pose-transfer/a1b2c3d4e5f6-depth/result.webp",
"processing_time_ms": 7800
}
Note: URLs are in
r2://format. The Cloudflare Worker resolves these to signed public URLs.
POST / (Cache Pose)
Cache a pose from a reference GLB and return the .npz contents for poses/.
Headers:
Request:
{
"reference_glb_url": "https://example.com/reference.glb",
"pose_id": "my_new_pose",
"pose_description": "Standing with arms out"
}
Response (200):
{
"pose_id": "my_new_pose",
"description": "Standing with arms out",
"npz_base64": "<base64 data>"
}
Save to file:
curl -s https://wearfits--v1-cache-pose.modal.run \
-H "Authorization: Bearer <POSE_TRANSFER_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"reference_glb_url":"https://.../reference.glb","pose_id":"my_new_pose","pose_description":"Standing with arms out"}' \
| python - <<'PY'
import base64, json, sys
data = json.load(sys.stdin)
with open("tools/pose-transfer/poses/my_new_pose.npz", "wb") as f:
f.write(base64.b64decode(data["npz_base64"]))
print("Saved tools/pose-transfer/poses/my_new_pose.npz")
PY
POST / (Body Mask from Size)
Find the closest matching body mask from the BodyM dataset based on measurements. This endpoint requires R2 because SAM3D consumes mask URLs; base64 responses are not supported here.
Headers:
Request:
{
"measurements": {
"height": 170,
"chest": 90,
"waist": 70,
"hip": 95,
"inseam": 78
},
"gender": "female"
}
Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
measurements.height |
number | Yes | Body height in cm (140-210) |
measurements.chest |
number | No | Chest circumference in cm |
measurements.waist |
number | No | Waist circumference in cm |
measurements.hip |
number | No | Hip circumference in cm |
measurements.inseam |
number | No | Inseam length in cm |
gender |
string | No | "male" or "female" for gender-specific matching |
At least height and one other measurement required.
Response (200):
{
"front_mask_url": "r2://wf-genai-results/pose-transfer/mask-abc123-front/result.png",
"side_mask_url": "r2://wf-genai-results/pose-transfer/mask-abc123-side/result.png",
"cache_key": "abc123",
"matched_subject": {
"subject_id": "KIL2didw2nNdy66kJxKb6pXjzOtMfl8BG-bWtdlKy-k",
"distance": 0.036,
"gender": "female",
"measurements": {
"height": 169.5,
"chest": 89.2,
"waist": 71.1,
"hip": 94.3,
"inseam": 77.8
}
},
"processing_time_ms": 1200
}
Note: If R2 is not configured, this endpoint returns 503 because SAM3D must fetch masks via URL.
Dataset: BodyM dataset with 2,018 subjects (heights 141-198cm). Heights outside this range match to closest available body.
POST / (Render GLB)
Render a GLB 3D model from 6 angles (front, back, left, right, top, bottom) and return a composite grid image. Useful for quick quality inspection of 3D models.
Headers:
Request (with URL):
{
"glb_url": "https://example.com/model.glb",
"width": 512,
"height": 512,
"format": "webp",
"quality": 85,
"upload": true
}
Request (with base64):
{
"glb_base64": "<base64-encoded GLB data>",
"width": 512,
"height": 512,
"format": "png",
"upload": false
}
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
glb_url |
string | No* | - | URL to GLB file |
glb_base64 |
string | No* | - | Base64-encoded GLB data |
width |
integer | No | 512 |
Width of each render in pixels |
height |
integer | No | 512 |
Height of each render in pixels |
format |
string | No | "webp" |
Output format: "webp", "png", "jpg" |
quality |
integer | No | 85 |
Quality for lossy formats (1-100) |
upload |
boolean | No | true |
Upload to R2 or return base64 |
*Either glb_url or glb_base64 is required.
Response (upload=true):
{
"status": "completed",
"url": "r2://wf-genai-results/render-glb-abc123/result.webp",
"width": 1576,
"height": 1104,
"processing_time_ms": 2500
}
Response (upload=false):
{
"status": "completed",
"base64": "/9j/4AAQSkZJRgABAQAA...",
"data_url": "data:image/webp;base64,/9j/4AAQSkZJRgABAQAA...",
"width": 1576,
"height": 1104,
"processing_time_ms": 2500
}
Grid Layout:
┌─────────────────────────────────────────────────┐
│ Front │ Back │ Left │
├──────────────┼──────────────┼───────────────────┤
│ Right │ Top │ Bottom │
└──────────────┴──────────────┴───────────────────┘
Grid dimensions are (3 × width + padding) × (2 × height + padding + labels)
POST / (Semantic Split GLB)
Extract semantic material masks from a 3D GLB model using GenAI and OpenCV. Renders 6 orthogonal and angled views of the model, leverages a fast generative vision pipeline to identify segments (e.g. shoe upper vs. sole), and maps the segment contours back into the original UV map using projection, saving as a Solid Color PNG map.
Generally triggered via the Cloudflare API /api/v1/texture-painter/split-materials method multiview_ai.
Headers:
Request:
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
glb_url |
string | Yes | - | URL to GLB file |
texture_type |
string | No | "shoe" |
Object type prompt guiding semantic logic (e.g., shoe, shirt) |
num_views |
integer | No | 6 |
Number of rendered views to analyze |
fov |
number | No | 45 |
Field of view in degrees |
Response:
{
"status": "completed",
"texture_base64": "iVBORw0KGgo...",
"mask_width": 2048,
"mask_height": 2048,
"processing_time_ms": 17850
}
POST / (Texture Render)
Render a GLB model for headless texture projection workflow. Returns a textured view (PNG) and UV map (Float32 binary) that can be used to project 2D edits back onto the 3D model's textures.
This endpoint enables the same texture projection workflow as the browser-based Texture Painter tool, but via API for automation and AI-driven texture editing.
Headers:
Request:
{
"glb_url": "https://example.com/model.glb",
"camera_position": [0, 0, 3],
"camera_target": [0, 0, 0],
"fov": 45,
"width": 1024,
"height": 1024
}
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
glb_url |
string | Yes | - | URL to GLB file |
camera_position |
array | No | [0, 0, 3] |
Camera position [x, y, z] |
camera_target |
array | No | [0, 0, 0] |
Point camera looks at [x, y, z] |
fov |
number | No | 45 |
Field of view in degrees |
width |
integer | No | 1024 |
Render width in pixels |
height |
integer | No | 1024 |
Render height in pixels |
Response (with R2):
{
"status": "completed",
"view_url": "https://api.wearfits.com/files/signed?key=...",
"uv_map_url": "https://api.wearfits.com/files/signed?key=...",
"uv_map_width": 1024,
"uv_map_height": 1024,
"mesh_info": [
{
"index": 0,
"name": "geometry_0",
"texture_size": [2048, 2048],
"has_texture": true,
"has_uvs": true
}
],
"processing_time_ms": 3200
}
Response (without R2):
{
"status": "completed",
"view_base64": "/9j/4AAQSkZJRgABAQAA...",
"uv_map_base64": "AAAAAAAAAAAAAAAA...",
"uv_map_width": 1024,
"uv_map_height": 1024,
"mesh_info": [...],
"processing_time_ms": 3200
}
UV Map Format:
The UV map is a raw Float32 binary file with 4 channels per pixel (RGBA):
| Channel | Value | Description |
|---|---|---|
| R | 0.0-1.0 | U texture coordinate |
| G | 0.0-1.0 | V texture coordinate |
| B | mesh_index/255 | Mesh index (1-indexed, 0 = background) |
| A | 0.0-1.0 | normal·view (for falloff, 1 = facing camera) |
Usage with TextureProjectionService:
// 1. Render view and UV map
const renderResult = await textureProjection.renderViewAndUVMap({
glbUrl: 'https://example.com/model.glb',
camera: { position: [0, 0, 3], target: [0, 0, 0], fov: 45 },
width: 1024,
height: 1024
});
// 2. Edit the view (e.g., via AI or manual editing)
const editedImageUrl = await editImage(renderResult.viewUrl);
// 3. Project edits back onto GLB (include meshInfo for reliable texture mapping)
const modifiedGlb = await textureProjection.projectImage({
glbUrl: 'https://example.com/model.glb',
editedImageUrl: editedImageUrl,
uvMapUrl: renderResult.uvMapUrl,
uvMapWidth: renderResult.uvMapWidth,
uvMapHeight: renderResult.uvMapHeight,
meshInfo: renderResult.meshInfo // Pass through for name-based texture alignment
});
POST / (Texture Project)
Project one or more 2D images onto a GLB model's texture using GPU-accelerated UV rendering with edge smoothing and texture dilation. Supports single or multi-projection (multiple images from different cameras applied sequentially in one call).
Camera position can be provided in the request JSON per projection, or extracted automatically from PNG metadata (wearfits-projection tEXt chunk embedded by the browser texture painter). JSON takes precedence over PNG metadata.
Headers:
Request (single projection — backward compatible):
{
"glb_url": "https://example.com/model.glb",
"projection_url": "https://example.com/edited-view.png",
"camera_position": [0, 0, 3],
"camera_target": [0, 0, 0],
"fov": 45,
"width": 1024,
"height": 1024
}
Request (multi-projection):
{
"glb_url": "https://example.com/model.glb",
"projections": [
{
"image_url": "https://example.com/front-edit.png",
"camera_position": [0, 0, 3],
"camera_target": [0, 0, 0],
"fov": 45
},
{
"image_url": "https://example.com/back-edit.png",
"camera_position": [0, 0, -3],
"camera_target": [0, 0, 0],
"fov": 45
},
{
"image_url": "https://example.com/side-edit-with-metadata.png"
}
],
"width": 1024,
"height": 1024,
"strip_pbr": true
}
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
glb_url |
string | Yes | - | URL to GLB file (or ZIP containing GLB) |
projection_url |
string | No* | - | Single projection image URL (backward compat) |
projections |
array | No* | - | Multi-projection array (see below) |
camera_position |
array | No | [0,0,3] |
Camera [x,y,z] for single projection |
camera_target |
array | No | [0,0,0] |
Camera look-at for single projection |
fov |
number | No | 45 |
FOV for single projection |
width |
integer | No | 1024 |
Render width |
height |
integer | No | 1024 |
Render height |
uv_supersample |
integer | No | 2 |
UV map supersampling factor |
falloff_start_angle |
number | No | 70 |
Edge falloff start angle (degrees) |
falloff_end_angle |
number | No | 85 |
Edge falloff end angle (degrees) |
dilation_iterations |
integer | No | 8 |
Texture dilation passes |
strip_pbr |
boolean | No | true |
Remove PBR textures for flat look |
*Either projection_url or projections is required.
Projection object fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
image_url |
string | Yes | - | URL to RGBA PNG projection image |
camera_position |
array | No | From PNG metadata | Camera [x,y,z] position |
camera_target |
array | No | From PNG metadata | Camera look-at target |
fov |
number | No | From PNG metadata, then 45 | Field of view |
Camera resolution order: JSON fields → PNG wearfits-projection tEXt metadata → skip projection.
Response:
{
"status": "completed",
"glb_url": "https://api.wearfits.com/files/signed?key=...",
"view_url": "https://api.wearfits.com/files/signed?key=...",
"projections": [
{"index": 0, "status": "success", "projected_pixels": 45230},
{"index": 1, "status": "success", "projected_pixels": 38100},
{"index": 2, "status": "skipped", "reason": "No camera position in request or PNG metadata"}
],
"total_projected_pixels": 83330,
"processing_time_ms": 8500
}
Per-projection status values:
| Status | Description |
|---|---|
success |
Projection applied, projected_pixels shows count |
skipped |
No camera found or no image URL, reason explains why |
error |
Projection failed (download error, rendering error), reason explains why |
PNG Camera Metadata Format:
The browser texture painter embeds camera state in PNG tEXt chunks with key wearfits-projection:
{
"version": 1,
"camera": {
"position": [x, y, z],
"target": [x, y, z],
"fov": 45
},
"resolution": 1024,
"timestamp": "2026-01-27T12:00:00.000Z"
}
POST /resize (Image Resize)
Resize and convert image format. Useful for normalizing person photos and garment images.
Request:
{
"image_url": "https://example.com/photo.jpg",
"max_width": 1024,
"max_height": 1024,
"format": "webp",
"quality": 85,
"fit": "contain",
"upload": true
}
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
image_url |
string | Yes | - | URL of image to resize |
max_width |
integer | No | 1024 |
Maximum output width |
max_height |
integer | No | 1024 |
Maximum output height |
format |
string | No | "webp" |
Output format: "webp", "png", "jpg" |
quality |
integer | No | 85 |
Quality for lossy formats (1-100) |
fit |
string | No | "contain" |
Resize mode (see below) |
upload |
boolean | No | true |
Upload to R2 or return base64 |
Fit Modes:
| Mode | Description |
|---|---|
contain |
Fit within box, maintain aspect ratio, don't upscale |
cover |
Fill box, crop excess, maintain aspect ratio |
exact |
Exact size (may distort aspect ratio) |
Response (upload=true):
{
"status": "completed",
"url": "r2://wf-genai-results/resize-abc123/result.webp",
"width": 1024,
"height": 768,
"format": "webp",
"processing_time_ms": 450
}
Response (upload=false):
{
"status": "completed",
"base64": "/9j/4AAQSkZJRgABAQAA...",
"data_url": "data:image/webp;base64,/9j/4AAQSkZJRgABAQAA...",
"width": 1024,
"height": 768,
"format": "webp",
"processing_time_ms": 450
}
POST /compose-grid (Garment Grid)
Compose multiple garment images into a labeled grid for AI try-on. Images can be provided as HTTP/HTTPS URLs or base64 data URLs.
Request:
{
"rows": [
{
"label": "TOP GARMENT",
"images": [
"https://example.com/shirt-front.jpg",
"https://example.com/shirt-back.jpg"
]
},
{
"label": "BOTTOM GARMENT",
"images": ["https://example.com/pants.jpg"]
},
{
"label": "SHOES",
"images": ["https://example.com/shoes.jpg"]
}
],
"max_cell_width": 1024,
"max_cell_height": 1024,
"min_cell_size": 512,
"format": "webp",
"quality": 85,
"upload": true
}
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
rows |
array | Yes | - | List of row definitions |
rows[].label |
string | Yes | - | Row label (e.g., "TOP GARMENT") |
rows[].images |
array | Yes | - | Image URLs (1-2 per row) |
max_cell_width |
integer | No | 1024 |
Max width per cell |
max_cell_height |
integer | No | 1024 |
Max height per cell |
min_cell_size |
integer | No | 512 |
Min size for shorter edge |
format |
string | No | "webp" |
Output format |
quality |
integer | No | 85 |
Quality for lossy formats |
upload |
boolean | No | true |
Upload to R2 or return base64 |
Standard Labels:
| Label | Use For |
|---|---|
TOP GARMENT |
Shirts, blouses, jackets, sweaters |
BOTTOM GARMENT |
Pants, skirts, shorts |
FULL BODY GARMENT |
Dresses, jumpsuits, rompers |
SHOES |
Any footwear |
Grid Layout:
┌─────────────────────────────────────────────┐
│ TOP GARMENT │
├─────────────────┬───────────────────────────┤
│ [image 1] │ [image 2] │
├─────────────────┴───────────────────────────┤
│ BOTTOM GARMENT │
├─────────────────┬───────────────────────────┤
│ [image 1] │ │
├─────────────────┴───────────────────────────┤
│ SHOES │
├─────────────────┬───────────────────────────┤
│ [image 1] │ │
└─────────────────┴───────────────────────────┘
Response:
{
"status": "completed",
"url": "r2://wf-genai-results/grid-abc123/result.webp",
"width": 2088,
"height": 1600,
"rows": 3,
"format": "webp",
"processing_time_ms": 1200
}
GET /health
Health check endpoint (no auth required).
Response:
Deployment
Prerequisites
- Modal CLI installed and authenticated to the wearfits workspace:
Important: Secrets are workspace-specific. Verify you're in the
wearfitsworkspace: - Dashboard: https://modal.com/apps/wearfits/main/deployed - CLI:python -m modal profile listshould showwearfitsas active
-
Modal secret
wearfits-r2with R2 credentials: -
Modal secret
wearfits-apiwith API key:
Deploy
Output:
✓ Created web endpoint => https://wearfits--v1-pose-transfer.modal.run
✓ Created web endpoint => https://wearfits--v1-render-glb.modal.run
✓ Created web endpoint => https://wearfits--v1-texture-render.modal.run
✓ Created web endpoint => https://wearfits--v1-texture-project.modal.run
✓ Created web endpoint => https://wearfits--v1-image-resize.modal.run
✓ Created web endpoint => https://wearfits--v1-compose-grid.modal.run
✓ Created web endpoint => https://wearfits--v1-health.modal.run
Test Deployment
# Health check
curl https://wearfits--v1-health.modal.run
# Pose transfer
curl -X POST https://wearfits--v1-pose-transfer.modal.run \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <API_KEY>" \
-d '{
"source_glb_url": "https://example.com/body.glb",
"pose_id": "standing_arms_down"
}'
# Image resize
curl -X POST https://wearfits--v1-image-resize.modal.run \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <API_KEY>" \
-d '{
"image_url": "https://example.com/photo.jpg",
"max_width": 1024,
"format": "webp"
}'
# Garment grid
curl -X POST https://wearfits--v1-compose-grid.modal.run \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <API_KEY>" \
-d '{
"rows": [
{"label": "TOP GARMENT", "images": ["https://example.com/shirt.jpg"]}
]
}'
# Render GLB (6-angle grid)
curl -X POST https://wearfits--v1-render-glb.modal.run \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <API_KEY>" \
-d '{
"glb_url": "https://example.com/model.glb",
"width": 512,
"height": 512,
"format": "webp"
}'
# Texture render (for projection workflow)
curl -X POST https://wearfits--v1-texture-render.modal.run \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <API_KEY>" \
-d '{
"glb_url": "https://example.com/model.glb",
"camera_position": [0, 0, 3],
"camera_target": [0, 0, 0],
"width": 1024,
"height": 1024
}'
# Texture project (single projection)
curl -X POST https://wearfits--v1-texture-project.modal.run \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <API_KEY>" \
-d '{
"glb_url": "https://example.com/model.glb",
"projection_url": "https://example.com/edited-view.png",
"camera_position": [0, 0, 3],
"camera_target": [0, 0, 0],
"fov": 45
}'
# Texture project (multi-projection)
curl -X POST https://wearfits--v1-texture-project.modal.run \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <API_KEY>" \
-d '{
"glb_url": "https://example.com/model.glb",
"projections": [
{"image_url": "https://example.com/front.png", "camera_position": [0,0,3], "camera_target": [0,0,0], "fov": 45},
{"image_url": "https://example.com/back.png", "camera_position": [0,0,-3], "camera_target": [0,0,0], "fov": 45}
]
}'
View Logs
Files
| File | Description |
|---|---|
modal_app.py |
Modal app definition with all web endpoints (CPU + GPU workers) |
pose_transfer.py |
Core pose transfer logic using MHR |
pymomentum_fitting.py |
Meta's PyMomentum-based fitting (hierarchical IK solver) |
mask_lookup.py |
Body mask lookup from measurements (BodyM dataset) |
gpu_uv_renderer.py |
GPU UV map + textured view rendering (ModernGL/EGL, NVIDIA T4 via EGL ICD). Functions accept ctx param for context reuse. |
projection_pipeline.py |
Headless texture projection pipeline (multi-projection, shared GL context, PNG metadata) |
texture_enhance_pipeline.py |
AI texture enhancement: render grid (4 or 6 views), enhance with AI, validate/fix swaps, project back. Per-step timing logged. |
color_correction.py |
Polynomial color correction (Finlayson 2015) + luminance/white boost |
texture_projection.py |
GLB loading, UV map rendering, view rendering for texture projection |
file_assets.py |
Asset file paths for fitting |
render_utils.py |
Mesh rendering utilities (multi-angle grids) |
image_utils.py |
Image processing utilities (resize, format conversion) |
http_utils.py |
HTTP download utilities with retry logic |
test_projection_pipeline.py |
Tests for projection pipeline (local + Modal GPU) |
poses/ |
Cached pose files (.npz format) |
assets/ |
Fitting assets (head_hand_mask.npz) |
Cached Poses
Pre-extracted poses in poses/ directory:
| Pose ID | Description |
|---|---|
default |
Alias for system default pose (configurable via DEFAULT_POSE_ID, currently girl_pose) |
standing_arms_down |
Natural standing with arms relaxed at sides |
man_pose |
Male standing pose |
girl_pose |
Female standing pose |
shoe_girl_pose |
Standing looking down at shoes, one leg forward - ideal for shoe try-on |
Add New Pose from Image (Recommended)
Use the helper script to create a pose template directly from an image:
npx tsx scripts/create-pose-template.ts <image_path> <pose_id> [description]
# Example:
npx tsx scripts/create-pose-template.ts assets/test/my-pose.jpg my_pose "Standing with hands on hips"
The script will:
1. Upload the image to fal.ai
2. Extract body mesh using SAM3D
3. Cache the pose via Modal and save the .npz file
Then redeploy: modal deploy modal_app.py
Add New Pose Manually
If you already have a GLB file with the desired pose:
- Cache via Modal
v1-cache-poseand save toposes/:bash curl -s https://wearfits--v1-cache-pose.modal.run \ -H "Authorization: Bearer <POSE_TRANSFER_API_KEY>" \ -H "Content-Type: application/json" \ -d '{"reference_glb_url":"https://.../reference.glb","pose_id":"my_new_pose","pose_description":"Pose description"}' \ | python - <<'PY' import base64, json, sys data = json.load(sys.stdin) with open("tools/pose-transfer/poses/my_new_pose.npz", "wb") as f: f.write(base64.b64decode(data["npz_base64"])) print("Saved tools/pose-transfer/poses/my_new_pose.npz") PY - Commit the new
.npzfile toposes/ - Redeploy:
modal deploy modal_app.py
Local Development
Modal Serve (Recommended)
cd tools/pose-transfer
modal serve modal_app.py
# Creates temporary endpoints: https://wearfits--v1-{endpoint}-dev.modal.run
This uses the same container image as production with all dependencies.
CLI (Requires Local Dependencies)
Local CLI requires conda dependencies that are difficult to install outside Modal:
# Requires conda/mamba environment with:
# - pymomentum-cpu (from conda-forge)
# - ezc3d, assimp, urdfdom, suitesparse
python pose_transfer.py \
--source body.glb \
--pose standing_arms_down \
--output posed.glb \
--render
python pose_transfer.py --list-poses
Environment Variables
Set via Modal secrets:
| Secret | Variable | Description |
|---|---|---|
wearfits-r2 |
R2_ENDPOINT_URL |
Cloudflare R2 S3-compatible endpoint |
wearfits-r2 |
R2_ACCESS_KEY_ID |
R2 access key |
wearfits-r2 |
R2_SECRET_ACCESS_KEY |
R2 secret key |
wearfits-r2 |
R2_BUCKET_NAME |
R2 bucket name for results |
wearfits-api |
POSE_TRANSFER_API_KEY |
API key for authentication |
Integration with WEARFITS API
The Cloudflare Worker calls Modal endpoints via service classes:
// In wrangler.jsonc
"POSE_TRANSFER_API_URL": "https://wearfits--v1-pose-transfer.modal.run"
"POSE_TRANSFER_API_KEY": "<set via wrangler secret>"
// Pose Transfer Service
import { createPoseTransferService } from './services/pose-transfer-service';
const poseTransfer = createPoseTransferService(env);
await poseTransfer.applyPose({ sourceGlbUrl, poseId: 'standing_arms_down' });
// Image Processing Service
import { createImageProcessingService } from './services/image-processing-service';
const imageService = createImageProcessingService(env);
await imageService.resize(imageUrl, { maxWidth: 1024, format: 'webp' });
await imageService.composeGrid(rows, { format: 'webp' });
The API then resolves r2:// URLs to signed public URLs using the R2 storage service.
Image Processing Details
Resolution Standards
| Image Type | Resolution | Format |
|---|---|---|
| Silhouette visualization | 1024×1024 | WebP |
| Person/selfie (normalized) | 1024×1024 box | WebP |
| Garment grid | ~2048×2048 max | WebP |
| Individual garments | 512-1024 shorter edge | WebP |
WebP Benefits
- 60-70% smaller file size compared to PNG
- Lossy and lossless compression support
- Transparency support (unlike JPEG)
- Fast decoding in modern browsers
MHR Model Details
The pose transfer manipulates MHR parameters using PyMomentum's hierarchical IK solver:
| Parameter | Size | Source |
|---|---|---|
identity_coeffs |
45 | Preserved from source (body shape) |
lbs_model_params |
204 | From reference/cache (pose) |
face_expr_coeffs |
72 | Preserved from source (expression) |
Fitting Stages: 1. Stage 0: Face rigid transformation 2. Stage 1.0: Face identity 3. Stage 1.1: Face expression 4. Stage 2: Body rigid transform 5. Stage 3: Torso and limb roots 6. Stage 4: Full limbs (excluding hands) 7. Stage 5: All parameters
Input format: GLB from SAM3D (geometry_0 = body mesh, 18439 vertices at LOD1)
Troubleshooting
GPU Rendering (NVIDIA EGL)
The gpu_image in modal_app.py includes an NVIDIA EGL ICD config file at /usr/share/glvnd/egl_vendor.d/10_nvidia.json. This is critical — without it, libglvnd only finds Mesa's EGL → llvmpipe software rendering (100-1000x slower). Modal mounts NVIDIA drivers at runtime but the container needs the ICD JSON to discover them. On startup, GPURenderWorker.setup() logs GL_RENDERER — verify it shows Tesla T4, not llvmpipe.
GL contexts are shared across multiple renders via the ctx parameter on render_lit_view_transparent(), render_view_and_uv_map_gpu(), and _render_result_view(). The texture enhancement pipeline creates one context for all view renders and one for all projection renders, avoiding redundant context creation/destruction.
Texture Enhancement 524 Resilience
The v1-texture-enhance-glb endpoint sits behind Cloudflare's proxy on modal.run (which has a ~100s origin timeout). For 6-view mode, the total pipeline time (~150s) exceeds this limit, causing Cloudflare to return HTTP 524. However, Modal continues executing and uploads results to R2.
The worker handles this via R2 polling:
1. Worker generates a job_id and passes it to Modal in the request body
2. Modal uses this job_id for R2 paths and writes a status.json to pose-transfer/texture-enhance-{job_id}-status/result.json after completion
3. On 524 or timeout, the worker polls R2 for status.json every 10s
4. status.json contains all result URLs (enhanced GLB, grid input/output, per-view URLs, timing, validation)
Cold Start Slow (~25s)
First request after idle loads the MHR model and initializes PyMomentum. Subsequent requests are fast (~8s). The scaledown_window=60 keeps containers warm for 60 seconds.
"pymomentum-cpu not found"
Ensure pymomentum-cpu is in the micromamba_install list in modal_app.py (not pip_install).
"FBX file not found"
MHR assets not downloaded correctly. Check the run_commands in modal_app.py downloads and unzips to /root/assets/.
R2 Upload Fails
Check Modal secrets are correctly configured:
v1-body-mask-from-size will return 503 if R2 is missing because SAM3D requires mask URLs.
401 Unauthorized
Missing or invalid API key. Ensure Authorization: Bearer <key> header is included.
Image Processing Errors
- "image_url is required" - Missing required parameter
- "Failed to fetch" - Image URL not accessible from Modal servers
- "No valid images found" - All image URLs failed to load