This tool simulates the visual understanding component required for advanced systems like Sora 2. Upload an image and provide candidate labels for zero-shot classification. Note: Due to computational limitations, running Sora 2 (Text-to-Video) directly in the browser via transformers.js is not currently feasible. We use a high-speed CLIP model (Xenova/clip-vit-base-patch32) for instant visual analysis.
1. Settings
2. Input Image & Labels
3. Analysis Result
Image Preview Area
Ready. Upload an image and provide labels.