Uses Gemini's multimodal capabilities to understand and edit images via natural language. The model takes the source image and a text prompt describing the desired edit, then generates a new image with the changes applied.
Semantic masking: Instead of requiring precise pixel masks, describe what to change in your prompt. The model understands context and can target specific regions.
Optional mask images: You can still provide a mask image (white = edit area) as a visual hint, but it's not required. Descriptive prompts often work better.
This skill should be used when the user asks to "edit an image", "modify a photo", "inpaint", "outpaint", "extend an image", "replace object in image", "add element to image", "resize image for social media", "crop image", "adapt image for Twitter", "convert image to OG format", or needs AI-powered image editing with masks. Source: b-open-io/gemskills.