January 15, 2011 - Dark Side of the Moon today released the Kimi MultimodalityImage Understanding Model APIThe new multimodal picture comprehension model moonshot-v1-vision-preview(hereinafter referred to as "Vision model") completes the multimodal capabilities of the moonshot-v1 model family.
Description of model capabilities
Image Recognition
Vision models are equipped with image recognition capabilities, recognizing complex details and nuances in images, be it food or animals, and being able to distinguish between similar but not identical objects.
In the example below, 16 similar images of blueberry muffins and chihuahuas that are harder for the human eye to distinguish have been officially pieced together, with the Vision model recognizing and labeling the image types in order.Whether it's a blueberry muffin or a Chihuahua, the model can accurately differentiate and identify the.
Text recognition and comprehension
Vision models have advanced image recognition capabilities that are more accurate than ordinary document scanning and OCR recognition software in OCR text recognition and image understanding scenarios.Handwritten scribbles such as receipts / courier bills can be accurately recognized..
Taking this bar chart of "A student's final exam results" as an example, the official asked the model to extract and analyze the exam results and analyze the bar chart from the perspective of aesthetic style. The Vision model is also able to accurately identify the score values corresponding to each subject name in the bar chart and do a comparison of the scores, and at the same time, it can identify the style formatting and color of the bar chart.
model billing
Vision models are billed on a per-volume basisThe price of the model call varies according to the model selected, with the following distinctions:
Model | Billing Unit | price |
moonshot-v1-8k-vision-preview | 1M tokens | ¥12.00 |
moonshot-v1-32k-vision-preview | 1M tokens | ¥24.00 |
moonshot-v1-128k-vision-preview | 1M tokens | ¥60.00 |
-
many rounds of dialogue
-
streaming output
-
Tool Call
-
JSON Mode
-
Partial Mode
-
Internet search: not supported
-
Context Caching:Creating a Context Cache with image content is not supported.The Vision model can be called with a Cache that has already been created.
-
URL-formatted images: not supported, currently only base64-encoded image content is supported
-
Support for organizational project management functions
-
Support for one business entity to authenticate multiple accounts
-
Add File file resource management function: intuitively manage and view file resources.
-
Optimize mouse hover copy for resource management list
-
Context Caching has been released to full users.
-
Cache renewals are no longer charged for creation