Kimi Multimodal Image Understanding Model API Released, 1M tokens priced from $12

January 15, 2011 - Dark Side of the Moon today released the Kimi MultimodalityImage Understanding Model APIThe new multimodal picture comprehension model moonshot-v1-vision-preview(hereinafter referred to as "Vision model") completes the multimodal capabilities of the moonshot-v1 model family.

Description of model capabilities

Image Recognition

Kimi Multimodal Image Understanding Model API Released, 1M tokens priced from $12

Vision models are equipped with image recognition capabilities, recognizing complex details and nuances in images, be it food or animals, and being able to distinguish between similar but not identical objects.

In the example below, 16 similar images of blueberry muffins and chihuahuas that are harder for the human eye to distinguish have been officially pieced together, with the Vision model recognizing and labeling the image types in order.Whether it's a blueberry muffin or a Chihuahua, the model can accurately differentiate and identify the.

Text recognition and comprehension

Vision models have advanced image recognition capabilities that are more accurate than ordinary document scanning and OCR recognition software in OCR text recognition and image understanding scenarios.Handwritten scribbles such as receipts / courier bills can be accurately recognized..

Kimi Multimodal Image Understanding Model API Released, 1M tokens priced from $12

Taking this bar chart of "A student's final exam results" as an example, the official asked the model to extract and analyze the exam results and analyze the bar chart from the perspective of aesthetic style. The Vision model is also able to accurately identify the score values corresponding to each subject name in the bar chart and do a comparison of the scores, and at the same time, it can identify the style formatting and color of the bar chart.

Kimi Multimodal Image Understanding Model API Released, 1M tokens priced from $12

model billing

Vision models are billed on a per-volume basisThe price of the model call varies according to the model selected, with the following distinctions:

Model Billing Unit price
moonshot-v1-8k-vision-preview 1M tokens ¥12.00
moonshot-v1-32k-vision-preview 1M tokens ¥24.00
moonshot-v1-128k-vision-preview 1M tokens ¥60.00
Description of model constraints
Features supported by the Vision visual model include:
  • many rounds of dialogue
  • streaming output
  • Tool Call
  • JSON Mode
  • Partial Mode
The following features are not supported or partially supported at this time:
  • Internet search: not supported
  • Context Caching:Creating a Context Cache with image content is not supported.The Vision model can be called with a Cache that has already been created.
  • URL-formatted images: not supported, currently only base64-encoded image content is supported
Other Platform Updates
  • Support for organizational project management functions
  • Support for one business entity to authenticate multiple accounts
  • Add File file resource management function: intuitively manage and view file resources.
  • Optimize mouse hover copy for resource management list
  • Context Caching has been released to full users.
  • Cache renewals are no longer charged for creation
statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

XF Starfire Deep Reasoning Model X1 Released: The Only Nationally Produced Arithmetic Training, the First in China for Many Indicators

2025-1-15 11:06:49

Information

To outperform OpenAI GPT-4, Meta spares Llama 3 training using controversial data

2025-1-15 21:03:41

Search