Dark Side of the Moon Kimi Open Platform "Context Cache" Officially Public Beta Long Text Model Cost Reduction 90%

Yesterday,Dark Side of the MoonUnderKimi The open platform announced that Context Caching has entered public beta. This technology can reduce the cost of using long text flagship models of up to 90% for developers without changing the API price, and significantly improve the response speed of the model.

Context Caching is an efficient data management technology that allows the system to pre-store large amounts of data or information that may be frequently requested. In this way, when you request the same information again, the system can quickly provide it directly from the cache without recalculating or retrieving it from the original data source, saving time and resources. Context Caching is particularly suitable for scenarios with frequent requests and repeated references to a large amount of initial context, which can significantly reduce the cost of long text models and improve efficiency!

Dark Side of the Moon Kimi Open Platform "Context Cache" Officially Public Beta Long Text Model Cost Reduction 90%

Specifically, "context caching" can be applied to scenarios with frequent requests and repeated references to a large number of initial contexts, bringing the following two effects:

Cost reduction of up to 90%: For example, in scenarios where a large number of questions need to be asked about fixed documents, context caching can save a lot of costs. For example, for a hardware product manual of about 90,000 words, pre-sales support personnel need to conduct multiple intensive questions and answers in a short period of time. After access to context caching, the cost can be reduced to about 10%.

The first token delay is reduced by 83%: For a request of a 128k model, it usually takes 30 seconds to return the first token. Through context caching, the first token delay can be reduced to 5 seconds on average, reducing the delay time by about 83%.

The charging model of Context Caching is mainly divided into the following three parts:

Cache creation cost:

Call the Cache creation API. After successfully creating the Cache, the actual amount of tokens in the Cache will be charged. 24 yuan/M token

Cache storage fee:

During the cache lifespan, the cache storage fee is charged per minute. 10 yuan/M token/minute

Cache call fee:

Charges for calling incremental tokens for Cache: charged according to the original price of the model

Cache call charges:

During the cache survival time, if the user requests a successfully created cache through the chat interface, and the chat message content successfully matches the surviving cache, the cache call fee will be charged according to the number of calls. 0.02 yuan/time

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

AI clothing-changing black technology MMTryon virtual try-on framework can be matched and layered as needed with one click

2024-7-2 9:12:26

Information

Runway Gen-3 Alpha text generation video model is now available to paying users

2024-7-2 9:15:13

Search