-
The open source multimodal behemoth is here! Meta will launch the Llama 3 405B model on July 23
Meta is about to make something big again! They are about to release an open source language model called Llama3405B, which is not only their largest model to date, but also the largest open source language model in history. This behemoth, with a staggering 405 billion parameters, can shuttle between images and text, completely subverting the old calendar that can only process text. Key points: Meta will release Llama3405B, a multimodal model with 405 billion parameters, on July 23. The decision to open source Llama3405B and its weights may completely…- 5.8k
-
Google launches multimodal VLOGGER AI: making static portraits move and "talk"
Google recently published a blog post on its GitHub page, introducing the VLOGGER AI model. Users only need to input a portrait photo and an audio content, and the model can make these characters "animate" and read the audio content with facial expressions. VLOGGER AI is a multimodal diffusion model suitable for virtual portraits. It is trained using the MENTOR database, which contains more than 800,000 portraits and more than 2,200 hours of videos, allowing VLOGGER to generate different...- 2.6k
-
Back in the game! Gemini-Pro's multimodal capabilities are on par with GPT-4V
The recent Gemini-Pro evaluation report shows that it has made significant progress in the multimodal field, comparable to GPT-4V, and even better in some aspects. First, in the comprehensive performance on the multimodal proprietary benchmark MME, Gemini-Pro surpassed GPT-4V with a high score of 1933.4, showing its comprehensive advantages in perception and cognition. Among the 37 visual understanding tasks, Gemini-Pro performed outstandingly in tasks such as text translation, color/landmark/person recognition, and OCR, showing its excellent ability in the basic perception field. …- 2.8k
-
Gemini: An AI assistant developed by Google for writing, planning, and learning
Gemini is a new generation of artificial intelligence system launched by Google DeepMind. It is capable of multimodal reasoning and supports seamless interaction between text, images, video, audio, and code.Gemini has surpassed its previous state in many areas such as language comprehension, reasoning, math, and programming, making it one of the most powerful AI systems to date. Gemini is available in three different sizes to meet a wide range of needs, from edge computing to the cloud, and can be used in a wide range of applications, including creative design, writing assistance, question answering, code generation, and more. Gemini Features Writing Assistant: Ge...- 2k
❯
Search
Scan to open current page
Top
Checking in, please wait
Click for today's check-in bonus!
You have earned {{mission.data.mission.credit}} points today!
My Coupons
-
¥CouponsLimitation of useExpired and UnavailableLimitation of use
before
Limitation of usePermanently validCoupon ID:×Available for the following products: Available for the following products categories: Unrestricted use:Available for all products and product types
No coupons available!
Unverify
Daily tasks completed: