Apple AIM autoregressive vision model validation performance is related to model size

appleResearchers at the company modeled the image by autoregressive image modeling (AIMAIM can effectively utilize large amounts of unorganized image data, and its training methodology and stability are similar to that of recent large-scale language models (LLM) is similar. This observation is consistent with previous findings on extending large language models.

Although the model used for the experiments in this paper is limited in size, further exploration is needed to see if this law can be validated on models with larger parameter scales. The pre-training objective used by the researchers follows the standard autoregressive model applied to image patch sequences, and through a series of experiments and studies, it is verified that the model capacity can be easily scaled up to billions of parameters with good performance for downstream tasks.

Apple AIM autoregressive vision model validation performance is related to model size

In addition, the researchers explored multiple aspects of training ViT models with autoregressive objectives and revisited previous work. The researcher's experiments report that the optimization objective directly leads to better downstream performance throughout the training process, while both the loss value and the accuracy of the downstream task improve as the model capacity increases. This observation is consistent with the trend observed in LLMs, reflecting the fact that optimization goals lead directly to better downstream performance.

Among the design parameters of the AIM, in addition to the extended width, the researcher has specifically adopted a simple design using multi-layer perceptron blocks that process each patch independently. The researcher also emphasizes that the scale of the studied model is limited and validation of this law on models with larger parameter scales is yet to be further explored.

The experimental results of the paper prove that the visual model also follows the law of "the more parameters, the stronger the performance", and the autoregressive training has good scalability for the image model and can meet the training requirements of visual features. It provides a new research direction and idea for future image model performance improvement and optimization.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

Samsung Galaxy S24 series is launched! Galaxy AI function charging plan may be launched at the end of 2025

2024-1-19 10:02:36

Information

Anthropic responds to music publisher AI copyright lawsuit, accusing it of "subjective behavior"

2024-1-19 10:07:52

Search