Google has released the MediaPipe LLM Inference API, which makes it easier for developers to run AI large-scale models locally on cell phones, PCs, and other devices. Google has focused on optimizing the cross-device stack, including new operations, quantization, caching, and weight sharing. Currently, MediaPipe supports four models, Gemma, Phi 2, Falcon and Stable LM, which run on web, Android and iOS devices. Google plans to extend this feature to more platforms.
demo address:
https://github.com/googlesamples/mediapipe/tree/main/examples/llm_inference