Prompt Engineering Guide
😃 Basics
💼 Applications
🧙‍♂️ Intermediate
🧠 Advanced
Special Topics
🌱 New Techniques
🤖 Agents
⚖️ Reliability
🖼️ Image Prompting
🔓 Prompt Hacking
🔨 Tooling
💪 Prompt Tuning
🗂️ RAG
🎲 Miscellaneous
Models
🔧 Models
Resources
📙 Vocabulary Resource
📚 Bibliography
📦 Prompted Products
🛸 Additional Resources
🔥 Hot Topics
✨ Credits

Apple Intelligence Models

🟦 This article is rated medium
Reading Time: 4 minutes
Last updated on December 20, 2024

Bhuwan Bhatt

Modeling overview for the Apple foundation models

Introduction

Since its release at the 2024 Worldwide Developers Conference (WWDC), Apple Intelligence has made headlines across all major tech news platforms and social media.

Contrary to general-purpose models like Gemini and ChatGPT, Apple Intelligence consists of numerous highly capable generative models that are fast, efficient, and tailored to seamlessly integrate into Apple users' daily lives. These models, called Apple Foundation Models (AFMs), are optimized for tasks such as crafting and refining text, summarizing notifications, generating playful images, and automating actions across apps—delivering convenience and creativity at every turn.

In this article, we’ll explore the architecture, data practices, and optimization strategies behind Apple Intelligence. We'll also highlight how Apple balances performance with its core commitment to user privacy.

The Models Behind Apple Intelligence

Apple Intelligence operates with two core models:

  • AFM-on-device: A lightweight ~3-billion-parameter model optimized for edge devices.
  • AFM-server: A robust server-based model for more intensive tasks.

Beyond these, Apple Intelligence also includes a coding model for developers and a diffusion model for generating visual content.

AFM models follow four key responsible AI principles:

  • Empower users with intelligent tools: AI should meet user needs responsibly.
  • Design with care: Minimize potential misuse or harm.
  • Protect privacy: Treat user data with the utmost caution.

Let’s dive deeper into the architecture powering these models.

Architecture

AFM models use a modified transformer architecture with several innovative features:

  • Memory-efficient design: A shared input/output embedding matrix reduces parameter size.
  • Training stability: Techniques like RMSNorm and query/key normalization improve model training.
  • Efficient attention: Grouped-query attention (GQA) minimizes memory use while maintaining quality.
  • Enhanced activation: SwiGLU activation improves computational efficiency.
  • Long-context support: RoPE embeddings extend capabilities for long inputs.

The table below outlines the AFM-on-device dimensions:

AMF-on-device dimensions

Apple employs runtime-swappable adapters, enabling a single model to specialize in dozens of tasks without bloating its architecture. Here’s an overview of the adapter-based design:

Architecture of Apple Intelligence with adapters

Optimizations for Speed and Efficiency

Apple Intelligence is designed for everyday use on resource-constrained edge devices. To achieve high performance with minimal latency and power consumption, Apple employs:

  • Quantization: Reduces model size without compromising accuracy, including mixed-precision techniques that compress weights to 3.5 bits.
  • Talaria tool: Optimizes bit rates by analyzing model latency and power use.
  • Adapters: Make smaller models as effective as larger ones while maintaining task-specific efficiency.

Data: Quality Over Quantity

Generative models are data-hungry models; however, the quality of data is as important as the quantity fed to the model. The data used to train AFM includes:

  • Web pages: Consist of publicly available information crawled using Applebot, a web crawler.
  • Licensed data from publishers: Consist of a limited amount of high-quality data from publishers.
  • Open-source datasets: Consist of data from publicly available datasets and code repositories on GitHub.

It is important to note that data from users was not used to train AFM. Explicit and inappropriate content, personally identifiable information, profanity, and unsafe material were removed from the data before training. In addition to human-generated data, synthetic data is also used to enhance data quality and diversity.

Performance Highlights

AFM powers several applications within the Apple ecosystem that involve tasks such as writing, following instructions, solving math problems, using external tools, and more. Let's look at how AFM-powered applications perform in each of these areas:

  • User preference: Human evaluations consistently rate AFM higher than competing models like Mistral, Gemma, and GPT.
Comparison of AFM with other models
  • Instruction following: AFM outperforms competitors in understanding and executing user prompts.
Instruction following capabilities of AFM
  • Tool use: AFM-server achieves superior accuracy in selecting tools for specific tasks.
Tool Use via Function Calling Benchmarks results
  • Writing tasks: AFM models lead in summarization, composition, and more.
AFM performance on writing tasks
  • Math performance: AFM-on-device surpasses Mistral-7B and Gemini-7B on math benchmarks.
AFM performance on math tasks

Conclusion

In conclusion, Apple Intelligence represents a significant advancement in AI tailored specifically for the Apple ecosystem. By combining the efficiency of AFM-on-device models with the processing power of AFM-server, Apple has successfully integrated AI that not only enhances the user experience across its devices but also prioritizes privacy, security, and responsible AI practices. These models, built on a refined transformer architecture, employ innovative memory and processing optimizations to make high-quality, real-time AI interactions possible even on edge devices.

The rigorous data curation and reliance on synthetic data ensure that Apple Intelligence serves users without compromising their personal information. Through AFM's architecture, optimizations, and data handling practices, Apple has set a new standard for user-centric AI, paving the way for future advancements that are both powerful and secure.

Bhuwan Bhatt

Bhuwan Bhatt, a Machine Learning Engineer with over 5 years of industry experience, is passionate about solving complex challenges at the intersection of machine learning and Python programming. Bhuwan has contributed his expertise to leading companies, driving innovation in AI/ML projects. Beyond his professional endeavors, Bhuwan is deeply committed to sharing his knowledge and experiences with others in the field. He firmly believes in continuous improvement, striving to grow by 1% each day in both his technical skills and personal development.

Footnotes

  1. Apple. (2024). Apple Intelligence Foundation Language Models. https://arxiv.org/abs/2407.21075 2 3 4 5 6 7 8