← Terug naar blog

Vergelijking van online LLM’s.

AI

by Djimit

Inleiding

Large Language Models (LLM’s) hebben zich snel ontwikkeld tot technologieën in diverse sectoren, van bedrijfsautomatisering en softwareontwikkeling tot creatieve content creatie en wetenschappelijk onderzoek. De proliferatie van LLM-aanbieders en hun respectievelijke modellen heeft geleid tot een complex en dynamisch landschap. Dit rapport presenteert een diepgaande, op bewijs gebaseerde vergelijkende analyse van vooraanstaande online LLM-platformen en -hulpmiddelen, waaronder OpenAI (ChatGPT), Anthropic (Claude), Google (Gemini), Mistral, Cohere, Meta (LLaMA), Aleph Alpha en Perplexity. Het doel is om de LLM’s en de geschiktheid voor specifieke taken en gebruikers contexten in kaart te brengen. Deze analyse is gebaseerd op een uitgebreid onderzoek van modelarchitecturen, prestatie benchmarks, implementatieopties en strategische en ethische overwegingen.

1. Landschap & capaciteiten

De huidige LLM-markt wordt gekenmerkt door een snelle evolutie, waarbij leveranciers continu nieuwe modellen en modelvarianten introduceren met verbeterde architecturen, grotere context vensters en geavanceerdere functionaliteiten.

1.1. OpenAI (ChatGPT)

OpenAI biedt een gediversifieerde reeks modellen, voornamelijk onder de GPT- en o-serie families.

1.2. Anthropic (Claude)

Anthropic’s Claude-modellen staan bekend om hun focus op veiligheid en behulpzaamheid, met een sterke nadruk op “Constitutional AI”.

1.3. Google (Gemini en Gemma)

Google’s aanbod omvat de krachtige Gemini-modellen en de open-source Gemma-modellen.

1.4. Mistral AI

Mistral AI heeft zich snel gepositioneerd als een belangrijke speler, met een focus op open-source modellen en efficiënte architecturen.

1.5. Cohere

Cohere richt zich op enterprise-grade LLM’s, met een sterke nadruk op RAG, toolgebruik en meertaligheid.

1.6. Meta (LLaMA)

Meta’s LLaMA-modellen zijn invloedrijk geweest in de open-source LLM-gemeenschap.

1.7. Aleph Alpha (Luminous en Pharia)

Aleph Alpha, een Duits bedrijf, legt de nadruk op data soevereiniteit en transparantie, gericht op de Europese markt.

1.8. Perplexity AI

Perplexity AI positioneert zichzelf als een AI-zoekmachine en biedt toegang tot zowel eigen modellen als modellen van derden.

2. Use case fit matrix

Het selecteren van het juiste LLM voor een specifieke taak vereist een afweging van de sterke punten van elk model in relatie tot de vereisten van de use case. De volgende matrix en analyse zijn gebaseerd op gerapporteerde benchmarkprestaties en kwalitatieve beoordelingen.

2.1. Benchmark overzicht

LLM-benchmarks zoals MMLU, GSM8K, HumanEval, en MT-Bench bieden gestandaardiseerde tests om prestaties te meten. Recente rapporten (Q1/Q2 2025) tonen aan dat AI-modellen benchmarks sneller onder de knie krijgen, waarbij open-gewichtsmodellen de kloof met gesloten modellen dichten. De prestaties aan de top convergeren, en er worden continu uitdagendere benchmarks voorgesteld omdat traditionele benchmarks zoals MMLU, GSM8K en HumanEval verzadigd raken.

Belangrijke benchmarks:

Recente Benchmark Resultaten (Selectie, Q1/Q2 2025):

ModelMMLU (%) (shots)MMLU-Pro (%)GSM8K (%) (shots)HumanEval (%) (pass@1)MT-Bench (score)Arena ELOOpenAI GPT-4.190.2––54.6 (SWE-Bench)–1366OpenAI GPT-4o85.7 (1-shot)74.6894.290.2 (0-shot)9.321408OpenAI o391.883.696 (Codeforces ELO 2706)69.1 (SWE-Bench)+12/-11 (o1-preview)1413OpenAI o4-mini––93.4 (AIME 2024, no tools)68.1 (SWE-Bench)+12/-9 (o1-mini)1351Anthropic Claude 3.7 Sonnet~9182.7 (Thinking)–70.3 (SWE-Bench, scaffold)+3/-31300 (thinking-32k)Anthropic Claude 3 Opus88.2 (5-shot CoT)–95.0 (0-shot CoT)84.9 (0-shot)9.451247Google Gemini 2.5 Pro–84.1 (Exp)–63.8 (SWE-Bench, custom agent)+5/-4 (Exp-0827)1446 (Preview-05-06)Google Gemini Ultra (1.0)90.04–94.474.4–(Niet direct vermeld voor Ultra 1.0 in recente Arena)Meta Llama 4 Maverick85.579.461.2 (MATH)43.4 (LiveCodeBench)8.841270Meta Llama 4 Scout–69.6–32.8 (LiveCodeBench)7.89(Niet direct vermeld in recente Arena top)Meta Llama 3.1 70B86–94.880.5– (Llama 3.1 405b: +4/-4)1248Mistral Large 2 (2407/Nov ’24)84.069.793.092.08.631252Mistral Mixtral 8x22B Instruct77.8128.8974.15–8.66(Niet in top Arena)Mistral Codestral (22B)–––81.1–(Niet in top Arena)Cohere Command A85.5–80.0 (MATH)86.2 (MBPP+)–1306Cohere Command R+–43.9–––(Niet in top Arena)Aleph Alpha Pharia-1-LLM-7B-control48.4 (5-shot)–1.4 (5-shot)––(Niet in Arena)Perplexity Sonar-Reasoning-Pro-High–––––1136 (Search Arena)

Opmerking: Benchmarkresultaten kunnen variëren afhankelijk van de specifieke testconfiguratie (bijv. aantal shots, promptingtechniek). Scores voor SWE-Bench en LiveCodeBench zijn vaak gerapporteerd als pass@1 of % opgelost. MT-Bench scores zijn vaak Elo-ratings of een schaal van 1-10.

**2.2. Kwalitatieve beoordeling en geschiktheid **Use Cases

2.2.1. Complexe Redenering en Probleemoplossing

Modellen zoals OpenAI’s o-serie (o3, o4-mini) en GPT-4.1, Anthropic’s Claude 3 Opus en 3.7 Sonnet (met “extended thinking”), en Google’s Gemini 2.5 Pro (een “thinking model”) zijn specifiek ontworpen of tonen sterke prestaties in complexe redeneertaken. Meta’s Llama 4 Maverick en Mistral Large 2 vertonen ook verbeterde redeneercapaciteiten. Cohere’s Command A is sterk in bedrijfsgerelateerde redeneertaken.

2.2.2. Juridische Analyse, Compliance en Documentbeoordeling

Voor juridische taken zijn precisie, lange contextverwerking en betrouwbaarheid cruciaal.

2.2.3. Softwareontwikkeling en Codegeneratie

De prestaties op benchmarks zoals HumanEval, SWE-Bench en LiveCodeBench zijn hier indicatief.

2.2.4. Bedrijfsautomatisering, RAG en Agentische Workflows

LLM’s worden steeds vaker ingezet voor het automatiseren van bedrijfsprocessen, vaak in combinatie met Retrieval-Augmented Generation (RAG) en toolgebruik.

2.2.5. Creatief Schrijven, Ideeënvorming en Contentgeneratie

De subjectiviteit van creatieve taken maakt benchmarking lastig, maar er zijn indicaties.

2.2.6. Klantenservice en geavanceerde conversationele AI

Hierbij zijn contextbehoud, coherentie en het vermogen om instructies te volgen essentieel.

2.2.7. Onderzoek, Documenten Samenvatten en Kennisextractie

Lange contextvensters zijn hier van groot belang.

2.2.8. Meertalige Vertaling en Lokalisatie

Modellen worden geëvalueerd op benchmarks zoals FLORES-200 en WMT.

2.2.9. Wiskundige Probleemoplossing

Benchmarks zoals GSM8K, MATH en AIME zijn hier de standaard.

3. Implementatiemodellen

LLM-platformen bieden diverse implementatiemodaliteiten om aan verschillende gebruikersbehoeften en technische vereisten te voldoen.

3.1. Web UI (ChatGPT, Claude, Gemini, Le Chat, Perplexity)

Direct toegankelijke webinterfaces bieden een lage drempel voor interactie met LLM’s.

3.2. API-gebaseerd

API’s bieden de flexibiliteit om LLM-capaciteiten te integreren in aangepaste applicaties en workflows.

3.3. Zelfgehost / On-Premise

Voor organisaties die maximale controle en data soevereiniteit vereisen.

3.4. Device-Native Modellen

LLM’s die direct op eindgebruikersapparaten draaien, bieden voordelen op het gebied van privacy en latentie.

3.5. Enterprise-Readiness

Voor zakelijk gebruik zijn factoren als logging, beveiligingscontroles, rate limits, SLA’s en compliance (GDPR, SOC2, HIPAA) cruciaal.

4. Overwegingen

De snelle ontwikkeling en implementatie van LLM’s brengen significante strategische en ethische overwegingen met zich mee, waaronder governance, transparantie, veiligheid en de impact van regelgeving zoals de EU AI Act.

4.1. Governance transparantie

Open Gewicht Beschikbaarheid:

Auditability & Explainability:

Red Teaming en Veiligheidstests:

Alignment Strategie:

4.2. Regelgevende Implicaties per Geografie

EU AI Act:

Data Soevereiniteit:

Oplossingen van Aanbieders:

5. Vergelijkend Evaluatiekader

Om een holistisch beeld te geven van de relatieve positionering van de LLM-aanbieders, wordt een vergelijkend evaluatiekader gebruikt. Dit kader beoordeelt modellen op meerdere assen.

5.1. Assen voor Vergelijking

Geciteerd werk

  1. Practical Guide for Model Selection for Real‑World Use Cases – OpenAI Cookbook, https://cookbook.openai.com/examples/partners/model_selection_guide/model_selection_guide 2. Number of Parameters in GPT-4 (Latest Data) – Exploding Topics, https://explodingtopics.com/blog/gpt-parameters 3. Introducing GPT-4.1 in the API – OpenAI, https://openai.com/index/gpt-4-1/ 4. GPT-4.1: Features, Access, GPT-4o Comparison, and More …, https://www.datacamp.com/blog/gpt-4-1 5. GPT-4.1 is released and we put it to the test. – Monica, https://monica.im/blog/gpt-4-1-release/ 6. Introducing OpenAI o3 and o4-mini, https://openai.com/index/introducing-o3-and-o4-mini/ 7. Reasoning best practices – OpenAI API, https://platform.openai.com/docs/guides/reasoning-best-practices 8. Fine-tuning – OpenAI API, https://platform.openai.com/docs/guides/fine-tuning 9. arxiv.org, https://arxiv.org/abs/2410.21276 10. GPT-4o Model Card – PromptHub, https://www.prompthub.us/models/gpt-4o 11. What is the context window of gpt 4 – API – OpenAI Developer Community, https://community.openai.com/t/what-is-the-context-window-of-gpt-4/701256 12. GPT-4o vs o4-mini – Detailed Performance & Feature Comparison, https://docsbot.ai/models/compare/gpt-4o/o4-mini 13. OpenAI o3 and o4‑mini models – FAQ [ChatGPT Enterprise & Edu], https://help.openai.com/en/articles/9855712-openai-o3-and-o4-mini-models-faq-chatgpt-enterprise-edu 14. O4-mini, O3, GPT-4.1: Comparison of OpenAI Models – Bind AI, https://blog.getbind.co/2025/04/17/openai-o4-mini-o3-gpt4-1-comparison-of-openai-models/ 15. OpenAI Introduces o3 and o4-mini System Card – Ampcome, https://www.ampcome.com/post/openai-introduces-o3-and-o4-mini-system-card 16. Open AI O3 vs GPT-4: Top Differences That You Should Know in …, https://yourgpt.ai/blog/updates/open-ai-o3-vs-gpt-4-top-differences-that-you-should-know-in-2025 17. OpenAI Platform, https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset 18. OpenAI Platform – OpenAI API, https://platform.openai.com/docs/models/overview 19. The Claude 3 Model Family: Opus, Sonnet, Haiku – Anthropic, https://www.anthropic.com/claude-3-model-card 20. All models overview – Anthropic API, https://docs.anthropic.com/en/docs/about-claude/models/all-models 21. Anthropic Claude 3 – Klu.ai, https://klu.ai/glossary/anthropic-claude-3 22. Introducing the next generation of Claude – Anthropic, https://www.anthropic.com/news/claude-3-family 23. Claude 3 Opus – Vertex AI – Google Cloud console, https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-3-opus 24. Claude Models Comparison From Haiku to 3.7 Sonnet – TeamAI, https://teamai.com/blog/large-language-models-llms/understanding-different-claude-models/ 25. Claude 3.7 Sonnet: How it Works, Use Cases & More | DataCamp, https://www.datacamp.com/blog/claude-3-7-sonnet 26. NEW Anthropic Claude 3.7 Sonnet – Amazon Bedrock, https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-37.html 27. Compare Claude 3 Opus vs. Claude 3 Haiku – Context.ai, https://context.ai/compare/claude-3-opus/claude-3-haiku 28. anthropic.com, https://anthropic.com/claude-3-7-sonnet-system-card 29. Use Anthropic’s Claude models | Generative AI on Vertex AI – Google Cloud, https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude 30. Intro to Claude – Anthropic API, https://docs.anthropic.com/en/docs/intro-to-claude 31. Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock, https://aws.amazon.com/blogs/machine-learning/best-practices-and-lessons-for-fine-tuning-anthropics-claude-3-haiku-on-amazon-bedrock/ 32. Fine-tune Claude 3 Haiku in Amazon Bedrock – Anthropic, https://www.anthropic.com/news/fine-tune-claude-3-haiku 33. Newsroom \ Anthropic, https://www.anthropic.com/news 34. Introducing Contextual Retrieval – Anthropic, https://www.anthropic.com/news/contextual-retrieval 35. Claude 3.7 Sonnet and Claude Code \ Anthropic, https://www.anthropic.com/news/claude-3-7-sonnet 36. Claude 3.7 Sonnet \ Anthropic, https://www.anthropic.com/claude/sonnet 37. First look: Claude 3.7 Sonnet and Box AI | Box Blog, https://blog.box.com/first-look-claude-37-sonnet-and-box-ai 38. Claude 3.7 Sonnet vs. OpenAI’s O3: Which Hybrid Reasoning Model …, https://latenode.com/blog/claude-37-sonnet-vs-openais-o3-which-hybrid-reasoning-model-wins-in-real-world-tasks 39. Meet Claude \ Anthropic, https://www.anthropic.com/product 40. Google Gemini Product Brief | UC Davis IET, https://iet.ucdavis.edu/aggie-ai/commercial-ai-tools/gemini-product-brief 41. Gemini (language model) – Wikipedia, https://en.wikipedia.org/wiki/Gemini_(language_model) 42. Learn about supported models | Vertex AI in Firebase – Google, https://firebase.google.com/docs/vertex-ai/models 43. The Gemini ecosystem – Google AI, https://ai.google/get-started/gemini-ecosystem/ 44. You Can Explore the New Gemini Large Language Model Even if You’re Not a Data Scientist – Pure AI, https://pureai.com/articles/2023/12/20/try-gemini.aspx 45. Gemini 2.5: Our newest Gemini model with thinking – Google Blog, https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/ 46. Gemini 2.5 on Vertex AI: Pro, Flash & Model Optimizer Live | Google Cloud Blog, https://cloud.google.com/blog/products/ai-machine-learning/gemini-2-5-pro-flash-on-vertex-ai 47. Gemini models | Gemini API | Google AI for Developers, https://ai.google.dev/gemini-api/docs/models 48. Gemini thinking | Gemini API | Google AI for Developers, https://ai.google.dev/gemini-api/docs/thinking 49. Gemini 2.5 Pro | Generative AI on Vertex AI – Google Cloud, https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-pro 50. Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study – arXiv, https://arxiv.org/html/2502.02481v2 51. Google Gemini 2.5 Pro vs DeepSeek V3.1: The 2025 AI Model Showdown – MPG ONE, https://mpgone.com/google-gemini-2-5-pro-vs-deepseek-v3-1-the-2025-ai-model-showdown/ 52. Gemini 2.0 Flash | Generative AI on Vertex AI – Google Cloud, https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-0-flash 53. Gemini 1.5 Flash | Generative AI on Vertex AI – Google Cloud, https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/1-5-flash 54. Fine-tuning with the Gemini API | Google AI for Developers, https://ai.google.dev/gemini-api/docs/model-tuning 55. Prepare supervised fine-tuning data for Gemini models | Generative AI on Vertex AI, https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-supervised-tuning-prepare 56. Tune Gemini models by using supervised fine-tuning | Generative AI on Vertex AI, https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-use-supervised-tuning 57. About supervised fine-tuning for Gemini models | Generative AI on Vertex AI – Google Cloud, https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-supervised-tuning 58. What is Mistral AI? | IBM, https://www.ibm.com/think/topics/mistral-ai 59. Large Enough | Mistral AI, https://mistral.ai/news/mistral-large-2407 60. Models Overview | Mistral AI Large Language Models, https://docs.mistral.ai/getting-started/models/ 61. Meet Mistral Large 2: Mistral AI’s Innovation that Redefines AI Boundaries – Hyperight, https://hyperight.com/meet-mistral-large-2-mistral-ais-innovation-that-redefines-ai-boundaries/ 62. Model selection | Mistral AI Large Language Models, https://docs.mistral.ai/getting-started/models/picking/ 63. Benchmarks | Mistral AI Large Language Models, https://docs.mistral.ai/getting-started/models/benchmark/ 64. Top LLM Benchmarks Explained: MMLU, HellaSwag, BBH, and Beyond – Confident AI, https://www.confident-ai.com/blog/llm-benchmarks-mmlu-hellaswag-and-beyond 65. La Plateforme – frontier LLMs – Mistral AI, https://mistral.ai/products/la-plateforme 66. Mistral AI’s Mistral Large 2 – AI Model Details – DocsBot AI, https://docsbot.ai/models/mistral-large-2 67. Mistral Large Model Card – PromptHub, https://www.prompthub.us/models/mistral-large 68. Latest news | Mistral AI, https://mistral.ai/news/ 69. La Plateforme – frontier LLMs | Mistral AI, https://mistral.ai/technology/ 70. Mistral AI: What It Is, How It Works, and Use Cases – Voiceflow, https://www.voiceflow.com/blog/mistral-ai 71. What Is Mistral Large 2? How It Works, Use Cases & More | DataCamp, https://www.datacamp.com/blog/mistral-large-2 72. Best LLMs for Coding (May 2025 Report) – PromptLayer, https://blog.promptlayer.com/best-llms-for-coding/ 73. Au Large | Mistral AI, https://mistral.ai/news/mistral-large 74. Mistral – Models in Amazon Bedrock – AWS, https://aws.amazon.com/bedrock/mistral/ 75. Mistral AI models | Generative AI on Vertex AI – Google Cloud, https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/mistral 76. Tackle complex reasoning tasks with Mistral Large, now available on Amazon Bedrock, https://aws.amazon.com/blogs/aws/tackle-complex-reasoning-tasks-with-mistral-large-now-available-on-amazon-bedrock/ 77. Mistral AI adds Medium 3 to its family of models, claiming low cost and high performance, https://siliconangle.com/2025/05/07/mistral-ai-adds-medium-3-family-models-claiming-low-cost-high-performance/ 78. How Ollama and Mistral are Shaping the Future of AI Deployment for Enterprise Use in 2025, https://www.belsterns.com/post/ollama-and-mistral 79. Mistral: Mixtral 8x7B (base) – OpenRouter, https://openrouter.ai/mistralai/mixtral-8x7b/parameters?tab=parameters 80. Mistral Nemo Instruct 2407 – ai – Docker Hub, https://hub.docker.com/r/ai/mistral-nemo 81. Mistral Nemo Model Card – PromptHub, https://www.prompthub.us/models/mistral-nemo 82. Mixtral 8x7B Model Card – PromptHub, https://www.prompthub.us/models/mixtral-8x7b 83. Mixtral 8x7B Instruct: Intelligence, Performance & Price Analysis, https://artificialanalysis.ai/models/mixtral-8x7b-instruct 84. Introducing Qwen1.5 | Qwen, https://qwenlm.github.io/blog/qwen1.5/ 85. mixtral-8x7b-instruct-v0.1 Model by Mistral AI – NVIDIA NIM APIs, https://build.nvidia.com/mistralai/mixtral-8x7b-instruct/modelcard 86. Mixtral 8x7B: A game-changing AI model by Mistral AI | SuperAnnotate, https://www.superannotate.com/blog/mistral-ai-mixtral-of-experts 87. Mixtral of experts – Mistral AI, https://mistral.ai/news/mixtral-of-experts 88. Mixtral 8x22B – Prompt Engineering Guide, https://www.promptingguide.ai/models/mixtral-8x22b 89. Mixtral 8x22B Model Card – PromptHub, https://www.prompthub.us/models/mixtral-8x22b 90. Mixtral 8x22B Instruct: Intelligence, Performance & Price Analysis, https://artificialanalysis.ai/models/mistral-8x22b-instruct 91. mistral-community/Mixtral-8x22B-v0.1 · Benchmarks are here! – Hugging Face, https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1/discussions/4 92. mistralai/Mixtral-8x22B-Instruct-v0.1 · MT-Bench Results – Hugging Face, https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1/discussions/8 93. Mixtral 8x22B V0.1 By mistral-community – LLM Explorer – EXTRACTUM, https://llm.extractum.io/model/mistral-community%2FMixtral-8x22B-v0.1,XCnYIK6d6MVKStW33QzTW 94. Mistral Codestral 25.01: Is it the best model for coding? – Bind AI, https://blog.getbind.co/2025/01/15/mistral-codestral-25-01-is-it-the-best-model-for-coding/ 95. Its all about the context window. Even the new Mistral Codestral-2501 256K CW do… | Hacker News, https://news.ycombinator.com/item?id=42934887 96. An Overview of Cohere’s Models, https://docs.cohere.com/v2/docs/models 97. CohereLabs/c4ai-command-a-03-2025 – Hugging Face, https://huggingface.co/CohereLabs/c4ai-command-a-03-2025 98. Command A – Cohere Documentation, https://docs.cohere.com/v2/docs/command-a 99. Cohere / Command-R-plus – Langbase · Serverless AI Developer Platform, https://langbase.com/models/cohere/command-r-plus 100. CohereLabs/c4ai-command-r-plus – Hugging Face, https://huggingface.co/CohereLabs/c4ai-command-r-plus 101. Cohere – Models in Amazon Bedrock – AWS, https://aws.amazon.com/bedrock/cohere/ 102. cohere.com, https://cohere.com/research/papers/command-a-technical-report.pdf 103. The best large language models (LLMs) in 2025 | YourGPT, https://yourgpt.ai/blog/general/best-llms 104. Introducing Command A: Max performance, minimal compute – Cohere, https://cohere.com/blog/command-a 105. Cohere Command R+ · GitHub Models, https://github.com/marketplace/models/azureml-cohere/Cohere-command-r-plus 106. An Overview of The Cohere Platform, https://docs.cohere.com/v2/docs/the-cohere-platform 107. What are large language models? LLMs explained – Cohere, https://cohere.com/blog/large-language-models 108. Discover Cohere Models: API for AI Applications, https://apipie.ai/docs/Models/Cohere 109. An Overview of the Developer Playground — Cohere, https://docs.cohere.com/v2/docs/playground-overview 110. \thetable We rank open benchmarks from Table on their popularity in model release reports. For each model we indicate in how many of its supported languages the model is evaluated. For WMT General Benchmarks, we report the union of all subsets. – arXiv, https://arxiv.org/html/2504.11829v2 111. Meta AI: What is Llama 4 and why does it matter? – Zapier, https://zapier.com/blog/llama-meta/ 112. Llama 3 vs GPT 4: A Detailed Comparison | Which to Choose?, https://blog.promptlayer.com/llama-3-vs-gpt-4/ 113. Meta Llama – Models in Amazon Bedrock – AWS, https://aws.amazon.com/bedrock/llama/ 114. Meta’s Llama 4 Models Spark Momentum — and Scrutiny — for Open-Source AI, https://www.vktr.com/ai-market/metas-llama-4-models-spark-momentum-and-scrutiny-for-open-source-ai/ 115. Model Cards and Prompt formats – Llama 3, https://www.llama.com/docs/model-cards-and-prompt-formats/meta-llama-3/ 116. llama-models/models/llama3_3/MODEL_CARD.md at main – GitHub, https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md 117. The Llama 4 herd: The beginning of a new era of natively … – Meta AI, https://ai.meta.com/blog/llama-4-multimodal-intelligence/ 118. Llama 4 models from Meta now available in Amazon Bedrock … – AWS, https://aws.amazon.com/blogs/aws/llama-4-models-from-meta-now-available-in-amazon-bedrock-serverless/ 119. Top LLMs To Use in 2025: Our Best Picks | Splunk, https://www.splunk.com/en_us/blog/learn/llms-best-to-use.html 120. Connect 2024: The responsible approach we’re taking to generative AI, https://ai.meta.com/responsible-ai/ 121. llama-models/models/llama3/MODEL_CARD.md at main · meta …, https://github.com/meta-llama/llama-models/blob/main/models/llama3/MODEL_CARD.md 122. Meta’s Llama 3.2 models now available on watsonx, including multimodal 11B and 90B models | IBM, https://www.ibm.com/think/news/meta-llama-3-2-models 123. Llama 4 API Service – Vertex AI – Google Cloud Console, https://console.cloud.google.com/vertex-ai/publishers/meta/model-garden/llama-4-maverick-17b-128e-instruct-maas 124. Batch predictions | Generative AI on Vertex AI – Google Cloud, https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/llama-batch 125. Meta Llama in the Cloud | Llama Everywhere, https://www.llama.com/docs/llama-everywhere/running-meta-llama-in-the-cloud/ 126. Llama 4 Maverick – Labelbox, https://labelbox.com/product/model/foundry-models/llama-4-maverick/ 127. Meta’s Llama 4 models now available on Amazon Web Services, https://www.aboutamazon.com/news/aws/aws-meta-llama-4-models-available 128. Meta Introduces Llama 4 Models with 10 Million Context Window – Swipe Insight, https://web.swipeinsight.app/posts/meta-introduces-llama-4-models-with-10-million-context-window-15896 129. Meta Llama 4 Maverick and Llama 4 Scout now available in watsonx.ai | IBM, https://www.ibm.com/new/announcements/Meta-llama-4-maverick-and-llama-4-scout-now-available-in-watsonx-ai 130. Llama 4 Maverick – Intelligence, Performance & Price Analysis, https://artificialanalysis.ai/models/llama-4-maverick 131. Meta’s Llama 4 now available fully managed in Amazon Bedrock – AWS, https://aws.amazon.com/about-aws/whats-new/2025/04/metas-llama-4-managed-amazon-bedrock/ 132. Benchmark results for Llama 4 Maverick and Scout for DevQualityEval v1.0 – Reddit, https://www.reddit.com/r/LocalLLaMA/comments/1jv9xxo/benchmark_results_for_llama_4_maverick_and_scout/ 133. Meta Releases Llama 4 Models, Claims Edge Over AI Competitors – DeepLearning.AI, https://www.deeplearning.ai/the-batch/meta-releases-llama-4-models-claims-edge-over-ai-competitors/ 134. Meta’s Llama 4: Features, Access, How It Works, and More – DataCamp, https://www.datacamp.com/blog/llama-4 135. Evaluating Meta’s Llama 4 Models for Enterprise Content with Box AI, https://blog.box.com/evaluating-metas-llama-4-models-enterprise-content-box-ai 136. Best LLMs for Writing in 2025 based on Leaderboard & Samples …, https://intellectualead.com/best-llm-writing/ 137. The Best LLMs to Use in 2025 – Chatbase, https://www.chatbase.co/blog/best-llms 138. Best 44 Large Language Models (LLMs) in 2025 – Exploding Topics, https://explodingtopics.com/blog/list-of-llms 139. Llama 4 Comparison with Claude 3.7 Sonnet, GPT-4.5, and Gemini 2.5 – Bind AI, https://blog.getbind.co/2025/04/06/llama-4-comparison-with-claude-3-7-sonnet-gpt-4-5-and-gemini-2-5/ 140. Llama 4 family of models from Meta are now available in SageMaker JumpStart – AWS, https://aws.amazon.com/blogs/machine-learning/llama-4-family-of-models-from-meta-are-now-available-in-sagemaker-jumpstart/ 141. Llama 4 Maverick Model Card – PromptHub, https://www.prompthub.us/models/llama-4-maverick 142. Luminous & Pharia | AI Agent Tools – Beam AI, https://beam.ai/llm/luminous-pharia/ 143. Luminous-Explore – A model for world-class semantic representation – Aleph Alpha, https://aleph-alpha.com/luminous-explore-a-model-for-world-class-semantic-representation/ 144. Settings of models. – Aleph Alpha Docs, https://docs.aleph-alpha.com/products/apis/pharia-inference/4.2.2/model-settings/ 145. Completion | Aleph Alpha Docs, https://docs.aleph-alpha.com/products/apis/pharia-inference/4.2.2/complete/ 146. Aleph Alpha Luminous Benchmarks, https://aleph-alpha.com//uploads/Performance-Report_Luminous_Aleph-Alpha.pdf 147. Luminous Performance Benchmarks – ALEPH ALPHA, https://aleph-alpha.com/luminous-performance-benchmarks/ 148. Page 2 – Import AI, https://jack-clark.net/page/2/ 149. Aleph Alpha: Luminous – Stanford CRFM, https://crfm.stanford.edu/fmti/May-2024/company-reports/Aleph%20Alpha_Luminous.html 150. Blog Archives – ALEPH ALPHA, https://aleph-alpha.com/category/blog/ 151. Technical Performance | The 2025 AI Index Report | Stanford HAI, https://hai.stanford.edu/ai-index/2025-ai-index-report/technical-performance 152. PhariaAI – Sovereign full stack AI suite – Aleph Alpha, https://aleph-alpha.com/phariaai/ 153. Aleph Alpha (Luminous) API Integrations – Pipedream, https://pipedream.com/apps/luminous 154. What’s new – Aleph Alpha Docs, https://docs.aleph-alpha.com/docs/changelog/ 155. Aleph Alpha Docs: Welcome to the Aleph Alpha documentation!, https://docs.aleph-alpha.com/docs/introduction 156. AI Research and Innovations | ALEPH ALPHA, https://aleph-alpha.com/research/#responsible-ai 157. Europe’s Open-Source AI Pioneers: 10 groups Shaping LLMs Under the EU AI Act – Finalist / Tech Blog, https://techblog.finalist.nl/blog/europes-open-source-ai-pioneers-10-groups-shaping-llms-under-eu-ai-act 158. Announcing support for numerous additional open-source models through vLLM-based worker | Aleph Alpha Docs, https://docs.aleph-alpha.com/docs/changelog/2024-10-14-releasing-vllm-based-worker/ 159. Announcing release of Pharia embedding model – Aleph Alpha Docs, https://docs.aleph-alpha.com/docs/changelog/2024-10-09-pharia-embedding/ 160. Model recommendations for PhariaAssistant – Aleph Alpha Docs, https://docs.aleph-alpha.com/products/pharia-ai/pharia-assistant/model-recommendations-for-pharia-assistant/ 161. Installation process | Aleph Alpha Docs, https://docs.aleph-alpha.com/products/pharia-ai/installation/installation-process/ 162. PhariaAI Helm Chart – Aleph Alpha Docs, https://docs.aleph-alpha.com/products/pharia-ai/helm-chart/ 163. Aleph-Alpha/Pharia-1-LLM-7B-control at refs/pr/1 – Hugging Face, https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control/blame/refs%2Fpr%2F1/README.md 164. Aleph-Alpha/Pharia-1-LLM-7B-control · Hugging Face, https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control 165. Pharia 1 LLM 7B Control Hf By Aleph-Alpha – LLM Explorer – EXTRACTUM, https://llm.extractum.io/model/Aleph-Alpha%2FPharia-1-LLM-7B-control-hf,y0YLVQDhNXeJHtX4QPDs6 166. Accessing the model card – Aleph Alpha Docs, https://docs.aleph-alpha.com/products/pharia-1-llm/model-card_Pharia-1-LLM/ 167. What advanced AI models are included in a Perplexity Pro …, https://www.perplexity.ai/hub/technical-faq/what-advanced-ai-models-does-perplexity-pro-unlock 168. Perplexity Sonar Dominates New Search Arena Evaluation, https://www.perplexity.ai/hub/blog/perplexity-sonar-dominates-new-search-arena-evolution 169. Improved Sonar Models: Industry Leading Performance at Lower Costs – Perplexity, https://www.perplexity.ai/hub/blog/new-sonar-search-modes-outperform-openai-in-cost-and-performance 170. Llama 3.1 70B – Intelligence, Performance & Price Analysis | Artificial …, https://artificialanalysis.ai/models/llama-3-1-instruct-70b 171. Azure Llama 3.1 benchmarks : r/LocalLLaMA – Reddit, https://www.reddit.com/r/LocalLLaMA/comments/1e9hg7g/azure_llama_31_benchmarks/ 172. Introducing Perplexity Deep Research, https://www.perplexity.ai/hub/blog/introducing-perplexity-deep-research 173. How to Use Perplexity AI API – BytePlus, https://www.byteplus.com/en/topic/536834 174. Perplexity AI API Documentation | Complete Guide 2025 – BytePlus, https://www.byteplus.com/en/topic/536561 175. Perplexity: Home, https://docs.perplexity.ai/docs/rate-limits 176. What Is Perplexity AI? How It Works & How to Use It, https://www.meetjamie.ai/blog/what-is-perplexity-ai 177. A Complete How-To Guide to Perplexity AI – Learn Prompting, https://learnprompting.org/blog/guide-perplexity 178. 40 Large Language Model Benchmarks and The Future of Model Evaluation – Arize AI, https://arize.com/blog/llm-benchmarks-mmlu-codexglue-gsm8k 179. Large Language Model (LLM): Everything You Need to Know – WEKA, https://www.weka.io/learn/guide/ai-ml/what-is-llm/ 180. 20 LLM evaluation benchmarks and how they work – Evidently AI, https://www.evidentlyai.com/llm-guide/llm-benchmarks 181. LLM Benchmarks Explained: Significance, Metrics & Challenges | Generative AI Collaboration Platform, https://orq.ai/blog/llm-benchmarks 182. 14 Popular LLM Benchmarks to Know in 2025 – Analytics Vidhya, https://www.analyticsvidhya.com/blog/2025/03/llm-benchmarks/ 183. arXiv:2503.10657v1 [cs.CL] 8 Mar 2025, https://arxiv.org/pdf/2503.10657 184. 7 LLM Benchmarks for Performance, Capabilities, and Limitations …, https://dev.to/yayabobi/7-llm-benchmarks-for-performance-capabilities-and-limitations-39dc 185. Chatbot Arena | OpenLM.ai, https://openlm.ai/chatbot-arena/ 186. (PDF) A Review of Benchmarks for Evaluating the Mathematical …, https://www.researchgate.net/publication/388556839_A_Review_of_Benchmarks_for_Evaluating_the_Mathematical_Abilities_of_Modern_LLM_in_Advanced_Problem_Solving 187. MMLU Pro – Vals AI, https://www.vals.ai/benchmarks/mmlu_pro-05-11-2025 188. Mistral Large 2 (Nov ’24) – Intelligence, Performance & Price Analysis, https://artificialanalysis.ai/models/mistral-large-2 189. Anthropic release Claude 3, claims >GPT-4 Performance – AI Alignment Forum, https://www.alignmentforum.org/posts/JbE7KynwshwkXPJAJ/anthropic-release-claude-3-claims-greater-than-gpt-4 190. Google Gemini Vs ChatGPT: Everything You Need To Know [Jan 2024] – Nerdynav, https://nerdynav.com/google-gemini/ 191. Adept Fuyu-Heavy: A new multimodal model, https://www.adept.ai/blog/adept-fuyu-heavy 192. Mistral Large 2 vs Codestral-22B – LLM Stats, https://llm-stats.com/models/compare/mistral-large-2-2407-vs-codestral-22b 193. GSM8K Benchmark – Klu.ai, https://klu.ai/glossary/GSM8K-eval 194. Mistral / Benchmarks – Langbase · Serverless AI Developer Platform, https://langbase.com/models/mistral/mistral-large-2/benchmarks 195. PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models – arXiv, https://arxiv.org/html/2503.02324v1 196. A Benchmark for Evaluating Mathematical Creativity of Large Language Models – arXiv, https://arxiv.org/pdf/2505.08744? 197. LLM Benchmarks in 2024: Overview, Limits and Model Comparison, https://www.vellum.ai/blog/llm-benchmarks-overview-limits-and-model-comparison 198. Which LLM is Best for Coding in 2025? – BytePlus, https://www.byteplus.com/en/topic/416069 199. Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks – arXiv, https://arxiv.org/html/2505.07473v1 200. CVE-Bench: Benchmarking LLM-based Software Engineering Agent’s Ability to Repair Real-World CVE Vulnerabilities – ACL Anthology, https://aclanthology.org/2025.naacl-long.212.pdf 201. BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions | OpenReview, https://openreview.net/forum?id=YrycTjllL0 202. HumanEval Benchmark (Code Generation) – Papers With Code, https://paperswithcode.com/sota/code-generation-on-humaneval 203. How the MT-Bench test measures and compares LLMs – Telnyx, https://telnyx.com/resources/what-is-mt-bench 204. What is MT-Bench ? | Deepchecks, https://www.deepchecks.com/glossary/mt-bench/ 205. SimulBench, https://simulbench.github.io/ 206. How to Build an LLM Evaluation Framework & Top 20 Performance …, https://blog.lamatic.ai/guides/llm-evaluation-framework/ 207. CohereLabs/c4ai-command-r-v01 · Official results for common benchmarks like MMLU, GSM8K and others – Hugging Face, https://huggingface.co/CohereLabs/c4ai-command-r-v01/discussions/23 208. Chatbot Arena Leaderboard Week 8: Introducing MT-Bench and Vicuna-33B | LMSYS Org, https://lmsys.org/blog/2023-06-22-leaderboard/ 209. MT-Bench (Multi-turn Benchmark) – Klu.ai, https://klu.ai/glossary/mt-bench-eval 210. MMT-Bench, https://mmt-bench.github.io/ 211. Gemini 2.5 Pro benchmarks released : r/singularity – Reddit, https://www.reddit.com/r/singularity/comments/1jjoeq6/gemini_25_pro_benchmarks_released/ 212. LLM Evaluation: Benchmarks to Test Model Quality – Label Your Data, https://labelyourdata.com/articles/llm-fine-tuning/llm-evaluation 213. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context – arXiv, http://arxiv.org/pdf/2403.05530 214. Command R+ – Vals AI, https://www.vals.ai/models/cohere_command-r-plus 215. Deploy Llama 4 Models on your AWS account – Tensorfuse, https://tensorfuse.io/docs/guides/modality/text/llama_4 216. Llama 4 Japanese Performance – Shisa.AI, https://shisa.ai/posts/llama4-japanese-performance/ 217. Models Table – Dr Alan D. Thompson – LifeArchitect.ai, https://lifearchitect.ai/models-table/ 218. (PDF) Improving Model Alignment Through Collective Intelligence of Open-Source LLMS, https://www.researchgate.net/publication/391492661_Improving_Model_Alignment_Through_Collective_Intelligence_of_Open-Source_LLMS 219. Mistral | Open Laboratory, https://openlaboratory.ai/models/families/Mistral 220. Chatbot Arena (formerly LMSYS): Free AI Chat to Compare & Test Best AI Chatbots, https://lmarena.ai/ 221. arXiv:2501.09959v1 [cs.CL] 17 Jan 2025, https://arxiv.org/pdf/2501.09959 222. Evaluating LLM-based Agents for Multi-Turn Conversations: A Survey – arXiv, https://arxiv.org/html/2503.22458v1 223. Best llm for chatbot in 2025 – Callin, https://callin.io/best-llm-for-chatbot/ 224. GPT-4.1 vs Claude 3.7 Sonnet – Detailed Performance & Feature Comparison – DocsBot AI, https://docsbot.ai/models/compare/gpt-4-1/claude-3-7-sonnet 225. GPT-4.1 Mini vs Claude 3.7 Sonnet – Detailed Performance & Feature Comparison, https://docsbot.ai/models/compare/gpt-4-1-mini/claude-3-7-sonnet 226. LLM Leaderboard 2025 – Vellum AI, https://www.vellum.ai/llm-leaderboard 227. Claude 3.0, 3.5, 3.7 OpenAI-MRCR benchmark results : r/singularity – Reddit, https://www.reddit.com/r/singularity/comments/1kccxie/claude_30_35_37_openaimrcr_benchmark_results/ 228. LLM Benchmarking for Business Success – Teradata, https://www.teradata.com/insights/ai-and-machine-learning/llm-benchmarking-business-success 229. Top LLM Chatbot Evaluation Metrics: Conversation Testing Techniques – Confident AI, https://www.confident-ai.com/blog/llm-chatbot-evaluation-explained-top-chatbot-evaluation-metrics-and-testing-techniques 230. A Complete Guide to LLM Evaluation For Enterprise AI Success …, https://www.galileo.ai/blog/llm-evaluation-step-by-step-guide 231. LLM Summarization: Techniques, Metrics, and Top Models, https://www.projectpro.io/article/llm-summarization/1082 232. Chatbot Arena Leaderboard – a Hugging Face Space by lmarena-ai, https://huggingface.co/spaces/lmarena-ai/chatbot-arena-leaderboard 233. What Is GPT-4o? | Built In, https://builtin.com/artificial-intelligence/what-is-gpt4o 234. Claude v3 Opus – Relevance AI, https://relevanceai.com/llm-models/unlock-the-power-of-claude-v3-opus-for-your-ai-tasks 235. Claude 3 Sonnet vs Opus: Comprehensive AI Model Comparison – AI-Pro.org, https://ai-pro.org/learn-ai/articles/claude-3-sonnet-vs-opus-unveiling-the-differences-between-anthropics-advanced-ai-models/ 236. Benchmarking GPT‑4o – How Well Does It Follow Complex Prompts?, https://www.softforge.co.uk/blogs/all-topics/benchmarking-gpt-4o-how-well-does-it-follow-complex-prompts 237. Introduction to Claude 3 – ChatGPT and Generative AI Legal …, https://law-arizona.libguides.com/c.php?g=1301273&p=10306029 238. Anthropic’s new Claude 3.7 Sonnet AI model beats all LLMs at legal tasks – Robin AI, https://www.robinai.com/newsroom/anthropics-claude-87-better-at-legal-tasks 239. Unlocking the Power of AI in Law: Top 10 GPT-4O Applications for …, https://www.cimphony.ai/insights/quick-unlocking-the-power-of-ai-in-law-top-10-gpt-4o-applications-for-lawyers 240. Evaluating DeepSeek for Legal Research: Capabilities, Risks, and Comparisons, https://www.criminallawlibraryblog.com/evaluating-deepseek-for-legal-research-capabilities-risks-and-comparisons/ 241. LegalBench – Vals AI, https://www.vals.ai/benchmarks/legal_bench-02-25-2025 242. Legal AI Benchmarking: Evaluating Long Context Performance for …, https://www.thomsonreuters.com/en-us/posts/innovation/legal-ai-benchmarking-evaluating-long-context-performance-for-llms/ 243. 7 Best AI Tools for Legal Research for 2025 – Wordsmith AI, https://www.wordsmith.ai/blog/7-best-ai-tools-for-legal-research-for-2025 244. Using AI for Legal Research: How It Works and Best Solutions – Springs, https://springsapps.com/knowledge/using-ai-for-legal-research-how-it-works-and-best-solutions 245. Gemini 2.5 Pro Preview: even better coding performance – Google Developers Blog, https://developers.googleblog.com/en/gemini-2-5-pro-io-improved-coding-performance/ 246. Google’s Gemini 2.5 Pro model tops LMArena by close to 40 points – R&D World, https://www.rdworldonline.com/googles-gemini-2-5-pro-model-tops-lmarena-by-40-points-outperforms-competitors-in-scientific-reasoning/ 247. Gemini 2.5 Pro: Features, Tests, Access, Benchmarks & More | DataCamp, https://www.datacamp.com/blog/gemini-2-5-pro 248. Google releases Gemini 2.5 as model competition continues – ContentGrip, https://www.contentgrip.com/google-releases-gemini-2-5/ 249. Introducing web search on the Anthropic API, https://www.anthropic.com/news/web-search-api 250. Anthropic Introduces Web Search Functionality for Claude Models – InfoQ, https://www.infoq.com/news/2025/05/anthropic-web-search/ 251. Cohere: The Secure AI Platform for Enterprise, https://cohere.com/ 252. Automating LLM Performance Evaluation: Reducing Time from 2 …, https://www.allganize.ai/en/blog/automating-llm-performance-evaluation-reducing-time-from-2-hours-to-10-minutes 253. RAG systems: Best practices to master evaluation for accurate and reliable AI., https://cloud.google.com/blog/products/ai-machine-learning/optimizing-rag-retrieval 254. Consistently Hallucination-Proof Your LLMs with Automated RAG | Kong Inc., https://konghq.com/blog/enterprise/automated-rag-hallucination-proof-llms 255. What are RAG models? A guide to enterprise AI in 2025 – Glean, https://www.glean.com/blog/rag-models-enterprise-ai 256. 8 High-Impact Use Cases of RAG in Enterprises, https://www.signitysolutions.com/blog/use-cases-of-rag-in-enterprises 257. Top 9 RAG Tools to Boost Your LLM Workflows, https://lakefs.io/blog/rag-tools/ 258. 5 LLM Evaluation Tools You Should Know in 2025 – Humanloop, https://humanloop.com/blog/best-llm-evaluation-tools 259. The best large language models (LLMs) in 2025 – YourGPT, https://yourgpt.ai/blog/growth/best-llms 260. Meet Claude – Anthropic, https://www.anthropic.com/claude 261. Evaluating Creative Short Story Generation in Humans and Large Language Models – arXiv, https://arxiv.org/html/2411.02316v4 262. (PDF) Evaluating Creativity: Can LLMs Be Good Evaluators in Creative Writing Tasks?, https://www.researchgate.net/publication/389726188_Evaluating_Creativity_Can_LLMs_Be_Good_Evaluators_in_Creative_Writing_Tasks 263. Customer support agent – Anthropic API, https://docs.anthropic.com/en/docs/about-claude/use-case-guides/customer-support-chat 264. ChatGPT – OpenAI, https://openai.com/chatgpt/overview/ 265. Evaluating LLMs for Text Summarization: An Introduction – SEI Blog, https://insights.sei.cmu.edu/blog/evaluating-llms-for-text-summarization-introduction/ 266. SATS: simplification aware text summarization of scientific documents – Frontiers, https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2024.1375419/full 267. Distilling Structured Rationale from Large Language Models to Small Language Models for Abstractive Summarization – AAAI Publications, https://ojs.aaai.org/index.php/AAAI/article/view/34727/36882 268. Exploring large language models for summarizing and interpreting an online brain tumor support forum, https://pmc.ncbi.nlm.nih.gov/articles/PMC12034948/ 269. An Empirical Comparison of Text Summarization: A Multi-Dimensional Evaluation of Large Language Models – arXiv, https://arxiv.org/html/2504.04534v1 270. aclanthology.org, https://aclanthology.org/2025.naacl-long.280.pdf 271. Mufu: Multilingual Fused Learning for Low- Resource Translation with LLM – OpenReview, https://openreview.net/pdf?id=0eMsrRMmCw 272. Niek/chatgpt-web: ChatGPT web interface using the OpenAI API – GitHub, https://github.com/Niek/chatgpt-web 273. A Complete Guide to the Anthropic API vs Claude Web Interface – Lamatic.ai Labs, https://blog.lamatic.ai/guides/anthropic-api-vs-claude/ 274. Use the Gemini web app – Computer – Google Help, https://support.google.com/gemini/answer/13275745?hl=en&co=GENIE.Platform%3DDesktop 275. Sign in to the Gemini web app – Google Help, https://support.google.com/gemini/answer/13278668?hl=en 276. 10 Tasks Mistral’s Le Chat Handles Better Than Any Other AI – Analytics Vidhya, https://www.analyticsvidhya.com/blog/2025/02/mistrals-le-chat/ 277. Le Chat enterprise AI assistant | Mistral AI, https://mistral.ai/products/le-chat 278. Differences Between Azure OpenAI GPT-4o and OpenAI’s Public GPT-4o – Learn Microsoft, https://learn.microsoft.com/en-us/answers/questions/2153786/differences-between-azure-openai-gpt-4o-and-openai 279. Build with Claude – Anthropic, https://www.anthropic.com/api 280. Generative AI in Google Workspace Privacy Hub, https://support.google.com/a/answer/15706919?hl=en 281. Google Introduces Unified Security Platform and AI-Driven Security Agents | MSSP Alert, https://www.msspalert.com/news/google-introduces-unified-security-platform-and-ai-driven-security-agents 282. Gemini API using Vertex AI in Firebase – Google, https://firebase.google.com/docs/vertex-ai 283. Vertex AI Vs. You AI/Mind Studio: Key Feature Comparison – SmythOS, https://smythos.com/ai-agents/comparison/vertex-ai-vs-you-ai-mind-studio-key-feature-comparison/ 284. Build with Gemini on Google Cloud | Gemini API | Google AI for Developers, https://ai.google.dev/gemini-api/docs/migrate-to-cloud 285. Deployment Options – Overview — Cohere, https://docs.cohere.com/v2/docs/deployment-options-overview 286. Introduction to Cohere on Azure AI Foundry (v1 API), https://docs.cohere.com/v1/docs/cohere-on-azure/cohere-on-azure-ai-foundry 287. Best Tools for Self-Hosted LLM in 2025, https://research.aimultiple.com/self-hosted-llm 288. 6 reasons banks opt for private AI deployments – Cohere, https://cohere.com/blog/private-ai-deployments-for-banks 289. Private Deployments for Ultimate AI Security – Cohere, https://cohere.com/private-deployments 290. Unveiling the Landscape of LLM Deployment in the Wild: An Empirical Study – arXiv, https://arxiv.org/html/2505.02502v1 291. Understanding Large Language Models in Your Pockets: Performance Study on COTS Mobile Devices – arXiv, https://arxiv.org/html/2410.03613v2 292. On-Device AI: Building Smarter, Faster, And Private Applications – Smashing Magazine, https://www.smashingmagazine.com/2025/01/on-device-ai-building-smarter-faster-private-applications/ 293. Google Deploys On-Device AI to Thwart Scams on Chrome and Android, https://www.infosecurity-magazine.com/news/google-ai-gemini-nano-scams-chrome/ 294. Google deploys Gemini Nano in Chrome to protect users from online scams, https://indianexpress.com/article/technology/tech-news-technology/google-gemini-nano-chrome-boost-scam-detection-9992385/ 295. Phi Open Models – Small Language Models | Microsoft Azure, https://azure.microsoft.com/en-us/products/phi 296. microsoft/Phi-3-mini-128k-instruct – Hugging Face, https://huggingface.co/microsoft/Phi-3-mini-128k-instruct 297. Large Language Models and Regulations: Navigating the Ethical and Legal Landscape, https://scytale.ai/resources/large-language-models-and-regulations-navigating-the-ethical-and-legal-landscape/ 298. How to Use Large Language Models (LLMs) with Enterprise and Sensitive Data, https://www.startupsoft.com/llm-sensitive-data-best-practices-guide/ 299. Enterprise privacy at OpenAI | OpenAI, https://openai.com/enterprise-privacy 300. Safety & responsibility | OpenAI, https://openai.com/safety 301. Privacy policy | OpenAI, https://openai.com/policies/privacy-policy 302. Introducing data residency in Europe – OpenAI, https://openai.com/index/introducing-data-residency-in-europe/ 303. OpenAI Introduces European Data Residency for Enterprises and Developers – Maginative, https://www.maginative.com/article/openai-introduces-european-data-residency-for-enterprises-and-developers/ 304. Custom Data Retention Controls for Claude Enterprise | Anthropic Privacy Center, https://privacy.anthropic.com/en/articles/10440198-custom-data-retention-controls-for-claude-enterprise 305. GDPR Compliance Showdown: A Side-by-Side Comparison of Microsoft Copilot, ChatGPT, Claude & Gemini, https://pivotaledge.ai/blog/ai-assistant-gdpr-compliance-showdown 306. Are there AI models for analyzing GCP logs? : r/googlecloud – Reddit, https://www.reddit.com/r/googlecloud/comments/1ind3i5/are_there_ai_models_for_analyzing_gcp_logs/ 307. Amazon Bedrock vs Azure OpenAI vs Google Vertex AI: An In-Depth Analysis​, https://www.cloudoptimo.com/blog/amazon-bedrock-vs-azure-openai-vs-google-vertex-ai-an-in-depth-analysis/ 308. Legit SLA Management & Governance – Built for Enterprise-Scale AppSec, https://www.legitsecurity.com/blog/legit-sla-management-and-governance 309. The Security Risks of Using LLMs in Enterprise Applications – Coralogix, https://coralogix.com/ai-blog/the-security-risks-of-using-llms-in-enterprise-applications/ 310. Connect 2024: The responsible approach we’re taking to generative AI – Meta AI, https://ai.meta.com/blog/responsible-ai-connect-2024/ 311. Expanding our open source large language models responsibly – Meta AI, https://ai.meta.com/blog/meta-llama-3-1-ai-responsibility/ 312. Best Open Source LLMs in 2025: Top Models for AI Innovation – TRG Datacenters, https://www.trgdatacenters.com/resource/best-open-source-llms-2025/ 313. Responsible Generative AI Toolkit | Google AI for Developers – Gemini API, https://ai.google.dev/responsible 314. Responsible AI Progress Report – Google AI, https://ai.google/static/documents/ai-responsibility-update-published-february-2025.pdf 315. Gemma model card | Google AI for Developers – Gemini API, https://ai.google.dev/gemma/docs/model_card 316. gemma-7b Model by Google – NVIDIA NIM APIs, https://build.nvidia.com/google/gemma-7b/modelcard 317. Shared responsibility model | Aleph Alpha Docs, https://docs.aleph-alpha.com/products/pharia-ai/installation/before-you-start/shared-responsibility-model/ 318. If you thought training AI models was hard, try building enterprise apps with them – The Register, https://www.theregister.com/2025/02/23/aleph_alpha_sovereign_ai/ 319. Concept-Level Explainability for Auditing & Steering LLM Responses ! This paper contains model-generated content that might be offensive.! – arXiv, https://arxiv.org/html/2505.07610v1 320. Concept-Level Explainability for Auditing & Steering LLM Responses – arXiv, https://www.arxiv.org/pdf/2505.07610 321. Best Tools for LLM Observability: Monitor & Optimize LLMs – PromptLayer, https://blog.promptlayer.com/best-tools-to-measure-llm-observability/ 322. LLM Observability Tools: 2025 Comparison – lakeFS, https://lakefs.io/blog/llm-observability-tools/ 323. Anthropic’s Citations API: A Step Forward for AI Agent Explainability | Anthus, https://anth.us/blog/null/ 324. Sharing new open source protection tools and advancements in AI privacy and security, https://ai.meta.com/blog/ai-defenders-program-llama-protection-tools/ 325. Global Trends in AI Governance – World Bank Documents & Reports, https://documents1.worldbank.org/curated/en/099120224205026271/pdf/P1786161ad76ca0ae1ba3b1558ca4ff88ba.pdf 326. Who is Responsible? Data, Models, or Regulations, A Comprehensive Survey on Responsible Generative AI for a Sustainable Future. – arXiv, https://arxiv.org/html/2502.08650v4 327. Ensuring Explainability and Auditability in Generative AI Copilots for FinCrime Investigations, https://lucinity.com/blog/ensuring-explainability-and-auditability-in-generative-ai-copilots-for-fincrime-investigations 328. OpenAI: G7 Hiroshima AI Process (HAIP) Transparency Report, https://transparency.oecd.ai/reports/b167db92-67c8-47d8-966a-427e2ce8c008 329. OpenAI’s Approach to External Red Teaming for AI Models and Systems – arXiv, https://arxiv.org/html/2503.16431v1 330. Responsible Scaling Policy | Anthropic, https://anthropic.com/responsible-scaling-policy 331. Charting a Path to AI Accountability – Anthropic, https://www.anthropic.com/news/charting-a-path-to-ai-accountability 332. Introducing Cohere’s Secure AI Frontier Model Framework, https://cohere.com/blog/secure-model-framework 333. LLM security and safety: responsible AI at NeurIPS 2024 – Capital One, https://www.capitalone.com/tech/ai/llm-safety-security-neurips-2024/ 334. Responsible AI – AI Index, https://aiindex.stanford.edu//uploads/2024/04/HAI_AI-Index-Report-2024_Chapter3.pdf 335. Is perplexity safe to use? A comprehensive guide to AI tool safety – BytePlus, https://www.byteplus.com/en/topic/498504 336. www.ibm.com, https://www.ibm.com/think/insights/eu-ai-act#:~:text=The%20EU%20AI%20Act%20requires,model%20evaluation%2C%20documentation%20and%20reporting. 337. The EU AI Act – Ten key things to know – TLT LLP, https://www.tlt.com/insights-and-events/insight/the-eu-ai-act—ten-key-things-to-know/ 338. European Union Artificial Intelligence Act: a guide, https://www.twobirds.com/-/media/new-website-content/pdfs/capabilities/artificial-intelligence/european-union-artificial-intelligence-act-guide.pdf 339. General-Purpose AI Models in the AI Act – Questions & Answers | Shaping Europe’s digital future, https://digital-strategy.ec.europa.eu/en/faqs/general-purpose-ai-models-ai-act-questions-answers 340. EU AI Act: A Guide to Navigating AI Regulation in Europe – Appinventiv, https://appinventiv.com/blog/ai-regulation-and-compliance-in-europe/ 341. Mistral AI Models Fail Key Safety Tests, Report Finds – Bank Info Security, https://www.bankinfosecurity.com/mistral-ai-models-fail-key-safety-tests-report-finds-a-28358 342. Terms of use | Mistral AI, https://mistral.ai/terms 343. A Review of Large Language Models (LLMs) Development – Preprints.org, https://www.preprints.org/frontend/manuscript/55011f84f4aa54ecbce871965cfa42d5/download_pub 344. AI Privacy Risks & Mitigations – Large Language Models (LLMs) – European Data Protection Board, https://www.edpb.europa.eu/system/files/2025-04/ai-privacy-risks-and-mitigations-in-llms.pdf 345. Data Sovereignty and AI: Why You Need Distributed Infrastructure – The Equinix Blog, https://blog.equinix.com/blog/2025/05/14/data-sovereignty-and-ai-why-you-need-distributed-infrastructure/ 346. Artificial Intelligence (AI) Policy | Agile Business Consortium, https://www.agilebusiness.org/copyright-legal-policies/artificial-intelligence-ai-policy.html 347. Understanding European Tech Sovereignty: Challenges and Opportunities – Hivenet, https://www.hivenet.com/post/understanding-european-tech-sovereignty-challenges-and-opportunities 348. OpenAI claims GPT-4.1 sets new 90%+ standard in MMLU reasoning benchmark, https://www.rdworldonline.com/openai-claims-gpt-4-1-sets-new-90-standard-in-mmlu-reasoning-benchmark/ 349. arXiv:2503.16416v1 [cs.AI] 20 Mar 2025, https://arxiv.org/pdf/2503.16416 350. arXiv:2503.01763v1 [cs.CL] 3 Mar 2025, https://arxiv.org/pdf/2503.01763?

DjimIT Nieuwsbrief

AI updates, praktijkcases en tool reviews — tweewekelijks, direct in uw inbox.

Gerelateerde artikelen