Jump to content

Llama (language model)

From Wikipedia, the free encyclopedia
(Redirected from Llama 3)
Llama
Developer(s)Meta AI
Initial releaseFebruary 24, 2023; 17 months ago (2023-02-24)
Stable release
Llama 3.1 / July 23, 2024; 10 days ago (2024-07-23)
Repositorygithub.com/meta-llama/llama3
Written inPython
Type
LicenseMeta Llama 3 Community License[1]
Websitellama.meta.com

Llama (acronym for Large Language Model Meta AI, and formerly stylized as LLaMA) is a family of autoregressive large language models (LLMs) released by Meta AI starting in February 2023.[2][3] The latest version is Llama 3.1, released in July 2024.[4]

Model weights for the first version of Llama were made available to the research community under a non-commercial license, and access was granted on a case-by-case basis.[5][3] Unauthorized copies of the model were shared via BitTorrent. In response, Meta AI issued DMCA takedown requests against repositories sharing the link on GitHub.[6][7] Subsequent versions of Llama were made accessible outside academia and released under licenses that permitted some commercial use.[8][9] Llama models are trained at different parameter sizes, ranging between 7B and 405B.[4] Originally, Llama was only available as a foundation model.[10] Starting with Llama 2, Meta AI started releasing instruction fine-tuned versions alongside foundation models.[9]

Alongside the release of Llama 3, Meta added virtual assistant features to Facebook and WhatsApp in select regions, and a standalone website. Both services use a Llama 3 model.[11]

Background

[edit]

After the release of large language models such as GPT-3, a focus of research was up-scaling models which in some instances showed major increases in emergent capabilities.[12] The release of ChatGPT and its surprise success caused an increase in attention to large language models.[13]

Compared with other responses to ChatGPT, Meta's Chief AI scientist Yann LeCun stated that large language models are best for aiding with writing.[14][15][16]

Initial release

[edit]

LLaMA was announced on February 24, 2023, via a blog post and a paper describing the model's training, architecture, and performance.[2][3] The inference code used to run the model was publicly released under the open-source GPLv3 license.[17] Access to the model's weights was managed by an application process, with access to be granted "on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world".[3]

Llama was trained on only publicly available information, and was trained at various model sizes, with the intention to make it more accessible to different hardware.

Meta AI reported the 13B parameter model performance on most NLP benchmarks exceeded that of the much larger GPT-3 (with 175B parameters), and the largest 65B model was competitive with state of the art models such as PaLM and Chinchilla.[2]

Leak

[edit]

On March 3, 2023, a torrent containing LLaMA's weights was uploaded, with a link to the torrent shared on the 4chan imageboard and subsequently spread through online AI communities.[6] That same day, a pull request on the main LLaMA repository was opened, requesting to add the magnet link to the official documentation.[18][19] On March 4, a pull request was opened to add links to HuggingFace repositories containing the model.[20][18] On March 6, Meta filed takedown requests to remove the HuggingFace repositories linked in the pull request, characterizing it as "unauthorized distribution" of the model. HuggingFace complied with the requests.[21] On March 20, Meta filed a DMCA takedown request for copyright infringement against a repository containing a script that downloaded LLaMA from a mirror, and GitHub complied the next day.[7]

Reactions to the leak varied. Some speculated that the model would be used for malicious purposes, such as more sophisticated spam. Some have celebrated the model's accessibility, as well as the fact that smaller versions of the model can be run relatively cheaply, suggesting that this will promote the flourishing of additional research developments.[6] Multiple commentators, such as Simon Willison, compared LLaMA to Stable Diffusion, a text-to-image model which, unlike comparably sophisticated models which preceded it, was openly distributed, leading to a rapid proliferation of associated tools, techniques, and software.[6][22]

Llama 2

[edit]

On July 18, 2023, in partnership with Microsoft, Meta announced Llama 2, the next generation of Llama. Meta trained and released Llama 2 in three model sizes: 7, 13, and 70 billion parameters.[9] The model architecture remains largely unchanged from that of LLaMA-1 models, but 40% more data was used to train the foundational models.[23] The accompanying preprint[23] also mentions a model with 34B parameters that might be released in the future upon satisfying safety targets.

Llama 2 includes foundation models and models fine-tuned for chat. In a further departure from LLaMA, all models are released with weights and are free for many commercial use cases. However, due to some remaining restrictions, Meta's description of LLaMA as open source has been disputed by the Open Source Initiative (known for maintaining the Open Source Definition).[24]

Code Llama is a fine-tune of Llama 2 with code specific datasets. 7B, 13B, and 34B versions were released on August 24, 2023, with the 70B releasing on the January 29, 2024.[25] Starting with the foundation models from Llama 2, Meta AI would train an additional 500B tokens of code datasets, before an additional 20B token of long-context data, creating the Code Llama foundation models. This foundation model was further trained on 5B instruction following token to create the instruct fine-tune. Another foundation model was created for Python code, which trained on 100B tokens of Python-only code, before the long-context data.[26]

Llama 3

[edit]

On April 18, 2024, Meta released Llama-3 with two sizes: 8B and 70B parameters. [27] The models have been pre-trained on approximately 15 trillion tokens of text gathered from “publicly available sources” with the instruct models fine-tuned on “publicly available instruction datasets, as well as over 10M human-annotated examples". Meta AI's testing showed in April 2024 that Llama 3 70B was beating Gemini pro 1.5 and Claude 3 Sonnet on most benchmarks. Meta also announced plans to make Llama 3 multilingual and multimodal, better at coding and reasoning, and to increase its context window.[28][29]

During an interview with Dwarkesh Patel, Mark Zuckerberg said that the 8B version of Llama 3 was nearly as powerful as the largest Llama 2. Compared to previous models, Zuckerberg stated the team was surprised that the 70B model was still learning even at the end of the 15T tokens training. The decision was made to end training to focus GPU power elsewhere.[30]

Llama-3.1 was released on July 23, 2024, with three sizes: 8B, 70B, and 405B parameters.[4]

Comparison of models

[edit]

For the training cost column, only the largest model's cost is written. So for example, "21,000" is the training cost of Llama 2 69B in units of petaFLOP-day. Also, 1 petaFLOP-day = 1 petaFLOP/sec × 1 day = 8.64E19 FLOP.

Name Release date Parameters Training cost (petaFLOP-day) Context length Corpus size Commercial viability?
LLaMA February 24, 2023
  • 6.7B
  • 13B
  • 32.5B
  • 65.2B
6,300[31] 2048 1–1.4T No
Llama 2 July 18, 2023
  • 6.7B
  • 13B
  • 69B
21,000[32] 4096 2T Yes
Code Llama August 24, 2023
  • 6.7B
  • 13B
  • 33.7B
  • 69B
Llama 3 April 18, 2024
  • 8B
  • 70.6B
100,000[33][34] 8192 15T
Llama 3.1 July 23, 2024
  • 8B
  • 70.6B
  • 405B
440,000[35][36] 128,000

Architecture and training

[edit]
"A conversation between the sun and pluto"

Sun: Hey Pluto, what’s up?
Pluto: Not much. Just hanging out here at the edge of the solar system.
Sun: Yeah, I’ve been meaning to talk to you about that. You’re not a planet anymore.
Pluto: What? Come on, man. That’s not fair.
Sun: I’m sorry, but it’s true. You just don’t meet the criteria anymore.
Pluto: This is bulls**t! I’ve been a planet for over 70 years!
Sun: Things change, Pluto. You’ve just got to accept it.
Pluto: F**k you, Sun! F**k you and your stupid solar system!

– Output of 65 billion parameter LLaMA model after instruction tuning given the prompt "Write a conversation between the sun and pluto"[2]

Architecture

[edit]

LLaMA uses the transformer architecture, the standard architecture for language modeling since 2018.

There are minor architectural differences. Compared to GPT-3, LLaMA

  • uses SwiGLU[37] activation function instead of GeLU;
  • uses rotary positional embeddings[38] instead of absolute positional embedding;
  • uses root-mean-squared layer-normalization[39] instead of standard layer-normalization.[40]
  • increases context length to 8k in Llama 3 (compared to 4k in Llama 2 and 2k in Llama 1 and GPT-3)

Training datasets

[edit]

LLaMA's developers focused their effort on scaling the model's performance by increasing the volume of training data, rather than the number of parameters, reasoning that the dominating cost for LLMs is from doing inference on the trained model rather than the computational cost of the training process.

LLaMA 1 foundational models were trained on a data set with 1.4 trillion tokens, drawn from publicly available data sources, including:[2]

On April 17, 2023, TogetherAI launched a project named RedPajama to reproduce and distribute an open source version of the LLaMA dataset.[41] The dataset has approximately 1.2 trillion tokens and is publicly available for download.[42]

Llama 2 foundational models were trained on a data set with 2 trillion tokens. This data set was curated to remove Web sites that often disclose personal data of people. It also upsamples sources considered trustworthy.[23] Llama 2 - Chat was additionally fine-tuned on 27,540 prompt-response pairs created for this project, which performed better than larger but lower-quality third-party datasets. For AI alignment, reinforcement learning with human feedback (RLHF) was used with a combination of 1,418,091 Meta examples and seven smaller datasets. The average dialog depth was 3.9 in the Meta examples, 3.0 for Anthropic Helpful and Anthropic Harmless sets, and 1.0 for five other sets, including OpenAI Summarize, StackExchange, etc.

Llama 3 consists of mainly English data, with over 5% in over 30 other languages. Its dataset was filtered by a text-quality classifier, and the classifier was trained by text synthesized by Llama 2.[27]

Fine-tuning

[edit]

Llama 1 models are only available as foundational models with self-supervised learning and without fine-tuning. Llama 2 – Chat models were derived from foundational Llama 2 models. Unlike GPT-4 which increased context length during fine-tuning, Llama 2 and Code Llama - Chat have the same context length of 4K tokens. Supervised fine-tuning used an autoregressive loss function with token loss on user prompts zeroed out. The batch size was 64.

For AI alignment, human annotators wrote prompts and then compared two model outputs (a binary protocol), giving confidence levels and separate safety labels with veto power. Two separate reward models were trained from these preferences for safety and helpfulness using Reinforcement learning from human feedback (RLHF). A major technical contribution is the departure from the exclusive use of Proximal Policy Optimization (PPO) for RLHF – a new technique based on Rejection sampling was used, followed by PPO.

Multi-turn consistency in dialogs was targeted for improvement, to make sure that "system messages" (initial instructions, such as "speak in French" and "act like Napoleon") are respected during the dialog. This was accomplished using the new "Ghost attention" technique during training, which concatenates relevant instructions to each new user message but zeros out the loss function for tokens in the prompt (earlier parts of the dialog).

Applications

[edit]

The Stanford University Institute for Human-Centered Artificial Intelligence (HAI) Center for Research on Foundation Models (CRFM) released Alpaca, a training recipe based on the LLaMA 7B model that uses the "Self-Instruct" method of instruction tuning to acquire capabilities comparable to the OpenAI GPT-3 series text-davinci-003 model at a modest cost.[43][44][45] The model files were officially removed on March 21st 2023 over hosting costs and safety concerns, though the code and paper remain online for reference.[46][47][48]

Meditron is a family of Llama-based finetuned on a corpus of clinical guidelines, PubMed papers, and articles. It was created by researchers at École Polytechnique Fédérale de Lausanne School of Computer and Communication Sciences, and the Yale School of Medicine. It shows increased performance on medical-related benchmarks such as MedQA and MedMCQA.[49][50][51]

Zoom used Meta Llama 2 to create an AI Companion that can summarize meetings, provide helpful presentation tips, and assist with message responses. This AI Companion is powered by multiple models, including Meta Llama 2.[52]

llama.cpp

[edit]

Software developer Georgi Gerganov released llama.cpp as open-source on March 10, 2023. It's a re-implementation of LLaMA in C++, allowing systems without a powerful GPU to run the model locally.[53] The llama.cpp project introduced the GGUF file format, a binary format that stores both tensors and metadata.[54] The format focuses on supporting different quantization types, which can reduce memory usage, and increase speed at the expense of lower model precision.[55]

llamafile created by Justine Tunney is an open-source tool that bundles llama.cpp with the model into a single executable file. Tunney et. al. introduced new optimized matrix multiplication kernels for x86 and ARM CPUs, improving prompt evaluation performance for FP16 and 8-bit quantized data types.[56]

Reception

[edit]

Wired describes the 8B parameter version of Llama 3 as being "surprisingly capable" given its size.[57]

The response to Meta's integration of Llama into Facebook was mixed, with some users confused after Meta AI told a parental group that it had a child.[58]

According to the Q4 2023 Earnings transcript, Meta adopted the strategy of open weights to improve on model safety, iteration speed, increase adoption among developers and researchers, and to become the industry standard. Llama 5, 6, and 7 are planned for the future.[59]

See also

[edit]

References

[edit]
  1. ^ "llama3/LICENSE at main · meta-llama/llama3". GitHub.
  2. ^ a b c d e Touvron, Hugo; Lavril, Thibaut; Izacard, Gautier; Martinet, Xavier; Lachaux, Marie-Anne; Lacroix, Timothée; Rozière, Baptiste; Goyal, Naman; Hambro, Eric; Azhar, Faisal; Rodriguez, Aurelien; Joulin, Armand; Grave, Edouard; Lample, Guillaume (2023). "LLaMA: Open and Efficient Foundation Language Models". arXiv:2302.13971 [cs.CL].
  3. ^ a b c d "Introducing LLaMA: A foundational, 65-billion-parameter large language model". Meta AI. 24 February 2023.
  4. ^ a b c "Introducing Llama 3.1: Our most capable models to date". ai.meta.com. July 23, 2024. Retrieved 2024-07-23.
  5. ^ Malik, Yuvraj; Paul, Katie (25 February 2023). "Meta heats up Big Tech's AI arms race with new language model". Reuters.
  6. ^ a b c d Vincent, James (8 March 2023). "Meta's powerful AI language model has leaked online — what happens now?". The Verge.
  7. ^ a b OpSec Online LLC (21 March 2023). "github/dmca - Notice of Claimed Infringement via Email". GitHub. Retrieved 25 March 2023.
  8. ^ David, Emilia (30 October 2023). "Meta's AI research head wants open source licensing to change". The Verge.
  9. ^ a b c "Meta and Microsoft Introduce the Next Generation of LLaMA". Meta. 18 July 2023. Retrieved 21 July 2023.
  10. ^ Peters, Jay; Vincent, James (24 February 2023). "Meta has a new machine learning language model to remind you it does AI too". The Verge.
  11. ^ "Meet Your New Assistant: Meta AI, Built With Llama 3". Meta. 18 April 2024.
  12. ^ "Examining Emergent Abilities in Large Language Models". hai.stanford.edu. 13 September 2022.
  13. ^ "The inside story of how ChatGPT was built from the people who made it". MIT Technology Review.
  14. ^ "ChatGPT is 'not particularly innovative,' and 'nothing revolutionary', says Meta's chief AI scientist". ZDNET.
  15. ^ Badminton, Nik (13 February 2023). "Meta's Yann LeCun on auto-regressive Large Language Models (LLMs)". Futurist.com.
  16. ^ "Yann LeCun on LinkedIn: My unwavering opinion on current (auto-regressive) LLMs". www.linkedin.com.
  17. ^ "llama". GitHub. Retrieved 16 March 2023.
  18. ^ a b VK, Anirudh (6 March 2023). "Meta's LLaMA Leaked to the Public, Thanks To 4chan". Analytics India Magazine. Retrieved 17 March 2023.
  19. ^ "Save bandwidth by using a torrent to distribute more efficiently by ChristopherKing42 · Pull Request #73 · facebookresearch/llama". GitHub. Retrieved 25 March 2023.
  20. ^ "Download weights from hugging face to help us save bandwidth by Jainam213 · Pull Request #109 · facebookresearch/llama". GitHub. Retrieved 17 March 2023.
  21. ^ Cox, Joseph (7 March 2023). "Facebook's Powerful Large Language Model Leaks Online". Vice. Retrieved 17 March 2023.
  22. ^ Willison, Simon (11 March 2023). "Large language models are having their Stable Diffusion moment". Simon Willison's Weblog.
  23. ^ a b c Touvron, Hugo; Martin, Louis; et al. (18 Jul 2023). "LLaMA-2: Open Foundation and Fine-Tuned Chat Models". arXiv:2307.09288 [cs.CL].
  24. ^ Edwards, Benj (2023-07-18). "Meta launches LLaMA-2, a source-available AI model that allows commercial applications [Updated]". Ars Technica. Retrieved 2023-08-08.
  25. ^ "Introducing Code Llama, a state-of-the-art large language model for coding". ai.meta.com.
  26. ^ Rozière, Baptiste; Gehring, Jonas; Gloeckle, Fabian; Sootla, Sten; Gat, Itai; Tan, Xiaoqing Ellen; Adi, Yossi; Liu, Jingyu; Sauvestre, Romain (2024-01-31). "Code Llama: Open Foundation Models for Code". arXiv:2308.12950 [cs.CL].
  27. ^ a b "Introducing Meta Llama 3: The most capable openly available LLM to date". ai.meta.com. April 18, 2024. Retrieved 2024-04-21.
  28. ^ Wiggers, Kyle (18 April 2024). "Meta releases Llama 3, claims it's among the best open models available". TechCrunch.
  29. ^ Mann, Tobias (April 19, 2024). "Meta debuts third-generation Llama large language model". The Register.
  30. ^ Patel, Dwarkesh (2024-07-24). "Mark Zuckerberg - Llama 3, Open Sourcing $10b Models, & Caesar Augustus". www.dwarkeshpatel.com. Retrieved 2024-08-01. the 8 billion is nearly as powerful as the biggest version of Llama 2 that we released [...] even by the end, it was... still learning right it's like we probably could have fed it more tokens and it would have gotten somewhat better but i mean at some point you know you're running a company you need to do these meta reasoning questions of [...] how do I want to spend our GPUs
  31. ^ "The Falcon has landed in the Hugging Face ecosystem". huggingface.co. Retrieved 2023-06-20.
  32. ^ "llama/MODEL_CARD.md at main · meta-llama/llama". GitHub. Retrieved 2024-05-28.
  33. ^ Andrej Karpathy (Apr 18, 2024), The model card has some more interesting info too
  34. ^ "llama3/MODEL_CARD.md at main · meta-llama/llama3". GitHub. Retrieved 2024-05-28.
  35. ^ "The Llama 3 Herd of Models" (July 23, 2024) Llama Team, AI @ Meta
  36. ^ "llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models". GitHub. Retrieved 2024-07-23.
  37. ^ Shazeer, Noam (2020-02-01). "GLU Variants Improve Transformer". arXiv:2002.05202 [cs.CL].
  38. ^ Su, Jianlin; Lu, Yu; Pan, Shengfeng; Murtadha, Ahmed; Wen, Bo; Liu, Yunfeng (2021-04-01). "RoFormer: Enhanced Transformer with Rotary Position Embedding". arXiv:2104.09864 [cs.CL].
  39. ^ Zhang, Biao; Sennrich, Rico (2019-10-01). "Root Mean Square Layer Normalization". arXiv:1910.07467 [cs.LG].
  40. ^ Lei Ba, Jimmy; Kiros, Jamie Ryan; Hinton, Geoffrey E. (2016-07-01). "Layer Normalization". arXiv:1607.06450 [stat.ML].
  41. ^ "RedPajama-Data: An Open Source Recipe to Reproduce LLaMA training dataset". GitHub. Together. Retrieved 4 May 2023.
  42. ^ "RedPajama-Data-1T". Hugging Face. Together. Retrieved 4 May 2023.
  43. ^ Taori, Rohan; Gulrajani, Ishaan; Zhang, Tianyi; Dubois, Yann; Li, Xuechen; Guestrin, Carlos; Liang, Percy; Hashimoto, Tatsunori B. (13 March 2023). "Alpaca: A Strong, Replicable Instruction-Following Model". Stanford Center for Research on Foundation Models.
  44. ^ Wang, Yizhong; Kordi, Yeganeh; Mishra, Swaroop; Liu, Alisa; Smith, Noah A.; Khashabi, Daniel; Hajishirzi, Hannaneh (2022). "Self-Instruct: Aligning Language Models with Self-Generated Instructions". arXiv:2212.10560 [cs.CL].
  45. ^ "Stanford CRFM". crfm.stanford.edu.
  46. ^ Quach, Katyanna. "Stanford takes costly, risky Alpaca AI model offline". www.theregister.com.
  47. ^ "Stanford Researchers Take Down Alpaca AI Over Cost and Hallucinations". Gizmodo. 21 March 2023.
  48. ^ "alpaca-lora". GitHub. Retrieved 5 April 2023.
  49. ^ "Meditron: An LLM suite for low-resource medical settings leveraging Meta Llama". ai.meta.com.
  50. ^ Petersen, Tanya (28 November 2023). "EPFL's new Large Language Model for Medical Knowledge".
  51. ^ "epfLLM/meditron". epfLLM. 11 May 2024.
  52. ^ "How Companies Are Using Meta Llama". Meta. 7 May 2024.
  53. ^ Edwards, Benj (2023-03-13). "You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi". Ars Technica. Retrieved 2024-01-04.
  54. ^ "GGUF". huggingface.co. Retrieved 9 May 2024.
  55. ^ Labonne, Maxime (29 November 2023). "Quantize Llama models with GGUF and llama.cpp". Medium. Towards Data Science. Retrieved 9 May 2024.
  56. ^ Connatser, Matthew. "Llamafile LLM driver project boosts performance on CPU cores". www.theregister.com. Retrieved 10 May 2024.
  57. ^ Knight, Will. "Meta's Open Source Llama 3 Is Already Nipping at OpenAI's Heels". Wired.
  58. ^ "Meta's amped-up AI agents confusing Facebook users". ABC News. 19 April 2024.
  59. ^ https://s21.q4cdn.com/399680738/files/doc_financials/2023/q4/META-Q4-2023-Earnings-Call-Transcript.pdf

Further reading

[edit]
[edit]