Jump to content

ELMo

From Wikipedia, the free encyclopedia
Architecture of ELMo. It first processes input tokens into embedding vectors by an embedding layer (essentially a lookup table), then applies a pair of forward and backward LSTMs to produce two sequences of hidden vectors, then apply another pair of forward and backward LSTMs, and so on.
How a token is transformed successively over increasing layers of ELMo. At the start, the token is converted to a vector by a linear layer, giving the embedding vector . In the next layer, a forward LSTM produces a hidden vector , while a backward LSTM produces another hidden vector . In the next layer, the two LSTM produces and , and so on.

ELMo (embeddings from language model) is a word embedding method for representing a sequence of words as a corresponding sequence of vectors.[1] It was created by researchers at the Allen Institute for Artificial Intelligence,[2] and University of Washington and first released in February, 2018. It is a bidirectional LSTM which takes character-level as inputs and produces word-level embeddings.

Architecture

[edit]

ELMo is a multilayered bidirectional LSTM on top of a token embedding layer. The output of all LSTMs concatenated together consists of the token embedding. As the full embedding is too large, it is typically mapped through a trainable linear matrix ("projection matrix") to produce the task-specific embedding.

After the ELMo model is trained, its vector is frozen. The projection matrix is then trained to minimize loss on a specific language task. This is an early example of pretraining.

Comparison

[edit]

Like BERT (but unlike the word embeddings produced by "bag of words" approaches, and earlier vector approaches such as Word2Vec and GloVe), ELMo embeddings are context-sensitive, producing different representations for words that share the same spelling but have different meanings (homonyms) such as "bank" in "river bank" and "bank balance".[3]

ELMo's innovation stems from its utilization of bidirectional language models. Unlike their predecessors, these models process language in forward and backwards directions. By considering a word's entire context, bidirectional models capture a more comprehensive understanding of its meaning. This holistic approach to language representation enables ELMo to encode nuanced meanings that might be missed in unidirectional models.[4]

References

[edit]
  1. ^ Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018). "Deep contextualized word representations". arXiv:1802.05365 [cs.CL].
  2. ^ "AllenNLP - ELMo — Allen Institute for AI".
  3. ^ "How to use ELMo Embedding in Bidirectional LSTM model architecture?". www.insofe.edu.in. 2020-02-11. Retrieved 2023-04-04.
  4. ^ Van Otten, Neri (26 December 2023). "Embeddings from Language Models (ELMo): Contextual Embeddings A Powerful Shift In NLP".