🚀
10 Days Realtime LLM Bootcamp
  • Introduction
    • Getting Started
    • Course Syllabus
    • Course Structure
    • Prerequisites
    • Greetings from your Instructors
    • First Exercise (Ungraded)
  • Basics of LLM
    • What is Generative AI?
    • What is a Large Language Model?
    • Advantages and Applications of Large Language Models
    • Bonus Resource: Multimodal LLMs
  • Word Vectors Simplified
    • What is a Word Vector
    • Word Vector Relationships
    • Role of Context in LLMs
    • Transforming Vectors into LLM Responses
      • Neural Networks and Transformers (Bonus Module)
      • Attention and Transformers (Bonus Module)
      • Multi-Head Attention and Further Reads (Bonus Module)
    • Let's Track Our Progress
  • Prompt Engineering
    • What is Prompt Engineering
    • Prompt Engineering and In-context Learning
    • Best Practices to Follow in Prompt Engineering
    • Token Limits in Prompts
    • Prompt Engineering Excercise
      • Story for the Excercise: The eSports Enigma
      • Tasks in the Excercise
  • Retrieval Augmented Generation and LLM Architecture
    • What is Retrieval Augmented Generation (RAG)?
    • Primer to RAG Functioning and LLM Architecture: Pre-trained and Fine-tuned LLMs
    • In-Context Learning
    • High level LLM Architecture Components for In-context Learning
    • Diving Deeper: LLM Architecture Components
    • LLM Architecture Diagram and Various Steps
    • RAG versus Fine-Tuning and Prompt Engineering
    • Versatility and Efficiency in Retrieval-Augmented Generation (RAG)
    • Key Benefits of RAG for Enterprise-Grade LLM Applications
    • Similarity Search in Vectors (Bonus Module)
    • Using kNN and LSH to Enhance Similarity Search in Vector Embeddings (Bonus Module)
    • Track your Progress
  • Hands-on Development
    • Prerequisites
    • Dropbox Retrieval App in 15 Minutes
      • Building the app without Dockerization
      • Understanding Docker
      • Using Docker to Build the App
    • Amazon Discounts App
      • How the Project Works
      • Step-by-Step Process
    • How to Run the Examples
  • Live Interactions with Jan Chorowski and Adrian Kosowski | Bonus Resource
  • Final Project + Giveaways
    • Prizes and Giveaways
    • Tracks for Submission
    • Final Submission
Powered by GitBook
On this page
  • 💡 A practical insight
  • How to Choose the Right Vector Embeddings Model
  1. Word Vectors Simplified

Word Vector Relationships

PreviousWhat is a Word VectorNextRole of Context in LLMs

Last updated 1 year ago

Navigating the landscape of text representation, and grasping how words relate to each other in vector form is essential. In the upcoming video, Anup Surendran dives into the history of word vectors and takes a closer look at Google's groundbreaking Word2Vec project. Why are vector relationships so critical, and what biases do they bring?

Let's find out!

Here, Anup traces the evolution of word vectors, emphasizing the milestone that is Google's Word2Vec project. One of the standout features of Word2Vec is vector arithmetic, allowing us to reason about words mathematically.

For example, the vector equation "King - Man + Woman = Queen" showcases this property brilliantly.

The video also explores the role of word vector relationships in similarity search—a key capability in large language models. However, Anup discusses a very important component, i.e., the inherent biases in these major technological developments.

Understanding these relationships and their implications deepens our grasp of Large Language Models and equips us to use them more responsibly. 🌐

💡 A practical insight

You’ll often hear the “vector embeddings” and “word vectors” being used interchangeably in the context of LLMs. These vector embeddings are then stored in vector indexes, specialized data structures engineered to ensure rapid and relevant data access using these embeddings.

How to Choose the Right Vector Embeddings Model

Selecting the appropriate model for generating embeddings is an intriguing topic on its own. It's essential to recognize that this domain has no one-size-fits-all solution. A glance at this reveals a variety of embedding models, each tailored for specific applications. Currently, OpenAI's text-embedding-ada-002 stands out as the go-to model for producing efficient vector embeddings from diverse data, whether structured or unstructured. We'll delve deeper into its utilization in our tutorials by the end of this course.

MTEB Leaderboard on Hugging Face