Question 1

What is RAG and what vocabulary describes it in English?

Accepted Answer

RAG stands for Retrieval-Augmented Generation. The vocabulary includes chunking, embedding, vector store, cosine similarity, retrieval pipeline, re-ranking, and hybrid search. Engineers describe RAG as fetching the top-k most semantically similar chunks and injecting them into the prompt as context before the LLM generates its response.

Question 2

What does fine-tuning mean and when is it recommended over prompt engineering?

Accepted Answer

Fine-tuning involves training a pre-trained model on a domain-specific dataset to adapt its behavior, vocabulary, or tone. Engineers recommend it when the desired behavior cannot be reliably achieved through prompting alone, for example when the model must produce outputs in a very specific format consistently.

Question 3

What is an embedding and how is the concept explained in English?

Accepted Answer

An embedding is a numerical vector representation of text that captures semantic meaning. When two pieces of text have similar embeddings they are semantically similar, close together in vector space. Engineers explain this as embedding both the query and documents, then retrieving documents whose vectors are nearest to the query vector.

Question 4

What does prompt engineering mean and which vocabulary terms are essential?

Accepted Answer

Prompt engineering is the practice of crafting and iterating on inputs to an LLM to reliably produce desired outputs. Key vocabulary includes system prompt, few-shot examples, chain-of-thought, temperature, top-p, context window, prompt injection, and jailbreak.

Question 5

What is function calling (tool use) in LLM applications?

Accepted Answer

Function calling allows an LLM to request the execution of predefined functions by outputting structured JSON describing the function name and arguments. The application executes the function and returns the result to the model, which integrates the tool response into its final answer.

Question 6

How is LLM application evaluation discussed in English?

Accepted Answer

Evaluation vocabulary includes faithfulness, answer relevance, context recall, hallucination rate, LLM-as-judge, and eval suite. Teams set up automated evaluation pipelines that score model responses against a golden dataset to track quality over time.

Question 7

What is LLMOps and what does it involve?

Accepted Answer

LLMOps is the set of practices for deploying, monitoring, and iterating on LLM-powered applications in production. Key activities include prompt versioning, model registry management, latency and cost observability, A/B testing between model versions, and alerting on quality degradation.

Question 8

What is an AI agent in the context of LLM applications?

Accepted Answer

An AI agent is an LLM-powered system that autonomously takes actions such as calling tools, querying databases, or browsing the web to complete multi-step tasks. Vocabulary includes ReAct pattern, memory types, planning step, reflection loop, and multi-agent orchestration.

Question 9

What does context window mean and why does it matter for application architecture?

Accepted Answer

The context window is the maximum number of tokens an LLM can process in a single inference call. Application architecture must stay within this limit, which influences chunking strategies, conversation memory management, and the amount of retrieved context injected into each prompt.

Question 10

What vocabulary is used to describe vector databases in English?

Accepted Answer

Vector database vocabulary includes index, upsert, nearest neighbor search, approximate nearest neighbor (ANN), namespace, metadata filtering, and embedding dimensionality. Popular systems include Pinecone, Weaviate, Qdrant, and pgvector.

LLM Application Development Language

RAG Pipeline Vocabulary

Function Calling & Tool Use Language

LLM Application Evaluation Language

Advanced Prompt Engineering Language

LLMOps Workflow Language

AI Agent & Agentic System Vocabulary

Frequently Asked Questions