Haystack 2.0 is a component-based framework for building production RAG and NLP pipelines. These exercises cover the @component decorator, pipeline wiring with connect(), DocumentSplitter for chunking, DocumentStore backends, and the separation between indexing and querying pipelines.
0 / 5 completed
1 / 5
In Haystack 2.0, what is the fundamental building block of a Pipeline?
Haystack 2.0's Components are Python classes decorated with @component that declare typed @component.input/@component.output ports and implement a run() method. Pipelines connect components by wiring output ports to input ports, enabling type-checked, composable data flow.
2 / 5
A developer builds a RAG pipeline in Haystack 2.0 and calls pipeline.connect('embedder.embedding', 'retriever.query_embedding'). What does this do?
pipeline.connect() wires an output port of one component to an input port of another using dot-notation ('component.port'). This defines the data flow graph. Haystack validates that the connected ports have compatible types at pipeline construction time, catching configuration errors early.
3 / 5
Which Haystack component handles chunking documents into smaller pieces before embedding?
DocumentSplitter is the Haystack component that splits Documents into smaller chunks based on word count, sentence boundaries, or passage units. Proper chunking is critical for RAG quality — chunks must be small enough to be semantically focused but large enough to provide sufficient context to the LLM.
4 / 5
What does a DocumentStore provide in a Haystack pipeline?
A DocumentStore is Haystack's abstraction over storage backends (InMemoryDocumentStore, ElasticsearchDocumentStore, QdrantDocumentStore, etc.). It stores Document objects and supports retrieval operations like BM25 keyword search, vector similarity search, or hybrid retrieval depending on the implementation.
5 / 5
In Haystack 2.0, what distinguishes an indexing pipeline from a querying pipeline?
Haystack typically uses two pipelines: an indexing pipeline that takes raw documents, cleans, splits, embeds, and writes them to a DocumentStore. A separate querying pipeline takes a user question, embeds it, retrieves relevant documents from the store, and passes them with the question to a Generator component for the final answer.