Use pipeline(), InferenceClient, model IDs, task types, and understand the difference between the Inference API and Inference Endpoints
0 / 5 completed
1 / 5
What does the pipeline() function provide in the Hugging Face Transformers library?
pipeline(): Example: const pipe = await pipeline('text-classification', 'distilbert-base-uncased-finetuned-sst-2-english'); const result = await pipe('I love this!') returns [{ label: 'POSITIVE', score: 0.99 }]. Available tasks include text-generation, translation, summarization, image-classification, automatic-speech-recognition, and more.
2 / 5
What is the InferenceClient in the Hugging Face JS SDK?
InferenceClient:import { InferenceClient } from '@huggingface/inference'; const client = new InferenceClient(process.env.HF_TOKEN); const result = await client.textClassification({ model: 'distilbert/distilbert-base-uncased-finetuned-sst-2-english', inputs: 'Great product!' }). No GPU or local model weights required — HF handles infrastructure. Supports all Inference API tasks.
3 / 5
How do model IDs on Hugging Face follow a naming convention?
HF model IDs: The format is username-or-org/model-name. Official Hugging Face models may omit the org: gpt2. Variant models include the base plus fine-tune descriptor: facebook/bart-large-cnn. The same ID is used in pipeline(task, modelId), AutoModel.from_pretrained(modelId), and the Inference API URL path.
4 / 5
What are tasks in the Hugging Face ecosystem?
HF tasks: Tasks define the contract. text-generation takes text and generates continuation. summarization takes long text and returns a summary. fill-mask takes text with [MASK] tokens. Models declare their supported tasks in model_card_data.tags. The Inference API routes requests to the appropriate pipeline based on the task.
5 / 5
What is the difference between the Inference API and Inference Endpoints on Hugging Face?
Inference API vs Endpoints: The free Inference API is ideal for prototyping — it queues requests on shared GPU infrastructure. Inference Endpoints provisions dedicated GPU instances (AWS, Azure, GCP) running a specific model you choose. You pay per hour of instance uptime. Endpoints support auto-scaling to zero when idle. Use Endpoints for production traffic requiring consistent latency.