At standup, a dev defines the API surface for a model in BentoML. Which abstraction do they use?
In BentoML you define a Service that declares API endpoints and orchestrates model logic. It is the central unit you decorate and deploy. The Service ties together input/output handling and the underlying model.
2 / 5
During a design review, the team wants the heavy model inference isolated for independent scaling. Which BentoML concept?
A BentoML runner wraps model inference so it can run in its own worker and scale independently of the API logic. The Service calls runners for predictions. This separation improves resource utilization.
3 / 5
In a PR review, a dev needs to declare dependencies and the service entrypoint for packaging. Which file holds this?
The bentofile.yaml declares the service entrypoint, Python packages, and other build options. BentoML reads it to assemble the deployable artifact. It is the manifest for reproducible builds.
4 / 5
During a code review, the team turns the built artifact into a container image. Which command/concept applies?
BentoML lets you containerize a built Bento into an OCI image with bentoml containerize. The resulting image can run anywhere containers run. This bridges from artifact to deployable image.
5 / 5
An incident report references serving open LLMs via a BentoML companion project. Which is it?
OpenLLM is a BentoML project for running and serving open-source LLMs with an OpenAI-compatible API. It builds on the Bento packaging model. Teams use it to self-host open models with minimal setup.