BentoML-Based AI Inference System Performance Analysis

ai-technology · 2026-04-24

A study on arXiv (2604.20420) addresses the underexplored area of AI inference deployment by analyzing a BentoML-based system developed with graphworks.ai. Using a pre-trained RoBERTa sentiment analysis model, baseline performance was established under three realistic workload scenarios. Traffic patterns following gamma and exponential distributions simulated steady, bursty, and high-intensity conditions. Key metrics like latency percentiles and throughput were collected to identify bottlenecks.

Key facts

Study investigates performance and optimization of a BentoML-based AI inference system
Collaboration with graphworks.ai
Uses pre-trained RoBERTa sentiment analysis model
Three realistic workload scenarios for baseline performance
Traffic patterns follow gamma and exponential distributions
Simulates steady, bursty, and high-intensity workloads
Key metrics: latency percentiles and throughput
Identifies bottlenecks in inference pipeline

BentoML-Based AI Inference System Performance Analysis

Key facts

Entities

Institutions

Sources