novita

deepseek/deepseek-r1-distill-llama-8b

Context

32K

Input

text

Output

text

Tool calling

Supported

About this model

DeepSeek-R1-Distill-Llama-8B combines DeepSeek-R1 knowledge distillation with Llama architecture. Available in FP8 (H100+ only) and FP16 quantization, delivering efficient performance with 32K context.

Capabilities

text

Available through the unified API

Quick start

View API docs

curl https://api.oneinfer.ai/v1/chat/completions \
  -H "Authorization: Bearer $ONEINFER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "24c0b83803ff4ac8b32693aee54098e8",
    "messages": [
      { "role": "user", "content": "Hello!" }
    ]
  }'

Providers

Available routing options for this model through OneInfer.

novita

24c0b83803ff4ac8b32693aee54098e8

Available

Input

$0.040 / 1M

Output

$0.040 / 1M

Routing

OneInfer optimized

Pricing

Current OneInfer pricing for this model.

Usage	Price
Input tokens	$0.040 / 1M
Output tokens	$0.040 / 1M

Performance

Published evaluation results associated with this model.

Performance

MMLU71.5

GSM8K82.3

HumanEval61.4

Tokens/sec (A100)110

API example

curl https://api.oneinfer.ai/v1/chat/completions \
  -H "Authorization: Bearer $ONEINFER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "24c0b83803ff4ac8b32693aee54098e8",
    "messages": [
      { "role": "user", "content": "Hello!" }
    ]
  }'