AI Endpoints¶

OVHcloud AI Endpoints provides an OpenAI-compatible API for running LLM inference. This provider integrates seamlessly with Airflow, allowing you to use chat completions and embeddings in your workflows.

Overview¶

AI Endpoints follows the OpenAI API specification, making it easy to migrate existing workflows or use familiar patterns.

Base URL: https://oai.endpoints.kepler.ai.cloud.ovh.net/v1

Connection Setup¶

Create an Airflow connection with your API token:

airflow connections add ovh_ai_endpoints_default \
    --conn-type generic \
    --conn-password your-api-token-here

See Getting Started for detailed configuration options.

Chat Completions¶

Generate text responses using large language models.

Basic Usage¶

from airflow import DAG
from apache_airflow_provider_ovhcloud_ai.operators.ai_endpoints import (
    OVHCloudAIEndpointsChatCompletionsOperator
)
from datetime import datetime

with DAG(
    dag_id='chat_completion_example',
    start_date=datetime(2024, 1, 1),
    schedule=None,
    catchup=False,
) as dag:

    chat = OVHCloudAIEndpointsChatCompletionsOperator(
        task_id='generate_response',
        model='Meta-Llama-3_3-70B-Instruct',
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain machine learning in simple terms."}
        ],
        temperature=0.7,
        max_tokens=500,
    )

Message Roles¶

Messages use a role-based format:

Role	Description
`system`	Sets the behavior and context for the assistant
`user`	Messages from the user
`assistant`	Previous responses from the model (for conversation history)

Multi-Turn Conversations¶

chat = OVHCloudAIEndpointsChatCompletionsOperator(
    task_id='multi_turn_chat',
    model='Meta-Llama-3_3-70B-Instruct',
    messages=[
        {"role": "system", "content": "You are a Python expert."},
        {"role": "user", "content": "How do I read a CSV file?"},
        {"role": "assistant", "content": "You can use pandas: `pd.read_csv('file.csv')`"},
        {"role": "user", "content": "How do I filter rows where column 'age' > 30?"}
    ],
)

Parameters¶

Parameter	Type	Default	Description
`model`	`str`	Required	Model name to use
`messages`	`List[Dict]`	Required	List of message objects
`temperature`	`float`	`1.0`	Controls randomness (0-2)
`max_tokens`	`int`	`None`	Maximum tokens to generate
`top_p`	`float`	`None`	Nucleus sampling parameter
`stop`	`str` or `List[str]`	`None`	Stop sequences
`ovh_conn_id`	`str`	`'ovh_ai_endpoints_default'`	Airflow connection ID

Response Structure¶

The operator returns a response matching the OpenAI format:

{
    "id": "chatcmpl-xxx",
    "object": "chat.completion",
    "created": 1234567890,
    "model": "Meta-Llama-3_3-70B-Instruct",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Machine learning is..."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 25,
        "completion_tokens": 150,
        "total_tokens": 175
    }
}

Embeddings¶

Create vector representations of text for semantic search, clustering, or classification.

Basic Usage¶

from apache_airflow_provider_ovhcloud_ai.operators.ai_endpoints import (
    OVHCloudAIEndpointsEmbeddingOperator
)

embed = OVHCloudAIEndpointsEmbeddingOperator(
    task_id='create_embedding',
    model='BGE-M3',
    input="Apache Airflow is a workflow orchestration tool",
)

Batch Embeddings¶

Embed multiple texts in a single request:

embed = OVHCloudAIEndpointsEmbeddingOperator(
    task_id='batch_embeddings',
    model='BGE-M3',
    input=[
        "First document to embed",
        "Second document to embed",
        "Third document to embed"
    ],
)

Parameters¶

Parameter	Type	Default	Description
`model`	`str`	Required	Embedding model name
`input`	`str` or `List[str]`	Required	Text(s) to embed
`ovh_conn_id`	`str`	`'ovh_ai_endpoints_default'`	Airflow connection ID

Response Structure¶

{
    "object": "list",
    "data": [
        {
            "object": "embedding",
            "index": 0,
            "embedding": [0.0023, -0.0045, ...]  # Vector of floats
        }
    ],
    "model": "BGE-M3",
    "usage": {
        "prompt_tokens": 10,
        "total_tokens": 10
    }
}

Using the Hook Directly¶

For more control, use the OVHCloudAIEndpointsHook in Python operators:

from airflow.operators.python import PythonOperator
from apache_airflow_provider_ovhcloud_ai.hooks.ai_endpoints import OVHCloudAIEndpointsHook

def custom_ai_logic(**context):
    hook = OVHCloudAIEndpointsHook(ovh_conn_id='ovh_ai_endpoints_default')

    # Chat completion
    response = hook.chat_completion(
        model='Meta-Llama-3_3-70B-Instruct',
        messages=[
            {"role": "user", "content": "Summarize this text: ..."}
        ],
        temperature=0.3,
        max_tokens=100,
    )

    summary = response['choices'][0]['message']['content']

    # Create embedding of the summary
    embedding_response = hook.create_embedding(
        model='BGE-M3',
        input=summary,
    )

    return {
        'summary': summary,
        'embedding': embedding_response['data'][0]['embedding']
    }

with DAG(...) as dag:
    task = PythonOperator(
        task_id='custom_ai_task',
        python_callable=custom_ai_logic,
    )

Jinja Templating¶

Operators support Jinja templating for dynamic values:

from airflow import DAG
from apache_airflow_provider_ovhcloud_ai.operators.ai_endpoints import (
    OVHCloudAIEndpointsChatCompletionsOperator
)
from datetime import datetime

with DAG(
    dag_id='templated_example',
    start_date=datetime(2024, 1, 1),
    schedule='@daily',
    catchup=False,
) as dag:

    chat = OVHCloudAIEndpointsChatCompletionsOperator(
        task_id='analyze_daily_data',
        model='{{ var.value.llm_model }}',  # From Airflow Variables
        messages=[
            {
                "role": "system",
                "content": "You are a data analyst."
            },
            {
                "role": "user",
                "content": "Generate a report summary for {{ ds }}"  # Execution date
            }
        ],
    )

Templatable Fields¶

model
messages

Available Models¶

Refer to the OVHcloud AI Endpoints Catalog for the latest models.

Popular Chat Models¶

Model	Description
`Meta-Llama-3_3-70B-Instruct`	Meta's Llama 3.3 70B instruction-tuned
`gpt-oss-120b`	Large open-source model
`Mixtral-8x22B-Instruct-v0.1`	Mixtral mixture of experts

Embedding Models¶

Model	Description
`BGE-M3`	Multilingual embedding model
`bge-multilingual-gemma2`	Multilingual Gemma-based embeddings

Best Practices¶

1. Use Appropriate Temperatures¶

0.0 - 0.3: Deterministic outputs (code generation, factual answers)
0.5 - 0.7: Balanced creativity (general tasks)
0.8 - 1.0+: Creative outputs (brainstorming, creative writing)

2. Set Max Tokens Appropriately¶

Avoid unnecessary costs by setting max_tokens based on expected output length.

3. Use XCom for Chaining Tasks¶

def use_previous_response(**context):
    ti = context['ti']
    response = ti.xcom_pull(task_ids='chat_task')
    content = response['choices'][0]['message']['content']
    # Process content...

4. Handle Rate Limits¶

For high-volume workflows, consider adding retries:

chat = OVHCloudAIEndpointsChatCompletionsOperator(
    task_id='chat_with_retry',
    model='Meta-Llama-3_3-70B-Instruct',
    messages=[...],
    retries=3,
    retry_delay=timedelta(seconds=30),
)

Error Handling¶

The operators raise AirflowException on failures. Common errors:

Error	Cause	Solution
API key not found	Missing connection password	Check connection configuration
401 Unauthorized	Invalid API token	Verify your token is correct
429 Too Many Requests	Rate limit exceeded	Add retries with backoff
Model not found	Invalid model name	Check available models in the catalog