Skip to content

Scheduler fail to handle Kubernetes 429 responses with non-JSON body #49244

@whynick1

Description

@whynick1

Apache Airflow version

2.10.5

If "Other Airflow 2 version" selected, which one?

No response

What happened?

The Kubernetes API server's response body is not always guaranteed to be JSON. For example, in the case of a 429 response, the body may be a plain string (see example below).

kubernetes.client.exceptions.ApiException: (429)
Reason: Too Many Requests
HTTP response headers: HTTPHeaderDict({'Content-Type': 'text/plain; charset=utf-8', 'Retry-After': '4', ...})
HTTP response body: Too many requests, please try again later.

This leads to unhandled exception in Airflow scheduler (code).

except ApiException as e:
    body = json.loads(e.body)

, causing scheduler pod to restart.

What you think should happen instead?

Wrap the json.loads in a try-except to guard against malformed or non-JSON bodies.

except ApiException as e:
    try:
        body = json.loads(e.body)
    except json.JSONDecodeError:
        # If the body is a string (e.g., in a 429 error), it can't be parsed as JSON.
        # Use the body directly as the message instead.
        body = {"message": e.body}

How to reproduce

  1. Configure an Airflow deployment using the KubernetesExecutor or a setup that interacts with the Kubernetes API (e.g., KubernetesPodOperator or the scheduler using the K8s client).
  2. Apply rate limiting on the Kubernetes API server (or simulate it via a proxy or ingress controller) so that repeated API calls trigger a 429 Too Many Requests response in plain text body.
  3. Trigger a task run
  4. See json.decoder.JSONDecodeError error in Airflow scheduler, and scheduler restarts.

Operating System

Linux

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:Schedulerincluding HA (high availability) schedulerarea:corekind:bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions