AIRFLOW__CORE__XCOM_BACKEND=path.to.your.module.S3XComBackend Use code with caution. 4. Exclusive Production Optimization Design Patterns
By default, Airflow uses the metadata database to store XComs via the BaseXCom class. While this works well for development, it comes with significant limitations in production. The most notable constraint is the 48KB size limit for stored values. Furthermore, the default backend can become a bottleneck when dealing with a large number of XComs, slowing down the database and, by extension, the entire scheduler. These limitations form the primary reason for seeking a more "exclusive" and robust solution.
Your specific (AWS, GCP, Azure, or On-Premise)
Instead of relying on the default return_value , use specific keys for important metadata. This makes your DAG's "XCom" tab in the UI much easier to audit. airflow xcom exclusive
An treats XCom as a read-only, immutable, single-purpose reference , never as a payload carrier. The word "exclusive" implies three constraints:
You can explicitly push data using the xcom_push method inside the function. This is useful if you need to push multiple values.
For traditional operators that support XCom pushes, you can use the .output property to refer to an XCom value without explicit xcom_pull calls. This is especially useful when chaining operators in a functional style. AIRFLOW__CORE__XCOM_BACKEND=path
This approach is particularly useful with the TaskFlow API, where a task's return value is automatically pushed to XCom. By explicitly passing these references as arguments to downstream tasks, you create an intuitive, functional declaration of your pipeline.
@task def load_data(row_count: int) -> None: print(f"Loaded row_count rows into destination")
: If a Python task returns a value at the end of its function execution, Airflow automatically saves it. If that data is not needed downstream, return None or set do_xcom_push=False in your operator configuration. While this works well for development, it comes
Treat the metadata database as a , not a storage layer.
Using the execution context or task instance ( ti ) object directly within your operators.
If you would like to customize this workflow further, let me know: