Push vs Pull

Push vs Pull¶

Source-initiated vs platform-initiated ingestion patterns.

Overview¶

Ingestion can be initiated by either the source system (push) or the platform (pull). Each approach has different trade-offs in terms of latency, complexity, and cost.

Push (Source-Initiated)¶

Architecture:

Source System → Webhook/API → Platform Ingestion Endpoint

Characteristics¶

Real-time delivery - Source controls timing
Event-driven - Data arrives as events happen
Lower latency - No polling delay

Pros¶

✅ Real-time delivery
✅ Source controls timing
✅ No polling overhead
✅ Event-driven architecture

Cons¶

❌ Source must handle retries
❌ Platform must scale for bursts
❌ Requires source system changes
❌ More complex error handling

When to Use¶

Real-time requirements
Source has reliable infrastructure
You control the source system
Event-driven architecture

Implementation¶

Webhook endpoint:

@app.post("/ingest/{source}")
async def ingest_webhook(source: str, data: dict):
    # Validate request
    if not validate_request(source, data):
        return {"error": "Invalid request"}, 400

    # Process data
    process_data(source, data)

    return {"status": "success"}

Key considerations: - Idempotency keys - Deduplicate retries - Rate limiting - Prevent abuse - Authentication - Secure endpoints - Backpressure - Reject when overloaded

Pull (Platform-Initiated)¶

Architecture:

Platform Scheduler → Query Source → Process Results

Characteristics¶

Scheduled execution - Platform controls timing
Polling-based - Query source periodically
Higher latency - Depends on polling frequency

Pros¶

✅ Platform controls rate
✅ Easier backpressure
✅ No source system changes
✅ Simpler error handling

Cons¶

❌ Polling overhead
❌ May miss real-time events
❌ Higher latency
❌ May miss data if source unavailable

When to Use¶

Batch processing
Source can't push
Rate limiting needed
Legacy systems

Implementation¶

Scheduled query:

@schedule.every(hours=1)
def pull_data():
    # Query source
    data = query_source("SELECT * FROM table WHERE updated_at > ?", last_pull_time)

    # Process data
    process_data(data)

    # Update last pull time
    update_last_pull_time()

Optimization: - Incremental queries - Only fetch new data - Parallel pulls - Multiple workers - Adaptive polling - Increase frequency when data available

Comparison¶

Aspect	Push	Pull
Latency	Low (real-time)	Higher (polling delay)
Complexity	Higher (source changes)	Lower (no source changes)
Cost	Higher (always-on)	Lower (scheduled)
Reliability	Depends on source	Platform-controlled
Scalability	Burst handling needed	Easier to scale

Hybrid Approach¶

Use both patterns: - Push for real-time, critical data - Pull for batch, non-critical data

Example:

# Real-time events (push)
@app.post("/events")
async def ingest_events(data: dict):
    process_realtime_events(data)

# Batch data (pull)
@schedule.daily()
def pull_batch_data():
    process_batch_data()

Decision Framework¶

Use Push when: - ✅ Real-time requirement - ✅ Source can push - ✅ Event-driven architecture - ✅ You control source system

Use Pull when: - ✅ Batch acceptable - ✅ Source can't push - ✅ Rate limiting needed - ✅ Legacy systems

Best Practices¶

Push Best Practices¶

Idempotency - Handle duplicate events
Rate limiting - Prevent abuse
Authentication - Secure endpoints
Backpressure - Reject when overloaded
Retry logic - Source should retry on failure

Pull Best Practices¶

Incremental queries - Only fetch new data
Checkpointing - Track last processed record
Error handling - Retry on failures
Adaptive polling - Adjust frequency based on data availability
Parallel processing - Multiple workers for large sources

Batch vs Streaming - Ingestion patterns
CDC - Change data capture
Data Architecture - Storage patterns

Next: Data Architecture →

Push vs Pull

Push vs Pull¶

Overview¶

Push (Source-Initiated)¶

Characteristics¶

Pros¶

Cons¶

When to Use¶

Implementation¶

Pull (Platform-Initiated)¶

Characteristics¶

Pros¶

Cons¶

When to Use¶

Implementation¶

Comparison¶

Hybrid Approach¶

Decision Framework¶

Best Practices¶

Push Best Practices¶

Pull Best Practices¶

Related Topics¶