← Back to portfolio
Data Science Capstone — Delivery Hero / Foodpanda

Foodpanda Global Vendor Performance & Churn Risk Intelligence

An end-to-end data science pipeline across 11 Asian markets: ingesting 187,531 vendor records and 3,068,774 customer reviews, consolidating fragmented country-level CSVs into a unified analytics dataset, identifying cross-market quality patterns in a descriptive layer, and training a vendor churn-risk classifier (Random Forest, AUC 0.966) that predicts whether a vendor is at risk of falling below Foodpanda's quality threshold before the platform's official rating update has been triggered.

Python pandas scikit-learn Random Forest Gradient Boosting Matplotlib Multi-market Analytics Vendor Intelligence

Business Problem

Foodpanda operates restaurants and dark kitchens across Southeast and South Asia, each market with a different competitive landscape, logistics infrastructure, and customer expectation baseline. When a vendor's quality deteriorates, the damage compounds: the individual customer's bad experience can translate into platform abandonment if the negative experience occurs repeatedly, across multiple vendors in the same market.

The challenge is that Foodpanda's official AverageRating is a lagging indicator — it reflects the cumulative history of all reviews since the vendor joined the platform. A vendor that performed well for two years but has been receiving consistently poor scores for the past 90 days will still show a respectable headline rating. By the time the aggregate rating drops below the visibility threshold, thousands of customers have already been served a degraded experience.

This project answers three questions:

  1. Descriptive: What does the quality landscape look like across all 11 markets? Where are the structural, market-level quality gaps?
  2. Predictive: Can we identify vendors likely to cross the quality threshold using review-signal features alone — without relying on the lagging platform rating?
  3. Prescriptive: What is the market-specific intervention hierarchy, and what operational playbook should the Vendor Success team execute?
Dataset: Kaggle — Foodpanda Restaurant Reviews (bwandowando). Scraped 2025. 11 country files each for vendor master and customer review data. The Taiwan review file is structurally different (missing the restaurant_food sub-score column) — handled explicitly in the pipeline with pd.concat(join="outer").

Pipeline Architecture & Tools

The pipeline is structured as four sequential stages, each with a clear analytical purpose:

Data Engineering

Loops over 22 country-partitioned CSV files (11 vendor master + 11 review files), injects a country dimension, handles schema divergence (TW missing column), and aggregates 3M reviews to vendor-level KPIs using groupby. Outputs global_vendors_enriched.csv.

Descriptive Analytics

Three publication-style dark-theme charts: a cross-market quality leaderboard, a food-type × market rating heatmap (top 10 categories), and a review-volume segmentation panel showing how quality profile varies with vendor maturity.

Predictive Modeling (scikit-learn)

Three models benchmarked head-to-head via 5-fold stratified cross-validation: Logistic Regression (baseline), Random Forest, and Gradient Boosting. Features are derived exclusively from the review data — not the platform rating — to avoid leakage and simulate an early-warning production use-case.

Prescriptive Framework

RF churn-risk scores bucketed into three intervention tiers (Immediate / Preventive / Quality Gate) with market-specific operational playbooks informed by the feature importance ranking.

The Code

Step 1 — Data consolidation across 11 markets Python
import os, glob
import pandas as pd

DATA_DIR = "kaggle/foodpanda"
COUNTRIES = ["BD","HK","KH","LA","MM","MY","PH","PK","SG","TH","TW"]

# ── Consolidate vendor master ──────────────────────────────────────────
restos_frames = []
for cc in COUNTRIES:
    df = pd.read_csv(f"{DATA_DIR}/{cc.lower()}_restos_2025.csv")
    df["country"] = cc
    restos_frames.append(df)

global_vendors = pd.concat(restos_frames, ignore_index=True)
# → 187,531 rows, 11 markets

# ── Consolidate reviews — TW is missing restaurant_food column ─────────
# join="outer" fills the missing column with NaN for TW rows.
reviews_frames = []
for cc in COUNTRIES:
    df = pd.read_csv(
        f"{DATA_DIR}/{cc.lower()}_reviews_2025.csv",
        usecols=lambda c: c in [
            "StoreId","createdAt","overall","rider","restaurant_food","likeCount"
        ],
    )
    df["country"] = cc
    reviews_frames.append(df)

global_reviews = pd.concat(reviews_frames, ignore_index=True)
# → 3,068,774 rows

# ── Aggregate reviews → vendor KPIs ────────────────────────────────────
agg = (
    global_reviews.groupby("StoreId")
    .agg(
        review_count      = ("overall",         "count"),
        avg_overall       = ("overall",         "mean"),
        avg_rider         = ("rider",           "mean"),
        avg_food_rating   = ("restaurant_food", "mean"),
        pct_low_reviews   = ("overall",         lambda x: (x <= 2).mean()),
        total_likes       = ("likeCount",       "sum"),
        latest_review     = ("createdAt",       "max"),
        earliest_review   = ("createdAt",       "min"),
    )
    .reset_index()
)

# Review span in days — platform tenure proxy
agg["review_span_days"] = (
    (pd.to_datetime(agg["latest_review"], utc=True) -
     pd.to_datetime(agg["earliest_review"], utc=True))
    .dt.total_seconds() / 86400
).clip(lower=0)

# Merge enriched KPIs onto vendor master
df = global_vendors.merge(agg, on="StoreId", how="left")
Step 2 — Target definition & leakage-free feature engineering Python
# ── Target: AverageRating < 3.5 (Foodpanda's internal quality gate) ────
# AverageRating in the vendor master is NOT a raw mean of review scores.
# It is Foodpanda's time-weighted, Bayesian-smoothed platform rating.
# Our review-aggregated avg_overall will correlate with it, but not perfectly.
# This gap is exactly what enables early-warning detection.

model_df["at_risk"] = (model_df["AverageRating"] < 3.5).astype(int)
# → 879 at-risk vendors out of 38,517 with ≥ 20 reviews (2.3%)

# ── Feature matrix — AverageRating intentionally EXCLUDED ───────────────
# Including the platform rating would constitute label leakage.
# The model must learn to predict below-threshold ratings from review signals.
FEATURES = [
    "avg_overall",       # Mean review score (not identical to platform rating)
    "avg_rider",         # Delivery experience sub-score
    "avg_food_rating",   # Food quality sub-score (NaN for TW → median imputed)
    "pct_low_reviews",   # Share of 1–2 star reviews
    "total_likes",       # Community engagement signal
    "review_span_days",  # Platform tenure in days
    "Reviewers",         # Platform's own review count (vendor master)
    "review_count",      # Our aggregated count (review files)
    "country_enc",       # Market fixed effect (LabelEncoded)
]

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy="median")
X_imp   = imputer.fit_transform(model_df[FEATURES])
Step 3 — 5-fold stratified cross-validation across 3 models Python
from sklearn.model_selection import StratifiedKFold, cross_validate
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

models = {
    "Logistic Regression": Pipeline([
        ("scaler", StandardScaler()),
        ("clf",    LogisticRegression(max_iter=500, class_weight="balanced")),
    ]),
    "Random Forest": RandomForestClassifier(
        n_estimators=200, max_depth=8, min_samples_leaf=20,
        class_weight="balanced", random_state=42, n_jobs=-1,
    ),
    "Gradient Boosting": GradientBoostingClassifier(
        n_estimators=150, max_depth=5, learning_rate=0.08,
        subsample=0.8, random_state=42,
    ),
}

results = {}
for name, model in models.items():
    scores = cross_validate(
        model, X_imp, y, cv=cv,
        scoring=["roc_auc", "f1", "precision", "recall"],
    )
    results[name] = {k: v.mean() for k, v in scores.items()}

# Results — 5-fold mean:
# Logistic Regression:  AUC=0.939  F1=0.230  Prec=0.132  Rec=0.881
# Random Forest:        AUC=0.966  F1=0.401  Prec=0.266  Rec=0.813
# Gradient Boosting:    AUC=0.968  F1=0.482  Prec=0.616  Rec=0.398
Step 4 — Risk tier assignment & per-market intervention counts Python
# Train final RF on full cohort for scoring
rf_final = RandomForestClassifier(
    n_estimators=200, max_depth=8, min_samples_leaf=20,
    class_weight="balanced", random_state=42, n_jobs=-1,
)
rf_final.fit(X_imp, y)

model_df["churn_risk_score"] = rf_final.predict_proba(X_imp)[:, 1]
model_df["risk_tier"] = pd.cut(
    model_df["churn_risk_score"],
    bins=[0, 0.3, 0.6, 1.0],
    labels=["Low", "Medium", "High"],
)

# Per-market breakdown
risk_by_market = (
    model_df.groupby("country")
    .agg(
        vendors_scored = ("StoreId",           "count"),
        high_risk      = ("risk_tier",          lambda x: (x == "High").sum()),
        avg_score      = ("churn_risk_score",   "mean"),
    )
    .assign(high_risk_pct=lambda d: d["high_risk"] / d["vendors_scored"])
    .sort_values("high_risk_pct", ascending=False)
)
# Top result: Bangladesh — 31.2% of scored vendors in High risk tier

Visualizations

Four dark-theme charts trace the full analytical arc from market-level benchmarking to model interpretation. All figures are styled to match the portfolio's design token palette (--primary #0ea5e9, --card-bg #162236).

Cross-Market Quality Benchmark. Median vendor rating, share of ≥ 4 star vendors, and mean complaint-rate (reviews ≤ 2 stars) across 11 markets. Bangladesh and Pakistan show the highest structural quality gaps; Malaysia and Thailand lead in share of 4-star+ vendors.
Food Type × Market Rating Heatmap. Median rating for the top 10 food categories across all 11 markets. Dark cells indicate sparse or absent vendor presence; bright cells flag category–market combinations worth targeted vendor acquisition (e.g., Japanese cuisine in Bangladesh).
Review Volume Segmentation. Vendors bucketed by total review count into five maturity tiers. The chart reveals a critical insight: high-volume vendors (200+ reviews) have lower median ratings, not higher — suggesting the rating system is stricter for experienced customers, not that established vendors perform worse.
Churn Risk Model. 5-fold CV scorecard comparing Logistic Regression (AUC 0.939), Random Forest (AUC 0.966), and Gradient Boosting (AUC 0.968). Feature importance confirms that avg_overall and pct_low_reviews (review-signal proxies) dominate — validating the early-warning use-case.

Model Selection Rationale

All three models achieve competitive AUC scores (≥ 0.939), but they differ substantially on precision and recall — and that tradeoff has direct operational consequences.

Logistic Regression achieves the highest recall (0.881) at the cost of very low precision (0.132). In practice this means the Vendor Success team receives roughly 7 false alerts for every genuine at-risk vendor — a workload that quickly erodes confidence in the system and leads to alert fatigue.

Gradient Boosting flips that tradeoff: precision 0.616, recall 0.398. It would flag fewer vendors, but miss 60% of the actual at-risk population — unacceptable for a proactive intervention programme.

Random Forest sits in the middle (precision 0.266, recall 0.813) with the best AUC (0.966) and significantly better F1 than LR. At the platform scale of ~38K scored vendors, the RF flags approximately 2,700 vendors as high-priority, of which roughly 720 are genuine at-risk cases — a manageable daily ops workload. The RF's feature_importances_ provide the interpretability needed for the Vendor Success team to understand why a specific vendor was flagged.

The low absolute precision across all models is an artefact of class imbalance (2.3% base rate), not a model failure. At deployment, the RF's probability score would be used to rank vendors rather than hard-classify them — teams would work down a sorted priority list, not act on a binary flag.

Conclusion

This project demonstrates an analytics stack that directly mirrors the work done inside a regional food delivery platform's data science function: ingesting messy, partitioned operational data; engineering a unified dataset from disparate country schemas; surfacing market heterogeneity through comparative visualisations; and deploying a risk classifier that enables the operations team to intervene before the customer damage is done.

The most commercially significant finding is the market-level quality gradient: Bangladesh has 14–31× the at-risk vendor concentration of Taiwan or Malaysia, and Pakistan's high review complaint rate is driven disproportionately by low rider scores — pointing to a logistics partner problem rather than a food quality problem. These are actionable, market-specific diagnoses — precisely the kind of insight a regional VP of Operations needs to allocate Vendor Success Manager headcount and define Q3 market intervention priorities.