ETL vs ELT: What Actually Differs in Data Processing
ETL and ELT aren't just shuffled letters. The order defines where heavy computation happens — and that choice affects cost, maintainability, and compliance.
In an architecture review, someone asked why the data pipeline was loading raw files directly into the warehouse before transforming them. The answer was "it's ELT, not ETL." The silence in the room made it clear half the team had never stopped to think about the difference — and the other half assumed it was just a matter of letter order.
It's not. The order matters, and the wrong decision can cost weeks of rework when volume scales.
The difference that matters in practice
ETL — Extract, Transform, Load — is the classical model. You extract data from the source, transform it in an intermediate layer (usually a dedicated processing server), and only then load the clean, structured result into the final destination.
ELT — Extract, Load, Transform — reverses the last two steps. You extract data and load it raw into the destination, and the transformation happens inside the warehouse itself, using its own computational power.
The practical difference isn't philosophical. It's about where the heavy computation happens: on an intermediate ETL server or inside the destination warehouse.
ETL: Source → [ETL Server: transforms] → Warehouse (clean data)
ELT: Source → Warehouse (raw data) → [Warehouse: transforms in place]
Why ETL dominated for so long
ETL was the natural answer when warehouses were expensive and computationally constrained. Storage in Oracle or Teradata cost real money per GB, and keeping dirty data there was waste. Pre-processing on a dedicated server before persisting made economic sense.
ETL transformation was handled by tools like Informatica, SSIS, or custom Python code. Data arriving in the warehouse was a finished product: validated, typed, with a defined schema, ready for analytical queries.
The problem: transformation logic lives in a place separate from the data. When business rules change — and they always do — you modify the ETL code, reprocess history, and hope the maintenance window fits within the SLA.
Why ELT makes more sense in modern pipelines
The shift came with BigQuery, Redshift, and Snowflake. Modern warehouses are elastic, scale compute separately from storage, and charge per query executed — not at prohibitive per-GB storage prices. The cost of storing raw data dropped dramatically.
Loading first and transforming later became both cheaper and more flexible:
You preserve the original data. If the transformation logic is wrong, you haven't lost the raw data. You rewrite the transformation and reprocess from scratch without going back to the original source.
Transformation uses warehouse power. A Snowflake or BigQuery cluster has more computational power than any dedicated ETL server at equivalent cost. Aggregations over billions of rows are what they're built for.
Faster iteration. With tools like dbt, transformation is SQL versioned in Git. You change logic, run dbt run, and within minutes have the updated result without server deployments.
When ETL still makes sense
ELT isn't always the right answer. There are cases where transformation must happen before loading:
Compliance and privacy (GDPR, LGPD). If raw data contains PII — social security numbers, card numbers, health data — you may not have legal permission to load that into a warehouse before anonymizing or masking. In that case, transformation must come first.
Heterogeneous sources with incompatible schemas. When aggregating dozens of sources with completely different formats (flat files, REST APIs, legacy FTP), an intermediate ETL server may make more sense than trying to resolve everything inside the warehouse with SQL.
Very low volume against expensive warehouses. For small pipelines running against pay-per-query or pay-per-processing databases, transforming before loading may be cheaper.
Low latency requirements. ETL with lightweight, well-optimized transformations can be faster than the load-then-transform cycle of ELT when you need data ready in seconds, not minutes.
The hybrid model you find in production
Real pipelines are rarely pure ETL or pure ELT. What you find in production looks more like this:
- Extract: collect data from the source (API, OLTP database, files)
- Light transform: minimal required cleaning — encoding, key deduplication, PII masking
- Load raw: load into the raw/landing layer of the warehouse
- Heavy transform: analytical transformations inside the warehouse via dbt or native SQL
This model is what the Medallion architecture (Bronze/Silver/Gold) in Databricks or dbt's staging/marts layers formalize. Raw data preserved in the Bronze/raw layer serves as a reprocessing point, and upper layers are materialized views with progressively refined logic.
-- Example dbt transformation in the silver layer
-- Raw data is already in the warehouse; logic lives in versioned SQL
SELECT
order_id,
customer_id,
-- normalize value that came as a string from the source
CAST(REPLACE(raw_amount, ',', '.') AS DECIMAL(10,2)) AS amount,
-- standardize timezone
CONVERT_TIMEZONE('UTC', order_ts) AS order_utc
FROM raw.orders
WHERE order_id IS NOT NULL
The role of modern tooling in this choice
The popularization of ELT owes a lot to dbt (data build tool). Before dbt, doing ELT meant manual SQL with no versioning, no tests, no documentation. dbt brought software engineering practices — Git, CI, testing — to SQL transformations.
Extraction tools like Airbyte, Fivetran, and Stitch automated the Extract+Load, leaving Transform as the only part requiring custom code. The pipeline becomes: Airbyte does EL, dbt does T.
That doesn't mean ETL is dead. Tools like Apache Spark, Flink, and streaming frameworks still do heavy transformations before loading when the use case demands — especially in real-time streaming pipelines where you need to enrich or aggregate events before persisting.
When ELT doesn't scale well
There's a problem few posts about ELT mention: if you load raw data without any quality controls, the warehouse becomes a data swamp. Raw tables with inconsistent schemas, duplicates, null values where there shouldn't be — and all the cleaning logic scattered across dozens of undocumented dbt models.
The discipline ETL enforced (transform before persisting) had a positive side effect: data errors were caught before entering the warehouse. With ELT, detection needs to be explicit — dbt tests, data contracts, validations in the extraction pipeline.
ELT with discipline is better than ETL. ELT without discipline is worse than any alternative.
Frequently asked questions
ETL or ELT — which is faster to implement?
It depends on context. For a simple pipeline with a single source and a defined destination, ELT with Airbyte + dbt can be running within hours. For integrating dozens of heterogeneous sources with complex business rules, ETL with custom transformation may be faster to maintain in the medium term — especially when the team knows Python or Java better than analytical SQL.
Can I use dbt with ETL?
dbt is a SQL transformation tool that runs inside the warehouse. By definition, it operates on the ELT model — data needs to be in the warehouse before you run dbt run. Using dbt means adopting ELT, at least for the transformation step.
What is EtLT?
It's a term for architectures that do a lightweight transformation before loading (PII masking, deduplication) and then full analytical transformations inside the warehouse. It's the hybrid model described above with its own name. Most production pipelines are EtLT in practice, even when the team doesn't use the term.
Do I need a dedicated server for ETL?
Not necessarily. ETL can run on Lambda/Cloud Functions for smaller volumes, or in containers orchestrated by Airflow/Prefect for larger ones. Dedicated ETL servers were necessary in an era of less elastic infrastructure. Today, transformation compute can be serverless or containerized with no fixed cost.
The right architecture depends on your warehouse, not the acronym
The question "ETL or ELT?" matters less than "where is my transformation compute cheapest, most maintainable, and most reliable?" If the warehouse has elastic compute and the team knows SQL — ELT with dbt is the obvious choice today. If data has strict privacy constraints or sources are too chaotic to load raw — ETL or the hybrid model makes more sense.
What no longer exists is a universal answer. ETL isn't legacy and ELT isn't modern by itself. They're tools, and like every tool, their value depends on how you use them.
Note: the editorial content ends here. What follows is a related tool recommendation.
Related tool
ELT pipelines often involve inspecting intermediate files — CSVs exported from sources, API JSON responses, table dumps. The CSV to JSON tool from Quick Tools converts these files directly in the browser without uploading to a server, useful for inspecting data samples before deciding on the load schema in the warehouse.
- 01 What DevOps Actually Is Beyond the Tools DevOps isn't a pipeline or a job title. It's shared ownership between the people who write code and the people who run it in production — and why most teams get it wrong.
- 02 VPS, VPC, and Dedicated Server: What's the Difference and When to Use Each VPS, VPC, and dedicated server appear side by side on every hosting comparison page — but they mean different things. Here's where the money goes and how to decide.