Bulk Load / COPY
High-throughput data ingestion from external files or streams, bypassing row-by-row INSERT overhead.
COPY BULK INSERT LOAD DATA INFILE bcp SQL*Loader COPY INTO bulk insert bulk load
bulk load COPY import ETL performance LOAD DATA bcp
| Database | Status | Syntax Family | Min Version |
|---|---|---|---|
| PostgreSQL | full | copy | - |
| MySQL | full | load-data | - |
| MariaDB | full | load-data | - |
| Oracle | full | sqlldr | - |
| SQL Server | full | bulk-insert | - |
| SQLite | partial | cli-import | - |
| Redshift | full | copy-s3 | - |
| DB2 | full | load-import | - |
| Snowflake | full | copy-into | - |
| BigQuery | full | load-data | - |
Compare Databases
Database Details
PostgreSQL
full
-- Server-side import from a file (requires superuser or pg_read_server_files role)
COPY orders FROM '/data/orders.csv' (FORMAT CSV, HEADER TRUE, DELIMITER ',');
-- Server-side export to a file
COPY orders TO '/data/orders_export.csv' (FORMAT CSV, HEADER TRUE);
-- Export query results
COPY (SELECT id, total FROM orders WHERE status = 'complete')
TO '/data/complete_orders.csv' (FORMAT CSV, HEADER TRUE);
-- Client-side import using psql meta-command (no superuser needed)
\copy orders FROM 'local_orders.csv' (FORMAT CSV, HEADER TRUE);
-- Client-side export
\copy orders TO 'local_export.csv' (FORMAT CSV, HEADER TRUE);
-- Binary format for maximum throughput
COPY orders FROM '/data/orders.bin' (FORMAT BINARY);
COPY orders TO '/data/orders.bin' (FORMAT BINARY);- Server-side COPY reads/writes files on the database server; requires superuser or the pg_read_server_files / pg_write_server_files roles.
- \copy (psql meta-command, lowercase) uses client-side I/O; the file is on the client machine and no elevated privilege is needed.
- Binary format is the fastest option but is not human-readable and is PostgreSQL-version-specific.
- COPY supports NULL representation, quoting characters, escape characters, and encoding options.
- COPY is transactional; if it fails mid-file the entire COPY is rolled back.
Docs: COPY, Populating a Database
MySQL
full
-- Server-side import (file must be on the MySQL server host)
LOAD DATA INFILE '/var/lib/mysql-files/orders.csv'
INTO TABLE orders
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(id, customer_id, total, order_date);
-- Client-side import (file is on the client machine)
LOAD DATA LOCAL INFILE 'orders.csv'
INTO TABLE orders
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
IGNORE 1 LINES;
-- Export to CSV
SELECT id, customer_id, total
INTO OUTFILE '/var/lib/mysql-files/export.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM orders;
-- mysqlimport CLI (wraps LOAD DATA INFILE)
-- mysqlimport --local --fields-terminated-by=',' db_name orders.csv- Server-side LOAD DATA INFILE requires the FILE privilege and the file must be in the directory specified by secure_file_priv.
- LOCAL INFILE transfers the file from client to server; may be disabled by server configuration (local_infile=OFF) or by the client driver.
- mysqlimport is the CLI utility that wraps LOAD DATA INFILE for batch imports.
- Multi-row INSERT with batching can approach LOAD DATA speeds for smaller datasets.
- IGNORE N ROWS / LINES skips header rows.
Docs: LOAD DATA Statement, mysqlimport
MariaDB
full
-- LOAD DATA INFILE (same as MySQL)
LOAD DATA INFILE '/tmp/orders.csv'
INTO TABLE orders
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
-- Load XML file (MariaDB extension)
LOAD XML INFILE '/tmp/orders.xml'
INTO TABLE orders
ROWS IDENTIFIED BY '<order>';
-- Client-side
LOAD DATA LOCAL INFILE 'orders.csv'
INTO TABLE orders
FIELDS TERMINATED BY ',';
-- Export
SELECT * INTO OUTFILE '/tmp/export.csv'
FIELDS TERMINATED BY ','
FROM orders;- LOAD DATA INFILE and LOAD DATA LOCAL INFILE behave identically to MySQL.
- LOAD XML INFILE is a MariaDB extension that imports XML files, with each record identified by a configurable XML element.
- Generally faster than multi-row INSERT statements for large datasets.
- secure_file_priv server variable controls which directories are accessible for server-side LOAD DATA.
Docs: LOAD DATA INFILE, LOAD XML
Oracle
full
-- SQL*Loader control file (orders.ctl):
-- LOAD DATA
-- INFILE 'orders.csv'
-- INTO TABLE orders
-- FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
-- (id, customer_id, total, order_date DATE 'YYYY-MM-DD')
-- Run SQL*Loader from the OS shell:
-- sqlldr userid=app/secret@mydb control=orders.ctl log=orders.log
-- Direct-path INSERT (bypasses buffer cache, minimal logging)
INSERT /*+ APPEND */ INTO orders_archive
SELECT * FROM orders WHERE order_date < DATE '2023-01-01';
COMMIT;
-- External Table: query a CSV as a table (no load step)
CREATE TABLE ext_orders (
id NUMBER,
customer_id NUMBER,
total NUMBER(12,2),
order_date DATE
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY data_dir
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
SKIP 1
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
(
id, customer_id, total,
order_date DATE 'YYYY-MM-DD'
)
)
LOCATION ('orders.csv')
)
REJECT LIMIT UNLIMITED;
-- Data Pump export/import (Oracle-to-Oracle transfer)
-- OS shell: expdp app/secret TABLES=orders DUMPFILE=orders.dmp
-- OS shell: impdp app/secret TABLES=orders DUMPFILE=orders.dmp REMAP_SCHEMA=old:new- SQL*Loader (sqlldr) is the bundled bulk load utility; it reads a control file describing the file format and column mappings.
- SQL*Loader supports conventional-path load (transactional, uses SQL INSERT) and direct-path load (writes formatted blocks directly, much faster but more restrictive).
- Direct-path INSERT (/*+ APPEND */ hint) bypasses the buffer cache and redo logging for maximum throughput; requires exclusive table access.
- External Tables allow querying flat files as read-only database tables without a load step, useful for ELT patterns.
- Data Pump (expdp/impdp) is the modern Oracle-to-Oracle export/import facility, replacing the older exp/imp utilities.
SQL Server
full
-- BULK INSERT from a CSV file
BULK INSERT orders
FROM 'C:\data\orders.csv'
WITH (
FORMAT = 'CSV',
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
TABLOCK
);
-- OPENROWSET with BULK to read and transform during insert
INSERT INTO orders (id, customer_id, total)
SELECT id, customer_id, total
FROM OPENROWSET(
BULK 'C:\data\orders.csv',
FORMATFILE = 'C:\data\orders.fmt',
FIRSTROW = 2
) AS src;
-- bcp utility (command line):
-- Import: bcp db.dbo.orders in orders.csv -S server -U user -P pass -c -t,
-- Export: bcp db.dbo.orders out orders.csv -S server -U user -P pass -c -t,- BULK INSERT is the SQL-integrated bulk load command; it reads files from the server file system.
- bcp (Bulk Copy Program) is the classic command-line tool for bulk import and export; it can read from/write to client-side files.
- OPENROWSET(BULK ...) enables inline bulk reads that can be filtered and transformed in a SELECT statement.
- The TABLOCK hint on BULK INSERT acquires a table lock and enables minimal logging for faster loads.
- SQL Server Integration Services (SSIS) is the full ETL platform for complex data movement workflows.
SQLite
partial
-- Import CSV in the sqlite3 CLI shell
.mode csv
.import orders.csv orders
-- Import with headers (sqlite3 3.32+)
.import --csv --skip 1 orders.csv orders
-- Application-level bulk insert: use a transaction for speed
BEGIN;
INSERT INTO orders (id, customer_id, total) VALUES (1, 10, 99.99);
INSERT INTO orders (id, customer_id, total) VALUES (2, 11, 149.50);
-- ... thousands more rows ...
COMMIT;
-- Prepared statement approach (in application code)
-- Use BEGIN / many binds / COMMIT for best throughput- SQLite has no SQL-level COPY or LOAD DATA command.
- The .import command in the sqlite3 CLI shell handles CSV import.
- Applications should use prepared statements with explicit BEGIN/COMMIT transactions for bulk insert. Wrapping thousands of inserts in a single transaction is orders of magnitude faster than auto-commit.
- SQLite is not optimized for very large bulk loads; for multi-GB datasets consider batch transactions of 10,000–100,000 rows each.
- The WAL journal mode (PRAGMA journal_mode=WAL) can improve concurrent write throughput.
Redshift
full
-- COPY from S3 (CSV)
COPY orders
FROM 's3://mybucket/data/orders/'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftLoadRole'
FORMAT AS CSV
IGNOREHEADER 1
DELIMITER ',';
-- COPY from Parquet (columnar, most efficient)
COPY orders
FROM 's3://mybucket/data/orders/'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftLoadRole'
FORMAT AS PARQUET;
-- COPY with MANIFEST file (exact file list)
COPY orders
FROM 's3://mybucket/manifests/orders.manifest'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftLoadRole'
MANIFEST
FORMAT AS CSV;
-- COPY from DynamoDB
COPY orders
FROM 'dynamodb://OrdersTable'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftLoadRole'
READRATIO 50;- COPY is the recommended way to load data into Redshift; it is much faster than INSERT statements.
- COPY loads in parallel from multiple S3 files. Split source files to match the number of Redshift slices for optimal throughput.
- Columnar file formats (Parquet, ORC) are the most efficient because Redshift can read only the columns it needs.
- Compressed files (gzip, lzop, bzip2, zstd) are automatically decompressed during COPY.
- MANIFEST files specify exact S3 file paths to load, preventing accidental inclusion of extra files.
- COPY also supports loading from DynamoDB, EMR (HDFS), and remote hosts (SSH).
DB2
full
-- IMPORT (transactional, uses SQL INSERT internally)
IMPORT FROM /data/orders.csv
OF CSV
MODIFIED BY COLDEL,
SKIPCOUNT 1
INSERT INTO orders;
-- LOAD (non-transactional, high speed, direct write)
LOAD FROM /data/orders.del
OF DEL
MODIFIED BY COLDEL,
DUMPFILE /data/orders.err
INSERT INTO orders
NONRECOVERABLE;
-- EXPORT to delimited file
EXPORT TO /data/orders_export.del
OF DEL
MODIFIED BY COLDEL,
SELECT * FROM orders;
-- db2move: multi-table export/import
-- OS shell: db2move MYDB export -tc orders,customers
-- OS shell: db2move NEWDB import- LOAD is the high-speed bulk load utility; it writes formatted pages directly and does not use SQL INSERT, making it very fast.
- IMPORT is slower than LOAD but transactional; it uses SQL INSERT and can be rolled back.
- LOAD requires that the table be taken offline (or accessible in a limited way) and the table space is placed in a load-pending state until the load completes.
- NONRECOVERABLE option on LOAD skips logging for maximum speed; the table must be backed up afterward.
- db2move exports and imports multiple tables in a single operation, useful for database migration.
- The IBM Data Movement Tool provides a graphical interface for complex data movement scenarios.
Snowflake
full
-- Stage a local file into an internal Snowflake stage
PUT file:///local/data/orders.csv @my_internal_stage;
-- Create a file format
CREATE FILE FORMAT my_csv_fmt
TYPE = CSV
FIELD_DELIMITER = ','
SKIP_HEADER = 1
NULL_IF = ('NULL', 'null');
-- COPY INTO table from internal stage
COPY INTO orders
FROM @my_internal_stage/orders.csv
FILE_FORMAT = my_csv_fmt;
-- COPY INTO table from external S3 stage
COPY INTO orders
FROM @my_s3_stage
FILE_FORMAT = (TYPE = PARQUET)
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;
-- COPY INTO stage (export)
COPY INTO @my_s3_stage/export/
FROM (SELECT id, total FROM orders WHERE status = 'complete')
FILE_FORMAT = (TYPE = CSV COMPRESSION = GZIP);
-- Handle errors: skip bad rows
COPY INTO orders
FROM @my_stage
FILE_FORMAT = my_csv_fmt
ON_ERROR = SKIP_FILE;- Files must be staged (PUT to an internal stage or referenced via an external stage) before COPY INTO can load them.
- Internal stages are Snowflake-managed storage; external stages reference S3, Azure Blob, or GCS directly.
- Supported formats: CSV, JSON (newline-delimited), Avro, ORC, Parquet, XML.
- Files are automatically decompressed (gzip, bzip2, zstd, Brotli, raw deflate) during load.
- ON_ERROR option controls behavior when bad rows are encountered: ABORT_STATEMENT (default), SKIP_FILE, CONTINUE.
- COPY INTO is idempotent by default; Snowflake tracks load metadata to prevent re-loading the same files.
BigQuery
full
-- SQL: LOAD DATA statement (GA since 2022)
LOAD DATA INTO mydataset.orders
FROM FILES (
format = 'CSV',
uris = ['gs://mybucket/orders/*.csv'],
skip_leading_rows = 1
);
-- LOAD DATA OVERWRITE (replace existing data)
LOAD DATA OVERWRITE mydataset.orders
FROM FILES (
format = 'PARQUET',
uris = ['gs://mybucket/orders/*.parquet']
);
-- bq CLI: load CSV from GCS
-- bq load --source_format=CSV --autodetect mydataset.orders gs://mybucket/orders/*.csv
-- bq CLI: load Parquet (schema inferred)
-- bq load --source_format=PARQUET mydataset.orders gs://mybucket/orders/*.parquet
-- Streaming insert via Storage Write API (real-time)
-- (application code, not SQL - uses google-cloud-bigquery client library)- LOAD DATA is the SQL syntax for batch loading from Cloud Storage (added in BigQuery in 2022).
- The bq load CLI command is the classic way to load data, supporting CSV, JSON, Avro, Parquet, and ORC from GCS.
- Parquet and Avro are the preferred formats; they carry schema information and load faster than CSV.
- The BigQuery Storage Write API (formerly the Streaming API) enables high-throughput, low-latency streaming inserts.
- Data is loaded into columnar storage automatically; no index management or VACUUM is required.
- Loads are transactional; they fully succeed or are fully rolled back.