Bulk Load / COPY - Can My DB?

Database	Status	Syntax Family	Min Version
PostgreSQL	full	copy	-
MySQL	full	load-data	-
MariaDB	full	load-data	-
Oracle	full	sqlldr	-
SQL Server	full	bulk-insert	-
SQLite	partial	cli-import	-
Redshift	full	copy-s3	-
DB2	full	load-import	-
Snowflake	full	copy-into	-
BigQuery	full	load-data	-

Compare Databases

Database Details

PostgreSQL full

-- Server-side import from a file (requires superuser or pg_read_server_files role)
COPY orders FROM '/data/orders.csv' (FORMAT CSV, HEADER TRUE, DELIMITER ',');

-- Server-side export to a file
COPY orders TO '/data/orders_export.csv' (FORMAT CSV, HEADER TRUE);

-- Export query results
COPY (SELECT id, total FROM orders WHERE status = 'complete')
  TO '/data/complete_orders.csv' (FORMAT CSV, HEADER TRUE);

-- Client-side import using psql meta-command (no superuser needed)
\copy orders FROM 'local_orders.csv' (FORMAT CSV, HEADER TRUE);

-- Client-side export
\copy orders TO 'local_export.csv' (FORMAT CSV, HEADER TRUE);

-- Binary format for maximum throughput
COPY orders FROM '/data/orders.bin' (FORMAT BINARY);
COPY orders TO '/data/orders.bin' (FORMAT BINARY);

Server-side COPY reads/writes files on the database server; requires superuser or the pg_read_server_files / pg_write_server_files roles.
\copy (psql meta-command, lowercase) uses client-side I/O; the file is on the client machine and no elevated privilege is needed.
Binary format is the fastest option but is not human-readable and is PostgreSQL-version-specific.
COPY supports NULL representation, quoting characters, escape characters, and encoding options.
COPY is transactional; if it fails mid-file the entire COPY is rolled back.

Docs: COPY, Populating a Database

MySQL full

-- Server-side import (file must be on the MySQL server host)
LOAD DATA INFILE '/var/lib/mysql-files/orders.csv'
  INTO TABLE orders
  FIELDS TERMINATED BY ','
  ENCLOSED BY '"'
  LINES TERMINATED BY '\n'
  IGNORE 1 ROWS
  (id, customer_id, total, order_date);

-- Client-side import (file is on the client machine)
LOAD DATA LOCAL INFILE 'orders.csv'
  INTO TABLE orders
  FIELDS TERMINATED BY ','
  ENCLOSED BY '"'
  IGNORE 1 LINES;

-- Export to CSV
SELECT id, customer_id, total
  INTO OUTFILE '/var/lib/mysql-files/export.csv'
  FIELDS TERMINATED BY ','
  ENCLOSED BY '"'
  LINES TERMINATED BY '\n'
  FROM orders;

-- mysqlimport CLI (wraps LOAD DATA INFILE)
-- mysqlimport --local --fields-terminated-by=',' db_name orders.csv

Server-side LOAD DATA INFILE requires the FILE privilege and the file must be in the directory specified by secure_file_priv.
LOCAL INFILE transfers the file from client to server; may be disabled by server configuration (local_infile=OFF) or by the client driver.
mysqlimport is the CLI utility that wraps LOAD DATA INFILE for batch imports.
Multi-row INSERT with batching can approach LOAD DATA speeds for smaller datasets.
IGNORE N ROWS / LINES skips header rows.

Docs: LOAD DATA Statement, mysqlimport

MariaDB full

-- LOAD DATA INFILE (same as MySQL)
LOAD DATA INFILE '/tmp/orders.csv'
  INTO TABLE orders
  FIELDS TERMINATED BY ','
  ENCLOSED BY '"'
  LINES TERMINATED BY '\n'
  IGNORE 1 ROWS;

-- Load XML file (MariaDB extension)
LOAD XML INFILE '/tmp/orders.xml'
  INTO TABLE orders
  ROWS IDENTIFIED BY '<order>';

-- Client-side
LOAD DATA LOCAL INFILE 'orders.csv'
  INTO TABLE orders
  FIELDS TERMINATED BY ',';

-- Export
SELECT * INTO OUTFILE '/tmp/export.csv'
  FIELDS TERMINATED BY ','
  FROM orders;

LOAD DATA INFILE and LOAD DATA LOCAL INFILE behave identically to MySQL.
LOAD XML INFILE is a MariaDB extension that imports XML files, with each record identified by a configurable XML element.
Generally faster than multi-row INSERT statements for large datasets.
secure_file_priv server variable controls which directories are accessible for server-side LOAD DATA.

Docs: LOAD DATA INFILE, LOAD XML

Oracle full

-- SQL*Loader control file (orders.ctl):
-- LOAD DATA
-- INFILE 'orders.csv'
-- INTO TABLE orders
-- FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
-- (id, customer_id, total, order_date DATE 'YYYY-MM-DD')

-- Run SQL*Loader from the OS shell:
-- sqlldr userid=app/secret@mydb control=orders.ctl log=orders.log

-- Direct-path INSERT (bypasses buffer cache, minimal logging)
INSERT /*+ APPEND */ INTO orders_archive
  SELECT * FROM orders WHERE order_date < DATE '2023-01-01';
COMMIT;

-- External Table: query a CSV as a table (no load step)
CREATE TABLE ext_orders (
  id          NUMBER,
  customer_id NUMBER,
  total       NUMBER(12,2),
  order_date  DATE
)
ORGANIZATION EXTERNAL (
  TYPE ORACLE_LOADER
  DEFAULT DIRECTORY data_dir
  ACCESS PARAMETERS (
    RECORDS DELIMITED BY NEWLINE
    SKIP 1
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
    (
      id, customer_id, total,
      order_date DATE 'YYYY-MM-DD'
    )
  )
  LOCATION ('orders.csv')
)
REJECT LIMIT UNLIMITED;

-- Data Pump export/import (Oracle-to-Oracle transfer)
-- OS shell: expdp app/secret TABLES=orders DUMPFILE=orders.dmp
-- OS shell: impdp app/secret TABLES=orders DUMPFILE=orders.dmp REMAP_SCHEMA=old:new

SQL*Loader (sqlldr) is the bundled bulk load utility; it reads a control file describing the file format and column mappings.
SQL*Loader supports conventional-path load (transactional, uses SQL INSERT) and direct-path load (writes formatted blocks directly, much faster but more restrictive).
Direct-path INSERT (/*+ APPEND */ hint) bypasses the buffer cache and redo logging for maximum throughput; requires exclusive table access.
External Tables allow querying flat files as read-only database tables without a load step, useful for ELT patterns.
Data Pump (expdp/impdp) is the modern Oracle-to-Oracle export/import facility, replacing the older exp/imp utilities.

Docs: SQL*Loader, Direct-Path INSERT, Oracle Data Pump

SQL Server full

-- BULK INSERT from a CSV file
BULK INSERT orders
FROM 'C:\data\orders.csv'
WITH (
  FORMAT            = 'CSV',
  FIRSTROW          = 2,
  FIELDTERMINATOR   = ',',
  ROWTERMINATOR     = '\n',
  TABLOCK
);

-- OPENROWSET with BULK to read and transform during insert
INSERT INTO orders (id, customer_id, total)
  SELECT id, customer_id, total
  FROM OPENROWSET(
    BULK 'C:\data\orders.csv',
    FORMATFILE = 'C:\data\orders.fmt',
    FIRSTROW = 2
  ) AS src;

-- bcp utility (command line):
-- Import: bcp db.dbo.orders in orders.csv -S server -U user -P pass -c -t,
-- Export: bcp db.dbo.orders out orders.csv -S server -U user -P pass -c -t,

BULK INSERT is the SQL-integrated bulk load command; it reads files from the server file system.
bcp (Bulk Copy Program) is the classic command-line tool for bulk import and export; it can read from/write to client-side files.
OPENROWSET(BULK ...) enables inline bulk reads that can be filtered and transformed in a SELECT statement.
The TABLOCK hint on BULK INSERT acquires a table lock and enables minimal logging for faster loads.
SQL Server Integration Services (SSIS) is the full ETL platform for complex data movement workflows.

Docs: BULK INSERT, bcp Utility, Bulk Import and Export of Data

SQLite partial

-- Import CSV in the sqlite3 CLI shell
.mode csv
.import orders.csv orders

-- Import with headers (sqlite3 3.32+)
.import --csv --skip 1 orders.csv orders

-- Application-level bulk insert: use a transaction for speed
BEGIN;
INSERT INTO orders (id, customer_id, total) VALUES (1, 10, 99.99);
INSERT INTO orders (id, customer_id, total) VALUES (2, 11, 149.50);
-- ... thousands more rows ...
COMMIT;

-- Prepared statement approach (in application code)
-- Use BEGIN / many binds / COMMIT for best throughput

SQLite has no SQL-level COPY or LOAD DATA command.
The .import command in the sqlite3 CLI shell handles CSV import.
Applications should use prepared statements with explicit BEGIN/COMMIT transactions for bulk insert. Wrapping thousands of inserts in a single transaction is orders of magnitude faster than auto-commit.
SQLite is not optimized for very large bulk loads; for multi-GB datasets consider batch transactions of 10,000–100,000 rows each.
The WAL journal mode (PRAGMA journal_mode=WAL) can improve concurrent write throughput.

Docs: sqlite3 CLI - .import, Optimizing SQLite Performance

Redshift full

-- COPY from S3 (CSV)
COPY orders
FROM 's3://mybucket/data/orders/'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftLoadRole'
FORMAT AS CSV
IGNOREHEADER 1
DELIMITER ',';

-- COPY from Parquet (columnar, most efficient)
COPY orders
FROM 's3://mybucket/data/orders/'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftLoadRole'
FORMAT AS PARQUET;

-- COPY with MANIFEST file (exact file list)
COPY orders
FROM 's3://mybucket/manifests/orders.manifest'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftLoadRole'
MANIFEST
FORMAT AS CSV;

-- COPY from DynamoDB
COPY orders
FROM 'dynamodb://OrdersTable'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftLoadRole'
READRATIO 50;

COPY is the recommended way to load data into Redshift; it is much faster than INSERT statements.
COPY loads in parallel from multiple S3 files. Split source files to match the number of Redshift slices for optimal throughput.
Columnar file formats (Parquet, ORC) are the most efficient because Redshift can read only the columns it needs.
Compressed files (gzip, lzop, bzip2, zstd) are automatically decompressed during COPY.
MANIFEST files specify exact S3 file paths to load, preventing accidental inclusion of extra files.
COPY also supports loading from DynamoDB, EMR (HDFS), and remote hosts (SSH).

Docs: COPY Command, Loading Data Best Practices

DB2 full

-- IMPORT (transactional, uses SQL INSERT internally)
IMPORT FROM /data/orders.csv
  OF CSV
  MODIFIED BY COLDEL,
  SKIPCOUNT 1
  INSERT INTO orders;

-- LOAD (non-transactional, high speed, direct write)
LOAD FROM /data/orders.del
  OF DEL
  MODIFIED BY COLDEL,
  DUMPFILE /data/orders.err
  INSERT INTO orders
  NONRECOVERABLE;

-- EXPORT to delimited file
EXPORT TO /data/orders_export.del
  OF DEL
  MODIFIED BY COLDEL,
  SELECT * FROM orders;

-- db2move: multi-table export/import
-- OS shell: db2move MYDB export -tc orders,customers
-- OS shell: db2move NEWDB import

LOAD is the high-speed bulk load utility; it writes formatted pages directly and does not use SQL INSERT, making it very fast.
IMPORT is slower than LOAD but transactional; it uses SQL INSERT and can be rolled back.
LOAD requires that the table be taken offline (or accessible in a limited way) and the table space is placed in a load-pending state until the load completes.
NONRECOVERABLE option on LOAD skips logging for maximum speed; the table must be backed up afterward.
db2move exports and imports multiple tables in a single operation, useful for database migration.
The IBM Data Movement Tool provides a graphical interface for complex data movement scenarios.

Docs: LOAD Command, IMPORT Command, Moving Data

Snowflake full

-- Stage a local file into an internal Snowflake stage
PUT file:///local/data/orders.csv @my_internal_stage;

-- Create a file format
CREATE FILE FORMAT my_csv_fmt
  TYPE = CSV
  FIELD_DELIMITER = ','
  SKIP_HEADER = 1
  NULL_IF = ('NULL', 'null');

-- COPY INTO table from internal stage
COPY INTO orders
  FROM @my_internal_stage/orders.csv
  FILE_FORMAT = my_csv_fmt;

-- COPY INTO table from external S3 stage
COPY INTO orders
  FROM @my_s3_stage
  FILE_FORMAT = (TYPE = PARQUET)
  MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;

-- COPY INTO stage (export)
COPY INTO @my_s3_stage/export/
  FROM (SELECT id, total FROM orders WHERE status = 'complete')
  FILE_FORMAT = (TYPE = CSV COMPRESSION = GZIP);

-- Handle errors: skip bad rows
COPY INTO orders
  FROM @my_stage
  FILE_FORMAT = my_csv_fmt
  ON_ERROR = SKIP_FILE;

Files must be staged (PUT to an internal stage or referenced via an external stage) before COPY INTO can load them.
Internal stages are Snowflake-managed storage; external stages reference S3, Azure Blob, or GCS directly.
Supported formats: CSV, JSON (newline-delimited), Avro, ORC, Parquet, XML.
Files are automatically decompressed (gzip, bzip2, zstd, Brotli, raw deflate) during load.
ON_ERROR option controls behavior when bad rows are encountered: ABORT_STATEMENT (default), SKIP_FILE, CONTINUE.
COPY INTO is idempotent by default; Snowflake tracks load metadata to prevent re-loading the same files.

Docs: COPY INTO (table), Bulk Loading from Amazon S3, PUT

BigQuery full

-- SQL: LOAD DATA statement (GA since 2022)
LOAD DATA INTO mydataset.orders
FROM FILES (
  format = 'CSV',
  uris   = ['gs://mybucket/orders/*.csv'],
  skip_leading_rows = 1
);

-- LOAD DATA OVERWRITE (replace existing data)
LOAD DATA OVERWRITE mydataset.orders
FROM FILES (
  format = 'PARQUET',
  uris   = ['gs://mybucket/orders/*.parquet']
);

-- bq CLI: load CSV from GCS
-- bq load --source_format=CSV --autodetect mydataset.orders gs://mybucket/orders/*.csv

-- bq CLI: load Parquet (schema inferred)
-- bq load --source_format=PARQUET mydataset.orders gs://mybucket/orders/*.parquet

-- Streaming insert via Storage Write API (real-time)
-- (application code, not SQL - uses google-cloud-bigquery client library)

LOAD DATA is the SQL syntax for batch loading from Cloud Storage (added in BigQuery in 2022).
The bq load CLI command is the classic way to load data, supporting CSV, JSON, Avro, Parquet, and ORC from GCS.
Parquet and Avro are the preferred formats; they carry schema information and load faster than CSV.
The BigQuery Storage Write API (formerly the Streaming API) enables high-throughput, low-latency streaming inserts.
Data is loaded into columnar storage automatically; no index management or VACUUM is required.
Loads are transactional; they fully succeed or are fully rolled back.

Docs: Loading Data into BigQuery, LOAD DATA Statement, Storage Write API