csvclean¶
Description¶
Cleans a CSV file of common syntax errors:
- reports rows that have a different number of columns than the header row
- attempts to correct the CSV by joining short rows into a single row
Note that every csvkit tool does the following:
- removes optional quote characters, unless the –quoting (-u) option is set to change this behavior
- changes the field delimiter to a comma, if the input delimiter is set with the –delimiter (-d) or –tabs (-t) options
- changes the record delimiter to a line feed (LF or
\n
) - changes the quote character to a double-quotation mark, if the character is set with the –quotechar (-q) option
- changes the character encoding to UTF-8, if the input encoding is set with the –encoding (-e) option
Outputs [basename]_out.csv and [basename]_err.csv, the former containing all valid rows and the latter containing all error rows along with line numbers and descriptions:
usage: csvclean [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
[-p ESCAPECHAR] [-z FIELD_SIZE_LIMIT] [-e ENCODING] [-S] [-H]
[-K SKIP_LINES] [-v] [-l] [--zero] [-V] [-n]
[FILE]
Fix common errors in a CSV file.
positional arguments:
FILE The CSV file to operate on. If omitted, will accept
input as piped data via STDIN.
optional arguments:
-h, --help show this help message and exit
-n, --dry-run Do not create output files. Information about what
would have been done will be printed to STDERR.
See also: Arguments common to all tools.
Examples¶
Test a file with known bad rows:
csvclean -n examples/bad.csv
Line 1: Expected 3 columns, found 4 columns
Line 2: Expected 3 columns, found 2 columns
To change the line ending from line feed (LF or \n
) to carriage return and line feed (CRLF or \r\n
) use:
csvformat -M $'\r\n' examples/dummy.csv