csvkit 2.0.1¶
About¶
csvkit is a suite of command-line tools for converting to and working with CSV, the king of tabular file formats.
It is inspired by pdftk, GDAL and the original csvcut tool by Joe Germuska and Aaron Bycoffe.
Important links:
Documentation: https://csvkit.rtfd.org/
Repository: https://github.com/wireservice/csvkit
Schemas: https://github.com/wireservice/ffs
First time? See Tutorial.
Note
To change the field separator, line terminator, etc. of the output, you must use csvformat.
Note
csvkit, by default, sniffs CSV formats (it deduces whether commas, tabs or spaces delimit fields, for example) based on the first 1024 bytes, and performs type inference (it converts text to numbers, dates, booleans, etc.). These features are useful and work well in most cases, but occasional errors occur. If you don’t need these features, set --snifflimit 0
(-y 0
) and --no-inference
(-I
).
Note
If you need to do more complex data analysis than csvkit can handle, use agate. If you need csvkit to be faster or to handle larger files, you may be reaching the limits of csvkit. Consider loading the data into SQL, or using qsv or xsv.
Note
Need to deduplicate or find fuzzy matches in your CSV data? Use csvdedupe and csvlink.
Why csvkit?¶
Because it makes your life easier.
Convert Excel to CSV:
in2csv data.xls > data.csv
Convert JSON to CSV:
in2csv data.json > data.csv
Print column names:
csvcut -n data.csv
Select a subset of columns:
csvcut -c column_a,column_c data.csv > new.csv
Reorder columns:
csvcut -c column_c,column_a data.csv > new.csv
Find rows with matching cells:
csvgrep -c phone_number -r "555-555-\d{4}" data.csv > new.csv
Convert to JSON:
csvjson data.csv > data.json
Generate summary statistics:
csvstat data.csv
Query with SQL:
csvsql --query "select name from data where age > 30" data.csv > new.csv
Import into PostgreSQL:
csvsql --db postgresql:///database --insert data.csv
Extract data from PostgreSQL:
sql2csv --db postgresql:///database --query "select * from data" > new.csv
And much more…
Table of contents¶
- Tutorial
- Reference
- Tips and Troubleshooting
- Contributing to csvkit
- Release process
- License
- Changelog
- Unreleased
- 2.0.1 - July 12, 2024
- 2.0.0 - May 1, 2024
- 1.5.0 - March 28, 2024
- 1.4.0 - February 13, 2024
- 1.3.0 - October 18, 2023
- 1.2.0 - October 4, 2023
- 1.1.1 - February 22, 2023
- 1.1.0 - January 3, 2023
- 1.0.7 - March 6, 2022
- 1.0.6 - July 13, 2021
- 1.0.5 - March 2, 2020
- 1.0.4 - March 16, 2019
- 1.0.3 - March 11, 2018
- 1.0.2 - April 28, 2017
- 1.0.1 - December 29, 2016
- 1.0.0 - December 27, 2016
- 0.9.1 - March 31, 2015
- 0.9.0 - September 8, 2014
- 0.8.0 - July 27, 2014
- 0.7.3 - April 27, 2014
- 0.7.2 - March 24, 2014
- 0.7.1 - March 24, 2014
- 0.7.0 - March 24, 2014
- 0.6.1 - August 20, 2013
- 0.6.0 - August 20, 2013
- 0.5.0 - August 21, 2012
- 0.4.4 - May 1, 2012
- 0.4.3 - February 20, 2012
Citation¶
When citing csvkit in publications, you may use this BibTeX entry:
@Manual{csvkit,
title = "csvkit",
author = "Christopher Groskopf and contributors",
year = "2016",
url = "https://csvkit.readthedocs.org/"
}