csvkit 0.9.1

About

csvkit is a suite of utilities for converting to and working with CSV, the king of tabular file formats.

It is inspired by pdftk, gdal and the original csvcut utility by Joe Germuska and Aaron Bycoffe.

Important links:

Why csvkit?

Because it makes your life easier.

Convert Excel to CSV:

in2csv data.xls > data.csv

Convert JSON to CSV:

in2csv data.json > data.csv

Print column names:

csvcut -n data.csv

Select a subset of columns:

csvcut -c column_a,column_c data.csv > new.csv

Reorder columns:

csvcut -c column_c,column_a data.csv > new.csv

Find rows with matching ells:

csvgrep -c phone_number -r 555-555-\d{4}" data.csv > matching.csv

Convert to JSON:

csvjson data.csv > data.json

Generate summary statistics:

csvstat data.csv

Query with SQL:

csvsql --query "select name from data where age > 30" data.csv > old_folks.csv

Import into PostgreSQL:

csvsql --db postgresql:///database --insert data.csv

Extract data from PostgreSQL::

sql2csv --db postgresql:///database --query "select * from data" > extract.csv

And much more...

Authors

The following individuals have contributed code to csvkit:

  • Christopher Groskopf
  • Joe Germuska
  • Aaron Bycoffe
  • Travis Mehlinger
  • Alejandro Companioni
  • Benjamin Wilson
  • Bryan Silverthorn
  • Evan Wheeler
  • Matt Bone
  • Ryan Pitts
  • Hari Dara
  • Jeff Larson
  • Jim Thaxton
  • Miguel Gonzalez
  • Anton Ian Sipos
  • Gregory Temchenko
  • Kevin Schaul
  • Marc Abramowitz
  • Noah Hoffman
  • Jan Schulz
  • Derek Wilson
  • Chris Rosenthal
  • Davide Setti
  • Gabi Davar
  • Sriram Karra
  • James McKinney
  • aarcro
  • Matt Dudys
  • Joakim Lundborg
  • Federico Scrinzi
  • Chris Rosenthal
  • Shane StClair
  • raistlin7447
  • Alex Dergachev
  • Jeff Paine
  • Jeroen Janssens
  • Sébastien Fievet
  • Travis Swicegood
  • Ryan Murphy
  • Diego Rabatone Oliveira
  • Matt Pettis
  • Tasneem Raja
  • Richard Low
  • Kristina Durivage
  • Espartaco Palma
  • pnaimoli
  • Michael Mior
  • Jennifer Smith
  • Antonio Lima
  • Dave Stanton

License

The MIT License

Copyright (c) 2014 Christopher Groskopf and contributers

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Changelog

0.9.1

  • Add Antonio Lima to AUTHORS.
  • Add support for ndjson. (#329)
  • Add missing docs for csvcut -C. (#227)
  • Reorganize docs so TOC works better. (#339)
  • Render docs locally with RTD theme.
  • Fix header in “tricks” docs.
  • Add install instructions to tutorial. (#331)
  • Add killer examples to doc index. (#328)
  • Reorganize doc index
  • Fix broken csvkit module documentation. (#327)
  • Fix version of openpyxl to work around encoding issue. (#391, #288)

0.9.0

  • Write missing sections of the tutorial. (#32)
  • Remove -q arg from sql2csv (conflicts with common flag).
  • Fix csvjoin in case where left dataset rows without all columns.
  • Rewrote tutorial based on LESO data. (#324)
  • Don’t error in csvjson if lat/lon columns are null. (#326)
  • Maintain field order in output of csvjson.
  • Add unit test for json in2csv. (#77)
  • Maintain key order when converting JSON into CSV. (#325.)
  • Upgrade python-dateutil to version 2.2 (#304)
  • Fix sorting of columns with null values. (#302)
  • Added release documentation.
  • Fill out short rows with null values. (#313)
  • Fix unicode output for csvlook and csvstat. (#315)
  • Add documentation for –zero. (#323)
  • Fix Integrity error when inserting zero rows in database with csvsql. (#299)
  • Add Michael Mior to AUTHORS. (#305)
  • Add –count option to CSVStat.
  • Implement csvformat.
  • Fix bug causing CSVKitDictWriter to output ‘utf-8’ for blank fields.

0.8.0

  • Add pnaimoli to AUTHORS.
  • Fix column specification in csvstat. (#236)
  • Added “Tips and Tricks” documentation. (#297, #298)
  • Add Espartaco Palma to AUTHORS.
  • Remove unnecessary enumerate calls. (#292)
  • Deprecated DBF support for Python 3+.
  • Add support for Python 3.3 and 3.4 (#239)

0.7.3

  • Fix date handling with openpyxl > 2.0 (#285)
  • Add Kristina Durivage to AUTHORS. (#243)
  • Added Richard Low to AUTHORS.
  • Support SQL queries “directly” on CSV files. (#276)
  • Add Tasneem Raja to AUTHORS.
  • Fix off-by-one error in open ended column ranges. (#238)
  • Add Matt Pettis to AUTHORS.
  • Add line numbers flag to csvlook (#244)
  • Only install argparse for Python < 2.7. (#224)
  • Add Diego Rabatone Oliveira to AUTHORS.
  • Add Ryan Murphy to AUTHORS.
  • Fix DBF dependency. (#270)

0.7.2

  • Fix CHANGELOG for release.

0.7.1

  • Fix homepage url in setup.py.

0.7.0

  • Fix XLSX datetime normalization bug. (#223)
  • Add raistlin7447 to AUTHORS.
  • Merged sql2csv utility (#259).
  • Add Jeroen Janssens to AUTHORS.
  • Validate csvsql DB connections before parsing CSVs. (#257)
  • Clarify install process for Ubuntu. (#249)
  • Clarify docs for –escapechar. (#242)
  • Make import csvkit API compatible with import csv.
  • Update Travis CI link. (#258)
  • Add Sébastien Fievet to AUTHORS.
  • Use case-sensitive name for SQLAlchemy (#237)
  • Add Travis Swicegood to AUTHORS.

0.6.1

  • Add Chris Rosenthal to AUTHORS.
  • Fix multi-file input to csvsql. (#193)
  • Passing –snifflimit=0 to disable dialect sniffing. (#190)
  • Add aarcro to the AUTHORS file.
  • Improve performance of csvgrep. (#204)
  • Add Matt Dudys to AUTHORS.
  • Add support for –skipinitialspace. (#201)
  • Add Joakim Lundborg to AUTHORS.
  • Add –no-inference option to in2csv and csvsql. (#206)
  • Add Federico Scrinzi to AUTHORS file.
  • Add –no-header-row to all tools. (#189)
  • Fix csvstack blowing up on empty files. (#209)
  • Add Chris Rosenthal to AUTHORS file.
  • Add –db-schema option to csvsql. (#216)
  • Add Shane StClair to AUTHORS file.
  • Add –no-inference support to csvsort. (#222)

0.5.0

  • Implement geojson support in csvjson. (#159)
  • Optimize writing of eight bit codecs. (#175)
  • Created csvpy. (#44)
  • Support –not-columns for excluding columns. (#137)
  • Add Jan Schulz to AUTHORS file.
  • Add Windows scripts. (#111, #176)
  • csvjoin, csvsql and csvstack will no longer hold open all files. (#178)
  • Added Noah Hoffman to AUTHORS.
  • Make csvlook output compatible with emacs table markup. (#174)

0.4.4

  • Add Derek Wilson to AUTHORS.
  • Add Kevin Schaul to AUTHORS.
  • Add DBF support to in2csv. (#11, #160)
  • Support –zero option for zero-based column indexing. (#144)
  • Support mixing nulls and blanks in string columns.
  • Add –blanks option to csvsql. (#149)
  • Add multi-file (glob) support to csvsql. (#146)
  • Add Gregory Temchenko to AUTHORS.
  • Add –no-create option to csvsql. (#148)
  • Add Anton Ian Sipos to AUTHORS.
  • Fix broken pipe errors. (#150)

0.4.3

  • Begin CHANGELOG (a bit late, I’ll admit).

Indices and tables