Interesting Data Science Utilities
Please note that all opinions are that of the author.
Pizza courtesy of Pizza for Ukraine!
Donate Now to Pizza for Ukraine
Hacker News had an excellent article on tools for large scale CSV / TSV / etc utilities. If you do this type of work a lot / look at sizable amounts of raw data, I’d be strongly surprised if you didn’t find a new tool here. The things I’m looking at are visidata and octosql and gron.
Here are some of the interesting takeaways on the tool front:
- http://jmespath.org/
- https://github.com/BurntSushi/xsv
- https://github.com/dinedal/textql
- https://github.com/n3mo/data-science
- https://stedolan.github.io/jq/
- https://gitlab.redox-os.org/redox-os/parallel
- https://github.com/willghatch/racket-rash
- https://visidata.org/
- https://github.com/tomnomnom/gron - JSON grep
- https://github.com/dflemstr/rq
- https://www.gnu.org/software/datamash/
- https://github.com/johnkerl/miller (written in D)
- https://github.com/mechatroner/RBQL
- https://github.com/shellbound/jwalk/
- https://www.rdocumentation.org/packages/plyr/
- https://github.com/google/crush-tools
- https://github.com/python-mario/mario (python for manipulation)
- https://github.com/cube2222/octosql/ (sql for manipulation)
- https://github.com/dkogan/vnlog
- https://csvkit.readthedocs.io/
- https://github.com/eBay/tsv-utils-dlang
- http://harelba.github.io/q/
- https://github.com/BatchLabs/charlatan
- https://github.com/dinedal/textql
- https://github.com/BurntSushi/xsv
- https://github.com/dbohdan/sqawk
- https://stedolan.github.io/jq/
- https://github.com/benbernard/RecordStream
- https://github.com/noyesno/awka (awk)