Over my career I've made many mistakes, occasionally learn from them, sometimes find useful software/tips/resources, and such. I don't expect to remember all or even most of these so I compile everything here so that I have a quick and easy way to reference them on the web; hopefully in doing so it also turns out to be helpful for others as well.
- PDB Cheatsheet from https://github.com/nblock/pdb-cheatsheet
- Pandas Cheatsheet from https://pandas.pydata.org/
- Unless you have a very good reason and have purely numerical data, never use csv; saying a file is csv format is insufficient information to be able to parse the file
- Default to json
- For large json files that are table-like (the root object is an array, and looks like rows), consider JSON lines/jsonl. Large JSON objects can be expensive to parse, and make it difficult to run parallel jobs (eg Apache Spark uses line delimited rows from text files)
- Zotero for organizing research papers
- MLFlow for experiment tracking
- Glances, a better top/htop (be sure to
pip install nvidia-ml-py3for GPU support)
- Plotnine for figures
- draw.io for diagrams
- Apache Spark for "Big Data"
- I use Arch Linux on machines I own.
- bat: cat replacement
- exa: ls replacement
- linuxbrew: package manager when I don't have sudo
Tips from Others
UMD CLIP Resources
UMIACS offers long term file storage and hosting through object stores using a set of s3-like utilities.
Specific to the
clip-quiz group, you should mirror the layout of
/fs/clip-quiz and the contents of the
clip-quiz bucket to make storing/restoring files easy.
For example, moving a file
/fs/clip-quiz/code/old-big-project/ could be done using:
cpobj -V -r -f /fs/clip-quiz/code/old-big-project clip-quiz:code/
- What is
~? Non-breaking space, LaTeX will not break lines between alpha and beta in
- Create PDF version of figures
pdbare fantastic for command line debugging
- To start debugger on if allennlp errors:
ipython -m (which allennlp) -- train config.jsonnetand press
cto continue when terminal starts
- Anaconda pip installations from source packages causing g++ errors like "file format not recognized", rename anaconda's
ld_so that pip uses the system version https://github.com/pytorch/pytorch/issues/16683#issuecomment-459982988