Learning Enough Python to Land a Job
Jeff Cogswell’s article Learning Enough Python to Land a Job. calls out that Python is for more than web development, Django, Twisted, Flask, &c. As a data science practitioner, mangling tens of gigabytes if not tens of terabytes daily from sensors deployed around the globe, converting code to Python after several years of hard-core Matlab use was motivated by Python’s highly-performant data science stack incorporating Pandas, SciPy, Numpy, h5py and other specialty Python user modules.
- Port code gradually, even from very high-level languages like Matlab. Tools like f2py for Fortran 77/90+, SWIG (and numerous others) for C, Oct2Py for Matlab, etc. allow you to speedily and often straightforwardly integrate code from other popular languages.
- Get familiar with (in this order): Spyder, Numpy, Matplotlib, h5py, Scipy, and Pandas. If you’re working with most real problems, you should be considering Pandas and h5py to allow you to filter/select data before reading it all form disk. I personally prefer h5py over PyTables. The fastest data to load is data where the superfluous data you didn’t need was never loaded, that would otherwise slow down the loading of the wanted data!
- Get started in data analysis knowing not much more than how to use
dict()
,list()
, andnumpy.array()
along with the standard basic functions that one would use in Matlab or R. such assqrt()
,for
,if
, &c. Learn about Numpy and Pandas before dealing with generators, sets, list comprehensions, itertools, etc. - When you find you have lots of heterogeneous but associated variables, particularly those associated by time, it’s time to use Pandas. Beyond 2-D DataFrames,
xarray
is the module to use. Pandas is awesome for loading and working with large heterogenous datasets. Think of Pandas as SQL for doing computations.
We hope this commentary on Python for data scientists and analysts considering the transition to Python from languages such as R, Matlab, Fortran, etc. has helped you.