Science Magazine on Open Data

Comments on June 2014 AAAS Science article on open data and open code for scientists are given below. The article refers to the E. White paper on nine easy ways to share code.

Maybe you don’t have the time to read those, so let me give you six quick tips to:

  • increase your citation count
  • increase your data usage (more citations)
  • increase your code sharing (more citations and lucrative job opportunities).
  1. Learn a popular science/engineering programming language. It will boost productivity and job opportunities. The language you should be using is Python, as it can run on supercomputers down to the $5 Raspberry Pi Zero. Python can streamline/inline CUDA, Fortran, and C for very fast speed while being easy to code for controlling hardware and doing simulations and image/data analysis of datasets of all sizes.
  2. Learn how to use HDF5. Try to use it right at your data acquisition source if it’s not a very high data rate application. Try to avoid making up your own formats or using text to store data unless completely necessary.
  3. Learn how to use version control. Git is an excellent choice. You will save massive amounts of time when it comes to some typo you made and you didn’t keep an old filename version.
  4. Learn how to use Github and be appropriately prolific about posting your code there. This leads to visibility and opportunity.
  5. Put examples, plots, and documentation of how to install and use your code in a README file.
  6. Publish your data online. Don’t leave it on some RAID drive or USB drive somewhere. The drive will fail eventually or the room will get flooded. Worst of all is CD/DVD, they have a very short lifespan, less that the length of your PhD studies. Keep your data in multiple online places.