PythonWhy Python?

A very important aspect that is missing in most public health programs is the introduction of high-level general-purpose programming languages such as Python, Java, and C’s. While I don’t think every public health/epidemiology student must master programming (especially those who focus more in qualitative than quantitative research), many will be greatly benefited from learning some basic to intermediate level of general programming skills. General programming offers much more than what typical statistical programming packages (like SAS, SPSS, and Stata) are capable of in data collection, storage, cleaning, manipulation, analysis, and visualization. Scaling up research project into an actual package/product is feasible since automation can be relatively easily implemented in general programming.

In a discussion on Quora on “How could learning Python benefit me as an epidemiologist?”, someone described the feeling of knowing how to code as having “superpowers”. I can attest to that sense of awe, amazement, and joy when I first saw my working Python code perform its algorithm that literally saved me years of work if they were done by real people.

SuperI decided to learn Python from scratch for my doctoral research primarily because I needed to start collecting web data within a reasonable time frame. Comparing to other programming languages, Python has a simple and readable syntax. It is well-known for its user-friendliness and relative ease to be picked up by complete beginners. Yet, it is powerful enough for all of my needed tasks. It also has an excellent, supportive user base that not only helps out each other, but also generous in code sharing (in the form of modules and Github packages) to avoid duplicated efforts. Here are some arguments for learning Python:

Learning Resources

My research involves intensive computing processes such as web crawling, collecting data from Twitter API, natural language processing (NLP), sentiment analysis, machine learning, and visualization. Here is a list of resources I have come across that work well for my research objectives.

Interactive Q&A Forum

Important Python Modules

Other Resources

  • YouTube Python tutorial series (i.e. basic Python, sentiment analysis, NLTK, and machine learning) by Sentdex –
  • YouTube Machine learning lecture series by Dr. Andrew Ng –
  • Book “Natural Language Processing with Python” (2009) by Steven Bird and Ewan Klein
  • Book “Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 1st Edition” (2012) by Wes McKinney
  • Book “Automate the Boring Stuff with Python: Practical Programming for Total Beginners” (2015) by Al Sweigart
  • Book “Mastering Machine Learning With scikit-learn” (2014) by Gavin Hackeling
  • Book “Getting Started with Beautiful Soup” (2014) by Vineeth G. Nair

Picture Courtesy

Please rate this