During my undergraduate degree, we had one statistics module in our second year. While this module gave us a rather nice theoretical introduction into the use of the basic statistical tests such as t tests, the practical side was a little hazier. We had a few scheduled practical sessions in which theoretically we would be introduced to the various options for statistical analysis available. Practically however, we were encouraged very heavily to use SPSS. R was mentioned as an option, but only as a SCARY option. I had a brief look through some lists of commands, but without any real sense of direction and soon reverted to using SPSS like the rest of my class.
(I should add that this has all changed a lot in my University now, students now get introduced to R and other statistical techniques in their first year)
It was only when doing my masters course that I first started getting to grips with using R, my first basic coding language. While initially daunting I soon found that it appealed to me for a variety of reasons, laziness being one of them. In SPSS, deciding to change a parameter on a test would require re-clicking through a whole host of dialogues. In R a simple code change can produce the change immediately. The same was true of graphs. While SPSS had a friendly WYSIWYG interface, it could prove immensely fiddly when you had a particular graph in mind or if you needed to produce a series of identical graphs.
Beyond laziness we started playing with real, messy datasets. The advantage of being able to manipulate large amounts of data quickly and efficiently soon became apparent, not to mention the various powerful tools R provided to analyse these datasets.
When began my PhD I also started using matlab to play with mathematical models. While similar to R, it had some annoying differences in syntax that frequently tripped me up when I was starting out. Still, I was very much coming round to the idea that “Code is Good” and decided that being able to do all this stuff would save me an infinite number of headaches. This very much proved to be the case, even for something as daft as wanting to have all individual birds I had GPS coordinates for share a similar naming convention. Suddenly I was playing around with the regular expressions commands to manipulate strings. I can’t pretend I 100% understand these even now, but I can understand how powerful they can be.
Throughout my MSc and even into my PhD I knew other for whom code just would not click. I saw this again when demonstrating on undergrad and masters stats modules. Occasionally (especially at undergrad level) I saw students making self fulfilling prophecies that anything that would involve writing code was not for them, and that engaging in it was therefore worthless. I’d always argue that is definitely worth engaging in. The stuff commonly used by bioscientists is infinitely friendlier than “proper” programming languages once you get to know it, and is so ridiculously useful.
This brings me round to the main thrust of this blog post, python. Python is Nice. It’s really really nice. You just won’t begin to imagine how very nice it is. R is powerful for statistics, but starts to fall down when you start getting into the realms of image analysis, file manipulation etc. Matlab is pretty powerful and flexible, but costs MONEY.
Python is free, friendly and powerful. Using python I have been able to chuck files around different directories automatically, process large data files, run bayesian statistics and create some pretty graphs. Here is one that ended up getting cut from what I’m currently working on, so it’s probably ok to show.
The initial setup can seem slightly confusing, and I highly recommend using a distribution like winpython which installs everything required and comes with lots of useful packages (including the essential numpy which lets you do all the sorts of n-dimensional matrix manipulations that you might be used to from R or matlab). Using a prepackaged version largely avoids all the confusion between firing up python in windows commandline vs. using a python interpreter that I certainly puzzled over initially. It also comes with an interpreter/editor spyder which makes the writing and trialling of big scripts extremely straight forward, in that it combines the script editor/console in the same way as Matlab or R commander might.
Will Python be replacing R for me? No, definitely not, R does too much good statistical stuff and has too many clever people writing packages for it. Will I be using it to do some of the things that I used to do that pushed R out of it’s comfort zone? Yes. Will I be using it instead of Matlab? Yes, because it is free!
I suppose the other main point of this rather rambly post is for students just starting out with all this: Don’t be afraid of code. Getting to grips with it is daunting at first, but incredibly rewarding in the long run.