Translating between Matlab and R

I mentioned in a previous post that I’d write a guide about translating Matlab code to work in R, so that others can avoid the same mistakes I made. This should also function as an R users guide to learning Matlab syntax and vice versa. I hope some people find it useful!

Full article below.

rmatlab

Continue reading

Translation Issues

Where on earth did the time go? One minute I was looking at ice sculptures and wading through snow drifts and now the snow is gone (mostly) and various wildlife has emerged from hiding.

Look, beavers!

I’ve also managed to get out on the water myself. And eat bacon while doing it. Canadian bacon is Different.

riverbaconRiver bacon!

Perhaps the main reason I’ve lost track of time in the last few weeks is that whether I’ve been awake or asleep, this keeps on flashing in front of my eyes.

codegif.gif

I haven’t been able to escape it. My office desktop has looked like this on and off for the whole month. I hope to later show what this has resulted in, but in the meantime I’m going to moan about the amount of pain it’s caused me.

The main source of trouble was that this code was originally written in Matlab. I decided to save us from having to acquire a Matlab license by translating the many scripts that make up the code into R.

Initially this was tedious. There are enough syntax differences (not to mention different names of functions etc) between R and Matlab that this required me to go through the code line by line. Then, even when I done this a number of errors arose simply due to the differing ways that the two programs handle data. I’ll post a guide based on what I learnt in separate post, featuring less pictures of beavers.

Much hair pulling later I got the code running, fed it my data and got a result. These results were consistent with some previous findings obtained using simpler methods. So far so good.

I then decided that instead of feeding my data to the code all in one go, it would be useful to give it one day at a time and then collate the results. “Fine” I thought. “Just modify my overarching processing code, no trouble”.

I was wrong.

c149182a696e73890de876f3d392e2da(Found via googling evil Matlab)

Once again the way in which the two programs handle data required me to make a lot of modifications to the various scripts. Cue more hair pulling. I should also mention that I’ve written this code to be run in parallel, utilising all of my computers cores to increase speed, which means R’s normal debugging tools don’t work.

Finally I got the code to run again and got a result. However, something had changed. A previously suggested relationship had completely reversed in direction. Was this simply due to the new way of feeding the data in? Or was it due to a bug in my code? Or due to me deleting some faulty data? I ran the code again using the original way of processing the data.

Even using parallel processing, this code can take anything from several hours, to all night to run. This meant that getting results was a slow process. So after waiting several hours for the code to run again using the original data processing, once again I got results.

The relationship had flipped direction in these results too.

Moving-animated-clip-art-dont-panic-picture

This suggested that the changes I’d made to accommodate the new data processing method had resulted in a COMPLETELY DIFFERENT RESULT. On the one hand, this was good. It meant that the biologically unrealistic result was due to my error rather than a fundamental problem with the methods. On the other hand, this is the sort of thing that can wake a scientist up at night screaming. A series of small changes in the way data was analysed leading to completely misleading results. In this case we’d caught it before we went too far, but if we’d approached this naively it might have been very easy to miss.

So, now I needed to work out which of my changes had caused this change. Luckily I save all my working files in Dropbox, which keeps a backup of all previous versions. I found a word document containing graphs I’d made to show my supervisor before I’d made changes and reverted all my code to a date before then. Then one by one I reinstated my changes.

In the end I pinned it down to one file. In that file, one line of code.

as.matrix(Y)

One line of code had resulted in huge, significant changes to my final result. As I said, the stuff of nightmares.

In the end I stripped out all the changes I’d made and carefully rewrote the scripts to deal with the new method of data processing. So my tale of woe has a happy ending, the code now works and perhaps I’ll even have some results soon. For everyone who made it this far, here is a view of Gatineu Park:

IMG_3201

Snowy owls, snowmobile trails and beaver tails

So I think I’ve been in Canada for nearly two weeks now. I am gradually learning things.

For example, when someone talks about getting beaver tails, they actually mean some sort of delicious pastry:

beavertailDEEP FRIED BEAVER!

I also learned how to write some basic video analysis stuff in python, and that a single chickadee in a controlled environment is an easier thing to track than multiple shags on the ocean:vlcsnap-2016-01-23-23h49m11s193I learnt that they deliberately thicken the ice here:

IMG_20160120_104210059

This is so the canal (which I walk along on the way to the office) can be turned into a massive ice skating rink. It opened for the first time today. I saw many people gliding effortlessly along, as well as some children being dragged along on sleds, which looks more my speed.  Since I got here I have occasionally been asked if I skate, to which I give a rueful laugh. I don’t think me and skates would really mix.

Another thing I learned is that batteries drain very quickly in the cold weather. Today myself and the grad students in my research group went to see if our lab vehicle still worked after sitting in a garage for a month or so. The answer was, it didn’t.

12572967_10153515288752054_7931590786820510579_n

Battery is flat.

However once a man from the Canadian AA turned up and jump started it, the car had to be driven somewhere to charge the battery. So I got to go on jaunt to one of the field sites where chickadees are studied, alongside a snowmobile trail. This was great as I was keen to get out of the city and see some Canadian countryside, and see some chickadees in the wild.

IMG_2971

We found many chickadees, but also a bald eagle!

eagle

Eeeeeeagle!

Shortly after seeing this I left the trail to go and look at a heap of rusty farm machinery buried in the snow. When I came back to the path, I knew something was afoot. It was then that I learnt that Canadian ambushes are exceptionally polite, as I was warned I might want to put my binoculars away:

12615379_10208372060120941_7821439240601366873_oArg! (Photo by Teri Jones)

It was then decided that we would head back toward Ottawa and try and find some snowy owls that had been reported in the area. On the way we would stop for “Timbits”. I did not know who Tim was, or the answers to any related questions. I found out:

timbits

It seems that Timbits are a big box full of the centres of various doughnuts! These tided us by as we headed toward where we might find snowy owls.

We stopped at the edge of a field in the general area where the owls had been seen. We all climbed out of the car and had a general scan of the trees and hedgerows. Nothing. We got back in the car to try a different spot. This was rather similar to my previous experience of attempting to see specific birds, so I wasn’t overly hopeful about finding a snowy owl in an expanse of snowy fields.

Then Shannon who had been looking out of the moving car with her binoculars (I am fairly certain doing this would make me carsick) suddenly spotted something on an electricity pylon in the middle of a field. We parked as close as we could to have a look. The post was quite a distance from the road, but through our binoculars we could clearly see a snowy owl! I had never even seen an owl in daylight, let alone that clearly.

I tried to get a picture but even at maximum zoom it wasn’t enormously clear.

IMG_2982Owl?

Still, the view through the binoculars was great. We stood and watched the owl for a bit, before deciding to head back down the road to try and find the female that was also supposed to be about. Once again I was sceptical. I think I’d just said finished muttering something along those lines when I suddenly has to ask:

“What’s that on the post?”

IMG_2990

There, on a post right next to the road was another snowy owl. We parked up right next to it, getting a much better view than before. I decided I had to try and take a photo.

It was at this point was once again reminded that batteries drain incredibly fast in the cold. Like the car earlier, my camera refused to start up. I fumbled with the various spare batteries. None of them worked. This was absolutely typical but luckily the owl was fairly accomodating. Eventually through luck and the strategic warming up the batteries, my camera finally fired up:

owl1OWL!

Tomorrow: statistics course!

Python is Nice.

During my undergraduate degree, we had one statistics module in our second year. While this module gave us a rather nice theoretical introduction into the use of the basic statistical tests such as t tests, the practical side was a little hazier. We had a few scheduled practical sessions in which theoretically we would be introduced to the various options for statistical analysis available. Practically however, we were encouraged very heavily to use SPSS. R was mentioned as an option, but only as a SCARY option. I had a brief look through some lists of commands, but without any real sense of direction and soon reverted to using SPSS like the rest of my class.

(I should add that this has all changed a lot in my University now, students now get introduced to R and other statistical techniques in their first year)

It was only when doing my masters course that I first started getting to grips with using R, my first basic coding language. While initially daunting I soon found that it appealed to me for a variety of reasons, laziness being one of them. In SPSS, deciding to change a parameter on a test would require re-clicking through a whole host of dialogues. In R a simple code change can produce the change immediately. The same was true of graphs. While SPSS had a friendly WYSIWYG interface, it could prove immensely fiddly when you had a particular graph in mind or if you needed to produce a series of identical graphs.

Beyond laziness we started playing with real, messy datasets. The advantage of being able to manipulate large  amounts of data quickly and efficiently soon became apparent, not to mention the various powerful tools R provided to analyse these datasets.

When  began my PhD I also started using matlab to play with mathematical models. While similar to R, it had some annoying differences in syntax that frequently tripped me up when I was starting out. Still, I was very much coming round to the idea that “Code is Good” and decided that being able to do all this stuff would save me an infinite number of headaches. This very much proved to be the case, even for something as daft as wanting to have all individual birds I had GPS coordinates for share a similar naming convention. Suddenly I was playing around with the regular expressions commands to manipulate strings. I can’t pretend I 100% understand these even now, but I can understand how powerful they can be.

I don’t know perl 😦

Throughout my MSc and even into my PhD I knew other for whom code just would not click. I saw this again when demonstrating on undergrad and masters stats modules. Occasionally (especially at undergrad level) I saw students making self fulfilling prophecies that anything that would involve writing code was not for them, and that engaging in it was therefore worthless. I’d always argue that is definitely worth engaging in. The stuff commonly used by bioscientists is infinitely friendlier than “proper” programming languages once you get to know it, and is so ridiculously useful.

This brings me round to the main thrust of this blog post, python. Python is Nice. It’s really really nice. You just won’t begin to imagine how very nice it is. R is powerful for statistics, but starts to fall down when you start getting into the realms of image analysis, file manipulation etc. Matlab is pretty powerful and flexible, but costs MONEY.

Python is free, friendly and powerful. Using python I have been able to chuck files around different directories automatically, process large data files, run bayesian statistics and create some pretty graphs. Here is one that ended up getting cut from what I’m currently working on, so it’s probably ok to show.

image010

colours!

The initial setup can seem slightly confusing, and I highly recommend using a distribution like winpython which installs everything required and comes with lots of useful packages (including the essential numpy which lets you do all the sorts of n-dimensional matrix manipulations that you might be used to from R or matlab). Using a prepackaged version largely avoids all the confusion between firing up python in windows commandline vs. using a python interpreter that I certainly puzzled over initially. It also comes with an interpreter/editor spyder which makes the writing and trialling of big scripts extremely straight forward, in that it combines the script editor/console in the same way as Matlab or R commander might.

Will Python be replacing R for me? No, definitely not, R does too much good statistical stuff and has too many clever people writing packages for it. Will I be using it to do some of the things that I used to do that pushed R out of it’s comfort zone? Yes. Will I be using it instead of Matlab? Yes, because it is free!

I suppose the other main point of this rather rambly post is for students just starting out with all this: Don’t be afraid of code. Getting to grips with it is daunting at first, but incredibly rewarding in the long run.

 

 

Back to Reality

When I eventually got back into the office after my trip to the USA, I turned on my computer and stared at it for a while, trying to remember how it worked.

Once the basics of how to operate a computer came back to me, I took a long hard look at where I am in my PhD. My basic thought process was something like this:

  • I have some stuff.
  • I need to write about this stuff.
  • Most of this stuff is going to require the doing of additional stuff before it is in a state where I can write about this stuff.
  • I REALLY need to write about this stuff.
  • I wonder where I have put all this stuff?

By which I mean I need to consolidate a few years worth of data, that in the past has been exported, rather haphazardly, to various folders in my dropbox. I also probably need to redo some statistics to include additional data collected this year. I then need to try and hammer this out into writing that people other than myself can understand and find interesting.

Let the gathering of Excel files commence.

excel

Arg.

 

Pre-Manhattan Madness

Only 2 days before I have to try and achieve the necessary escape velocity to leave Cornwall. I am still scrambling to pull everything together for my trip to ISBE. I am still running my simulations, trying to find the best models. I might be up until the very last minute, when have to stop and add the presented results to the talk.

sim

Still optimising..

With the (rather important!) exception of the final results, my presentation is more or less done. I could surely benefit by practising it a few (many) times. I also need to generate a PDF version of my slides, just in case of technical problems. I very much hope there are not technical problems, as my talk has a fair few videos, and animations to make my equations friendly..

PP

pleaseworkpleaseworkpleasework..

On the plus side, my virtual birds are behaving a lot better!

Virtual birds

A few months ago I gave a talk at the ASAB Easter conference in Sheffield about how I’m analysing the foraging rafts using collective behaviour. Of course, what with one thing and another, the work wasn’t as far along as I would have liked. After I got back from fieldwork and crunched the new data, I decided it was time to look at this work again.

One of the main reasons for pushing this work is that I am giving another talk on this at an international conference. In New York. In a few weeks. Do not let my restrained tones hide how terrified/excited I am about this.

The conference is ISBE, the Internation Society of Behavioural Ecologists. I applied for a talk back at the beginning of the year and several months later, here we are:

program

 

(Last talk of the day, hopefully some people will be awake)

I’d really better get some results together hadn’t I?

To recap what I had already done:

-Extracted positions of individual birds from video.

vlcsnap-00028-Tracked these individuals and then used these positions to create trajectories.

– Ran these trajectories through a correction matrix, to account for any distortions the camera might introduce

– Created code to remove unrealistic tracks.

– Wrote further functions that I can use to manually delete and merge tracks.

– Extracted the dive and surface events

dsplot

– Created various graphing functions, such as taking the corrected trajectories and plotting them back to their original positions to check my working.

vlcsnap-00030

What I have done in the last week (working frantically):

– Created a simulation based on zonal interaction models. Virtual birds that mooooooove.

– Wrote code that will compare the real data with my models, using things like radial density.

relpos– Added the ability for my virtual birds to DIVE.

– Modified simulation so the birds can react to these dive and surface events.

What am I doing now?

So the goal now is to produce results by choosing the simulation that best matches up with the data. In order to do this I need to choose the best “weight” for each of the things that might affect a birds movement (the effect of repulsion from other birds vs. the effect of attraction to other birds for example). I am currently exploring these weights manually, but eventually I need to run some code to optimise each of these weights for each piece of data.

I’d also like to add some more complicated diving rules (At the moment diving is totally random) where they will be influenced by other dives. I then need to come up with ways of checking this method of diving against the data, to determine how realistic this sort of behaviour is..

I fly out to New York in a few weeks. This is why I spend my weekends locked in the office.

Just to finish up here is a video from one of my early simulations with some random parameters. Those birds can’t seem to get away from each other fast enough! I suspect they might also need a bit more autonomous movement to stop them doing the crazy little dance when they can’t find other birds.