Translating between Matlab and R

I mentioned in a previous post that I’d write a guide about translating Matlab code to work in R, so that others can avoid the same mistakes I made. This should also function as an R users guide to learning Matlab syntax and vice versa. I hope some people find it useful!

Full article below.

rmatlab

1.- Some syntax basics for absolute beginners.

I’m assuming most people who end up needing to do this will have some code background, but I’ll run through some of the most common things that differ in both languages. If you’re somewhat familiar with both you can probably skip this part.  I’ll go over functions and flow control differences in another section.

Though I generally just code straight into R’s script window, I really recommend some form of better editor for this kind of work, something that will recognise R’s syntax and hopefully help you spot problems via use of colour and formatting. R-studio or tin-R are great options that you might already have. I personally use notepad++.


 

Firstly although many of us are taught to use

<-

for assignment in R an equal sign, as in matlab will do exactly the same (see here for a discussion, it’s mainly to avoid ambiguity with other uses of equals in R, such as giving function arguments), so no need to replace every assignment in your matlab code.


 

The symbol

;

is commonly used at the end of almost every matlab command to supress output and stop the code from flooding your console with useless information.

>> result=sum([1,2])

result =

     3

%prints result

>> result=sum([1,2]);

%does not print result

There is no reason to remove these from the end of lines as they will have no effect in R. In R ‘ ; ’ is used to indicate the end of a command, allowing for multiple commands to be placed on the same line like so:

> sum(c(1,1));sum(c(1,2))

[1] 2

[1] 3

Incidentally, R’s normal behaviour is to print nothing to the console when something is assigned to an object (the opposite of Matlab which will return everything, assigned or not, if you don’t suppress it). If you want to assign something to an object AND print it to the console you can just put the whole command in brackets:

> result = sum(1,1,1)

#Prints nothing

> (result=sum(1,1,1))

[1] 3

 

Comments in matlab code look like this.

%This is a comment

%%%%%%%%%%%

% Anything after the % symbol is ignored!

% %%%%%%%%%%

The same comment in R looks like this.

#This is a comment

###########

# Anything after the # symbol is ignored!

# ##########

Hitting an unexpected % sign will cause R to crash, so make sure you replace all % with # if you want to keep the comments!


 

I’m going to go into the different ways that R and Matlab deal with data in the next section, but for now: 1-dimensional data in Matlab can be created like this:

>>  result=[1,2,3,4,5]

result =

     1     2     3     4     5

>>  result=[1:5]

result =

     1     2     3     4     5

The equivalent commands to create 1d data (vectors) in R is:

> (result = c(1,2,3,4,5))

[1] 1 2 3 4 5

> (result = 1:5)

[1] 1 2 3 4 5

 

Similarly one way of creating 2-dimnesional data in Matlab is like this

>> result=['A','B';'C','D']

result =

AB

CD

R’s equivalent is a bit clunkier unfortunately, requiring you to create a vector, then convert it into a matrix of a given size.

> matrix(c('A','B','C','D'),nrow=2,ncol=2)

     [,1] [,2]

[1,] "A"  "C"

[2,] "B"  "D"

But wait! That’s wrong! That isn’t the result the Matlab code produces. I’m going to go into some very important differences in how Matlab and R deal with data in the next section. For now, we can fix this by telling R to assign our vector by row.

> matrix(c('A','B','C','D'),nrow=2,ncol=2,byrow=T)

     [,1] [,2]

[1,] "A"  "B"

[2,] "C"  "D"

So when creating matrices, use the byrow=T argument, make sure your matrix doesn’t end up rotated.


 

In R we can get at indexes of 2-dimensional data like so

> result = my2ddata[1,2]

which gives you the first row, second column. The equivalent code in matlab is

>> result = my2ddata(1,2);

Fairly straight forward, square brackets replace round when indexing.


 

If we want to pull an entire row or column out of 2d data in matlab

>> result = my2ddata(:,2);

which returns the entirety of second column.   The same is achieved in R like so:

> result = my2ddata[,2]

So, get rid of the colons when indexing.


 

Boolean data is important when indexing data in both Matlab and R.

>> booleanresult = [true,true,false,false]

booleanresult =

1     1     0     0

This kind of Boolean data might have obtained as part of a logic statement, which in Matlab are much like R’s with a few important differences:

>> result = [2,2,3,3]

result =

     2     2     3     3

>> booleanresult = result==2

booleanresult =

     1     1     0     0

In R would be exactly the same

> (booleanresult = result==2)

[1]  TRUE  TRUE FALSE FALSE

However if we wanted to INVERT these statements there is an important difference. Matlab uses the tilde ~ key to signify a NOT:

>> ~booleanresult

ans =

     0     0     1     1

>> result~=2

ans =

     0     0     1     1

While R uses an exclamation mark:

> !booleanresult

[1] FALSE FALSE  TRUE  TRUE

result!=2

[1] FALSE FALSE  TRUE  TRUE

So replace ~ in logic statements with !


 

The programs differ with how they deal with symbols. You’ll often see power in Matlab specified like this.

>> [2 2].^2

ans =

     4     4

The full stop tells matlab to square each element of the data (as matlab can also square matrices – more on this just below) In R, the full stop will simply cause a crash.

Likewise in Matlab you’ll often see a multiplication or division look like this:

>> [3,1;3,1].*[2,2;2,2]

ans =

     6     2

     6     2

This is once again to tell Matlab to do an elementwise operation, that is- multiply each number in one matrix by its equivalent in the other matrix. This is different to a matrix multiplication which would look like this:

>> [3,1;3,1]*[2,2;2,2]

ans =

     8     8

     8     8

The easiest way to think of this is that the default in Matlab is to attempt to carry out Matrix operations, while the default of R is to carry out elementwise operations. So the equivalent to the previous two examples in R would be

> matrix(c(3,1,3,1),2,2,byrow=T)*matrix(c(2,2,2,2),2,2,byrow=T)

     [,1] [,2]

[1,]    6    2

[2,]    6    2

And

> matrix(c(3,1,3,1),2,2,byrow=T)%*%matrix(c(2,2,2,2),2,2,byrow=T)

     [,1] [,2]

[1,]    8    8

[2,]    8    8

R has to specifically be told to carry out a matrix multiplication by using the %*% command.

So the main message of this segment is to check very carefully what sort of multiplication or division matlab is meant to be doing, remove all full stops and use R’s own matrix operations (such as %*% and %/%) when necessary.


 

Finally transposing matrices (flipping them on their side so that rows and columns swap places), can come up a lot in matlab code.

>> result = [1,2;3,4]

result =

     1     2

     3     4

>> result'

ans =

     1     3

     2     4

As seen in this example, the symbol transposes the matrix. It should be noted that this only works on the object it’s next to in the statement, rather than the whole line. Thus the differences here:

>> [1,2;3,4]+[1,2;3,4]'

ans =

     2     5

     5     8

%vs

>> ([1,2;3,4]+[1,2;3,4])'

ans =

     2     6

     4     8

The same operation in R is carried out by

> (result = matrix(c(1,2,3,4),2,2,byrow=T))

     [,1] [,2]

[1,]    1    2

[2,]    3    4

> t(result)

     [,1] [,2]

[1,]    1    3

[2,]    2    4

So check for transposing, replace the use of with t( ).


 

This hopefully covers most of the basics. I’ll detail differences in control flow and function in sections below. Potentially you could write a script to automate this stuff. There are some automated solutions out there, but the general consensus (which I share) is that they cause more problems than they solve.

 

2. – EVERYTHING IN MATLAB IS A MATRIX – NOT EVERYTHING IN R IS A MATRIX – THIS IS IMPORTANT.

Consider the following Matlab code where i changes from 2 to 1. i could be set as part of a loop, the result of another calculation or manual input.

>>i = 2

firstmat = [4,4,4;3,3,3;2,2,2];%make a matrix

secondmat = firstmat (1:i,:);%get rows 1 to i

x = size(secondmat,2); %number of columns of secondmat

thirdmat = secondmat'; %flip that matrix

fourthmat = firstmat*thirdmat; %do some matrix multiplication

sum(fourthmat(:))+x %sum of the entire matrix of fourthmat + number of columns of secondmat

ans =

   192

>> i = 1

% run same code again

ans =

   111

Note the different results.

If we translated the code to R it’d look something like this:

i = 2

firstmat = matrix(c(4,4,4,3,3,3,2,2,2),3,3,byrow=T) #make a matrix

secondmat = firstmat [1:i,] #get rows 1 to i

x = dim(secondmat)[2] #or ncol(secondmat) would work

thirdmat = t(secondmat) #flip that matrix

xfourthmat = firstmat%*%thirdmat

sum(fourthmat)+x  #sum of the entire matrix of fourthmat + number of columns of secondmat

So, if i = 2.

> sum(fourthmat)

[1] 192

Great, that’s the same result. What about if i = 1.

Error in firstmat %*% thirdmat : non-conformable arguments

> sum(fourthmat)+x  #sum of the entire matrix of fourthmat + number of columns of secondmat

numeric(0)

Oh dear. Despite being the same code, we get an error in R when i is 1. Lets debug by looking at this step by step and comparing to matlab. We’ll skip the first line as we know from when we ran i = 2 and got the same results as the original code that the initial matrix must be right.

>> secondmat = firstmat (1:i,:)%get rows 1 to i

secondmat =

     4     4     4

 

> (secondmat = firstmat [1:i,]) #get rows 1 to i

[1] 4 4 4

These look the same right? Actually however, this line is the root of our problem. While Matlab treats everything as a matrix no matter how many dimensions an object may have, R treats 1 dimensional data as a vector. R can crash when you try to access the second dimension of a vector.

When we get to the next line:

>> x = size(secondmat,2) %number of columns of secondmat

x =

     3

 

> (x = dim(secondmat)[2]) #or ncol(secondmat) would work

NULL

Our attempt at trying to get hold of the number of columns produces a NULL value because as far as R is concerned, there isn’t one. This doesn’t throw an immediate error, but will go on to cause havoc later in the code.

>> thirdmat = secondmat' %flip that matrix

thirdmat =

     4

     4

     4

 

> (thirdmat = t(secondmat)) #flip that matrix

     [,1] [,2] [,3]

[1,]    4    4    4

More alarm bells. While the transpose command has at least made thirdmat a matrix again (you can tell from the added column numbers) it has flipped the matrix completely the wrong way.

Then because thirdmat is the wrong shape to be multiplied by firstmat we get the error

Error in firstmat %*% thirdmat : non-conformable arguments

Finally because x is NULL, we can’t get an answer from our final line, even if fourthmat did exist properly.

numeric(0)

So the main problem is that something that the original code expected to be a matrix, stopped being a matrix. We therefore need to make the code more robust to stop this happening.

An initial fix might be to simply us R’s as.matrix() or matrix () commands.

> (secondmat = matrix(firstmat [1:i,])) #get rows 1 to i

     [,1]

[1,]    4

[2,]    4

[3,]    4

Well, secondmat is a matrix again, meaning that the next line will actually get a number. However, the matrix is rotated the wrong way, meaning that transposing it will cause it to be in the wrong place and thus lead to the error still being thrown.

This is once again down to differences in the default behaviour between the two programs.

1 dimensional data in matlab will look like this:

>> [1,2,3,4,5]

ans =

     1     2     3     4     5

As far as matlab is concerned, that is a matrix of 1 row and 5 columns. The equivalent conversion in R:

> matrix(c(1,2,3,4,5))

     [,1]

[1,]    1

[2,]    2

[3,]    3

[4,]    4

[5,]    5

Will make a matrix of 5 rows and 1 column. This is true even if you use the byrow=T argument.

So when I’ve been doing conversion work I add a function

mlmatrix<-function(x){

x<-t(matrix(x))

return(x)

}

> mlmatrix(c(1,2,3,4,5))

     [,1] [,2] [,3] [,4] [,5]

[1,]    1    2    3    4    5

Which creates a 1 dimensional matrix and immediately transposes it so it matches the matlab default. When i = 1, this works great and gets us the desired result.

> (secondmat = mlmatrix(firstmat [1:i,])) #get rows 1 to i

     [,1] [,2] [,3]

[1,]    4    4    4

> (x = dim(secondmat)[2]) #or ncol(secondmat) would work

[1] 3

> (thirdmat = t(secondmat)) #flip that matrix

     [,1]

[1,]    4

[2,]    4

[3,]    4

> fourthmat = firstmat%*%thirdmat

> sum(fourthmat)+x  #sum of the entire matrix of fourthmat + number of columns of secondmat

[1] 111

But this is not the end of the problem!

If i = 2 again

> (secondmat = mlmatrix(firstmat [1:i,])) #get rows 1 to i

     [,1] [,2] [,3] [,4] [,5] [,6]

[1,]    4    3    4    3    4    3

We’ve accidentally flattened our matrix! This should be 2 x 3 matrix. Using as.matrix avoids the flattening, but we know that if used when i = 1, we end up with a matrix needing to be transposed. We’re back to square one.

> (secondmat =as.matrix(firstmat [1:i,])) #get rows 1 to i

     [,1] [,2] [,3]

[1,]    4    4    4

[2,]    3    3    3

This is one of the main issues of converting code between the two languages. One solution is to always convert to a matrix and set the number of rows.

> i=2

> (secondmat =matrix(firstmat [1:i,],nrow=i)) #get rows 1 to i

     [,1] [,2] [,3]

[1,]    4    4    4

[2,]    3    3    3

> i=1

> firstmat = matrix(c(4,4,4,3,3,3,2,2,2),3,3,byrow=T) #make a matrix

> (secondmat =matrix(firstmat [1:i,],nrow=i)) #get rows 1 to i

     [,1] [,2] [,3]

[1,]    4    4    4

However if an object has been exported from one function, sent to another, stored in a list, exported to another and as such as have no easy way of knowing how big it might need to be. In this case it is probably best to use if statements like is.matrix() to check for vectors and convert them as need be, leaving objects that are already matrices alone.


 

Hopefully this example demonstrates the potential perils when converting complicated Matlab code to R, and the sorts of problems that might crop up and how to spot them.

 

3. – FLOW, FUNCTIONS AND WHERE TO STORE RESULTS

Going back to basic differences between R and Matlab again we have the different ways in which they use functions and the different options they have for storing data.

In matlab there are several options for storing matrices. Some of the most common are cells and structures. Structures are arrays with named fields. Bits of the structure can be accessed with their name.

>> result = struct('a',[],'b',[],'c',[])#structure containing 3 empty matrices called ‘a’,’b’ and ‘c’

result =

    a: []

    b: []

    c: []

>> result.a=[1,2,3;3,2,1];%store a 2 x 3 matrix in ‘a’

>> result.a %show what is stored in ‘a’

ans =

     1     2     3

     3     2     1

Many matlab scripts use structures to store options for functions. The best equivalent in R is probably using lists. For example:

> result=list(a=c(),b=c(),c=c()) #list containing empty objects called ‘a’,’b’ and ‘c’

> result

$a

NULL

$b

NULL

$c

NULL

> result$a<-matrix(c(1,2,3,3,2,1),2,3,byrow=T) #store a 2 x 3 matrix in ‘a’

> result$a #show what is stored in ‘a’

     [,1] [,2] [,3]

[1,]    1    2    3

[2,]    3    2    1

 

Cells are similar, to structures but individual elements do not have names. Cells can also be of any number of dimensions.

>> result = cell(2,2)

result =

    []    []

    []    []

>> result{1,1}=[1,2,1,2]

>> result{1,1}

ans =

     1     2     1     2

To create a similar structure in R, there are a couple of options. Firstly we could use lists of lists to represent the different dimensions:

result = lapply(1:2,function (x) (sapply(1:2,function (x) NULL)))#create 1 by 2 list, each cell containing a 1 by 2 empty list.

> result

[[1]]

[[1]][[1]]

NULL

[[1]][[2]]

NULL

 [[2]]

[[2]][[1]]

NULL

[[2]][[2]]

NULL

> result[[1]][[1]]<-c(1,2,1,2)

> result[[1]][[1]]

[1] 1 2 1 2

Alternatively we could create a matrix of lists.

result = matrix(list(),2,2)

> matrix(list(),2,2)

     [,1] [,2]

[1,] NULL NULL

[2,] NULL NULL

This looks like an empty matrix, but if we look closer:

> result[1,1]

[[1]]

NULL

And so we can assign to the lists in this matrix like so:

> result[1,1][[1]]<-c(1,2,1,2)

> result

     [,1]      [,2]

[1,] Numeric,4 NULL

[2,] NULL      NULL

Between these methods, we can replicate all the data structures used by Matlab.


 

The differences between the flow behaviour in the two software is pretty simple.

In matlab a loop looks like this:

result = zeros(1,10);

for i = 1:10

    result(i)=i+1;

end

While in R it looks like this:

result = rep (0,10)

for (i in 1:10){

     result[i]=i+1

}

Likewise, an if statement in Matlab looks like this:

if i == 1

    result = 'A';

elseif i == 2

    result = 'B';

else

     result = 'C';

end

While an if statement in R might look like this:

if(i == 1){

     result = "A"

}else if (i ==2){

     result = "B"

} else {

     result = "C"

}

It’s a fairly straight forward change of adding curly brackets and tweaking the if or for statements slightly.


 

Functions also behave fairly similarly, but with slight differences in how you might load them and return results. Matlab can “see” all functions that are within the current working directory while R needs to run the code, generally using the source () command if the function is stored in an external file. You can load a lot of scripts in at once using a loop.

scriptplace<-paste(getwd(),"SCRIPTFOLDER",sep="/")

for(script in dir(scriptplace)){

     source(paste(scriptplace,script,sep="/"))

}

A normal function in matlab might look like this

function [result1,result2,result3] = myfunction (input1, input2)

    result1 = input1+input2;

    result2 = input1-input2;

    result3 = input1*input2;

end

Which once saved into your working directory would operate:

[r1 r2 r3]=myfunction(1,2)

r1 =

     3

r2 =

    -1

r3 =

     2

The same function in R would be slightly different. R cannot return multiple objects at once from a function, so we need to combine the three results into a list (for this simple function it could just as easily be an empty vector, but lists allow you to store objects of any type).

myfunction = function (input1,input2){

     result1 = input1+input2;

     result2 = input1-input2;

     result3 = input1*input2;

     allresult = list (result1,result2,result3)

     return(allresult)

}

If we saved this function as myfunction.R in the working directory

> source(‘myfunction.R’)

> allr = myfunction(1,2)

> (r1=allr[[1]])

[1] 3

> (r2=allr[[2]])

[1] -1

> (r3=allr[[3]])

[1] 2

We pull our objects back out of the list and carry on.


 

That should hopefully cover most of the conversion of Matlab functions to their equivalent R functions. This is mostly straight forward busywork, but you’d be amazed how easy it is to miss a character or forget to convert something. When I start converting a matlab function I ususally make a note along the lines of “#need to return r1, r2, r3″ to remind myself what the function needs to output. I recommend annotating code heavily to make sure you keep track of what the function is meant to do and how far you’ve got with conversion.

4. – Matlab quirks to watch out for

Finally, here are a few things to look out for when dealing with Matlab code. I’ll update this if I come up with any other ideas.


 

Matlab’s default behaviour for many functions such as sum ( ), max ( ) and min ( ) is to return only the results of one dimension:

>> result=[1,2,3;1,1,1];

>> sum(result)% default is sum of columns

ans =

     2     3     4

>> sum(result,2)%sum of rows

ans =

     6

     3

The sum of an entire matrix is

>> sum(result(:))

ans =

     9

This differs from R:

> result=matrix(c(1,2,3,1,1,1),2,3,byrow=T)

> sum(result)#default produces sum of whole matrix

[1] 9

> colSums(result)#sum of columns

[1] 2 3 4

> rowSums(result)#sum of rows

[1] 6 3

So check how many numbers a sum command is supposed to produce and if it’s meant to be summed over a particular dimension.

But wait! Did you notice something odd about that previous section? Take another look at the code to get the sum of row in matlab.

>> sum(result,2)%sum of rows

ans =

     6

     3

We chose the second dimension. Which in most other cases is columns. So you might expect that command to instead produce the sum of columns. It clearly doesn’t.

I think matlabs logic is that you are summing along the second dimension. However when I was fist doing some conversion and saw this I assumed that the code needed to get the sum of columns, used the colSums () command in R and thus caused havoc. It didn’t help that the author of the original code had called the variable the sums were assigned to colsums.

Make sure you know which dimension you are meant to be working in!


 

The zeros ( ) command in matlab will, by default, produce a square matrix when given only one number.

>> zeros(2)

ans =

     0     0

     0     0

As this is often used for pre-allocation it’s an important one to keep an eye on. Replicate it using

> matrix (0, 2, 2)

     [,1] [,2]

[1,]    0    0

[2,]    0    0

Make sure you don’t fall into the mistake of assuming that because only one number is given, the result will be a 1-d object of zeros.


 

Feel free to contact me if you have thoughts or suggestions about this guide!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s