Solved – How to aggregate and merge these two data.frames

r

Here is the structure of my data.frames:

> str(c)
'data.frame':   21633 obs. of  20 variables:
 $ Trade        : num  7e+12 7e+12 7e+12 7e+12 7e+12 ...
     $ New          : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
 $ Amended      : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
     $ Unwound      : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
 $ Fixed        : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
     $ FX.Fixed     : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
 $ Branch       : Factor w/ 1 level "nil": 1 1 1 1 1 1 1 1 1 1 ...
     $ Currency     : Factor w/ 8 levels "AUD","EUR","USD",..: 1 1 1 1 1 1 1 3 1 1 ...
 $ Product      : Factor w/ 4 levels "A","B","C",..: 2 2 2 2 2 2 2 2 2 2 ...
     $ Today.NPV    : num  444 222 0 0 -77777 ...
 $ Today.Unreal : num  13.4 5555.54 0 0 6666.36 ...
     $ Today.Real   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ MTD.Real     : num  0 0 0 0 0 0 0 0 0 0 ...
     $ MTD.Unreal   : num  222 -333 0 0 -444 ...
 $ New.Deals.P.L: num  0 0 0 0 0 0 0 0 0 0 ...
     $ Amend.P.L    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Unwind.P.L   : num  0 0 0 0 0 0 0 0 0 0 ...
     $ Fixing.P.L   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Carry.P.L    : num  16.6 81.9 0 0 -319.3 ...
     $ FX.Fixing.P.L: num  0 0 0 0 0 0 0 0 0 0 ...
> str(predict)
'data.frame':   735755 obs. of  7 variables:
 $ X.Financial.Object.  : num  7e+12 7e+12 7e+12 7e+12 7e+12 ...
     $ X.Scaling.Currency.  : Factor w/ 7 levels "AUD","CAD","EUR",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ X.Maturity..date..   : Factor w/ 108 levels "12Aug2011","12Aug2014",..: 22 98 95 88 85 ...
     $ X.Rate.Change.       : num  0 0 0 0 0 0 0 0 0 0 ...
 $ X.Env.COB.Date.      : int  20110815 20110815 20110815 20110815 20110815  ...
     $ X.Predict.           : num  74.36 -3.84 16.77 4.66 11.88 ...
 $ X.Reporting.Currency.: Factor w/ 7 levels "AUD","CAD","EUR",..: 4 4 4 4 4 4 4 4 4 4 ...

Goal 1: aggregate the first data frame 'c' to sum all numeric values by Currency.

I do not need text fields other than the currency field

I tried to use

> aggregate(x=c, by=list("Currency"), FUN="sum")
Error in FUN(X[[1L]], ...) : arguments must have same length

It is apparently incorrect.

Goal 2: I would like to merge the data.frame c with predict using this relationship: "c.Trade == predict.Financial.Object"

I tried:

> cc = merge(c, predict, by=c("Trade","Financial.Object"))
Error in fix.by(by.x, x) : 'by' must specify valid column(s)

but it does not work.

One further question: why the field names in data frame 'predict' have the dot in them? What is the proper way to refer to them?

Best Answer

I am not an R expert, and since I don’t have your data, I cannot experiment with it. But here is what I would try:

1) I believe your x in the aggregate function should be c$Branch, not c.

2) I believe the argument by must only be used if the columns have the same name in both dataframes. Try using by.x="Trade", by.y="predict.Financial.Object". See documentation.

PS: Convention suggests writing cc <- merge(…) instead of cc = merge(…).

Related Question