I have a large csv file with around 45,000 rows and about 6 columns.
The columns are set-up like:
Module, File-path, Status, Version-Old, Version-New, Change-diff
Change-diff is simply the result of doing: New – Old.
There are around 30 different modules and I want to perform analysis and create different graphs for each module based on the change-diff.
I want to know the number of "rows" for each 'module' – I guess I can just count the number of each module name for the Module column to work that out.
I'm wondering if there is any program that I can use to make my job easier, I'm currently trying to plug it all into Excel but I'm not sure how I can create graphs for each module instead of one big graph for all modules.
Would R, SPSS or Matlab be a better option? Or anything else?
EDIT:
Just expanding on what the data looks like.
Module, Path, Status, Old, New, Change
modA, modA/a/a, 1, 11, 19, 8
modA, modA/a/b, 1, 2, 4, 2
modA, modA/a/c, 1, 0, 1, 1
modA, modA/b/c, 0, 0, 1, 1
modA, modA/b/a, 1, 7, 12, 5
modB, modB/a/a, 1, 8, 9, 1
modB, modB/a/b, 0, 0, 1, 1
modB, modB/a/c, 1, 4, 10, 6
modC, modC/a/a, 0, 0, 4, 4
modC, modC/a/b, 1, 0, 3, 3
So I want to analyse all of modA based on Change (and possibly based on Old vs New etc.)
Then analyse all of modB, modC etc. all in the same way but in separate graphs, automatically because of the amount of data I'm dealing with.
Thanks in advance
Best Answer
Your question is a little ambiguous ("different graphs for each module" doesn't mean quite the same thing as "separate graphs for each module"), but my impression is that you're trying to create the same plot or set of plots using only the data from each module, for all of your modules. This is how I'd do that in R: