[Math] Abstract Algebra for Business Intelligence / Data Mining

abstract-algebradata analysis

Twitter released its Algebird library, an abstract algebra library for scala.

At a high level, what are the maths involved in abstract algebra that have an impact on aggregating large datasets?

Best Answer

Short answer, if your operations are communicative the order of your output/input doesn't matter making it much easier to schedule operations since you don't have to worry about order. This is a nice feature when parallel processing since you have to do much less synchronization.

Rock-Paper-Scissors Problem: (Add spock and lizard for bonus points ;)

Say you have a long string of RPS and want to know who is left standing. Assume the ouput is the same as the game carried out when you iteratively match up the first two elements in the list until there is only one left.

RPSSPSRSRPSRPSRPSRPSRPSRPSRPSRPSRPSRPSRPSRPSRPSRPRPSSPRRRRSPSPSPPPRSPSPSRP

Multiplication table:

RR->R

PP->P

SS->S

RS->R

SR->R

SP->S

PS->S

RP->P

PR->P

Sit down and figure out how you can evaluate a huge RPS string in parallel.... (solution at the bottom)


Check out their test library for use cases: https://github.com/twitter/algebird/tree/develop/src/test/scala/com/twitter/algebird

Or the code itself: https://github.com/twitter/algebird/tree/develop/src/main/scala/com/twitter/algebird

It is a generic programming framework for data types that support the concepts of 0,1,plus, times, inverse ... The user has to write a wrapper in Scala that takes care of the memory management, and carries out the operations. Depending on which concepts the data structure implements it will fall into some abstract algebra concept: group, semigroup, monoid,... where you can use specialized algorithms specific to that generic abstract algebra concept to achieve your computational goals.

For example if you have a communicative operation, it may be faster than an associative one since you are allowed to reorder the elements; however the framework needs to know this in order to make optimizations.

IMHO a C++/Python/Java/Ruby framework based around parallel prefix would have been more useful. Also, the documentation is sort of lacking. I couldn't easily find a concise example of how to take your data structure and wrap it with all the functions it needs for the framework.


Parallel RPS solution.

Think of it as a three state finite state machine R, P, S; with transitions R, P, S. First encode R->0, P-> 1, S ->2.

Rock transform {RR->R, PR -> P, SR -> R} | {00->0, 10->1, 20->0} | {0,1,0}

Paper transform {RP->P, PP->P, SP -> S} | {1,1,2}

Scissors transform {RS->R, PS->S, SS->S } | {0,2,2}

Now apply a parallel prefix on the transformations {0,1,0}, {1,1,2}, and {0,2,2} under transformation composition; just like permutation composition/multiplication where you do f(g(x)).

How would algebird help you with that? Eh.... not sure.