[Math] Normalize data with large spread in values.

data analysisprobability

I'm currently trying to rank a set of data. The issue is that my initial rank comes from a search on google and the returning result set.

The spread in values is ranging from 33 all the way up to 1580000000. This makes it very hard, at least as far as my skill set goes, to apply any sort of modifiers to these numbers.

What I'm wondering is if there is a way to normalize the data into a close range. I do NOT care about the difference between the numbers as long as the original order is kept.

I have no clue what tag to post this under so I apologize for that right off the bat.

Thank you
Bruce

Best Answer

What tools are at your disposal, and how large is the dataset? One easy way is to simply sort the data by original rank and relabel each entry. I can show an easy way to do this in Excel, and I'll include some basic program code below that does the trick, too.

First, sort the data using the standard Excel sort menu.

Select "Sort"

Configure the sort appropriately (ascending order of rank, has headers):

Configure sort options

Use Excel's auto-complete to relabel the data entries Relabel data

Done! Done!

Pseudocode for a programmatic attempt:

structure searchStruct {
   //Keep track of original rank, search text, and your new rank.
   int GoogleRank;
   String searchText;
   int myRank;
}

main{
  //Load in data
  searchStruct[] allData = <Some method of loading in data>;

  //Sort data
  sort allData by GoogleRank;

  //Relabel (the -1 for the array index is b/c of 0-based array)
  for (int i = 1; i <= allData.length; i++) {
    allData[i-1].myRank = i;
  }
}
Related Question