Solved – Visualizing large file-based or Redis-in-memory stored large datasets (millions of data points)

cdata visualizationlarge datarrapidminer

I am very active at StackExchange's QuantFinance forum but thought this question is more suitable to be asked here.

I am generating large time series data and store them in-memory in Redis (alternatively could also save to disk in any format) and like to visualize the data efficiently as part of my data mining task. The data is of the following structure (but I could adjust the structure to fit the format of a capable visualization package):

column 1 key/index : millisecond precision time stamp (.Net long type)
column 2 – n : data containing time series which could be of type string, bool, double, int,…

I look for a visualization package with the following features:

  • Handle time series data that contain millions of data points
  • Able to either load the data set from file or access memory-mapped data (preferably Redis)
  • Freely select which variable to chart on x- or y-axes.
  • Chart several series on the same chart
  • Jump to specific windows within the complete data set to only chart subsets (by start and end time (column1)
  • Different chart types (line, financial,…)
  • Quickly scroll through the complete data set by either loading the complete data set into memory or loading on demand (visible data only).
  • Flexible zooming in/out to highlight specific windows
  • Should be GUI based

I have looked at a couple solutions but am so far not satisfied:

  • R/RStudio : Capable analytical packages but very poor charting capabilities. Need to specify in scrips each time I want to look at a different subset of data, poor zooming capabilities. But great to get the data in through memory map packages.

  • RapidMiner : Looks promising in terms of visualization capabilities but so far not sold on data imports (I am not aware it works with reading data from Redis in-memory server, or memory mapped files). Also, I am a bit skeptical in regards to performance. I need to process data sets that potentially contain tens of millions of data-points.

  • C# Charting libraries: I generate the data within C# and store in Redis In-Memory. So far only Sci-Charts (commercial charting library) looks like it could handle the size of data. Downside, I would need to write a lot of code to achieve the above mentioned charting features.

  • Teafiles -> TeaHouse: Proprietary data format for time series. Seems to have excellent charting capabilities but I am a bit worried that it is slightly too close-source (in terms of not being able to extend charting capabilities). Also, I would need to save the data in the TeaFile specific format and save to disk, no Redis support. While this is not a requirement, I am a bit put off because this is a pure charting application which does not offer the slightest time series analysis tools.

I am running on Windows (store data from within a C# application) and therefore am not able to consider Linux or Unix based solutions.

Any pointers or ideas what I should look or focus on? Again I like R a lot, but just have not come across GUI based charting packages that could handle what I am looking to do. But I am sure I have not come across all packages that offer charting capabilities.

Thanks a lot for your recommendations.

Best Answer

Try Tableau. It's nice commercial data visualization tool with rich charting, data filtering/aggregation capabilities. Focused on big data. It's said that Tableau uses memory-mapped I/O. Not sure if free public version has all the features, but anyway...