I'm tracking data on a backup job that runs nightly on our server and using the historical data to predict data and job time growth. I have the following three data points for most of the records: Data backed up (in Bytes), Total Job Time (hh:mm:ss), and Transfer Speed (in Bytes/Minute).
Total Job Time does not equal (Data Backed Up)/(Transfer Speed) because there is necessary overhead for the job starting, transitioning, and completing. I have created a fourth data point, Work Time, recording the time spent actually transfering data created using the above formula, but this does not appear to relate directly or consistently with the Total Job Time. Server use, network latency, and other resource-bound factors all affect the relationship.
For some of the older records I am missing the Transfer Speed data and I would like to know what formula I need to apply to the other two data points (Data and Job Time) to make a reasonable guess as to what the Transfer Speed might have been.
Below is a representative sample of the data, I've converted all the to bytes and minutes for ease of calculation:
Data(bytes) Time(min) Speed(bytes/min)
383542111073 381.22 1273000000
383676323632 382.72 1267000000
383875888842 378.55 1283000000
384088122257 382.15 1268000000
384247013724 378.40 1282000000
384457413287 378.68 1285000000
384652849842 381.42 1272000000
384973213219 380.15 1278000000
385188544442 380.13 1280000000
385504302010 377.80 1291000000
385628091021 377.97 1289000000
386061561686 384.77 1264000000
386853481337 383.98 1270000000
387117610212 381.90 1278000000
387679368117 385.80 1262000000
388015187994 386.50 1261000000
388240874769 385.20 1265000000
391312996783 383.15 1282000000
392497055973 384.73 1280000000
392877252269 387.13 1269000000
392988498970 386.52 1274000000
393236837467 385.33 1279000000
392386489223 366.32 1363000000
392626640464 370.68 1341000000
392772670262 366.68 1363000000
391049505322 366.60 1360000000
391308127859 365.62 1362000000
391683916463 365.53 1367000000
391868818660 367.87 1355000000
392029291293 366.82 1356000000
392028073259 370.40 1341000000
392143518314 366.07 1365000000
For any given combination of Data and Time, I'd like to be able to guess Speed.
UPDATE for comment regarding graphing:
I have graphed the data, but probably because I forgot most of my math in order to focus on technology as a career, I'm not exactly sure how the graph type will point me towards a particular function. The graph of the entire data set is below. Note how the MB/Min and Work Time data is missing from mid-July and before.
Part of the problem with my (meager, unpracticed) thoughts on what formula is best is that a month into the data collection I changed the time at which the backup occured which, by moving it to a time period when fewer things were running on the server lowered the resulting time by what appears to be 12 minutes. You can see that in the data set above where the last 7 time values are clustered around the high 360's and the points above are closer to 380.
Best Answer
Having copied data from your post, 3D plot suggest that data-points are on a hyper-plane, which suggests a fitting model. I am using Mathematica:
Added: Per OP's additional question, the model does change a little with 7-th record removed: