tags:

views:

761

answers:

5

Okay, so this is a straight math question and I read up on meta that those need to be written to sound like programming questions. I'll do my best...

So I have graph made in flot that shows the network usage (in bytes/sec) for the user. The data is 4 minutes apart when there is activity, and otherwise set at the start of the usage range (let's say day 1) and the end of the range (day 7). The data is coming from a CGI script I have no control over, so I'm fairly limited in what I can provide the user.

I never took trig or calculus, so I'm pretty much in over my head. What I want is for the user to have the option to click any point on the graph and see their bandwidth usage for that moment. Since the lines between real data points are drawn straight, this can be done by getting the points before and after where the user has clicked and finding the y-interval.

It took me weeks to finally get a helpful math person to explain this to me. Everyone else has insisted on trying to teach me Riemann sum techniques and all sorts of other heavy stuff that not only is confusing to me, doesn't seem necessary for the problem.

But I also want the user to be able to highlight the graph from two arbitrary points on the y-axis (time) to get the amount of network usage total during that range. I know this would be inaccurate, but I need it to be the right inaccurate using a solid equation.

I thought this was the area under the line, but experiments with much simpler graphs makes this seem just far too high. I figured out I could take the distance from y2 - y1 and multiply it by x2 - x1 and then divide by two to get the area of the graph below the line like a triangle, but again, the numbers seemed to high. (maybe they are just big numbers and I don't get this math stuff at all).

So what I need, if anyone would be really awesome enough to provide it before this question is closed down for being too pure-math, is either the name of the concept I should be researching or the equation itself. Or the bad news that I do need advanced math to get an accurate result.

I am not bad at math, just as a last note, I just am not familiar with math beyond 10th grade and so I need some place to start. All the math sites seem to keep it too simple or way over my paygrade.

+2  A: 

What I want is for the user to have the option to click any point on the graph and see their bandwidth usage for that moment. Since the lines between real data points are drawn straight, this can be done by getting the points before and after where the user has clicked and finding the y-interval.

Yes, that's a good way to find that instantaneous value. When you report that value back, it's in the same units as the y-axis, so that means bytes/sec, right?

I don't know how rapidly the rate changes between points, but it's even simpler if you simply pick the closest point and report its value. You simplify your problem without sacrificing too much accuracy.

I thought this was the area under the line, but experiments with much simpler graphs makes this seem just far too high. I figured out I could take the distance from y2 - y1 and multiply it by x2 - x1 and then divide by two to get the area of the graph below the line like a triangle, but again, the numbers seemed to high. (maybe they are just big numbers and I don't get this math stuff at all).

To calculate the total bytes over a given time interval, you should find the index closest to the starting and ending point and multiply the value of y by the spacing of your x-points and add them all together. That will give you the total # of bytes consumed during that time interval, but there's one more wrinkle you might have forgotten.

You said that the points come in "4 minutes apart", and your y-axis is in bytes/second. Remember that units matter. Your area is the sum of bytes/second times a spacing in minutes. To make the units come out right you have to multiply by 60 seconds/minute to get the final value of bytes that you want.

If that "too high" value is still off, consider units again. It's 1024 bytes per kbyte, and 1024*1024 bytes per MB. Check the units of the values you're checking the calculation against.

UPDATE:

No wonder you're having problems. Your original question CLEARLY stated bytes/sec. Even this question is imprecise and confusing. How did you arrive at "amount of data" at a given time stamp? Are those the total bits transferred since the last time stamp? If yes, simply add the values between the start and end of the interval you want and convert to the units convenient for you.

duffymo
A: 

This would be a lot easier for you if you would accept that there is well-established terminology for the concepts that you are having trouble expressing concisely or accurately, and that these mathematical terms have been around far longer than you. Since you've clearly gone through most of the trouble of understanding the concepts, you might as well break down and start calling them by their proper names.

That said:

There are 2 obvious ways to graph bandwidth, and two ways you might be getting the bandwidth data from the server. First, there's the cumulative usage function, which for any time is simply the total amount of data transferred since the start of the measurement. If you plot this function, you get a graph that never decreases (since you can't un-download something). The units of the values of this function will be bytes or kB or something like that.

What users are typically interested is in the instantaneous usage function, which is an indicator of how much bandwidth you are using right now. This is what users typically want to see. In mathematical terms, this is the derivative of the cumulative function. This derivative can take on any value from 0 (you aren't downloading) to the rated speed of your network link (indicating that you're pushing as much data as possible through your connection). The units of this function are bytes per second, or something related like Mbps (megabits per second).

You can approximate the instantaneous bandwidth with the average data usage over the past few seconds. This is computed as

 (number of bytes transferred) 
-----------------------------------------------------------------
 (number of seconds that elapsed while transferring those bytes)

Generally speaking, the smaller the time interval, the more accurate the approximation. For simplicity's sake, you usually want to compute this as "number of bytes transferred since last report" divided by "number of seconds since last report".

As an example, if the server is giving you a report every 4 minutes of "total number of bytes transferred today", then it is giving you the cumulative function and you need to approximate the derivative. The instantaneous bandwidth usage rate you can report to users is:

(total transferred as of now) - (total as of 4 minutes ago) bytes
-----------------------------------------------------------
  4*60 seconds

If the server is giving you reports of the form "number of bytes transferred since last report", then you can directly report this to users and plot that data relative to time. On the other hand, if the user (or you) is concerned about a quota on total bytes transferred per day, then you will need to transform the (approximately) instantaneous data you have into the cumulative data. This process, known as computing the integral, is the opposite of computing the derivative, and is in some ways conceptually simpler. If you've kept track of each of the reports from the server and the timestamp, then for each time, the value you plot is the total of all the reports that came in before that time. If you're doing this in realtime, then every time you get a new report, the graph jumps up by the amount in that report.

-1: I don't see why you slam the OP in the intro -- he's being humble enough and is clear that he just doesn't know the technical terms for what he's trying to do, which is 90% of the problem when you don't know the field, and specifically asks for "the name of the concept".
tom10
It's pretty clear that the OP has asked other mathematicians about this and then given up when they start using terms that he doesn't know. That's entirely his problem, and he'll have to get over it before he can communicate with other people about this subject.
I also attempted to edit the first paragraph to emphasize that the terminology exists not because the concepts are difficult, but because they aren't part of everyday discourse because we are used to dealing with calculus on an intuitive level. SO crashed on my edit. I'm sure that the OP can handle the subject as long as he doesn't reject it for superficial reasons such as seeming foreign.
@unknown: First off, I am really eager to get your take on the data I have in relation to your cumulative vs instantaneous usage explanation. So for anything defensive I may throw out before I get to that, I hope you can keep the same strong resolve you expected me to maintain when you gave me that massive burn up front....
Anthony
@uk: In regards to my need to use the proper concepts, I agree. It was this need that drove me to post a math question on SO when my better senses warned me not to. I have been trying to just find the names of the terms and concepts required for two (hopefully) simple equations for nearly 4 months now. I assure you that, in spite of what you have suggested twice now, I did not simply give up or bail out when the math got too hard. Rather I nagged and annoyed my mathematician colleagues demanding the start over, start simpler, or just give me the terms that their terms were built on...
Anthony
And they tried their best for lack of time and motivation, but it wasn't until this morning that I finally discovered the concept/equation of a y-intercept. It was the rush of feeling very close to finishing a project I thought I had abandoned six weeks ago--resigned to face that some things I just can't figure out--that pushed me finally to ask this question that I resisted posting on this site since day one. I knew SO *could* solve everything, but also knew it wasn't the appropriate place for the question...
Anthony
To finish this already too long tear, I want you to know that I do math in the shower and in the corners of books. Not calculus, but simple polynomials that I think up and word problems that present themselves in my daily life. I can bore an entire party to sleep with long-winded excitement about Fibonacci numbers, the occurrence of the golden ratio in art and nature, and the idea that natural numbers carry the unique role in the inventions of humans of being both a reflection of nature and arbitrary symbols. I find math mindblowing and familiar. But I can't comprehend log(x) to save my life.
Anthony
Okay, having just gushed all over all you polite people, I'm going to get back to business. unknown, if I understand you correctly, you are saying that if the data set I get from the server is cumulative, it will always get steadily larger, correct? Because I do think there is some kind of averaging going on, but the numbers definitely fluctuate. I'll give you more details when I have some reassurance that someone is still reading.
Anthony
+2  A: 
ldigas
+1: nice and clear guidance. (My interpretation of the question though is that the Trapezoidal rule might be overkill, and he might just want the average, but it's hard to say for sure.)
tom10
@tom10, maybe you can help the last pin slide into place in my mind... When I said that the numbers seemed to big, I was using two made up data points: (0,0), (600,1500). Finding a value of y along the line is totally clear now, but finding the area just seems off. Based on those two points, the user went from no bandwidth to 1500b/sec 10 minutes later. The area of that line would be `(600 * 1500)/` which means total bw usage is 450000b/sec. That is just wrong, unless those zeros are just confusing me. So I don't know if I need an average, an area, or a different equation for finding area...
Anthony
`(600 * 1500) / 2 `
Anthony
@Anthony - It depends on what you're looking for, which I didn't quite understand from your problem statement. I thought you wanted the average value of y, which would be the area (600*1500)/2, in your example, divided by the length along the x-axis used to get this area; so here it would be ( (600*1500)/2 )/600 = 1500/2 = halfway between your starting values (i.e. the average). You could, of course, have gotten this more directly, but area/x, is the more general formula for the average value of y. Of course, if you're not looking for the average y, this is totally irrelevant.
tom10
@tom10 - It might help to know that in this situation the user is also able to see the total bw used for the same time period (also something I don't have any access or control over), so highlight the whole graph and using whatever method/equation ends up being best should show a number close enough to the real total that they can see just above the graph that it doesn't make the whole graph/system seem unreliable/wrong/etc. So I guess the equation should calculate total use between any two points (however approximated it may need to be).
Anthony
@Anthony - as Idigas says in the last sentence, the units are helpful (though I disagree with the details, I think bandwidth should be in bytes/second). When you say "total bw used" what are the units, bytes or bytes/sec. If it's bytes (and your y-axis is in bytes/sec and x in sec) then it's the area that would match, if it's bytes/second then it's the average. Of course, it's easy to verify: do the calculation and then see which matches the total you want to match.
tom10
Doing the calculation is a whole different issue that I was actually hoping 'unknown' could help on. Anyways, so all I have access to is a cgi file which returns (more or less) a list of time stamps and the bits up and bits down at those timestamps. I'm not sure if I assumed that the bits were "per second" or I had some other evidence. But if it's bits at that second, I'd call that bits per second, right? The weekly total shown elsewhere on the page is total bw used, not per second. The users get a bw allocation, it's showing them what they have left of it. So, average or area?
Anthony
@idigas - Wow. Good grief indeed. Thanks for getting precise, homie. flot is a jquery plugin which takes a simple array and plots it in a canvas element. It has a fairly extensive api for handling data types and user interaction, but actual data calculations it doesn't do native. google it, it's pretty sweet. I'm going to have to take a minute or 10 to try to digest your update. I think the problem has to do with units/intervals. I keep thinking it's all relative to itself so it should just be dividing and multiplying.
Anthony
@idigas - and the y axis isn't speed, it's throughput. The graph is showing how much bits they've uploaded and downloaded over time as they can run out.
Anthony
@tom10 - Wow, I think the final pin just fell into place, maybe. But there is another issue (actually less math and more data interpretation) but I don't want to get murdered tonight. Any chance I can go over it with you offline (as in off SO) or is that totally weird to ask?
Anthony
"throughput" ? (have no idea how to translate that to my language :) So, it is showing how much traffic they wasted on a time basis. Is that it ? Bytes on y, and sec on x axis ?
ldigas
Is it accumulating or it just shows the number for the interval of 4 minutes in question ?
ldigas
It fluctuates, so I don't think it's accumulating. But I think they use some funky in-house trick to mod certain points so it works with their in house graphing applet that I'm trying to replace. Certain points I know are wrong but always wrong in the same way, etc. So, yeah, I'm starting to realize it's not bits/second, but simply bits on y, timeline on x. And I keep saying bits because the data returned is in bits, so you know.
Anthony
Ok. Then that's a different story. If they are bits wasted in a delta_time period (4min), then to calculate total wasted bits in some arbitrary time period one just needs to add those values. To calculate average speed, divide that sum by time period in question. So, in general, it would come out instead of a regular, to a curve integral.
ldigas
I have no idea what any of that meant, sorry. So do I still do the area? I'm not looking for average speed, I'm looking for the bits used between each 4 min interval. So if I used 500 at point 3 and 400 at point 4, I want to know how much was used total during that delta_time period. I think, anyway.
Anthony
The short answer is 400 bits were used in the time #3 to #4, and 500 were used in the previous interval. A time passes and they tell you how many bits were used in that time; you get your data in discrete amounts, and it's easiest to just keep it this way. The total bits used for the two intervals (2 to 3) and (3 to 4) then is 500+400=1100, and just add them up this way. It's easier to just think of this as a sum and not an area (though an area isn't exactly wrong). Aside from this, there are two things that are confusing everyone, I think, so I'll address these in the next comment block.
tom10
confusion 1: with time on the x-axis, you can easily convert all these things to rates, e.g. 400bits/4min=100bits/min, or (500+400 bits)/(4+4 minutes)=1.875bits/sec, but OP doesn't seem to need this, so just ignore it. confusion 2: The best you can do here is just the sum of the discreet data point, e.g. between pt 3 and pt 509, just add up all 506 data pts. In some sense this is the area, but since this isn't a continuous curve, that's just a confusion, and Trapzoidal or Riemann are not meaningful.
tom10
I still think the main confusion is that we don't know for certain what the input data is, and what we're trying to get as a result. Give me that, and the rest is easy, lemon easy.
ldigas
So really, just add the bits in each data point for the interval in question. You can't do better that. Does this seem like it's the right answer?
tom10
@Idigas - Sure, you're absolutely right. But, imho, this is the way of things 90% of the time... knowing the question is much harder than knowing the answer. (Btw, if you mind my posting here, I can just make my own answer -- somehow I got started with this and it sort-of got out of hand.)
tom10
@tom10 - I cannot agree. Yes, it's true that sometimes knowing the question solves half the problem, but this is a trivial technical problem with a singular solution. If we don't know what we've got, we cannot solve it, only guess. And that is a waste of time (except for educational purposes) because at some point in time, for the problem to be solved, we will need to find what that input data is. ... No, of course, I don't mind. Perish the thought. SOmetimes it's even easier to follow the discussion this way.
ldigas
@Idigas - Hasn't the OP (eventually) told us that each data points is the total bits used in a 4 min interval? What else do we need to know. (I think this has been confusing because some mathematical people suggested approaches that were inappropriate to the data, e.g. resulting in the irrelevant title of this question amongst other confusions, so we've needed to unwind this confusion and get the OP to do what he probably would've done in the first place.)
tom10
Okay, there is some definite confusion. First off, you both rock. Drinks are on me. Second, I *NEVER* said that the data points were cumulative in any way. As far as I can tell, the bits tied to each time represents the data being used up at THAT specific point. One hint at this is that if I add up all the values like you both have suggested, I'm way off from what the right value should be. So, again, imagine it like this: Point 5 : 5k bits, Point 6 : 10k bits. So what I need is some number higher than the sum, to cover the 4 minutes in between plus the 15k that are accounted for.
Anthony
@Anthony - Can you post a picture of the graph in question, with axes labelled and the totals, etc?
tom10
@Anthony - To be clear, I'm suggesting that each point on the graph (say, #7) represent the accumulation of bits used between it (#7) and the previous point (#6). If the value is "bits" it has to be cumulative in at least this sense. A single point on a graph represents an instant in time, infinitely short, so no bits really have the time to move. It can represent the bits that have moved between it an a previous instant (my suggestion) or the rate of flow of bit (which would be bits/sec, etc) or bits in some other time interval.
tom10
That is, "data being used up at THAT specific point [measured in bits]" doesn't make sense. "[rate of] data being used up at THAT specific point [measured in, e.g., bits/sec]" would make sense, but that's not what you're saying. For bits there needs to be some unit of time implied or specifically stated. That time is probably either a second or minute (in which case you'd call it bits/second or bits/minute), or the interval between the data points (in which case, refer to my answer "The short answer is 400...").
tom10
@tom10 - Of course, you're right. But we had to get to that first. That's why I like to clear the question before drawing any conclusions about the answer. But never mind that now, if Anthony can give us a graph in question that would pretty much solve this thing for good. As far as drinks go, same goes, if you're ever in my part of the world.
ldigas
I could provide a graph, but it wouldn't really be correct if I'm mis-reading the data. Do you mean you want the data that the graph reflects? At this point I don't think the graphing is the important part, so much as the data interpretation and (unfortunately) me getting an understanding of what it means on face value.
Anthony
Would it help if I mentioned that one of the things that makes this data a real pain is that it doesn't show "zero points" at the end of a usage period? What I mean is, if a user spends an hour on the network, signs off, comes in the next day, uses the network for 20 minutes, the data reflects the usage for both periods, but no idea that it ever dropped down to zero. So if I don't account for that and add the zeros, I get a dramatic line drawn from their last usage point of day1 over to where they pick up on day2.
Anthony
You said "The data is 4 minutes apart when there is activity". If this is a reliable feature, then if there are no new data points for 5 minutes, your program could add a fake zero data point at that time.
Artelius
@Anthony - We need to know the basic meaning of a data point before worrying about the exceptions. In multiple paragraphs now I've suggested that the bits are calculated as the number of bits that were used between the time of the plotted pt and the time of the previous pt. There's been a lot of discussion, but it still seems to me like it's the case. Really think about whether this is consistent with your data, why I think it's true, and keep in mind why "THAT specific point" is probably the wrong way to think of it, etc, and answer this question directly.
tom10
@Anthony - of course it wouldn't. What I ment under "graph" comment was whether you could post either graph, or some kind of data, or anything ... from which we could draw some conclusions. Of course, you would still have to say what is it the numbers that you're posting, since we don't have the background script which provides the data. I'm sure, if we finally knew what is it we're working with, somebody would solve this problem, at least in pseudocode, which you can then translate to ... your particular flavour.
ldigas
A: 

I am not bad at math, ... I just am not familiar with math beyond 10th grade

This is like saying "I'm not bad at programming, I have no trouble with ifs and loops but I never got around to writing more than one function."

I would suggest you enrol in a maths class of some kind. An understanding of matrices and the basics of calculus gives you an appreciation of many things, and can be useful in all sorts of areas. You'll be able to understand more of Wikipedia articles and SO answers - and questions!

If you can't afford that, try to find some lecture videos or something.

Everyone else has insisted on trying to teach me Riemann sum techniques

I can't see why. You don't need them for this - though if you had learned them, I expect you would find it easier to come up with a solution. You see, Riemann sums attempt to give you a "familiar" notion of area. The sort of area you (hopefully) learned years ago.

Getting the area below your usage graph between two points will tell you (approximately) how much was used over that period.

How do you find the area of a floor plan? You break it up into rectangles and triangles, find the area of each, and add them together. You can do the same thing with your graph, basically. Someone has worked out a simple way of doing this called the trapezoidal rule. It's just a matter of choosing how to divide your graph into strips, and in your case this is easy: just use the data points themselves as dividers. (You'll also need to work out the value of the graph at the left and right ends of the region selected by the user, using linear interpolation.)

If there's anything I've said that isn't clear to you (as there may well be), please leave a comment.

Artelius
I'm going to leave "unknown" a much more in depth comment, but I wanted to respond with two things: 1) Thank you for the advice on trapezoidal rules and for easing off of the condescension after you got it out of your system. It helps to be empathetic with someone struggling with an embarrassing drawback, which you were after the initial jab, so I'll take it as cheeky instead of rude. Secondly...
Anthony
I have to disagree with you on your initial analogy. The extent of my programming prior to a year ago was a class in turbo pascal in HS. I don't like to pat myself on the back, but it startles me that I've picked up as much as I have in 12 months mostly through intuition and documentation sites. I still tremble at objects and classes and go blank at the mention of bit code, but I expect I'll get it when I really need to. I didn't say "I'm great at math." I said I'm not bad. I figured out geometric equations in jr. HS before they were covered and matrices rock....
Anthony
I just hit a point where it wasn't intuitve anymore (sine and cosine, damn thee), and where I had other passions to distract me. I don't mean to make this about my life story. What I'm getting at is that there is a difference between someone who is "bad" at math (or writing, or with money) and honestly can't "get it" for whatever reason, and someone who simply lacks the right starting point (or the time to start from scratch) to learn an unfamiliar and fairly advanced concept.
Anthony
I think you misunderstood the intent in my first paragraph. Mathematics is an enormous subject. I personally consider the entirety of programming to be one branch of mathematics. So it's big. High school doesn't give really give you enough exposure to mathematics for you to tell whether you're "good" or "bad" (terribly one-dimensional words that they be). And note that in my analogy I said "never got around to", rather than "couldn't comprehend".
Artelius
A: 

The network usage total is not in bytes (kilo-, mega-, whatever) per second. It would be in just straight bytes (or kilo-, or whatever).

For example, 2 megabytes per second over an interval of 10 seconds would be 20 megabytes total. It would not be 20 megabytes per second.

Or do you perhaps want average bytes per second over an interval?

Robert L
Somewhere in this mess, the idea that kilo, mega, etc was part of my question, but it wasn't. In fact, after much discussion, the per second is not even right. I have a list of time stamps each with amount of data (in bits) sent and received by a user. The users need to know how much data they've used up in their data plan (bandwidth capping), not how fast they are going at any time but how much they used. So I need the total usage based on the points given.
Anthony