Needs some Stats help - Standard Deviation
AsSiMiLaTeD
Posts: 11,725
This isn't a school exercise, I'm in the real world here, needing help with something for work. It can be argued that I shouldn't be working on what I'm doing given my limited understanding, but we'll just skip that part.
What I really want to know is if you can really do standard deviation off data that's basically 1s and 0s, or yes/no. I don't really see how it could be possible, because isn't the whole point ot standard deviation to measure how far something is from center, or the mean? Well, if it's a 1 or 0 then I don't see how you can do that.
I know standard deviation is used alot with counting data, or variable data, but I'm not seeing how it can be sued with yes/no data.
If anyone can explain to me I'd appreciate it.
What I really want to know is if you can really do standard deviation off data that's basically 1s and 0s, or yes/no. I don't really see how it could be possible, because isn't the whole point ot standard deviation to measure how far something is from center, or the mean? Well, if it's a 1 or 0 then I don't see how you can do that.
I know standard deviation is used alot with counting data, or variable data, but I'm not seeing how it can be sued with yes/no data.
If anyone can explain to me I'd appreciate it.
Post edited by AsSiMiLaTeD on
Comments
-
Would it not come down to how many times the 0s or 1s were suppose to be 0s and 1s but were not or were ?Speakers: SDA-1C (most all the goodies)
Preamp: Joule Electra LA-150 MKII SE
Amp: Wright WPA 50-50 EAT KT88s
Analog: Marantz TT-15S1 MBS Glider SL| Wright WPP100C Amperex BB 6er5 and 7316 & WPM-100 SUT
Digital: Mac mini 2.3GHz dual-core i5 8g RAM 1.5 TB HDD Music Server Amarra (memory play) - USB - W4S DAC 2
Cables: Mits S3 IC and Spk cables| PS Audio PCs -
Not really, let's say I have the following dataset, we'll call this error rate, the 1 represents an error, the 0 represents no error.
1
1
1
0
0
0
0
0
0
0
So if I average that column, I get .3, but if I use Excel to get the standard deviation, I get .48. I don't see how that number is either valid or meaningful. -
Yep, even binomials have SD's. sqrt(np(1-P)). Basically you can use that to set up inference. With a small sample size you calculate exact probabilities, with a large one, you can just pretend that it's Gaussian.Gallo Ref 3.1 : Bryston 4b SST : Musical fidelity CD Pre : VPI HW-19
Gallo Ref AV, Frankengallo Ref 3, LC60i : Bryston 9b SST : Meridian 565
Jordan JX92s : MF X-T100 : Xray v8
Backburner:Krell KAV-300i -
...but what you describe is a sample, so you're probably interested in the SE. what is the exact question you want to answer with your data?Gallo Ref 3.1 : Bryston 4b SST : Musical fidelity CD Pre : VPI HW-19
Gallo Ref AV, Frankengallo Ref 3, LC60i : Bryston 9b SST : Meridian 565
Jordan JX92s : MF X-T100 : Xray v8
Backburner:Krell KAV-300i -
AsSiMiLaTeD wrote: »Not really, let's say I have the following dataset, we'll call this error rate, the 1 represents an error, the 0 represents no error.
1
1
1
0
0
0
0
0
0
0
So if I average that column, I get .3, but if I use Excel to get the standard deviation, I get .48. I don't see how that number is either valid or meaningful.
Need a larger sample data set, I think it has to be at least 25 depending on what level of Sigma you are trying to achieve. I could be totally off base, I have had the Green Belt training but no project yet.Speakers: SDA-1C (most all the goodies)
Preamp: Joule Electra LA-150 MKII SE
Amp: Wright WPA 50-50 EAT KT88s
Analog: Marantz TT-15S1 MBS Glider SL| Wright WPP100C Amperex BB 6er5 and 7316 & WPM-100 SUT
Digital: Mac mini 2.3GHz dual-core i5 8g RAM 1.5 TB HDD Music Server Amarra (memory play) - USB - W4S DAC 2
Cables: Mits S3 IC and Spk cables| PS Audio PCs -
Let's say I have 100,000 rows of data, for the past 2 months, not sample data, but the entire population. I have a field, lets call it error, on each row of data. When I have an error, that field is stamped with a 1.
I want the error rate, which is the average of the field error - so AVG(error*1.000) in sql. So if I have 30,000 rows where error = 1 then my average is .30. Now I want the standard deviation in that same dataset, but I don't think I can really do that with just 1s and 0s, right? -
You can. With 30% error rate, you would have a standard deviation of 0.46. With 10% error rate it would be 0.30.
sqrt( (30000 * (1-0.3)^2 + 70000 * (0-0.3)^2) / 100000 ) = 0.46
sqrt( (10000 * (1-0.1)^2 + 90000 * (0-0.1)^2) / 100000 ) = 0.30 -
I'm good with the calculation, I'm more curious if it's meaningful in measuring a simple yes/no. Would that .46 be 46%, or .46%?
-
Yep, Sami nailed. I'll say it again, though. Statistics are only as good as the question that you're asking. The SD might not answer your question.Gallo Ref 3.1 : Bryston 4b SST : Musical fidelity CD Pre : VPI HW-19
Gallo Ref AV, Frankengallo Ref 3, LC60i : Bryston 9b SST : Meridian 565
Jordan JX92s : MF X-T100 : Xray v8
Backburner:Krell KAV-300i -
I think I know what' i'm missing, really haven't had much time to think this through until now. But let me explain more...
What I'm trying to do is build a control chart, sorta. So I have data for the last two months, in the 1s and 0s format I mentioned. For that time I need to calculate average and standard deviation.
Then I take the previous week's data and get the average for that as well. Then I compare that with the average over the last couple months to basically paint the cell that value is in - if it's within one SD of the 2 month average, it's green, within 2 SD it's yellow, more than 2 it's red.
I've got the technical stuff figured out, and have the standard deviation function working. The issue is that I'm seeing very high standard deviations that don't make sense. I'm wondering if I actually should just group down the data at the very lowest level, actually calculate averages at that level, and then build my SD off of that... -
What stats tool are you using? Do you have access to SAS, SPSS or STATA?Main Gear
Panasonic 50" Plasma, Polk LSi15 (Front), LSiC, LSi7 (Rear), Sherwood Newcastle AVP-9080, AM-9080 bi-amp to LSi15, AM-9080 bi-amp to LSiC and LSi7. -
I'm not using any stats package, just sql rs.
-
AsSiMiLaTeD wrote: »I'm good with the calculation, I'm more curious if it's meaningful in measuring a simple yes/no. Would that .46 be 46%, or .46%?
It's not a percentage, it's just the expected delta between the average and the actual value. Kind of a limit of what's within acceptable range (depending on how you use it). I'm not a stats expert so I hope that's not way off. 0's and 1's, yes, doesn't really make a lot of sense since 0 or 1 is always off from the SD no matter what the values are.
Do you have enough information to get the error rate (ER) % and get SD from monthly/weekly/daily/hourly ER? Just an idea. Your team that you work with/for most likely would have plenty of input for you. -
Do you have enough information to get the error rate (ER) % and get SD from monthly/weekly/daily/hourly ER? Just an idea.
-
goddammit, my wife just closed my browser.... I had a decent explanation, but oh well. anyhow, you want to use:
Z= phat-p/sqrt[p(1-p)/n] Where P is the 2 month value and n is the number for that week's sample. We're gonna pretend that the 2 month value a constant even though it isn't. phat is your weekly sample rate.Gallo Ref 3.1 : Bryston 4b SST : Musical fidelity CD Pre : VPI HW-19
Gallo Ref AV, Frankengallo Ref 3, LC60i : Bryston 9b SST : Meridian 565
Jordan JX92s : MF X-T100 : Xray v8
Backburner:Krell KAV-300i