Needs some Stats help - Standard Deviation

AsSiMiLaTeD
AsSiMiLaTeD Posts: 11,725
edited January 2009 in The Clubhouse
This isn't a school exercise, I'm in the real world here, needing help with something for work. It can be argued that I shouldn't be working on what I'm doing given my limited understanding, but we'll just skip that part.

What I really want to know is if you can really do standard deviation off data that's basically 1s and 0s, or yes/no. I don't really see how it could be possible, because isn't the whole point ot standard deviation to measure how far something is from center, or the mean? Well, if it's a 1 or 0 then I don't see how you can do that.

I know standard deviation is used alot with counting data, or variable data, but I'm not seeing how it can be sued with yes/no data.

If anyone can explain to me I'd appreciate it.
Post edited by AsSiMiLaTeD on

Comments

  • thsmith
    thsmith Posts: 6,082
    edited January 2009
    Would it not come down to how many times the 0s or 1s were suppose to be 0s and 1s but were not or were ?
    Speakers: SDA-1C (most all the goodies)
    Preamp: Joule Electra LA-150 MKII SE
    Amp: Wright WPA 50-50 EAT KT88s
    Analog: Marantz TT-15S1 MBS Glider SL| Wright WPP100C Amperex BB 6er5 and 7316 & WPM-100 SUT
    Digital: Mac mini 2.3GHz dual-core i5 8g RAM 1.5 TB HDD Music Server Amarra (memory play) - USB - W4S DAC 2
    Cables: Mits S3 IC and Spk cables| PS Audio PCs
  • AsSiMiLaTeD
    AsSiMiLaTeD Posts: 11,725
    edited January 2009
    Not really, let's say I have the following dataset, we'll call this error rate, the 1 represents an error, the 0 represents no error.

    1
    1
    1
    0
    0
    0
    0
    0
    0
    0

    So if I average that column, I get .3, but if I use Excel to get the standard deviation, I get .48. I don't see how that number is either valid or meaningful.
  • unc2701
    unc2701 Posts: 3,587
    edited January 2009
    Yep, even binomials have SD's. sqrt(np(1-P)). Basically you can use that to set up inference. With a small sample size you calculate exact probabilities, with a large one, you can just pretend that it's Gaussian.
    Gallo Ref 3.1 : Bryston 4b SST : Musical fidelity CD Pre : VPI HW-19
    Gallo Ref AV, Frankengallo Ref 3, LC60i : Bryston 9b SST : Meridian 565
    Jordan JX92s : MF X-T100 : Xray v8
    Backburner:Krell KAV-300i
  • unc2701
    unc2701 Posts: 3,587
    edited January 2009
    ...but what you describe is a sample, so you're probably interested in the SE. what is the exact question you want to answer with your data?
    Gallo Ref 3.1 : Bryston 4b SST : Musical fidelity CD Pre : VPI HW-19
    Gallo Ref AV, Frankengallo Ref 3, LC60i : Bryston 9b SST : Meridian 565
    Jordan JX92s : MF X-T100 : Xray v8
    Backburner:Krell KAV-300i
  • thsmith
    thsmith Posts: 6,082
    edited January 2009
    Not really, let's say I have the following dataset, we'll call this error rate, the 1 represents an error, the 0 represents no error.

    1
    1
    1
    0
    0
    0
    0
    0
    0
    0

    So if I average that column, I get .3, but if I use Excel to get the standard deviation, I get .48. I don't see how that number is either valid or meaningful.


    Need a larger sample data set, I think it has to be at least 25 depending on what level of Sigma you are trying to achieve. I could be totally off base, I have had the Green Belt training but no project yet.
    Speakers: SDA-1C (most all the goodies)
    Preamp: Joule Electra LA-150 MKII SE
    Amp: Wright WPA 50-50 EAT KT88s
    Analog: Marantz TT-15S1 MBS Glider SL| Wright WPP100C Amperex BB 6er5 and 7316 & WPM-100 SUT
    Digital: Mac mini 2.3GHz dual-core i5 8g RAM 1.5 TB HDD Music Server Amarra (memory play) - USB - W4S DAC 2
    Cables: Mits S3 IC and Spk cables| PS Audio PCs
  • AsSiMiLaTeD
    AsSiMiLaTeD Posts: 11,725
    edited January 2009
    Let's say I have 100,000 rows of data, for the past 2 months, not sample data, but the entire population. I have a field, lets call it error, on each row of data. When I have an error, that field is stamped with a 1.

    I want the error rate, which is the average of the field error - so AVG(error*1.000) in sql. So if I have 30,000 rows where error = 1 then my average is .30. Now I want the standard deviation in that same dataset, but I don't think I can really do that with just 1s and 0s, right?
  • Sami
    Sami Posts: 4,634
    edited January 2009
    You can. With 30% error rate, you would have a standard deviation of 0.46. With 10% error rate it would be 0.30.

    sqrt( (30000 * (1-0.3)^2 + 70000 * (0-0.3)^2) / 100000 ) = 0.46

    sqrt( (10000 * (1-0.1)^2 + 90000 * (0-0.1)^2) / 100000 ) = 0.30
  • AsSiMiLaTeD
    AsSiMiLaTeD Posts: 11,725
    edited January 2009
    I'm good with the calculation, I'm more curious if it's meaningful in measuring a simple yes/no. Would that .46 be 46%, or .46%?
  • unc2701
    unc2701 Posts: 3,587
    edited January 2009
    Yep, Sami nailed. I'll say it again, though. Statistics are only as good as the question that you're asking. The SD might not answer your question.
    Gallo Ref 3.1 : Bryston 4b SST : Musical fidelity CD Pre : VPI HW-19
    Gallo Ref AV, Frankengallo Ref 3, LC60i : Bryston 9b SST : Meridian 565
    Jordan JX92s : MF X-T100 : Xray v8
    Backburner:Krell KAV-300i
  • AsSiMiLaTeD
    AsSiMiLaTeD Posts: 11,725
    edited January 2009
    I think I know what' i'm missing, really haven't had much time to think this through until now. But let me explain more...

    What I'm trying to do is build a control chart, sorta. So I have data for the last two months, in the 1s and 0s format I mentioned. For that time I need to calculate average and standard deviation.

    Then I take the previous week's data and get the average for that as well. Then I compare that with the average over the last couple months to basically paint the cell that value is in - if it's within one SD of the 2 month average, it's green, within 2 SD it's yellow, more than 2 it's red.

    I've got the technical stuff figured out, and have the standard deviation function working. The issue is that I'm seeing very high standard deviations that don't make sense. I'm wondering if I actually should just group down the data at the very lowest level, actually calculate averages at that level, and then build my SD off of that...
  • vlam
    vlam Posts: 282
    edited January 2009
    What stats tool are you using? Do you have access to SAS, SPSS or STATA?
    Main Gear
    Panasonic 50" Plasma, Polk LSi15 (Front), LSiC, LSi7 (Rear), Sherwood Newcastle AVP-9080, AM-9080 bi-amp to LSi15, AM-9080 bi-amp to LSiC and LSi7.
  • AsSiMiLaTeD
    AsSiMiLaTeD Posts: 11,725
    edited January 2009
    I'm not using any stats package, just sql rs.
  • Sami
    Sami Posts: 4,634
    edited January 2009
    I'm good with the calculation, I'm more curious if it's meaningful in measuring a simple yes/no. Would that .46 be 46%, or .46%?

    It's not a percentage, it's just the expected delta between the average and the actual value. Kind of a limit of what's within acceptable range (depending on how you use it). I'm not a stats expert so I hope that's not way off. 0's and 1's, yes, doesn't really make a lot of sense since 0 or 1 is always off from the SD no matter what the values are.

    Do you have enough information to get the error rate (ER) % and get SD from monthly/weekly/daily/hourly ER? Just an idea. Your team that you work with/for most likely would have plenty of input for you.
  • AsSiMiLaTeD
    AsSiMiLaTeD Posts: 11,725
    edited January 2009
    Do you have enough information to get the error rate (ER) % and get SD from monthly/weekly/daily/hourly ER? Just an idea.
    That's what I was getting at in post 11 above, I think that's the direction I really need to go.
  • unc2701
    unc2701 Posts: 3,587
    edited January 2009
    goddammit, my wife just closed my browser.... I had a decent explanation, but oh well. anyhow, you want to use:
    Z= phat-p/sqrt[p(1-p)/n] Where P is the 2 month value and n is the number for that week's sample. We're gonna pretend that the 2 month value a constant even though it isn't. phat is your weekly sample rate.
    Gallo Ref 3.1 : Bryston 4b SST : Musical fidelity CD Pre : VPI HW-19
    Gallo Ref AV, Frankengallo Ref 3, LC60i : Bryston 9b SST : Meridian 565
    Jordan JX92s : MF X-T100 : Xray v8
    Backburner:Krell KAV-300i