i’m looking for some handy program/script to which i can pump data via stdin and which can present me some basic statistics of input data. for instance – provided with set of values separated by new line character i would like to get:
- average for all values
- average for data except 5% smallest and 5% largest values
- standard deviation
yes – i know, can be done with bash or awk, but maybe you already know something handy?
ps.
i’m perfectly aware of ‘big cannons’ like octave, r and some other – but i need something much simpler.
thanks
Answer
This little AWK snippet will do part of what you’re looking for:
awk '{sum += $0; count++; vals[$0]++} END {mean = sum / count; print "Total: ", sum; print "Mean: ", mean; for (i in vals){ s += vals[i] * ((i - mean) ** 2) }; print "Standard Dev: ", sqrt(s/count)}' datafile
The drop 5% part would be a little more complicated and depend on exactly how you mean it.
I know you’re looking for something canned, but short of using R, Octave, SAS or SPSS, I don’t know of anything.
Edit: Corrected formula
Attribution
Source : Link , Question Author : pQd , Answer Author : Dennis Williamson