simple statistical analysis from shell level

i’m looking for some handy program/script to which i can pump data via stdin and which can present me some basic statistics of input data. for instance – provided with set of values separated by new line character i would like to get:

  • average for all values
  • average for data except 5% smallest and 5% largest values
  • standard deviation

yes – i know, can be done with bash or awk, but maybe you already know something handy?

ps.

i’m perfectly aware of ‘big cannons’ like octave, r and some other – but i need something much simpler.

thanks

Answer

This little AWK snippet will do part of what you’re looking for:

awk '{sum += $0; count++; vals[$0]++} END {mean = sum / count; print "Total: ", sum; print "Mean: ", mean; for (i in vals){ s += vals[i] * ((i - mean) ** 2) }; print "Standard Dev: ", sqrt(s/count)}' datafile

The drop 5% part would be a little more complicated and depend on exactly how you mean it.

I know you’re looking for something canned, but short of using R, Octave, SAS or SPSS, I don’t know of anything.

Edit: Corrected formula

Attribution
Source : Link , Question Author : pQd , Answer Author : Dennis Williamson

Leave a Comment