Robust mean

Synopsis


use oakleaf

subroutine rmean(data,mean,stderr,stdsig,scale,reltol,robfun,rankinit,flag,verbose,report)

   real, dimension(:), intent(in) :: data
   real, intent(in out) :: mean
   real, intent(out), optional :: stderr, stdsig
   real, intent(in out), optional :: scale
   real, intent(in) :: reltol
   character(len=*), intent(in) :: robfun
   logical, intent(in), optional :: rankinit
   integer, intent(out), optional :: flag
   logical, intent(in), optional :: verbose
   character(len=*), intent(in), optional :: report

end subroutine rmean
  

The subroutine is generic, supporting REAL32, REAL64 and REAL128 (available on 64-bit platforms) datatypes. The reccomended, if the nature of a given problem allows it, is REAL64. REAL32 has low accuracy, whilst REAL128 is slow.

Description

The robust mean subroutine rmean() estimates the mean, and related statistical quantities. The data are supposed to be sample from the Normal distribution with possible contamination of a unknown origin.

The parameters for both the location and the dispersion of the Normal distribution N(mean,stdsig) are estimated. The true value X of the sample falls, with 68% probability, into the interval mean - stderr < X < mean + stderr.

Parameters

On input:

data
array of data,
mean
(optional) an initial estimate of the mean,
scale
(optional) an initial estimate of the scale (i.e. stdsig),
reltol
(optional) the relative accuracy of determination of the optimum (default: 2.4%),
robfun
(optional) select the robust function among: `hampel' (default), `tukey', `huber' or `square' (non-robust least square),
rankinit
(optional) estimate initial values by the order estimates (default: .true.)
verbose
(optional) print a detailed information about the calulations.
report
(optional) a filename of a report

data are the mandatory input parameter. The procedure initialisation is done internally if rankinit == .true. (default); otherwise, both mean,scale must be set by the calling procedure. The initial estimates are taken by the median and MAD described in qmean.

On output:

mean
the mean
stderr
(optional) the standard error
stdsig
(optional) the standard deviation
scale
(optional) the scale
flag
(optional) a flag, see the status codes.

The mandatory output parameter is the robust mean. The check of the status is recommended.

Example

Save the program to the file example.f08:

program example

   use oakleaf
   use iso_fortran_env

   real(REAL64), dimension(5) :: data = [ 1, 2, 3, 4, 5 ]
   real(REAL64) :: mean,stderr,stdsig

   call rmean(data,mean,stderr,stdsig)
   write(*,*) 'rmean:',mean,' stderr:',stderr,'stdsig:',stdsig

end program example
  

Than compile and run it (the paths of -I, -L are an example):

$ gfortran -I/usr/local/include example.f08 -L/usr/local/lib -loakleaf -lminpack
$ ./a.out
rmean:   3.0001622773505234       stderr:  0.68885212488300640      stdsig:   1.5403201776835767
  

Reports

The report file keeps the detailed track of the algorithm for visualisation or debugging purposes. It is not recommended on a common use as it slow-down a run.

The format is (see src/sqpfmean.f08) self-descibing and includes: initial estimates, the optimisation steps (current values, trust-region radius, the step, the gradient, the quasi-Newton hessian and the trust-step flag) and final number of iterations, function evaluations, restores, steps out of the trust-region, indefinite and hard steps and the final hessian computed analytically.

References

Hroch,F.: A contribution to the estimate of the robust mean value, in preparation