Oak Leaf in Fortran

Oak Leaf programming interface to modern Fortran since its 2008 revision.

Basic usage

Write and save to the file basic.f08 the source code:

program basic

   use oakleaf

   real, dimension(5) :: data = [ 1, 2, 3, 4, 5 ]
   real :: mean,stderr,stdsig
   logical :: reliable

   call rmean(data,mean,stderr,stdsig,reliable)
   write(*,*) 'rmean:',mean,stderr,stdsig,reliable

end program basic

The code is ready to be compiled

$ gfortran -I/usr/include -o basic basic.f08 -L/usr/lib -loakleaf

and subsequently launched:

$ ./basic
 rmean:   3.00000000      0.748169661       1.65011716     T

That's all!

Robust estimators

Robust mean

The robust mean estimator is a heart of OakLeaf. rmean() is designed as a replacement for the arithmetical mean: its execution is significanlty slower, the robust performance great.

There are two modes of rmean() calls: plain and weighed. The weighted mean is an obsolete term for use of a priory known errors of the data.


 subroutine rmean(data,errors,mean,stderr,stdsig,scale,reliable,flag,verbose)
 subroutine rmean(data,mean,stderr,stdsig,scale,reliable,flag,verbose)

       real(kind), dimension(:), intent(in) :: data
       real(kind), dimension(:), intent(in), optional :: errors
       real(kind), intent(out) :: mean
       real(kind), intent(out), optional :: stderr, stdsig, scale
       logical, intent(out), optional :: reliable
       integer, intent(out), optional :: flag
       logical, intent(in), optional :: verbose

    end subroutine rmean

On input:
     data - array of data values to be estimated
     errors (optional) - statistical errors of data (errors > 0), for weighted mean
     verbose (optional) - print some verbose information

On output are estimated:
      mean - robust mean
      stderr (optional) - standard error
      stdsig (optional) - standard deviation
      scale (optional) - scale of function normalisation
      reliable (optional) - realiable results?
      flag (optional) - indicates reliability of results:
             0 == success, results are both reliable and accurate
             1 == consider a rough estimate due convergence fail, try verbose
             2 == basic estimates, few datapoints available
             3 == data looks nearly identical, MAD is zero
             4 == improper input parameters (sizes of arrays errors, data
                  are unequal, any element of error <= 0), try verbose
             5 == failed to allocate memory, only very initial estimates
                  of mean, stderr, stdsig are just returned

    The results gives the robust estimate of the true value X
    of the sample x falling with 68% probability into the interval

            mean - stderr  <  X  <  mean + stderr

   The distribution can be approximated by Normal distribution

           N(mean,stdsig).


   WARNING

   This routine gives unexpected results when setup of input errors
   is incorrect. It is indicated by significant violation of scale /= 1.
   If you encountered it, run plain rmean() instead. If errors holds some
   additional property or condition, use it.

Quantil mean

This routine estimates both averadge and standard deviation of a data set on base of quantiles. The median is claimed as the average of the data (which its is direct equivalent of) Q(0.5). Standard deviation is estimated as [Q(0.75) - Q(0.25)]/2.

Quantiles are robust estimators by principle, however the results are more granulated ‒ with smaller accuracy. Two data points are the absolute minimum of data amount.


    subroutine qmean(data,mean,stdsig,flag)
       real(dbl), dimension(:), intent(in) :: data
       real(dbl), intent(out) :: mean
       real(dbl), intent(out), optional :: stdsig
       integer, intent(out), optional :: flag
    end subroutine qmean

 On input:
     data - array of data values to be estimated

 On output are estimated:
      mean - robust mean
      stdsig (optional) - standard deviation
      flag (optional) - indicates reliability of results:
             0 == success, results are both reliable and precise
             1 == fail due too few data (less than 3)

Median and MAD estimates

This routine estimates the averadge by median and the standard deviation by the median of absolute deviations (MAD).

    subroutine madmed(x,t,s)
    subroutine madwmed(x,dx,t,s,d)

       real(kind), dimension(:),intent(in) :: x, dx
       real(kind), intent(out) :: t,s
    end subroutine qmean

 On input:
      x - array of data values to be estimated
      dx (optional) - array of the data errors

 On output are estimated:
      t - median
      s - standard deviation as MAD/Q50

Flux mean

Robust estimation of the factor lambda of Poisson distribution, it is an average number of events per a time period.


  subroutine fmean(photons,photons_err,counts,counts_err, &
       mean,stderr,stdsig,scale,reliable,poisson,flag,verbose)

    real(kind), dimension(:), target, intent(in) :: &
         photons,photons_err,counts,counts_err
    real(kind), intent(out) :: mean
    real(kind), intent(out), optional :: stderr,stdsig,scale
    logical, intent(out), optional :: reliable, poisson
    integer, intent(out), optional :: flag
    logical, intent(in), optional :: verbose

   This module implements a robust estimator of a ratio of events

       t =  photon / counts

   (including estimation of its scatter) appearing as the factor
   lambda parameter of Poisson distribution

       Po(lambda) = lambda**k/k! * exp(-lambda),   lambda = t * C

    which provides zero mean for difference of N1 - N2 events
    in Skellam's distribution
       https://en.wikipedia.org/wiki/Skellam_distribution

    The estimation is valid only for large values of lambda >> 1
    in Gaussian regime when transformation

        (photon - t*counts) / sqrt(photon_err**2 + t**2*count_err**2)

    is valid. The approach usually requires proper estimates errors.

    Both photons, counts should be non-integers > 1 as a product of
    previous computations.


   fmean should be called as:

    call fmean(photons,photons_err,counts,counts_err,mean,stderr,stdsig,
               scale,reliable,poisson,verbose)

   On input:
     photons - array of reference photon rates
     photons_err - array of statistical errors of photons
     counts - array of measured rates
     counts_err - array of statistical errors of counts
     verbose - print additional info

   On output are estimated:
     mean - mean
     stderr (optional) - its standard error
     stdsig (optional) - estimate of standard deviation
     scale (optional) - estimate of scale
     reliable - indicates reliability of results (optional)
     poisson - indicates Poisson (.true.) or Normal (.false.)
               distribution used for determine of results (optional)

   The given results  means that a true value T of the sample
   photons / counts can be, with 70% probability, found in interval

           t - dt  <  T  <  t + dt

   The estimate is performed as maximum of entropy of Poisson density
   for values of photons smaller than 'plimit' (666) or by robust estimate
   of parameters of Normal distribution. Both the estimates are robust.
   Poisson mean is estimated as the minimum of free energy:
   https://en.wikipedia.org/wiki/Helmholtz_free_energy

Supporting routines

The routines, described above, works with help of supporting routines described below. They are should be usefull in general.

Scale by information

This routine provides estimation of scale by maximising of information, or minimising of variance (dispersion): Fisher information.

The scale of dispersion of data is crutial quantity for robust estimates on base of the maximum-likelihood principle.


   Iscale should be called as:

    call iscale(r,s,reliable,flag,verbose)

   On input:
     r - array of residuals: (data-mean), (data-mean)/errors, ...
     verbose (optional) - verbose prints

   On output are estimated:
     s - scale
     reliable (optional) - indicates reliability of result
                           if .true., scale has been correctly
                           localised. if .false., s is undefined
     flag (optional) - result code:
            0 == success, results are both reliable and precise
            1 == consider rough estimate due convergence fail, try verbose
            2 == basic estimates, few datapoints available
            3 == data looks nearly identical
            5 == failed to allocate memory, initial estimates are returned

   Note.
   Scale parameter shouldn't confused with standard deviation.
   Their values are identical only in case of Normal distribution.

Empirical CDF

Empirical cumulative distribution function (CDF) constructed from data.

    subroutine ecdf(data,x,p)
        real(dbl),dimension(:),intent(in) :: data
        real(dbl),dimension(:),intent(out) :: x,p
    end subroutine ecdf

 On input:
     data - array of data values to be estimated

 On output:
     x - coordinates of absicca of steps
     p - step values

Quantiles

Estimates Q-quantiles by a linear interpolation of empirical CDF between discrete CDF steps. This is the inverse function of the empirical CDF.

    function quantile(q,x,y) result(t)
       real(dbl), intent(in) :: q
       real(dbl), dimension(:), intent(in) :: x,y
    end function quantile

 On input:
       q - quantile as fraction (0.0 <= q <= 1.0)
       x,y - empirical CDF

 Result:
       Value of x abscicca for given quantile