What is ReGenesees

ReGenesees (R Evolved Generalized Software for Sampling Estimates and Errors in Surveys) is a full-fledged R software for design-based and model-assisted analysis of complex sample surveys. This system is the outcome of a long term research and development project, aimed at defining a new standard for calibration, estimation and sampling error assessment to be adopted in all large scale sample surveys routinely carried out by Istat (the Italian National Institute of Statistics).


System Architecture

ReGenesees has a clear-cut two-layer architecture: the application layer of the system is embedded into an R package named ReGenesees. A second R package, called ReGenesees.GUI, implements the presentation layer of the system (namely a Tcl/Tk GUI). Both packages can be run under Windows as well as under Mac, Linux and most of the Unix like operating systems. While the ReGenesees.GUI package requires the ReGenesees package, the latter can be used also without the GUI on top. Thus the statistical functions of the system will always be accessible by users interacting with R through the traditional command-line interface. On the contrary, less experienced R users will take advantage from the user-friendly mouse-click GUI.


Data Input/Output

The ReGenesees system can import data in a variety of ways. First, it can load R workspace files (with .RData or .rda extensions) storing previously saved data. Second, data can be imported from Text Files (with extensions .txt, .csv, .dat). Third, the system can import data from MS Excel spreadsheets and/or MS Access database tables. Currently, ReGenesees can save output data into R workspace files (.RData, .rda) and/or export them into Text Files (.txt, .csv, .dat). Further extensions are possible.



Main Statistical Functions
  • Complex Sampling Designs
    • Multistage, stratified, clustered, sampling designs
    • Sampling with equal or unequal probabilities, with or without replacement
    • “Mixed” sampling designs (i.e. with both self-representing and non-self-representing strata)
  • Calibration
    • Global and partitioned (for factorizable calibration models)
    • Unit-level and cluster-level weights adjustment
    • Homoscedastic and heteroscedastic models
    • Linear, raking and logit distance functions
    • Bounded and unbounded weights adjustment
    • Multi step calibration
    • Consistent trimming of calibration weights
  • Basic Estimators
    • Horvitz-Thompson
    • Calibration Estimators
  • Variance Estimation
    • Multistage formulation
    • Ultimate Cluster approximation
    • Collapsed strata technique for handling lonely PSUs
    • Taylor linearization of nonlinear “smooth” estimators
    • Generalized Variance Functions method
  • Estimates and Sampling Errors (standard error, variance, coefficient of variation, confidence interval, design effect) for:
    • Totals
    • Means
    • Absolute and relative frequency distributions (marginal, conditional and joint)
    • Ratios between totals
    • Shares and ratios between shares
    • Multiple regression coefficients
    • Quantiles
  • Estimates and Sampling Errors for Complex Estimators
    • Handles arbitrary differentiable functions of Horvitz-Thompson or Calibration estimators
    • Complex Estimators can be freely defined by the user
    • Automated Taylor linearization
    • Design covariance and correlation between Complex Estimators
  • Estimates and Sampling Errors for Subpopulations (Domains)
    • All the analyses above can be carried out for arbitrary domains

Future plans

New Statistical Functions
  • Replication based Variance Estimation for non-analytic estimators, through the Delete‑A‑group Jackknife (DAGJK) technique: this would integrate the EVER package with the ReGenesees system
  • Variance Estimation of poverty and inequality indicators through the Generalized Linearization technique
For an up-to-date list of possible future extensions of ReGenesees, please read the embedded text file 'DESIDERATA'.    

Get involved

For further information, interested people can contact the ReGenesees project leader at Istat: email    

Public administration reference

  • ReGenesees plays a strategic role in the production processes of Istat (the Italian National Institute of Statistics). While, as of 2016, it was already successfully integrated in the production workflow of more than 30 large-scale surveys, all probabilistic sample surveys carried out by Istat are committed to eventually migrate towards ReGenesees.
  • A recent comparative study (March 2015) carried out by the Methodology Advisory Service of the UK Office for National Statistics (ONS) concluded that it would be entirely feasible to adopt ReGenesees as the software used in production at ONS. The recommendation of this report is for organisations across the UK Government Statistical Service (GSS) to explore the introduction of ReGenesees as a replacement for Statistics Canada software GES in a production setting.
  • In 2014 ReGenesees was used to calibrate the last round of three important surveys of the Scottish Government (whose weighting procedures were, till then, contracted to three separate external companies): (i) Scottish Household Survey, (ii) Scottish Health Survey, and (iii) Scottish Crime and Justice Survey.

