# F.A.Q

Here you can find a list of Frequently Asked Questions on GSA in general and SAFE in particular.

**WHAT IS GSA USEFUL FOR? WHY SHOULD I USE IT IN THE FIRST PLACE?**

We recently published a literature review that attempts at answering this question – what is GSA useful for in the context of earth systems modelling, and what we have learnt about the construction, testing and use of environmental models through the years, thanks to the application of GSA:

This paper could be a useful starting point if you want to get ideas on the type of questions GSA can address.

**HOW DO I SET-UP MY GSA (HOW TO CHOOSE THE GSA METHOD, SAMPLING STRATEGY, SAMPLE SIZE, ETC.)?**

A general introduction on the key choices in setting-up GSA with references for further reading is given in:

Pianosi, F., Beven, K., Freer, J.W. Hall, J. Rougier, J. Stephenson, D.B., Wagener, T. (2016), Sensitivity analysis of environmental models: A systematic review with practical workflow, Environmental Modelling & Software, 79, 214-232. (*Open Access*)

More discussion on the practical effects of some of the key set-up choices, and how to interpret them, is given in:

Noacco, V., Sarrazin, F., Pianosi, F. and Wagener, T. (2019), Matlab/R workflows to assess critical choices in Global Sensitivity Analysis using the SAFE toolbox, MethodX, 6, 2258-2280.

Specific discussion on the choice of sample size for EET, RSA and VBSA can be found in:

Sarrazin, F., Pianosi, F., Wagener, T. (2016), Sensitivity analysis of environmental models: Convergence and validation Environmental Modelling & Software, 79, 135-152. (*Open Access*)

**I DO NOT HAVE A MATLAB LICENCE, CAN I STILL USE SAFE?**

If you do not have Matlab you can use SAFE in Octave. Octave is freely available at:

www.gnu.org/software/octave/download.html

In order to use SAFE in Octave, you also need to download the “statistics” package:

http://octave.sourceforge.net/statistics/

Alternatively, you can download the R or Python version of SAFE.

**WHY DO I GET NEGATIVE VARIANCE-BASED SENSITIVITY INDICES? (OR INDICES LARGER THAN 1)**

In principle variance based indices take values in [0,1]. However, this might not happen in practice because we use an approximation procedure to estimate the indices (analytical computation being impossible). For instance, imagine that the ”true” (unknown) value of a sensitivity index is 0.05 and your estimation error is -0.06, you will get a sensitivity estimate of -0.01. So, obtaining indices below 0 (or above 1) is an evidence that the approximation errors are relatively large.

>> HOW DO I ESTIMATE APPROXIMATION ERRORS?

The extent of approximation errors can be assessed by using the bootstrapping option to derive confidence intervals.

>> HOW DO I REDUCE APPROXIMATION ERRORS?

If you want to reduce approximation errors (and hopefully obtain index values in [0,1]), you must increase the sample size. To do this efficiently, you can add new samples to an already existing dataset, rather than creating a new sample of bigger size from scratch (see point 4 in “workflow_vbsa_hymod”). Depending on the case study, the sample size needed to achieve good approximation may vary a lot, ranging from 1,000 up to 10,000 (or more) times the number of input factors (see for example Figure 5 in Pianosi et al. (2016))

A final remark: if the true unknown value of the sensitivity index is 0, a negative index can be obtained even when using a very large sample size, although very small in absolute value, because the sensitivity index coincides with the approximation error.

**MY VARIANCE-BASED SENSITIVITY INDICES STILL HAVE VERY LARGE CONFIDENCE BOUNDS, WHAT CAN I DO?**

If you cannot afford to run more model evaluations to reduce the confidence intervals of VBSA indices, you might:

1) Extract as much information as possible from the sensitivity estimates you possess. For example, if confidence intervals are large but they do not overlap, then you might still be able to derive reasonably robust conclusion about the ranking of the inputs, even if the sensitivity indices values are not exactly estimated. This is also discussed in Sarrazin et al. (2016).

2) Apply a different, less computationally demanding method, for example the Elementary Effects Test (EET). Notice that if you used Saltelli’s resampling strategy to generate the input/output samples for VBSA (as implemented in the *vbsa_resampling* function of SAFE) you can apply the EET to those samples without re-running the model (you only need to rearrange the samples in the right way before passing them to the *EET_indices *function). The workflow to do this (in Matlab) is available here.

**HOW DO I SET A THRESHOLD FOR IDENTIFYING UNINFLUENTIAL FACTORS?**

In theory, uninfluential input factors should have zero-valued sensitivity indices. However, since sensitivity indices are typically computed by numerical approximations rather than analytical solutions, an uninfluential factor may still be associated with a non-zero (although small) index value. One way to identify uninfluential factors is to define a threshold value for the sensitivity indices: if the index is below the threshold, then the input factor is deemed uninfluential. The problem then is how to sensibly define the threshold. A simple and effective way to set the threshold for variance-based and PAWN sensitivity indices is by using the estimated sensitivity to a ‘dummy parameter’. The approach is described and demonstrated in:

Farkhondeh KZ, Nossent J, Sarrazin F, Pianosi, F, van Griensven A, Wagener T, Bauwens, W (2017), Comparison of variance-based and moment-independent global sensitivity analysis approaches by application to the SWAT model, Environmental Modelling & Software, 91, 210–222.

**ARE THERE OTHER METHODS TO ASSESS THE ROBUSTNESS OF RANKING AND SCREENING RESULTS?**

Yes, for example the Model Variable Augmentation (MVA) method allows to assess the quality of SA results without performing any additional model runs or requiring bootstrapping. More information about the method:

Mai, J., & Tolson, B. A. ( 2019). Model Variable Augmentation (MVA) for diagnostic assessment of sensitivity analysis results. *Water Resources Research*, 55, 2631– 2651.

A Python implementation of this method is available here (code by Juliane Mai), and can be easily integrated in the Python version of SAFE.

**MODELS WITH CORRELATED INPUT FACTORS: CAN I STILL APPLY VARIANCE-BASED GSA?**

The SAFE functions currently implemented for VBSA require that input factors be independent. However the literature on extending variance-based sensitivity estimators to the case of dependent/correlated inputs is growing. A good starting point is:

S. Kucherenko, S. Tarantola, P. Annoni (2012), Estimation of global sensitivity indices for models with dependent variables, Computer Physics Communications, 183, 937–946.

A Python implementation of this method is available here (code by Alessio Ciullo), and can be easily integrated in the Python version of SAFE.

**SAMPLING FROM DISCRETE UNIFORM DISTRIBUTION: HOW TO MODIFY THE LOWER BOUND OF THE RANGE? (Matlab version)**

The sampling functions *OAT_sampling* and *AAT_sampling* in SAFE rely on the Matlab/Octave function *unidinv*, which assumes that the lower bound of a discrete uniform distribution be 1 – this is why *OAT_sampling* and *AAT_sampling* allow to specify only one parameter (the upper bound) for discrete uniform distributions. If one wants to sample from a range with lower bound different from 1, we suggest to still use the *OAT_sampling* and *AAT_sampling* functions as they are, and simply shift the results after sampling by adding/subtracting the due amount. A simple script with some examples is available here.

**WHY DO I GET AN ‘OUT OF MEMORY’ ERROR MESSAGE WHEN USING AAT_sampling FUNCTION? (Matlab version)**

You can get an ‘out of memory’ error when using the *AAT_sampling.m* function with a large number of model evaluations (N) or forcing inputs (M). The error is typically caused by another function, *lhcube.m*, which is called by *AAT_sampling.m*. The problem occurs when *lhcube.m* tries to calculate the minimum distance between two points in the Latin Hypercube Sample (lines 85-88 of *lhcube.m*). This step can take a lot of memory as it requires creating a matrix that includes the distances between all possible pairs of two points in the LHS. In the *lhcube.m* function, we have two options to perform this step. The first option (default) is given on line 86, and it uses the *pdist.m* function of the Matlab Statistical Toolbox:

*dk = min(pdist(Xk)) ; % Requires Statistical Toolbox*

The second option (which should not be active in the copy of SAFE you have received) is given on the (commented) line 88, and it uses the *ipdm.m* function (an open access function included in SAFE):

*dk=(ipdm(Xk,’metric’,2)); dk=min(dk(dk>0)); *

The reason why we use the first option by default is that it uses less memory, meaning that it should avoid ‘out of memory’ errors more frequently. However, we also give the second option for those who do not have the Statistical Toolbox – they will just need to comment line 86 and uncomment line 88.

>> HOW DO I FIX THE PROBLEM?

If you have the Statistical Toolbox and can thus use the more efficient *pdist.m* function, double check that this is the option being used by *lhcube.m*. If you are already using the first option (or if you do not have the Statistical Toolbox, so you cannot use this option) and you still get an ‘out of memory’ message, then you can try increase the memory allocated to Matlab. Look at the Mathworks documentation on how to do that – for example instructions for Matlab R2018b are given here).

As a last resort, you can force *lhcube.m* to avoid calculating the minimum distance between points at all by commenting all lines 85-95 in *lhcube.m*. What happens in those lines is that *lhcube.m* selects the ‘best’ LHS (to be delivered as output of the function) out of a certain number of ‘candidate’ LHSs (10 in our case), and the selection criterion is that the ‘best’ LHS is the one where the minimum distance is maximum (so called ‘maximin’ approach). So, if we comment these lines, the selection will not be made, and *lhcube.m* will simply return as output the last candidate LHS – so, still a LHS but possibly suboptimal with respect to the maximin LHS.

[Want to learn more about LHS optimisation? See for instance: Damblin, Couplet, Iooss (2013) Numerical studies of space filling designs: optimization of Latin Hypercube Samples and subprojection properties. Journal of Simulation]