An adaptive sampling method for global sensitivity analysis based on least-squares support vector regression

In the field of engineering, surrogate models are commonly used for approximating the behavior of a physical phenomenon in order to reduce the computational costs. Generally, a surrogate model is created based on a set of training data, where a typical method for the statistical design is the Latin hypercube sampling (LHS). Even though a space-filling distribution of the training data is reached, the sampling process takes no information on the underlying behavior of the physical phenomenon into account and new data cannot be sampled in the same distribution if the approximation quality is not sufficient. Therefore, in this study we present a novel adaptive sampling method based on a specific surrogate model, the least-squares support vector regression. The adaptive sampling method generates training data based on the uncertainty in local prognosis capabilities of the surrogate model areas of higher uncertainty require more sample data. The approach offers a cost efficient calculation due to the properties of the least-squares support vector regression. The opportunities of the adaptive sampling method are proven in comparison with the LHS on different analytical examples. Furthermore, the adaptive sampling method is applied to the calculation of global sensitivity values according to Sobol, where it shows faster convergence than the LHS method. With the applications in this paper it is shown that the presented adaptive sampling method improves the estimation of global sensitivity values, hence reducing the overall computational costs visibly.


Introduction
In numerous fields in civil engineering, numerical models and physical experiments representing the reality are applied for observing physical phenomenona. For improving modeling properties, fields like model calibration and uncertainty analysis are topics of various investigations [1][2][3][4]. Uncertainty analysis involves the determination of uncertainties in the model responses derived from uncertainties in the model parameters and of the relationships between the model parameters and the model responses, which is done by sensitivity analyses. There are a variety of approaches for uncertainty and sensitivity analysis in use, such as differential analysis [5][6][7], variance decomposition procedures [8][9][10][11], and sampling-based approaches [12][13][14][15]. In this study, the focus is on sampling-based approaches, which convince through easy implementation and the waiver of intermediate models. However, sampling-based approaches involve high computational cost because a high amount of data must be calculated with numerical models.
A commonly applied technique to reduce computational cost is to use data-based models, socalled surrogate models. Through that, not the full amount of the data needs to be simulated with the original model because the applied surrogate approximates parts of the data and the computation time can be effectively reduced. Surrogate models are frequently applied for sensitivity analysis [16][17][18], reliability analysis [19][20][21], and nonlinear optimization [22,23]. This paper focuses in particular on the application of surrogate models to global sensitivity analysis. Two of the most popular surrogate models used for global sensitivity analysis are, for example, polynomial chaos expansions [24][25][26][27] and Kriging approximations [16,[28][29][30].
For sufficient results in the analyses, the approximation with the surrogate models must become sufficiently accurate using a limited number of data for training. An influence on the accuracy of the approximation has on the one hand the choice of the surrogate model and the corresponding model parameters and on the other hand the structure of the training data. This contribution deals with the second issue and especially with the challenge of adding new points to an existing sampling set. Therefore, we present a novel adaptive sampling method with the aim to accelerate and improve surrogate-based sensitivity analysis.
The paper is organized as follows. After a brief review of the main surrogate modeling techniques, we address issues of adaptive sampling strategies; then, we present the least-squares support vector regression, which is the applied surrogate model, and introduce the novel adaptive sampling method. Finally, we analyze the functionality and applicability of this method and observe the impact on global sensitivity analysis.

Surrogate modeling
The concept of constructing a surrogate model requires a set of training data points x 1 , ..., x n within a domain D ⊂ R k with known responses y = [y 1 , ..., y n ] T = [ f (x 1 ), ..., f (x n )] T of the observed model, a black box function f , which represents either a physical or a computer experiment. The behavior of the original function f , whose evaluation is usually time-intensive, can be approximated with a surrogatef based on the training data set. There are several techniques for constructing the approximation function and some intensively discussed are: • Polynomial regression [23,31] • Moving least squares [32] • Radial basis function [33,34] • Kriging regression [22,35] • Support vector regression (SVR) [36,37] • Artificial neural network [38].
All of these methods possess various properties and require different computational costs. Therefore, it is often a challenging question to judge on the optimal model choice. In the approach presented in this paper, we focus on the least-squares support vector regression (LS-SVR) because it provides the basis for the investigated method with a favorable calculation of the leave-one-out cross-validation error.
Besides the model selection, the construction of the training data set is a frequently discussed question. On one hand, there are different sampling possibilities to distribute the training points, such as full factorial and stratified random sampling techniques [22]. A stratified random sampling method commonly used is the Latin hypercube sampling (LHS), which we also apply for our analyses. On the other hand, the number n of training points has to be defined optimally. There is a trade-off between increasing the approximation quality, which mostly improves with more sample points, and keeping the computation cost low, which is related to choosing n small. Because the number of required training points is not known, it makes sense to apply adaptive sampling strategies, which is the topic of the next section.

Review of adaptive sampling methods
The LHS method provides a favorable, space-filling distribution of the training points. Nevertheless, if a sufficient approximation quality is not reached, new sample points have to be added to the existing training data set. This expansion cannot be done in the strategy of the LHS method. In the application fields of reliability analysis and nonlinear optimization, various adaptive sampling methods exist that sample new points regarding their objectives (e.g. [20,[39][40][41]). However, we are interested here in the application to global sensitivity analysis, which requires a good approximation quality over the whole range of the parameters. Therefore, areas with low approximation quality have to be identified and improved by adding new training points. Some existing global adaptive sampling strategies are listed here: • Kriging approaches For example, the entropy approach [42] max • Cross-validation approach [43] The Expressions 1, 2, and 3 describe the corresponding selection criteria for a new training point x new . The Kriging approaches use the benefits of the ordinary Kriging method, which provides the estimation of the prediction error. In the mentioned example, the entropy approach, the aim is to maximize the amount of information that can be obtained from new sample points. There, R n+1 describes the correlation matrix between the initial training data set and a new point, and 1 n+1 is a (n + 1)-dimensional vector of 1's. Even when several sequential approaches based on the Kriging method exist, they are not applicable to other surrogate models.
In contrast, the cross-validation approach can be applied to all kinds of surrogate models. It calculates a mean squared error based on the deviation between the surrogate functionf for n training points and the sub-surrogates f −i (x) constructed with the training data set except the i'th point for i = 1, ..., n. Therefore, it requires more computational effort because n additional approximation functions have to be constructed.
In the third mentioned method, the maximin distance approach, the information obtained from the existing surrogate are not taken into account; instead, the aim is to maximize the Euclidean distance d(x i , x j ) between the sample point x i and its nearest neighbor x j . Therefore, new points are sampled where no information from the original function exists and not where the approximation quality is perhaps lower.
In this contribution, we introduce a novel global adaptive sampling method, which offers a cost efficient calculation due to the properties of the underlying LS-SVR.

Proposed approach
In this section, we explain first the applied surrogate method and secondly, we introduce the novel global adaptive sampling method.

Least-squares support vector regression
One class of surrogate methods are the SVR methods, which have been originally introduced in the context of binary classification [45,46] and then extended to regression methods [47]. A special form of the SVR methods is the LS-SVR [48,49], which will be explained in this section. In order to give an overview about the functionality of the LS-SVR, we explain the idea first in the linear case and extend it then to a nonlinear approximation. Finally, the calculation of the model parameters is described. The approximation function built in the linear case and expressed at point x is formulated as followed: where w = [w 1 , ..., w k ] T and b are the unknown parameters of the method. To identify the optimal parameters, the optimization problem min w,b,ζ has to be solved. We minimize in the concept of the LS-SVR the sum of squares of the error between the response of the original function and the response of the approximation ζ i = y i − f (x i ) instead of a sum of the absolute error greater than ε as for the standard SVR based on the ε-insensitive loss function. The resulting advantage will be discussed later. The sum of squared errors is penalized by the parameter C > 0 to control the smoothness of the approximation. The linear approximation problem is displayed in Figure 1.
The optimization problem defined in Equation 5 can be reformulated into the following dual optimization problem by applying the Lagrangian function and the Karush-Kuhn-Tucker conditions for optimality: with (K) i j = x T i x j and the n Lagrangian multipliers α = [α 1 , ..., α n ] T . Since this optimization problem consists of a linear set of equations, it can be solved easily to obtain the parameters of the approximation function, expressed aŝ Whereby, the construction of the approximation function is faster than for the classical SVR, which require solving a quadratic optimization problem. The only concern that is listed against the LS-SVR in comparison with the classical SVR, is that the prediction is written in terms of all training points and thereby cannot be formulated in terms of a sparse representation. This can lead to higher processing time if n gets high, which should be avoided already because of the computational costs of the original function f .
Nonlinear case: The concept of the linear LS-SVR can be directly adapted to the nonlinear case by using kernel functions. The main idea is to map the input data into a nonlinear feature space denoted by F , enabling a linear LS-SVR in F . It is unnecessary to define the mapping φ : D → F explicitly; choosing a kernel function k that corresponds to k( [45,50]. Therefore, the nonlinear approximation function is expressed aŝ For a linear mapping, Equation 8 becomes Equation 7. To identify the optimal model parameters α and b, the dual optimization problem from Equation 6 has to be solved, as for the linear case. Though, the entries of the matrix K are now defined by the kernel function with (K) i j = k(x i , x j ). Some possible choices for the kernel function are [37,51]: In this investigation we use the Gaussian kernel which is the most widely used kernel in the SVR literature.

Estimation of the model parameters:
To result in the most suitable approximation function, we need to identify the optimal choice of the kernel parameter σ and the regularization parameter C. If another kernel function is applied, different kernel parameters has to be identified accordingly. Therefore, we define the hyperparameters θ = [C, σ , ...] T including all utilized parameters.
As a criteria to decide about the parameter selection the K-fold cross-validation error [52] is frequently applied. Therefore, the training data set is split randomly into K sub data sets and sub-surrogates are built, for each of which one subset of the training points is left out. The left-out subset can be used as an untrained test data set to examine the approximation quality in unobserved areas. The smaller K is choosen, the faster is the calculation of the cross-validation; however, the training data set gets quite reduced compared to the whole one and the variance of the cross-validation error increases. We apply in this study the special case K = n, known as leave-one-out error. The leave-one-out error is an almost unbiased estimate of the expected error [53] and provides therefore the best decision criterion. Because the calculation of the leave-one-out error is costly, commonly, a 10-fold cross-validation error is used [54]. However, by using the LS-SVR, the calculation cost of the leave-one-out error can be reduced, which is shown below.
For the leave-one-out error, the sub-surrogatef −i is constructed each time with the training data set except the i'-th training point. The result of the sub-surrogate at the left-out point x i is then compared with the result of the true function value y i , so that the leave-one-out error Err LOO is formulated as By minimizing Err LOO , we obtain the best choices for the hyperparameters θ . For the calculation of this error value, it is necessary to construct the corresponding sub-surrogatesf −i for i = 1, ..., n, which requires more computational time. However, the LS-SVR enables an analytical calculation of the deviation of the true responses y i to the result of the sub- where (K −1 ) ii is the i-th diagonal element of the inverse of With the analytical formulation of the leave-one-out error, the LS-SVR offers a very efficient parameter estimation. Therefore, and due to the linear problem settings, the LS-SVR is a practical method with low computational costs, especially useful for high dimensional problems. Furthermore, the analytical formulation of the leave-one-out error offers the basis for the adaptive sampling method explained in the next section.

Distance-based LOO error sampling method
The global adaptive sampling method we introduce in this investigation is called the "distancebased LOO error sampling method". It uses two different criteria to choose suitable new sampling points.
First, we want to identify areas which have a higher uncertainty in the approximation quality than others. An indicator for the uncertainty of the whole approximation function is the leaveone-out error defined in Equation 9. This error consists of the sum of the deviation at each training point, so that each summand describes the uncertainty at one training point. In case of the LS-SVR, the analytical formulation of each of these deviations is given in Equation 10.
The uncertainty of the approximation function at an unobserved point can be approximated by the uncertainty at the training points, implying that the behavior of the model uncertainty is continuous. More precisely, we use the deviations y i −f −i (x i ) for i = 1, ..., n as responses of the training points to construct the leave-one-out error functionf LOO (x) by applying the LS-SVR. New sampling points can be then obtained by maximizing the functionf LOO over x.
Second, we want to avoid getting redundant information by constructing new training points. However, it is possible thatf LOO is maximal at or close to an existing training point. Therefore, we take additionally the maximin distance approach into account, which is also done in the cross-validation approach in [43]. The maximin distance approach prevents close and therefore redundant training points.
By taking both criteria into account, we obtain an optimization problem where we have to maximize both the leave-one-out error functionf LOO and the minimal distance between the sample points. This optimization problem can be formulated as follows: As a distance function d(x, x i ) the Euclidean metric is selected.

Selection of suitable points:
Still, there is the question how to solve the optimization problem from Expression 12. We use the following two ways to select new sampling points related to this optimization problem: • Selection out of a predifined large sample set X new with e.g. N new = 50 000 • Sampling related to the quasi-distribution function In the first case, new points are taken from a large sample set. We choose the points with the highest result forf LOO (x) min i d(x, x i ). This calculation needs a computational effort depending on the size N new of the sample set where we choose the new points from. Therefore, it is recommended to use this way to select new points only for research analyses. For numerical and physical applications, where small computation time is desired, the second selection method is recommended. There, the objective of the optimization problem is taking as a quasi-distribution function δ (x). New points are randomly sampled along δ (x) with the slice sampling routine [56]. Accordingly, sample points with higher results for the optimization function have a higher possibility to be chosen. After identifying the new sample points with one of the two methods the corresponding responses have to be calculated with the original function. With the expanded training data set, a new surrogate can be constructed.
In the following, we explain in more details how the adaptive sampling algorithm works and give a pseudo-code for a Matlab implementation. Implementation: In the main program, first, the LS-SVR is constructed with the initial training data set X = [x 1 , ..., x n ] T and their responses of the original function y. During the approximation process the parameters α and b are calculated by solving Equation 6, and θ by minimizing Equation 9.
We also obtain the results y i −f −i (xi) from Equation 10 for i = 1, ..., n, which are used for the adaptive sampling method. As the second part stepwise new points are added to X and y by applying the distance-based LOO error sampling method. The number m of calls of the adaptive sampling method and the number n new of new points in each step is predefined. However, it is possible to include a termination criterion in the corresponding loop such as Err LOO greater than a constant number corresponding to a sufficient approximation quality. Therefore, the limit for sufficient approximation quality is strongly dependent on the observed problem; influencing factors could be the dimension, parameter range, assumed noise in the input parameters, or the application field. The pseudo-code of the main program is listed below. For the second way to choose new sampling points, the slice sampling routine [56] is used. There, points are sampled along a pseudo-distribution function p. As an initial point a x-value is chosen with p(x) > 0. Outgoing from the result from the previous step a random value u i between 0 and p(x i−1 ) is chosen. Then, x i is sampled randomly until a value is found with p(x i ) > u i . With this procedure sampling points with higher values of p have a higher probability to be selected. The one-dimensional case of one step of the slice sampling routine is shown in Figure 2. In our application p(x) is chosen asf LOO min i d(x, x i ). The corresponding adaptive sampling routine is presented below. In this program we assume the sample points in the interval [0, 1], which has to be adjusted according to the problem of interest.

Numerical analysis
In the following numerical analysis, we observe the efficiency of the introduced adaptive sampling method by comparing it with the LHS method. Although, the LHS method is not an adaptive sampling method, still it provides an equally distributed sample set better than the maximum distance approach; therefore, it gives a good comparison to the adaptive sampling method, where we improve the approximation quality in specific areas.
Two quality criteria are used to assess the approximation quality of the respective surrogate model. Both are based on the coefficient of determination (CoD) [57], which is formulated as followed The CoD describes how much of the behavior of the original function can be described by the approximation and indicates a good approximation with a value close to 1. Although, the CoD is a meaningful error criterion, it uses the training points also as test points and could result therefore in misleading outcomes. Instead, we use on the one hand the cross-validation approach and on the other hand an additional unused data set. By usingf −i instead off , we result in the CoD based on cross-validation (CoD CV ). In the second case -the validated CoD (CoD val )-the test points x i and their responses y i with i = 1, ..., N come from a large data set (for example N = 100 000), which was not used before. In the investigation, the most meaningful is the CoD val , but it can be used only for selected examples where the amount of available data is large enough.
In the following subsections, we observe different functions which represent cases that can appear in real-world problems and investigate the application of the adaptive sampling method to the sensitivity analysis. For a better overview, the discussed examples are listed hereinafter: It is obvious that the new sampling points are selected in areas where the approximation quality has to be improved. Both optimization criteria affect the choice of new points. Often thê f LOO takes the maximum value close to already-existing training points. Therefore, including the maximin distance approach is important for the sampling strategy and depending on the existing approximation has more or less influence.
The function f (x) = x sin x is a function with similar behavior over the whole domain. It varies over the whole range; even though, the variation of the function increases with increasing x. Therefore, it is expected that an equally distributed sample set is of advantage and that the application of the adaptive sampling method cannot effectively improve the approximation quality. From the results of the CoD CV and the CoD val shown in Figure 4, it is visible that there is no great differences between the results of the adaptive sampling method and the LHS method. Nevertheless, the results indicate already some improvements by the use of the adaptive sampling method. By applying the distance-based LOO error sampling method, better results are obtained for n = 7 and n = 9. Also for higher n, the CoD val of the adaptive sampling method is slightly larger than for the LHS method and the variance is smaller.

Three
As a second example we observe the three-dimensional function x 1 sin x 1 + x 2 sin x 2 + x 3 sin x 3 within the interval [0, 15] 3 . The variation of the function is similar to the previous one. Though, in this case more input parameter are important and have the same influence on the response of the function.
In Figure 5, the CoD CV -and CoD val -values are depicted. The results of the CoD CV indicate clearly that the LHS method gives the better results. However, the CoD val show that the adaptive sampling method improves the results from n = 175. An explanation is that, first sample points are chosen over the whole domain to adopt global behavior; later the local approximation quality

CoD val
Adaptive sampling method LHS Figure 5: Boxplot of the CoD CV (left) and the CoD val (right) for the distance-based LOO error method (red) and LHS method (blue) depending on n for the approximation of x 1 sin x 1 + x 2 sin x 2 + x 3 sin x 3 can be improved. Therefore, the adaptive sampling method shows its improvement just for higher values of n and it would make sense to start later with this method.
It should be mentioned that the results of the CoD CV do not show an improvement of the adaptive sampling method for any number of n. However, the CoD val uses an additional data set and is therefore more trustable.

Model with Singularity
We are interested here in functions with weak discontinuities in some regions. This functions have some areas with low and high variations. As an example we study a typical function from engineering: The absolute value of the transfer function of the equation of motion. The problem is described as follows: • The choice of the damping parameter c influences the damping effect and therefore the nonlinearity of the observed function. In Figure 6 we can see on the left hand side an approximation with the training points sampled using the adaptive sampling method and on the right hand side an approximation with LHS training points. It is evident that the first approximation is closer to the true function, because more sample points are constructed where the curvature of the graph of f (x) changes more rapidly.
By comparing the results of the CoD val for both the adaptive sampling method and the LHS method, it is also visible that the convergence to an acceptable solution is faster and more robust if the distance-based LOO error method is used, as illustrated in Figure 7. Again, the CoD CV misinterprets the results for the smallest n and recommends the use of the LHS method; however, from n = 12 also the CoD CV shows that the application of the adaptive sampling method provides better results.

Model with decaying influence of the variables
Another type of functions worth to investigate are functions where the input parameters have different influence on the output. Therefore, we analyze the following benchmark function The result of the function f from Equation 14 for dimension k = 2 are displayed in Figure 8.
From the values of the CoD CV and the CoD val it is visible that the approximation is already satisfying with n = 100 training point, but slightly better for the adaptive sampling method. In the three-dimensional case of the function f from Equation 14, we have to use a higher number of training points to get a sufficient approximation quality (CoD val ≥ 0.9). The results are shown in Figure 9. The values of the CoD CV recommend to use only LHS data because the results of the distance-based LOO error sampling method are lower and have a higher variance. However, the CoD val show that the approximation quality with the use of the adaptive sampling method is better for n larger than 300. Because at the beginning the adaptive sampling method not necessarily provides better approximation quality, it is recommended to start later sampling new points with respect to that strategy, depending on the dimension of the observed problem. For this example, it is recommended to start with n = 200 LHS training points and increase then the quality by adding new points regarding the distance-based LOO error method.

Noisy function
If the observed data are obtained from experiments, it is expected that the responses are corrupted by noise. In those cases it is desirable that the surrogate model approximates the behavior Adaptive sampling method LHS Figure 9: Boxplot of the CoD CV (left) and the CoD val (right) for the distance-based LOO error method (red) and LHS method (blue) depending on n for the approximation of of the underlying engineering effect and does not get lost in the noise. Especially for the adaptive sampling method, the noise should not prevent sampling new points in interesting areas.
To analyze noise corrupted functions, we revisit the model from Subsection 3.3 and add randomly to each sample point normally distributed noise with zero mean and a variance of 0.1 and 0.2. The corresponding approximations with and without the adaptive sampling method are displayed in Figure 10 and 11.
The higher the deviation of the training data from the original trend, the higher variation in both approximations is visible especially in the area that was nearly linear before. Still, the use of the adaptive sampling method offers a better approximation especially in the interesting area where the damping effect is visible. In the case where only the LHS method is applied, high deviations close to the edges of the domain can appear. Nevertheless, the displayed approximations are just possible surrogates, which change with the use of different training data. Therefore, we show in Figure 12 and 13 the results of 100 approximations with and without the use of the adaptive sampling method for the absolute transfer function with an additional noise distributed with N (0, 0.1) and N (0, 0.2).
With the displayed results of the CoD val it is visible that with increasing noise the maximal possible approximation quality decreases, which is reasonable because the error in the training data set through the noise cannot be avoided. In the case with σ = 0.1 the variance of the CoD val is higher for n = 18, 19 and 20 than for n = 14 to n = 17 if a LHS training data set is used. It shows that not always the use of more training points improves the approximation quality, rather the locations of new points are important. This effect is not visible for the higher noise level. From the results it can be concluded that the distance-based LOO error method provides better approximations even if the data are corrupted by noise. However, if the variance of the overlying noise gets higher, the variation of the noise and of the function cannot be distinguished and the results of both ways of approximation are more and more similar. The reason is that in this case no area is of special interest anymore because of its higher nonlinearity and therefore, new points are sampled over the whole domain. The results of the CoD CV show for σ 2 = 0.1 again at the beginning (until n = 10) a better convergence of the surrogates using LHS training data sets. However, at the latest from n = 14 a significant decrease of the variance and improvement of the results is visible for both noise-levels.

Computation of sensitivity indices based on the adaptive LS-SVR
With the previous examples it is shown that the distance-based LOO error sampling method mostly improves the quality of the approximation, especially if more points are required in specific areas. However, not only the results of the CoD CV and the CoD val are important, but also it is of interest how the adaptive sampling method affects the calculation of the sensitivity values.
The sensitivity analysis is of importance to determine the most decisive parameters on the response, so that further research could be focused on those parameters first. There are different possibilities to perform a sensitivity analysis such as variance-based methods and derivativebased approaches. In the present work we focus on the application of surrogate models to variance based methods because they have the opportunity of being model-independent and they take the whole input space into account. One often used global variance-based approach is the Sobol indices proposed by Sobol [58], where we observe the first-order effects S i and the total effects S T i of each input parameter X i (i = 1, ..., n) on the output Y . The values S i represent the single influence of each parameter, while the values S T i contain also the interaction effects between the parameters. They can be calculated using Equation 15 and 16.
In these equations Var(Y ) is the unconditional variance and E(Y |X i )) and (E(Y |X ∼i )) describe the conditional expected values. Thereby, conditional on X ∼i means depending on all X j with j = i. However, it is usually not possible to calculate these sensitivity values analytical so that the following estimator of the Sobol indices is used [9]: data sets with the function evaluations y j = f ([x j1 , ..., x jk ]) and y j = f ([x j1 , ..., x jk ]). Additionally, k data sets are constructed by exchanging the i'th row of X with values from X. Thus, we determine the function responses y X i j of [x j1 , ..., x ji−1 , x ji , x ji+1 , ..., x jk ] T . That means (k + 2)N function evaluations are needed to calculate the estimators of the sensitivity values. In order to get good estimations, N have to be high, which causes high computational time and is the reason for applying surrogate models.
By the application of the surrogate models, we get the approximationsŜ n,N i andŜ n,N T i of the esti-matorsŜ N i andŜ N i and therefore estimators for the Sobol indices S i and S T i . The approximation quality ofŜ n,N i andŜ n,N T i depends not only strongly on the number of sample points n and N, but also on the distribution of the sample points. The number N does not decisively influences the computational costs because it only concerns the number of evaluations of the relatively cheap surrogate model. In contrast, it is important to keep n low. Through the application of the adaptive sampling strategy, it is to be achieved that an optimal sampling strategy is found with which the number of required function calls can be reduced.
In the following subsections we observe how the approximation quality of the surrogate-based estimator of the Sobol sensitivity values can be improved by applying the adaptive sampling strategy. Each time one hundred simulations with N = 100 000 for different n are performed and displayed in boxplots.
4.1 Computation of sensitivity indices of f (x) = x 1 sin x 1 +x 2 2 sin x 2 +x 3 3 sin x 3 As an example, we investigate the results of a function where the input parameters have very different influence on the output and therefore different sensitivity values. The observed function is the following: within the interval [0, 15] 3 . The third input parameter x 3 has the highest influence on the output variance with a sensitivity value S 3 = 0.9938, while the other two parameter have a sensitivity close to zero (S 1 = 0.0002 and S 2 = 0.0061). The total effects have same values as S 1 , S 2 , and S 3 because no interaction effects appear.
From the results of the quality criteria depicted in Figure 14, a high improvement by applying the adaptive sampling method is visible. The convergence of the CoD CV and the CoD val to one is significantly faster in the case where the adaptive sampling method is applied. Also from n = 125 the variance of the results is lower.
Now, we compare the sensitivity values calculated from the different surrogate models with the true sensitivity values. The results of all first-order and total-order effects are shown in Figure 15. Here, we come to the same conclusion, that the adaptive sampling method improves the result visibly. For the two low sensitivity values, the differences between the results of the two methods are visible particularly for the total sensitivity values. It requires more training points to detect a low total sensitivity value if only a LHS data set is used. The results of the third input parameter even show a clear improvement for the first sensitivity value.

Computation of the sensitivity indices of the model with decaying influence of the variables
In this subsection, we have a look at the example from Subsection 3.4. In the analysis of Expression 14 for k = 2 it was visible that the adaptive sampling methods improves the results of the approximation quality at the latest n = 100. The results for the sensitivity analysis, shown in Figure 16, provide the same results: The surrogate approximations of the sensitivity values are all closer to the true sensitivity values (dashed line) from n = 100 and have a lower variance. Even though, the results do not differ extremely.
For the three-dimensional case, the CoD val indicates a better approximation with the use of the adaptive sampling method from n = 400, while the results of the CoD CV advise to use a LHS-data set. Besides, a sufficient approximation quality is not reached with n = 500 sample points. This is also visible for the approximation of the sensitivity values plotted in Figure 17 where not always the deviation from the true value for n = 500 is acceptable. For all sensitivity √ 2x 2 approximated with the LS-SVR based on the distance-based LOO error method (red) and the LHS method (blue) depending on n values except S 1 , the approximation based on the adaptive sampling method is closer to the true values. Though, sometimes earlier and sometimes later than n = 400. An improvement for high n is with the use of the adaptive sampling method definitely visible; however, it is not clearly recognizable from the related quality criteria.
From the results of these two sensitivity examples it can be concluded that the distance-based LOO error method improves the approximation of the true sensitivity values. How great the improvement is and from which n it starts depends on the respective example. It is clear that if the CoD CV advises the use of the distance-based LOO error method, also the sensitivity values based on this method are more accurate. Other way around is not necessarely true as seen in the second example.

Closure
In this paper we presented a new global adaptive sampling method and investigate it with different analytical examples. The use of the distance-based LOO error method is of advantage because it identifies areas where the uncertainty of the approximation is the highest; new samples are taken correspondingly. Furthermore, the calculation time remains low because the advantages of the LS-SVR are used. Therefore, it can just be used for the approximation with the LS-SVR, which can be regarded as a disadvantage.
During the numerical analysis, it was visible that the distance-based LOO error method can be used to improve the approximation quality. The method is particularly suitable for functions with critical behavior in some specific regions. Depending on how much the variation of the function changes over the whole region, the global adaptive sampling method is more visibly helpful. Often it makes sense to firstly adapt the global trend of the observed function and later start with the adaptive sampling method to improve the approximation in local regions. One critical point in our investigation is that the CoD CV and the CoD val not always show the same results. In fact, the CoD val is more trustable; however, it cannot be used for real-worldapplications because it requires too many sample points. For these problems, we have no other choice than to trust the results of the CoD CV .
By observing the calculation of sensitivity values an improvement by the use of the adaptive method was also visible. This effect was more distinct when the results from the CoD CV match the results from the CoD val . Therefore, the CoD CV is a sufficient criterion for the usefullness of the application of the distance-based LOO error method. In general, it is advised to apply the observed sampling strategy to global sensitivity investigations.
Finally, it should be mentioned that we compared in this paper only the adaptive sampling method with the LHS method because it gives the most suitable space-filling distribution. However, in an engineering application it does not make sense to sample new, larger LHS data sets if the previous number of sample points is not sufficient. In this situation an adaptive sampling method has to be considered, anyway. Therefore, further research will compare the distance-based LOO error method with other global adaptive sampling methods and will apply this method to different engineering examples.