sequd.pybatdoe
sequd.pybatdoe.batch_grid
- class pybatdoe.batch_grid.GridSearch(para_space, max_runs=100, estimator=None, cv=None, scoring=None, refit=True, n_jobs=None, random_state=0, verbose=False)[source]
Implementation of grid search.
- Parameters
para_space (dict or list of dictionaries) –
It has three types:
- Continuous:
Specify Type as continuous, and include the keys of Range (a list with lower-upper elements pair) and Wrapper, a callable function for wrapping the values.
- Integer:
Specify Type as integer, and include the keys of Mapping (a list with all the sortted integer elements).
- Categorical:
Specify Type as categorical, and include the keys of Mapping (a list with all the possible categories).
max_runs (int, optional, default=100) – The maximum number of trials to be evaluated. When this values is reached, then the algorithm will stop.
estimator (estimator object) – This is assumed to implement the scikit-learn estimator interface.
cv (cross-validation method, an sklearn object.) – e.g., StratifiedKFold and KFold` is used.
scoring (string, callable, list/tuple, dict or None, optional, default=None) – A sklearn type scoring function. If None, the estimator’s default scorer (if available) is used. See the package sklearn for details.
refit (boolean, or string, optional, default=True) – It controls whether to refit an estimator using the best found parameters on the whole dataset.
n_jobs (int or None, optional, optional, default=None) – Number of jobs to run in parallel. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. See the package joblib for details.
random_state (int, optional, default=0) – The random seed for optimization.
verbose (boolean, optional, default = False) – It controls whether the searching history will be printed.
>>> import numpy as np >>> from sklearn import svm >>> from sklearn import datasets >>> from sequd import GridSearch >>> from sklearn.model_selection import KFold >>> iris = datasets.load_iris() >>> ParaSpace = {'C':{'Type': 'continuous', 'Range': [-6, 16], 'Wrapper': np.exp2}, 'gamma': {'Type': 'continuous', 'Range': [-16, 6], 'Wrapper': np.exp2}} >>> estimator = svm.SVC() >>> cv = KFold(n_splits=5, random_state=0, shuffle=True) >>> clf = GridSearch(ParaSpace, max_runs=100, estimator=estimator, cv=cv, scoring=None, n_jobs=None, refit=False, random_state=0, verbose=False) >>> clf.fit(iris.data, iris.target)
- Variables
best_score_ (float) – The best average cv score among the evaluated trials.
best_params_ (dict) – Parameters that reaches best_score_.
best_estimator_ (sklearn estimator) – The estimator refitted based on the best_params_. Not available if estimator = None or refit=False.
search_time_consumed_ (float) – Seconds used for whole searching procedure.
refit_time_ (float) – Seconds used for refitting the best model on the whole dataset. Not available if estimator=None or refit=False.
grid search is not recommend for high dimensional hyperparameter tunning. As it is limited by the max_run specified by the user, the grid points may be badly distributed.
sequd.pybatdoe.batch_rand
- class pybatdoe.batch_rand.RandSearch(para_space, max_runs=100, estimator=None, cv=None, scoring=None, refit=True, n_jobs=None, random_state=0, verbose=False)[source]
Implementation of Random Search.
- Parameters
para_space (dict or list of dictionaries) –
It has three types:
- Continuous:
Specify Type as continuous, and include the keys of Range (a list with lower-upper elements pair) and Wrapper, a callable function for wrapping the values.
- Integer:
Specify Type as integer, and include the keys of Mapping (a list with all the sortted integer elements).
- Categorical:
Specify Type as categorical, and include the keys of Mapping (a list with all the possible categories).
max_runs (int, optional, default=100) – The maximum number of trials to be evaluated. When this values is reached, then the algorithm will stop.
estimator (estimator object) – This is assumed to implement the scikit-learn estimator interface.
cv (cross-validation method, an sklearn object.) – e.g., StratifiedKFold and KFold` is used.
scoring (string, callable, list/tuple, dict or None, optional, default=None) – A sklearn type scoring function. If None, the estimator’s default scorer (if available) is used. See the package sklearn for details.
refit (boolean, or string, optional, default=True) – It controls whether to refit an estimator using the best found parameters on the whole dataset.
n_jobs (int or None, optional, optional, default=None) – Number of jobs to run in parallel. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. See the package joblib for details.
random_state (int, optional, default=0) – The random seed for optimization.
verbose (boolean, optional, default=False) – It controls whether the searching history will be printed.
>>> import numpy as np >>> from sklearn import svm >>> from sklearn import datasets >>> from sequd import RandSearch >>> from sklearn.model_selection import KFold >>> iris = datasets.load_iris() >>> ParaSpace = {'C':{'Type': 'continuous', 'Range': [-6, 16], 'Wrapper': np.exp2}, 'gamma': {'Type': 'continuous', 'Range': [-16, 6], 'Wrapper': np.exp2}} >>> estimator = svm.SVC() >>> cv = KFold(n_splits=5, random_state=0, shuffle=True) >>> clf = RandSearch(ParaSpace, max_runs=100, estimator=estimator, cv=cv, scoring=None, n_jobs=None, refit=False, rand_seed=0, verbose=False) >>> clf.fit(iris.data, iris.target)
- Variables
best_score_ (float) – The best average cv score among the evaluated trials.
best_params_ (dict) – Parameters that reaches best_score_.
best_estimator_ (sklearn estimator) – The estimator refitted based on the best_params_. Not available if estimator = None or refit=False.
search_time_consumed_ (float) – Seconds used for whole searching procedure.
refit_time_ (float) – Seconds used for refitting the best model on the whole dataset. Not available if estimator=None or refit=False.
sequd.pybatdoe.batch_lhs
- class pybatdoe.batch_lhs.LHSSearch(para_space, max_runs=100, estimator=None, cv=None, scoring=None, refit=True, n_jobs=None, random_state=0, verbose=False)[source]
Implementation of Latin Hypercube Sampling.
- Parameters
para_space (dict or list of dictionaries) –
It has three types:
- Continuous:
Specify Type as continuous, and include the keys of Range (a list with lower-upper elements pair) and Wrapper, a callable function for wrapping the values.
- Integer:
Specify Type as integer, and include the keys of Mapping (a list with all the sortted integer elements).
- Categorical:
Specify Type as categorical, and include the keys of Mapping (a list with all the possible categories).
max_runs (int, optional, default=100) – The maximum number of trials to be evaluated. When this values is reached, then the algorithm will stop.
estimator (estimator object) – This is assumed to implement the scikit-learn estimator interface.
cv (cross-validation method, an sklearn object.) – e.g., StratifiedKFold and KFold` is used.
scoring (string, callable, list/tuple, dict or None, optional, default=None) – A sklearn type scoring function. If None, the estimator’s default scorer (if available) is used. See the package sklearn for details.
refit (boolean, or string, optional, default=True) – It controls whether to refit an estimator using the best found parameters on the whole dataset.
n_jobs (int or None, optional, optional, default=None) – Number of jobs to run in parallel. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. See the package joblib for details.
random_state (int, optional, default=0) – The random seed for optimization.
verbose (boolean, optional, default=False) – It controls whether the searching history will be printed.
>>> import numpy as np >>> from sklearn import svm >>> from sklearn import datasets >>> from sequd import LHSSearch >>> from sklearn.model_selection import KFold >>> iris = datasets.load_iris() >>> ParaSpace = {'C':{'Type': 'continuous', 'Range': [-6, 16], 'Wrapper': np.exp2}, 'gamma': {'Type': 'continuous', 'Range': [-16, 6], 'Wrapper': np.exp2}} >>> estimator = svm.SVC() >>> cv = KFold(n_splits=5, random_state=0, shuffle=True) >>> clf = LHSSearch(ParaSpace, max_runs=100, estimator=estimator, cv=cv, scoring=None, n_jobs=None, refit=False, random_state=0, verbose=False) >>> clf.fit(iris.data, iris.target)
- Variables
best_score_ (float) – The best average cv score among the evaluated trials.
best_params_ (dict) – Parameters that reaches best_score_.
best_estimator_ (sklearn estimator) – The estimator refitted based on the best_params_. Not available if estimator = None or refit=False.
search_time_consumed_ (float) – Seconds used for whole searching procedure.
refit_time_ (float) – Seconds used for refitting the best model on the whole dataset. Not available if estimator=None or refit=False.
sequd.pybatdoe.batch_sobol
- class pybatdoe.batch_sobol.SobolSearch(para_space, max_runs=100, estimator=None, cv=None, scoring=None, refit=True, n_jobs=None, random_state=0, verbose=False)[source]
Implementation of Sobol Sequence.
- Parameters
para_space (dict or list of dictionaries) –
It has three types:
- Continuous:
Specify Type as continuous, and include the keys of Range (a list with lower-upper elements pair) and Wrapper, a callable function for wrapping the values.
- Integer:
Specify Type as integer, and include the keys of Mapping (a list with all the sortted integer elements).
- Categorical:
Specify Type as categorical, and include the keys of Mapping (a list with all the possible categories).
max_runs (int, optional, default=100) – The maximum number of trials to be evaluated. When this values is reached, then the algorithm will stop.
estimator (estimator object) – This is assumed to implement the scikit-learn estimator interface.
cv (cross-validation method, an sklearn object.) – e.g., StratifiedKFold and KFold` is used.
scoring (string, callable, list/tuple, dict or None, optional, default=None) – A sklearn type scoring function. If None, the estimator’s default scorer (if available) is used. See the package sklearn for details.
refit (boolean, or string, optional, default=True) – It controls whether to refit an estimator using the best found parameters on the whole dataset.
n_jobs (int or None, optional, optional, default=None) – Number of jobs to run in parallel. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. See the package joblib for details.
random_state (int, optional, default=0) – The random seed for optimization.
verbose (boolean, optional, default=False) – It controls whether the searching history will be printed.
>>> import numpy as np >>> from sklearn import svm >>> from sklearn import datasets >>> from sequd import SobolSearch >>> from sklearn.model_selection import KFold >>> iris = datasets.load_iris() >>> ParaSpace = {'C':{'Type': 'continuous', 'Range': [-6, 16], 'Wrapper': np.exp2}, 'gamma': {'Type': 'continuous', 'Range': [-16, 6], 'Wrapper': np.exp2}} >>> estimator = svm.SVC() >>> cv = KFold(n_splits=5, random_state=0, shuffle=True) >>> clf = SobolSearch(ParaSpace, max_runs=100, estimator=estimator, cv=cv, scoring=None, n_jobs=None, refit=False, random_state=0, verbose=False) >>> clf.fit(iris.data, iris.target)
- Variables
best_score_ (float) – The best average cv score among the evaluated trials.
best_params_ (dict) – Parameters that reaches best_score_.
best_estimator_ (sklearn estimator) – The estimator refitted based on the best_params_. Not available if estimator=None or refit=False.
search_time_consumed_ (float) – Seconds used for whole searching procedure.
refit_time_ (float) – Seconds used for refitting the best model on the whole dataset. Not available if estimator = None or refit=False.
sequd.pybatdoe.batch_ud
- class pybatdoe.batch_ud.UDSearch(para_space, max_runs=100, max_search_iter=100, estimator=None, cv=None, scoring=None, refit=True, n_jobs=None, random_state=0, verbose=False)[source]
Implementation of Uniform Design.
- Parameters
para_space (dict or list of dictionaries) –
It has three types:
- Continuous:
Specify Type as continuous, and include the keys of Range (a list with lower-upper elements pair) and Wrapper, a callable function for wrapping the values.
- Integer:
Specify Type as integer, and include the keys of Mapping (a list with all the sortted integer elements).
- Categorical:
Specify Type as categorical, and include the keys of Mapping (a list with all the possible categories).
max_runs (int, optional, default=100) – The maximum number of trials to be evaluated. When this values is reached, then the algorithm will stop.
max_search_iter (int, optional, default=100) – The maximum number of iterations used to generate uniform design or augmented uniform design.
estimator (estimator object) – This is assumed to implement the scikit-learn estimator interface.
cv (cross-validation method, an sklearn object.) – e.g., StratifiedKFold and KFold` is used.
scoring (string, callable, list/tuple, dict or None, optional, default=None) – A sklearn type scoring function. If None, the estimator’s default scorer (if available) is used. See the package sklearn for details.
refit (boolean, or string, optional, default=True) – It controls whether to refit an estimator using the best found parameters on the whole dataset.
n_jobs (int or None, optional, optional, default=None) – Number of jobs to run in parallel. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. See the package joblib for details.
random_state (int, optional, default=0) – The random seed for optimization.
verbose (boolean, optional, default=False) – It controls whether the searching history will be printed.
>>> import numpy as np >>> from sklearn import svm >>> from sklearn import datasets >>> from sequd import UDSearch >>> from sklearn.model_selection import KFold >>> iris = datasets.load_iris() >>> ParaSpace = {'C':{'Type': 'continuous', 'Range': [-6, 16], 'Wrapper': np.exp2}, 'gamma': {'Type': 'continuous', 'Range': [-16, 6], 'Wrapper': np.exp2}} >>> estimator = svm.SVC() >>> cv = KFold(n_splits=5, random_state=0, shuffle=True) >>> clf = UDSearch(ParaSpace, max_runs=100, max_search_iter=100, estimator=estimator, cv=cv, scoring=None, n_jobs=None, refit=False, random_state=0, verbose=False) >>> clf.fit(iris.data, iris.target)
- Variables
best_score_ (float) – The best average cv score among the evaluated trials.
best_params_ (dict) – Parameters that reaches best_score_.
best_estimator_ (sklearn estimator) – The estimator refitted based on the best_params_. Not available if estimator = None or refit=False.
search_time_consumed_ (float) – Seconds used for whole searching procedure.
refit_time_ (float) – Seconds used for refitting the best model on the whole dataset. Not available if estimator=None or refit=False.
sequd.pybayopt
sequd.pybayopt.bayopt_gpei
- class pybayopt.bayopt_gpei.GPEIOPT(para_space, max_runs=100, time_out=10, estimator=None, cv=None, scoring=None, refit=True, random_state=0, verbose=False)[source]
Interface of Gaussian Process - Expected Improvement (Bayesian Optimization).
- Parameters
para_space (dict or list of dictionaries) –
It has three types:
- Continuous:
Specify Type as continuous, and include the keys of Range (a list with lower-upper elements pair) and Wrapper, a callable function for wrapping the values.
- Integer:
Specify Type as integer, and include the keys of Mapping (a list with all the sortted integer elements).
- Categorical:
Specify Type as categorical, and include the keys of Mapping (a list with all the possible categories).
max_runs (int, optional, default=100) – The maximum number of trials to be evaluated. When this values is reached, then the algorithm will stop.
time_out (float, optional, default=10) – The time out threshold (in seconds) for generating the next run.
estimator (estimator object) – This is assumed to implement the scikit-learn estimator interface.
cv (cross-validation method, an sklearn object.) – e.g., StratifiedKFold and KFold` is used.
scoring (string, callable, list/tuple, dict or None, optional, default = None) – A sklearn type scoring function. If None, the estimator’s default scorer (if available) is used. See the package sklearn for details.
refit (boolean, or string, optional, default=True) – It controls whether to refit an estimator using the best found parameters on the whole dataset.
random_state (int, optional, default=0) – The random seed for optimization.
verbose (boolean, optional, default=False) – It controls whether the searching history will be printed.
>>> import numpy as np >>> from sklearn import svm >>> from sklearn import datasets >>> from sequd import GPEIOPT >>> from sklearn.model_selection import KFold >>> iris = datasets.load_iris() >>> ParaSpace = {'C':{'Type': 'continuous', 'Range': [-6, 16], 'Wrapper': np.exp2}, 'gamma': {'Type': 'continuous', 'Range': [-16, 6], 'Wrapper': np.exp2}} >>> estimator = svm.SVC() >>> cv = KFold(n_splits=5, random_state=0, shuffle=True) >>> clf = GPEIOPT(ParaSpace, max_runs=100, time_out=10, estimator=estimator, cv=cv, scoring=None, refit=None, random_state=0, verbose=False) >>> clf.fit(iris.data, iris.target)
- Variables
best_score_ (float) – The best average cv score among the evaluated trials.
best_params_ (dict) – Parameters that reaches best_score_.
best_estimator_ (sklearn estimator) – The estimator refitted based on the best_params_. Not available if estimator = None or refit=False.
search_time_consumed_ (float) – Seconds used for whole searching procedure.
refit_time_ (float) – Seconds used for refitting the best model on the whole dataset. Not available if estimator=None or refit=False.
sequd.pybayopt.bayopt_smac
- class pybayopt.bayopt_smac.SMACOPT(para_space, max_runs=100, estimator=None, cv=None, scoring=None, refit=True, random_state=0, verbose=False)[source]
Interface of SMAC (Bayesian Optimization).
- Parameters
para_space (dict or list of dictionaries) –
It has three types:
- Continuous:
Specify Type as continuous, and include the keys of Range (a list with lower-upper elements pair) and Wrapper, a callable function for wrapping the values.
- Integer:
Specify Type as integer, and include the keys of Mapping (a list with all the sortted integer elements).
- Categorical:
Specify Type as categorical, and include the keys of Mapping (a list with all the possible categories).
max_runs (int, optional, default=100) – The maximum number of trials to be evaluated. When this values is reached, then the algorithm will stop.
estimator (estimator object) – This is assumed to implement the scikit-learn estimator interface.
cv (cross-validation method, an sklearn object.) – e.g., StratifiedKFold and KFold` is used.
scoring (string, callable, list/tuple, dict or None, optional, default=None) – A sklearn type scoring function. If None, the estimator’s default scorer (if available) is used. See the package sklearn for details.
refit (boolean, or string, optional, default=True) – It controls whether to refit an estimator using the best found parameters on the whole dataset.
random_state (int, optional, default=0) – The random seed for optimization.
verbose (boolean, optional, default=False) – It controls whether the searching history will be printed.
>>> import numpy as np >>> from sklearn import svm >>> from sklearn import datasets >>> from sequd import SMACOPT >>> from sklearn.model_selection import KFold >>> iris = datasets.load_iris() >>> ParaSpace = {'C':{'Type': 'continuous', 'Range': [-6, 16], 'Wrapper': np.exp2}, 'gamma': {'Type': 'continuous', 'Range': [-16, 6], 'Wrapper': np.exp2}} >>> estimator = svm.SVC() >>> cv = KFold(n_splits=5, random_state=0, shuffle=True) >>> clf = SMACOPT(ParaSpace, max_runs=100, estimator=estimator, cv=cv, scoring=None, refit=None, random_state=0, verbose=False) >>> clf.fit(iris.data, iris.target)
- Variables
best_score_ (float) – The best average cv score among the evaluated trials.
best_params_ (dict) – Parameters that reaches best_score_.
best_estimator_ (sklearn estimator) – The estimator refitted based on the best_params_. Not available if estimator = None or refit=False.
search_time_consumed_ (float) – Seconds used for whole searching procedure.
refit_time_ (float) – Seconds used for refitting the best model on the whole dataset. Not available if estimator=None or refit=False.
sequd.pybayopt.bayopt_tpe
- class pybayopt.bayopt_tpe.TPEOPT(para_space, max_runs=100, estimator=None, cv=None, scoring=None, refit=True, random_state=0, verbose=False)[source]
Interface of Hyperopt (Bayesian Optimization).
- Parameters
para_space (dict or list of dictionaries) –
It has three types:
- Continuous:
Specify Type as continuous, and include the keys of Range (a list with lower-upper elements pair) and Wrapper, a callable function for wrapping the values.
- Integer:
Specify Type as integer, and include the keys of Mapping (a list with all the sortted integer elements).
- Categorical:
Specify Type as categorical, and include the keys of Mapping (a list with all the possible categories).
max_runs (int, optional, default=100) – The maximum number of trials to be evaluated. When this values is reached, then the algorithm will stop.
estimator (estimator object) – This is assumed to implement the scikit-learn estimator interface.
cv (cross-validation method, an sklearn object.) – e.g., StratifiedKFold and KFold` is used.
scoring (string, callable, list/tuple, dict or None, optional, default=None) – A sklearn type scoring function. If None, the estimator’s default scorer (if available) is used. See the package sklearn for details.
refit (boolean, or string, optional, default=True) – It controls whether to refit an estimator using the best found parameters on the whole dataset.
random_state (int, optional, default=0) – The random seed for optimization.
verbose (boolean, optional, default=False) – It controls whether the searching history will be printed.
>>> import numpy as np >>> from sklearn import svm >>> from sklearn import datasets >>> from sequd import TPEOPT >>> from sklearn.model_selection import KFold >>> iris = datasets.load_iris() >>> ParaSpace = {'C':{'Type': 'continuous', 'Range': [-6, 16], 'Wrapper': np.exp2}, 'gamma': {'Type': 'continuous', 'Range': [-16, 6], 'Wrapper': np.exp2}} >>> estimator = svm.SVC() >>> cv = KFold(n_splits=5, random_state=0, shuffle=True) >>> clf = TPEOPT(ParaSpace, max_runs=100, estimator=estimator, cv=cv, scoring=None, refit=None, random_state=0, verbose=False) >>> clf.fit(iris.data, iris.target)
- Variables
best_score_ (float) – The best average cv score among the evaluated trials.
best_params_ (dict) – Parameters that reaches best_score_.
best_estimator_ (sklearn estimator) – The estimator refitted based on the best_params_. Not available if estimator = None or refit=False.
search_time_consumed_ (float) – Seconds used for whole searching procedure.
refit_time_ (float) – Seconds used for refitting the best model on the whole dataset. Not available if estimator=None or refit=False.
sequd.pysequd
sequd.pysequd.seqrand
- class pysequd.seqrand.SeqRand(para_space, n_runs_per_stage=20, max_runs=100, n_jobs=None, estimator=None, cv=None, scoring=None, refit=True, random_state=0, verbose=False)[source]
Implementation of random search in sequential version.
- Parameters
para_space (dict or list of dictionaries) –
It has three types:
- Continuous:
Specify Type as continuous, and include the keys of Range (a list with lower-upper elements pair) and Wrapper, a callable function for wrapping the values.
- Integer:
Specify Type as integer, and include the keys of Mapping (a list with all the sortted integer elements).
- Categorical:
Specify Type as categorical, and include the keys of Mapping (a list with all the possible categories).
n_runs_per_stage (int, optional, default=20) – The positive integer which represent the number of levels in generating uniform design.
max_runs (int, optional, default=100) – The maximum number of trials to be evaluated. When this values is reached, then the algorithm will stop.
n_jobs (int or None, optional, optional, default=None) – Number of jobs to run in parallel. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. See the package joblib for details.
estimator (estimator object) – This is assumed to implement the scikit-learn estimator interface.
cv (cross-validation method, an sklearn object.) – e.g., StratifiedKFold and KFold` is used.
scoring (string, callable, list/tuple, dict or None, optional, default=None) – A sklearn type scoring function. If None, the estimator’s default scorer (if available) is used. See the package sklearn for details.
refit (boolean, or string, optional, default=True) – It controls whether to refit an estimator using the best found parameters on the whole dataset.
random_state (int, optional, default=0) – The random seed for optimization.
verbose (boolean, optional, default=False) – It controls whether the searching history will be printed.
>>> import numpy as np >>> from sklearn import svm >>> from sklearn import datasets >>> from sequd import SeqRand >>> from sklearn.model_selection import KFold >>> iris = datasets.load_iris() >>> ParaSpace = {'C':{'Type': 'continuous', 'Range': [-6, 16], 'Wrapper': np.exp2}, 'gamma': {'Type': 'continuous', 'Range': [-16, 6], 'Wrapper': np.exp2}} >>> Level_Number = 20 >>> estimator = svm.SVC() >>> cv = KFold(n_splits=5, random_state=1, shuffle=True) >>> clf = SeqRand(ParaSpace, n_runs_per_stage=20, max_runs=100, n_jobs=None, estimator=None, cv=None, scoring=None, refit=None, random_state=0, verbose=False) >>> clf.fit(iris.data, iris.target)
- Variables
best_score_ (float) – The best average cv score among the evaluated trials.
best_params_ (dict) – Parameters that reaches best_score_.
best_estimator_ (sklearn estimator) – The estimator refitted based on the best_params_. Not available if estimator = None or refit=False.
search_time_consumed_ (float) – Seconds used for whole searching procedure.
refit_time_ (float) – Seconds used for refitting the best model on the whole dataset. Not available if estimator=None or refit=False.
- fit(x, y=None)[source]
Run fit with all sets of parameters.
- Parameters
x (array, shape = [n_samples, n_features]) – input variales.
y (array, shape = [n_samples] or [n_samples, n_output], optional) – target variable.
sequd.pysequd.snto
- class pysequd.snto.SNTO(para_space, n_runs_per_stage=20, max_runs=100, max_search_iter=100, n_jobs=None, estimator=None, cv=None, scoring=None, refit=True, random_state=0, verbose=False)[source]
Implementation of SNTO.
- Parameters
para_space (dict or list of dictionaries) –
It has three types:
- Continuous:
Specify Type as continuous, and include the keys of Range (a list with lower-upper elements pair) and Wrapper, a callable function for wrapping the values.
- Integer:
Specify Type as integer, and include the keys of Mapping (a list with all the sortted integer elements).
- Categorical:
Specify Type as categorical, and include the keys of Mapping (a list with all the possible categories).
n_runs_per_stage (int, optional, default=20) – The positive integer which represent the number of levels in generating uniform design.
max_runs (int, optional, default=100) – The maximum number of trials to be evaluated. When this values is reached, then the algorithm will stop.
max_search_iter (int, optional, default=100) – The maximum number of iterations used to generate uniform design or augmented uniform design.
n_jobs (int or None, optional, optional, default=None) – Number of jobs to run in parallel. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. See the package joblib for details.
estimator (estimator object) – This is assumed to implement the scikit-learn estimator interface.
cv (cross-validation method, an sklearn object.) – e.g., StratifiedKFold and KFold` is used.
scoring (string, callable, list/tuple, dict or None, optional, default = None) – A sklearn type scoring function. If None, the estimator’s default scorer (if available) is used. See the package sklearn for details.
refit (boolean, or string, optional, default=True) – It controls whether to refit an estimator using the best found parameters on the whole dataset.
random_state (int, optional, default=0) – The random seed for optimization.
verbose (boolean, optional, default=False) – It controls whether the searching history will be printed.
>>> import numpy as np >>> from sklearn import svm >>> from sklearn import datasets >>> from sequd import SNTO >>> from sklearn.model_selection import KFold >>> iris = datasets.load_iris() >>> ParaSpace = {'C':{'Type': 'continuous', 'Range': [-6, 16], 'Wrapper': np.exp2}, 'gamma': {'Type': 'continuous', 'Range': [-16, 6], 'Wrapper': np.exp2}} >>> estimator = svm.SVC() >>> cv = KFold(n_splits=5, random_state=1, shuffle=True) >>> clf = SNTO(ParaSpace, n_runs_per_stage=20, max_runs=100, max_search_iter=100, n_jobs=None, estimator=None, cv=None, scoring=None, refit=None, random_state=0, verbose=False) >>> clf.fit(iris.data, iris.target)
- Variables
best_score_ (float) – The best average cv score among the evaluated trials.
best_params_ (dict) – Parameters that reaches best_score_.
best_estimator_ (sklearn estimator) – The estimator refitted based on the best_params_. Not available if estimator = None or refit=False.
search_time_consumed_ (float) – Seconds used for whole searching procedure.
refit_time_ (float) – Seconds used for refitting the best model on the whole dataset. Not available if estimator=None or refit=False.
- fit(x, y=None)[source]
Run fit with all sets of parameters.
- Parameters
x (array, shape = [n_samples, n_features]) – input variales.
y (array, shape = [n_samples] or [n_samples, n_output], optional) – target variable.
sequd.pysequd.sequd
- class pysequd.sequd.SeqUD(para_space, n_runs_per_stage=20, max_runs=100, max_search_iter=100, n_jobs=None, estimator=None, cv=None, scoring=None, refit=True, random_state=0, verbose=False)[source]
Implementation of sequential uniform design.
- Parameters
para_space (dict or list of dictionaries) –
It has three types:
- Continuous:
Specify Type as continuous, and include the keys of Range (a list with lower-upper elements pair) and Wrapper, a callable function for wrapping the values.
- Integer:
Specify Type as integer, and include the keys of Mapping (a list with all the sortted integer elements).
- Categorical:
Specify Type as categorical, and include the keys of Mapping (a list with all the possible categories).
n_runs_per_stage (int, optional, default=20) – The positive integer which represent the number of levels in generating uniform design.
max_runs (int, optional, default=100) – The maximum number of trials to be evaluated. When this values is reached, then the algorithm will stop.
max_search_iter (int, optional, default=100) – The maximum number of iterations used to generate uniform design or augmented uniform design.
n_jobs (int or None, optional, optional, default=None) – Number of jobs to run in parallel. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. See the package joblib for details.
estimator (estimator object) – This is assumed to implement the scikit-learn estimator interface.
cv (cross-validation method, an sklearn object.) – e.g., StratifiedKFold and KFold` is used.
scoring (string, callable, list/tuple, dict or None, optional, default=None) – A sklearn type scoring function. If None, the estimator’s default scorer (if available) is used. See the package sklearn for details.
refit (boolean, or string, optional, default=True) – It controls whether to refit an estimator using the best found parameters on the whole dataset.
random_state (int, optional, default=0) – The random seed for optimization.
verbose (boolean, optional, default=False) – It controls whether the searching history will be printed.
>>> import numpy as np >>> from sklearn import svm >>> from sklearn import datasets >>> from sequd import SeqUD >>> from sklearn.model_selection import KFold >>> iris = datasets.load_iris() >>> ParaSpace = {'C':{'Type': 'continuous', 'Range': [-6, 16], 'Wrapper': np.exp2}, 'gamma': {'Type': 'continuous', 'Range': [-16, 6], 'Wrapper': np.exp2}} >>> estimator = svm.SVC() >>> cv = KFold(n_splits=5, random_state=1, shuffle=True) >>> clf = SeqUD(ParaSpace, n_runs_per_stage=20, max_runs=100, max_search_iter=100, n_jobs=None, estimator=None, cv=None, scoring=None, refit=None, random_state=0, verbose=False) >>> clf.fit(iris.data, iris.target)
- Variables
best_score_ (float) – The best average cv score among the evaluated trials.
best_params_ (dict) – Parameters that reaches best_score_.
best_estimator_ (sklearn estimator) – The estimator refitted based on the best_params_. Not available if estimator = None or refit=False.
search_time_consumed_ (float) – Seconds used for whole searching procedure.
refit_time_ (float) – Seconds used for refitting the best model on the whole dataset. Not available if estimator=None or refit=False.
- fit(x, y=None)[source]
Run fit with all sets of parameters.
- Parameters
x (array, shape = [n_samples, n_features]) – input variales.
y (array, shape = [n_samples] or [n_samples, n_output], optional) – target variable.