种类 | 模型 | 参数 | predict | predict_prob | partial_fit | sample_weight | score | n_jobs | 评价 | 方法/字段 | 官网 | |||
linear_model | LinearRegression | fit_intercept
: boolean, optional, default True normalize : boolean, optional, default False |
有 | 有 | 有 | coef_ intercept_ |
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html | |||||||
Ridge | alpha fit_intercept : boolean normalize : boolean, optional, default False |
有 | 有 | 有 | coef_ intercept_ |
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html | ||||||||
Lasso | alpha fit_intercept : boolean normalize : boolean, optional, default False positive : bool, optional,是否强制系数为正 |
有 | 无 | 无 | 有 | coef_ intercept_ |
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html | |||||||
ElasticNet | alpha
: float, optional l1_ratio : float fit_intercept : bool normalize : boolean, optional, default False |
有 | 无 | 无 | 无 | 有 | http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html | |||||||
LogisticRegression | penalty
: str, ‘l1’ or ‘l2’, default: ‘l2’ dual : bool, default: False,如果是True,求对偶问题 C : float, default: 1.0,正则化系数的倒数 multi_class : str, {‘ovr’, ‘multinomial’} 'ovr':one-vs-rest ‘multinomial’:直接用多分类策略 |
有 | 有 | 无 | 有 | 有 | http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html | |||||||
tree | DecisionTreeClassifier | criterion
: string, optional (default=”gini”) splitter : string, optional (default=”best”) max_depth : int or None, optional (default=None) min_samples_split : int, float, optional (default=2) min_samples_leaf : int, float, optional (default=1) min_weight_fraction_leaf : float, optional (default=0.) max_features : int, float, string or None, optional (default=None) max_leaf_nodes : int or None, optional (default=None) min_impurity_decrease : float, optional (default=0.) |
有 | 有 | 无 | 有 | 有 | classes_ n_classes_ clf.feature_importances_#变量重要性指标,各个属性的gini系数归一化后的值 #例如,print(pd.DataFrame(list(zip(data.columns,clf.feature_importances_)))) max_features_ n_features_ |
http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html | |||||
DecisionTreeRegressor | criterion
: string, optional (default=”mse”) splitter : string, optional (default=”best”)'random' max_depth : int or None, optional (default=None) min_samples_split : int, float, optional (default=2) max_features : int, float, string or None, optional (default=None) min_weight_fraction_leaf : float, optional (default=0.) min_samples_leaf : int, float, optional (default=1) max_leaf_nodes : int or None, optional (default=None) min_impurity_decrease : float, optional (default=0.) |
有 | 无 | 无 | 有 | 有 | feature_importances_ :also known as the Gini importance max_features_ n_features_ n_outputs_ |
http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html | 似乎Y可以是多维的? | |||||
naive_bayes | GaussianNB | priors : array-like, shape (n_classes,) | 有 | 有 | 有 | 有 | 有 | class_prior_ class_count_ theta_#mean of each feature per class sigma_#variance of each feature per class |
http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html | |||||
MultinomialNB | alpha
: float, optional (default=1.0) Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing) fit_prior : (default=True)# 表示从样本计算先验分布,class_prior : (default=None)#先验分布 |
有 | 有 | 有 | 有 | 有 | class_prior_ class_count_ theta_#mean of each feature per class sigma_#variance of each feature per class |
http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html | ||||||
BernoulliNB | alpha : float, optional
(default=1.0) binarize : float or None, optional (default=0.0) 如果是None,默认Feature已经二元化了 如果是float,以float对Feature二元化 fit_prior : boolean, optional (default=True) class_prior : (default=None)#先验分布 |
有 | 有 | 有 | 有 | 有 | intercept_ coef_ class_count_ feature_count_ |
http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.BernoulliNB.html | ||||||
neighbors | KNeighborsClassifier | n_neighbors : int, optional (default = 5) weights : str or callable, optional (default = ‘uniform’) ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally. ‘distance’ : closer neighbors of a query point will have a greater influence than neighbors which are further away. [callable] : accepts an array of distances, returns an array of the same shape containing the weights. algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional leaf_size : int, optional (default = 30) p : integer, optional (default = 2): Minkowski metric的阶数 metric : string or callable, default ‘minkowski’ |
有 | 有 | 无 | 无 | 有 | 有 | http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html | |||||
KneighborsRegressor | n_neighbors : int, optional (default = 5) weights : str or callable, optional (default = ‘uniform’) ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally. ‘distance’ : closer neighbors of a query point will have a greater influence than neighbors which are further away. [callable] : accepts an array of distances, returns an array of the same shape containing the weights. algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional leaf_size : int, optional (default = 30) p : integer, optional (default = 2): Minkowski metric的阶数 metric : string or callable, default ‘minkowski’ |
有 | 无 | 无 | 无 | 有 | 有 | http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html | ||||||
svm | LinearSVC | penalty
: string, ‘l1’ or ‘l2’ (default=’l2’) loss : ‘hinge’ or ‘squared_hinge’ C : float, optional (default=1.0) Penalty parameter dual(default=True):True表示使用对偶求解 multi_class : string, ‘ovr’ or ‘crammer_singer’ (default=’ovr’) fit_intercept : boolean, optional (default=True),计算截距,也就是决策函数中的常数项,为False时适用于已经中心化的数据 verbose |
有 | 无 | 无 | 有 | 有 | 无 | coef_:各特征的权重 intercept_ |
http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html | ||||
LinearSVR | C
: float, optional (default=1.0):Penalty parameter loss: ‘epsilon_insensitive’ or ‘squared_epsilon_insensitive’ epsilon : float, optional (default=0.1)loss中的参数,与y的大小也是有关系的,不确定的话,set epsilon=0 dual : bool, (default=True),是否使用对偶,Prefer dual=False when n_samples > n_features. fit_intercept : boolean, optional (default=True) |
有 | 无 | 无 | 有 | 有 | 无 | coef_:各特征的权重 intercept_ |
http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVR.html | |||||
SVC | C
: float, optional (default=1.0):Penalty parameter kernel : ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable degree:‘poly’ gamma:‘rbf’, ‘poly’ and ‘sigmoid’,默认是1/n_features coef0:‘poly’ and ‘sigmoid’ probability:是否计算概率 decision_function_shape : ‘ovo’, ‘ovr’ one-vs-rest, one-vs-one |
有 | 需要 probability=True |
无 | 有 | 有 | 无 | decision_function(X)
#Distance of the samples X to the separating hyperplane.点到超平面的距离 model.support_vectors_# get support vectors model.support_# get indices of support vectors model.n_support_# get number of support vectors for each class model.decision_function_shape='ovo' dec = model.decision_function([[0.6,1]]) |
http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html | |||||
SVR | C : float, optional (default=1.0) epsilon : float, optional (default=0.1),计算loss时的参数,如果不知道,就设为0(实际验证过,非常重要!!!) kernel : ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable degree:‘poly’ gamma:‘rbf’, ‘poly’ and ‘sigmoid’,默认是1/n_features coef0:‘poly’ and ‘sigmoid’ shrinking : shrinking heuristic |
有 | 无 | 无 | 有 | 有 | 无 | support_ support_vectors_ |
http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html | |||||
neural_network | MLPClassifier | hidden_layer_sizes: default (100,) activation : {‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, default ‘relu’ ‘identity’, f(x) = x ‘logistic’, f(x) = 1 / (1 + exp(-x)). ‘tanh’, f(x) = tanh(x). ‘relu’, f(x) = max(0, x) solver : {‘lbfgs’, ‘sgd’, ‘adam’}, default ‘adam’ ‘lbfgs’ 一种 quasi-Newton methods. ‘sgd’ stochastic gradient descent. ‘adam’ a stochastic gradient-based optimizer alpha: default 0.0001, L2 penalty batch_size : int, optional, default ‘auto’ learning_rate : {‘constant’, ‘invscaling’, ‘adaptive’},只在solver='sgd'时有用 momentum : float, default 0.9,只在solver='sgd'时有用 learning_rate_init : double, optional, default 0.001 power_t : double, optional, default 0.5 shuffle : bool, optional, default True,只在solver=’sgd’ or ‘adam’时有用 warm_start : bool, optional, default False early_stopping : bool, default False,如果为True,设定10%验证集,只在solver=’sgd’ or ‘adam’时有用 validation_fraction : float, optional, default 0.1,划分validation,只在early_stopping=True时有用 beta_1,beta_2,epsilon,只在solver=’adam’时有用 |
有 | softmax | 有 | 无 | 有 | 无 | loss_ | clf.classes_#每个类的标签 clf.n_iter_ clf.n_outputs_ clf.n_layers_ #[coef.shape for coef in clf.coefs_] clf.out_activation_#输出层的函数类型 clf.loss_#The current loss computed with the loss function. clf.coefs_#n_layers-1层各自的权重参数 clf.intercepts_#n_layers-1层各自的截距 |
http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html | |||
MLPRegressor | hidden_layer_sizes: default (100,) activation : {‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, default ‘relu’ ‘identity’, f(x) = x ‘logistic’, f(x) = 1 / (1 + exp(-x)). ‘tanh’, f(x) = tanh(x). ‘relu’, f(x) = max(0, x) solver : {‘lbfgs’, ‘sgd’, ‘adam’}, default ‘adam’ ‘lbfgs’ 一种 quasi-Newton methods. ‘sgd’ stochastic gradient descent. ‘adam’ a stochastic gradient-based optimizer alpha: default 0.0001, L2 penalty batch_size : int, optional, default ‘auto’ learning_rate : {‘constant’, ‘invscaling’, ‘adaptive’},只在solver='sgd'时有用 momentum : float, default 0.9,只在solver='sgd'时有用 learning_rate_init : double, optional, default 0.001 power_t : double, optional, default 0.5 shuffle : bool, optional, default True,只在solver=’sgd’ or ‘adam’时有用 warm_start : bool, optional, default False early_stopping : bool, default False,如果为True,设定10%验证集,只在solver=’sgd’ or ‘adam’时有用 validation_fraction : float, optional, default 0.1,划分validation,只在early_stopping=True时有用 beta_1,beta_2,epsilon,只在solver=’adam’时有用 |
有 | 无 | 无 | 无 | 有 | 无 | clf.loss_ clf.coefs_#n_layers-1层各自的权重参数 clf.intercepts_#n_layers-1层各自的截距 |
http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html | |||||
ensemble | AdaBoostClassifier | base_estimator : object, (default=DecisionTreeClassifier) n_estimators : integer, optional (default=50) learning_rate : float, optional (default=1.) algorithm : {‘SAMME’, ‘SAMME.R’},(default=’SAMME.R’) ‘SAMME.R’ 收敛快、误差小,但base_estimator必须有概率 |
有 | 有 | 无 | 有 | 有 | 无 | estimators_#所有基分类器 estimator_weights_# 每个基分类器的权重 estimator_errors_ #每个基分类器的误差 feature_importances_ #变量重要性,能不能用取决于base_estimator有没有这功能 decision_function(X) 每一轮迭代结束后的: staged_predict(X) Return staged predictions for X. staged_predict_proba(X) Predict class probabilities for X. staged_score(X, y[, sample_weight]) Return staged scores for X, y. |
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html | ||||
AdaBoostRegressor | base_estimator
: object n_estimators : integer, optional (default=50) learning_rate : float, optional (default=1.) loss : {‘linear’, ‘square’, ‘exponential’}, optional (default=’linear’) |
有 | 无 | 无 | 有 | 有 | 无 | staged_predict(X)
Return staged predictions for X. staged_score(X, y[, sample_weight]) Return staged scores for X, y. |
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostRegressor.html | |||||
GradientBoostingClassifier (GBDT) |
loss
: {‘deviance’, ‘exponential’}, optional (default=’deviance’) ‘deviance’对数损失函数L(Y,P(Y|X))=-log P ‘exponential’指数损失函数 learning_rate : float, optional (default=0.1) n_estimators : int (default=100) max_depth : integer, optional (default=3),不要太深 criterion : string, optional (default=”friedman_mse”) min_samples_split : int, float, optional (default=2) min_samples_leaf : int, float, optional (default=1) min_weight_fraction_leaf : float, optional (default=0.) subsample : float, optional (default=1.0) max_features : int, float, string or None, optional (default=None) If int, then consider max_features features at each split. If float, then max_features is a percentage If “auto”, then max_features=sqrt(n_features). If “sqrt”, then max_features=sqrt(n_features). If “log2”, then max_features=log2(n_features). If None, then max_features=n_features. max_leaf_nodes : int or None, optional (default=None) warm_start : bool, default: False |
有 | 有 | 无 | 有 | 有 | 无 | feature_importances_
oob_improvement_ 每增加一棵树,测试集损失函数的减少量 train_score_ 每增加一棵树,训练集损失函数的减少量 |
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html | |||||
GradientBoostingRegressor (GBRT) |
loss
: {‘ls’, ‘lad’, ‘huber’, ‘quantile’}, optional (default=’ls’) learning_rate : float, optional (default=0.1) n_estimators : int (default=100) max_depth : integer, optional (default=3) criterion : string, optional (default=”friedman_mse”) min_samples_split : int, float, optional (default=2) min_samples_leaf : int, float, optional (default=1) min_weight_fraction_leaf : float, optional (default=0.) max_features : 同上 max_leaf_nodes : int or None, optional (default=None) alpha : float (default=0.9),仅用于loss='huber' or loss='quantile' warm_start : bool, default: False |
有 | 无 | 无 | 有 | 有 | feature_importances_
oob_improvement_ 每增加一棵树,测试集损失函数的减少量 train_score_ 每增加一棵树,训练集损失函数的减少量 |
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html | ||||||
RandomForestClassifier | n_estimators
criterion : string, optional (default=”gini”) max_features :同上 max_depth min_samples_split min_samples_leaf min_weight_fraction_leaf max_leaf_nodes bootstrap : (default=True)是否对sample 进行bootstrap oob_score : bool (default=False),out-of-bag,也就是测试集 warm_start : bool, optional (default=False) |
有 | 有 | 无 | 有 | 有 | 无 | http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html | ||||||
RandomForestRegressor | n_estimators
: integer, optional (default=10) criterion : string, optional (default=”mse”) max_features :同上 max_depth min_samples_split min_samples_leaf min_weight_fraction_leaf max_leaf_nodes bootstrap : (default=True)是否对sample 进行bootstrap oob_score : bool (default=False),out-of-bag,也就是测试集 warm_start : bool, optional (default=False) |
estimators_ feature_importances_ oob_score_ |
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html | |||||||||||
【交叉验证】介绍 | http://www.guofei.site/2017/10/03/crossvalidation.html | |||||||||||||
【模型评价】Python实现 | http://www.guofei.site/2017/11/23/ModelEvaluation1.html | |||||||||||||
gmm | 可以出概率值 | |||||||||||||
ICA与GMM的关系 |