python机器学习库sklearn——逻辑回归

时间:2018-01-05 08:04:30

全栈工程师开发手册 (作者:栾鹏)

python数据挖掘系列教程

逻辑回归的相关的知识内容可以参考
http://blog.csdn.net/luanpeng825485697/article/details/78957577

这里只讲述sklearn中如何使用逻辑回归进行分类预测。包含三种分类结果

# -*- coding: UTF-8 -*-

import numpy as np # 快速操作结构数组的工具
import pandas as pd # 数据分析处理工具


# 样本数据集,第一列为x1,第二列为x2,第三列为分类(三种类别)
data=[
        [-2.68420713,0.32660731,0],
        [-2.71539062,-0.16955685,0],
        [-2.88981954,-0.13734561,0],
        [-2.7464372,-0.31112432,0],
        [-2.72859298,0.33392456,0],
        [-2.27989736,0.74778271,0],
        [-2.82089068,-0.08210451,0],
        [-2.62648199,0.17040535,0],
        [-2.88795857,-0.57079803,0],
        [-2.67384469,-0.1066917,0],
        [0.28479459,0.68543919,1],
        [0.93241075,0.31919809,1],
        [0.46406132,0.50418983,1],
        [0.18096721,-0.82560394,1],
        [0.08713449,0.07539039,1],
        [0.64043675,-0.41732348,1],
        [0.09522371,0.28389121,1],
        [-0.75146714,-1.00110751,1],
        [0.04329778,0.22895691,1],
        [-0.01019007,-0.72057487,1],
        [2.53172698,-0.01184224,2],
        [2.41407223,-0.57492506,2],
        [2.61648461,0.34193529,2],
        [2.97081495,-0.18112569,2],
        [2.34975798,-0.04188255,2],
        [3.39687992,0.54716805,2],
        [2.51938325,-1.19135169,2],
        [2.9320051,0.35237701,2],
        [2.31967279,-0.24554817,2],
        [2.91813423,0.78038063,2]
]


#生成X和y矩阵
dataMat = np.mat(data)
X = dataMat[:,0:2]   # 特征数据集
y = dataMat[:,2]   # 类别变量


# ========逻辑回归========

from sklearn import metrics
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X, y)
print('逻辑回归模型:\n',model)
# 使用模型预测
predicted = model.predict(X)   #预测分类
answer = model.predict_proba(X)  #预测分类概率
print(answer)


import matplotlib.pyplot as plt


# 绘制散点图 参数:x横轴 y纵轴,颜色代表分类。x图标为样本点,.表示预测点
plt.scatter(X[:,0].flatten().A[0], X[:,1].flatten().A[0],c=y.flatten().A[0],marker='x')   # 绘制样本数据集
plt.scatter(X[:,0].flatten().A[0], X[:,1].flatten().A[0],c=predicted.tolist(),marker='.') # 绘制预测数据集

# 绘制x轴和y轴坐标
plt.xlabel("x")
plt.ylabel("y")

# 显示图形
plt.show()

分类结果的概率输出为

[[ 0.75006544 0.24686616 0.0030684 ]
[ 0.73597037 0.26097482 0.00305481]
[ 0.74081347 0.25684717 0.00233936]
[ 0.73246435 0.26458573 0.00294992]
[ 0.75125337 0.24587933 0.0028673 ]
[ 0.75186168 0.24266112 0.0054772 ]
[ 0.74100526 0.25641086 0.00258388]
[ 0.74402105 0.25258387 0.00339507]
[ 0.72803399 0.26953261 0.0024334 ]
[ 0.73682679 0.25993684 0.00323637]
[ 0.248196 0.43832458 0.31347942]
[ 0.07829857 0.40097958 0.52072185]
[ 0.18183426 0.44049617 0.37766957]
[ 0.17684251 0.51689268 0.30626481]
[ 0.26669421 0.47192764 0.26137815]
[ 0.09830059 0.4590965 0.44260291]
[ 0.28037082 0.45994995 0.25967923]
[ 0.45101417 0.46343922 0.08554661]
[ 0.29360361 0.46175548 0.24464091]
[ 0.23779162 0.51568553 0.24652285]
[ 0.00421 0.29300658 0.70278342]
[ 0.00410393 0.30984436 0.68605171]
[ 0.00420341 0.28229601 0.71350058]
[ 0.0019332 0.28990883 0.70815797]
[ 0.00562358 0.2987508 0.69562561]
[ 0.00129789 0.26857347 0.73012864]
[ 0.00271401 0.32187608 0.67540991]
[ 0.00252868 0.27696612 0.7205052 ]
[ 0.00546016 0.30471069 0.68982915]
[ 0.00305065 0.26630024 0.73064911]]

输出结果图为
这里写图片描述

根据分类概率和分类图可以看出数据量少导致会有部分中间节点的分类错误,分类概率差别不明显。

作者:luanpeng825485697 发表于2018/1/5 10:04:30 原文链接
阅读:0 评论:0 查看评论