本文实例讲述了python聚类算法之基本k均值运算技巧。分享给大家供大家参考,具体如下:
基本k均值 :选择 k 个初始质心,其中 k 是用户指定的参数,即所期望的簇的个数。每次循环中,每个点被指派到最近的质心,指派到同一个质心的点集构成一个。然后,根据指派到簇的点,更新每个簇的质心。重复指派和更新操作,直到质心不发生明显的变化。
# scoding=utf-8
import pylab as pl
points = [[int(eachpoint.split(“#”)[0]), int(eachpoint.split(“#”)[1])] for eachpoint in open(“points”,”r”)]
# 指定三个初始质心
currentcenter1 = [20,190]; currentcenter2 = [120,90]; currentcenter3 = [170,140]
pl.plot([currentcenter1[0]], [currentcenter1[1]],’ok’)
pl.plot([currentcenter2[0]], [currentcenter2[1]],’ok’)
pl.plot([currentcenter3[0]], [currentcenter3[1]],’ok’)
# 记录每次迭代后每个簇的质心的更新轨迹
center1 = [currentcenter1]; center2 = [currentcenter2]; center3 = [currentcenter3]
# 三个簇
group1 = []; group2 = []; group3 = []
for runtime in range(50):
group1 = []; group2 = []; group3 = []
for eachpoint in points:
# 计算每个点到三个质心的距离
distance1 = pow(abs(eachpoint[0]-currentcenter1[0]),2) + pow(abs(eachpoint[1]-currentcenter1[1]),2)
distance2 = pow(abs(eachpoint[0]-currentcenter2[0]),2) + pow(abs(eachpoint[1]-currentcenter2[1]),2)
distance3 = pow(abs(eachpoint[0]-currentcenter3[0]),2) + pow(abs(eachpoint[1]-currentcenter3[1]),2)
# 将该点指派到离它最近的质心所在的簇
mindis = min(distance1,distance2,distance3)
if(mindis == distance1):
group1.append(eachpoint)
elif(mindis == distance2):
group2.append(eachpoint)
else:
group3.append(eachpoint)
# 指派完所有的点后,更新每个簇的质心
currentcenter1 = [sum([eachpoint[0] for eachpoint in group1])/len(group1),sum([eachpoint[1] for eachpoint in group1])/len(group1)]
currentcenter2 = [sum([eachpoint[0] for eachpoint in group2])/len(group2),sum([eachpoint[1] for eachpoint in group2])/len(group2)]
currentcenter3 = [sum([eachpoint[0] for eachpoint in group3])/len(group3),sum([eachpoint[1] for eachpoint in group3])/len(group3)]
# 记录该次对质心的更新
center1.append(currentcenter1)
center2.append(currentcenter2)
center3.append(currentcenter3)
# 打印所有的点,用颜色标识该点所属的簇
pl.plot([eachpoint[0] for eachpoint in group1], [eachpoint[1] for eachpoint in group1], ‘or’)
pl.plot([eachpoint[0] for eachpoint in group2], [eachpoint[1] for eachpoint in group2], ‘oy’)
pl.plot([eachpoint[0] for eachpoint in group3], [eachpoint[1] for eachpoint in group3], ‘og’)
# 打印每个簇的质心的更新轨迹
for center in [center1,center2,center3]:
pl.plot([eachcenter[0] for eachcenter in center], [eachcenter[1] for eachcenter in center],’k’)
pl.show()
运行效果截图如下:
希望本文所述对大家python程序设计有所帮助。