An example of a machine learning model for dimensionality reduction
Clustering is an unsupervised learning method used in machine learning. Unsupervised means there is not a target in mind. Rather, we are simply noticing the patterns in the data. In the following examples, I use K-means clustering, which makes centers in the data and assigns all data points to the closest center. The data points are then classified based on their center. I use 2, 3, and 4 cluster centers to see how well the data is centered around these cluster centers.
This is the full data set based on 2 clusters in a table form.
TwoClusters<-kmeans(trainTransformed[,-5],centers=2)
Clusterdata<-trainTransformed
Clusterdata$Cluster<-as.factor(TwoClusters$cluster)
knitr::kable(Clusterdata)%>%
kableExtra::kable_styling("striped")%>%
kableExtra::scroll_box(width = "100%",height="300px")
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | Cluster | |
---|---|---|---|---|---|---|
1 | -0.9564672 | 0.9771265 | -1.3544131 | -1.3132658 | setosa | 1 |
4 | -1.6037007 | 0.0955086 | -1.2964772 | -1.3132658 | setosa | 1 |
5 | -1.0859139 | 1.1975310 | -1.3544131 | -1.3132658 | setosa | 1 |
6 | -0.5681271 | 1.8587444 | -1.1806053 | -1.0485537 | setosa | 1 |
8 | -1.0859139 | 0.7567220 | -1.2964772 | -1.3132658 | setosa | 1 |
9 | -1.8625941 | -0.3453004 | -1.3544131 | -1.3132658 | setosa | 1 |
10 | -1.2153606 | 0.0955086 | -1.2964772 | -1.4456218 | setosa | 1 |
11 | -0.5681271 | 1.4179355 | -1.2964772 | -1.3132658 | setosa | 1 |
12 | -1.3448073 | 0.7567220 | -1.2385412 | -1.3132658 | setosa | 1 |
15 | -0.0503404 | 2.0791489 | -1.4702849 | -1.3132658 | setosa | 1 |
17 | -0.5681271 | 1.8587444 | -1.4123490 | -1.0485537 | setosa | 1 |
18 | -0.9564672 | 0.9771265 | -1.3544131 | -1.1809098 | setosa | 1 |
19 | -0.1797871 | 1.6383400 | -1.1806053 | -1.1809098 | setosa | 1 |
20 | -0.9564672 | 1.6383400 | -1.2964772 | -1.1809098 | setosa | 1 |
21 | -0.5681271 | 0.7567220 | -1.1806053 | -1.3132658 | setosa | 1 |
22 | -0.9564672 | 1.4179355 | -1.2964772 | -1.0485537 | setosa | 1 |
26 | -1.0859139 | -0.1248959 | -1.2385412 | -1.3132658 | setosa | 1 |
28 | -0.8270205 | 0.9771265 | -1.2964772 | -1.3132658 | setosa | 1 |
31 | -1.3448073 | 0.0955086 | -1.2385412 | -1.3132658 | setosa | 1 |
32 | -0.5681271 | 0.7567220 | -1.2964772 | -1.0485537 | setosa | 1 |
33 | -0.8270205 | 2.2995534 | -1.2964772 | -1.4456218 | setosa | 1 |
35 | -1.2153606 | 0.0955086 | -1.2964772 | -1.3132658 | setosa | 1 |
36 | -1.0859139 | 0.3159131 | -1.4702849 | -1.3132658 | setosa | 1 |
37 | -0.4386805 | 0.9771265 | -1.4123490 | -1.3132658 | setosa | 1 |
40 | -0.9564672 | 0.7567220 | -1.2964772 | -1.3132658 | setosa | 1 |
42 | -1.7331474 | -1.6677272 | -1.4123490 | -1.1809098 | setosa | 1 |
44 | -1.0859139 | 0.9771265 | -1.2385412 | -0.7838417 | setosa | 1 |
45 | -0.9564672 | 1.6383400 | -1.0647335 | -1.0485537 | setosa | 1 |
47 | -0.9564672 | 1.6383400 | -1.2385412 | -1.3132658 | setosa | 1 |
50 | -1.0859139 | 0.5363176 | -1.3544131 | -1.3132658 | setosa | 1 |
54 | -0.4386805 | -1.6677272 | 0.1519209 | 0.1426504 | versicolor | 2 |
55 | 0.8557865 | -0.5657048 | 0.4995364 | 0.4073624 | versicolor | 2 |
58 | -1.2153606 | -1.4473227 | -0.2536306 | -0.2544177 | versicolor | 2 |
59 | 0.9852331 | -0.3453004 | 0.4995364 | 0.1426504 | versicolor | 2 |
60 | -0.8270205 | -0.7861093 | 0.0939849 | 0.2750064 | versicolor | 2 |
61 | -1.0859139 | -2.3289407 | -0.1377587 | -0.2544177 | versicolor | 2 |
62 | 0.0791063 | -0.1248959 | 0.2677927 | 0.4073624 | versicolor | 2 |
63 | 0.2085530 | -1.8881317 | 0.1519209 | -0.2544177 | versicolor | 2 |
64 | 0.3379997 | -0.3453004 | 0.5574723 | 0.2750064 | versicolor | 2 |
66 | 1.1146798 | 0.0955086 | 0.3836645 | 0.2750064 | versicolor | 2 |
67 | -0.3092338 | -0.1248959 | 0.4416005 | 0.4073624 | versicolor | 2 |
68 | -0.0503404 | -0.7861093 | 0.2098568 | -0.2544177 | versicolor | 2 |
69 | 0.4674464 | -1.8881317 | 0.4416005 | 0.4073624 | versicolor | 2 |
70 | -0.3092338 | -1.2269183 | 0.0939849 | -0.1220617 | versicolor | 2 |
72 | 0.3379997 | -0.5657048 | 0.1519209 | 0.1426504 | versicolor | 2 |
76 | 0.9852331 | -0.1248959 | 0.3836645 | 0.2750064 | versicolor | 2 |
77 | 1.2441265 | -0.5657048 | 0.6154082 | 0.2750064 | versicolor | 2 |
79 | 0.2085530 | -0.3453004 | 0.4416005 | 0.4073624 | versicolor | 2 |
80 | -0.1797871 | -1.0065138 | -0.1377587 | -0.2544177 | versicolor | 2 |
81 | -0.4386805 | -1.4473227 | 0.0360490 | -0.1220617 | versicolor | 2 |
84 | 0.2085530 | -0.7861093 | 0.7892160 | 0.5397184 | versicolor | 2 |
85 | -0.5681271 | -0.1248959 | 0.4416005 | 0.4073624 | versicolor | 2 |
86 | 0.2085530 | 0.7567220 | 0.4416005 | 0.5397184 | versicolor | 2 |
87 | 1.1146798 | 0.0955086 | 0.5574723 | 0.4073624 | versicolor | 2 |
89 | -0.3092338 | -0.1248959 | 0.2098568 | 0.1426504 | versicolor | 2 |
90 | -0.4386805 | -1.2269183 | 0.1519209 | 0.1426504 | versicolor | 2 |
92 | 0.3379997 | -0.1248959 | 0.4995364 | 0.2750064 | versicolor | 2 |
93 | -0.0503404 | -1.0065138 | 0.1519209 | 0.0102944 | versicolor | 2 |
96 | -0.1797871 | -0.1248959 | 0.2677927 | 0.0102944 | versicolor | 2 |
97 | -0.1797871 | -0.3453004 | 0.2677927 | 0.1426504 | versicolor | 2 |
102 | -0.0503404 | -0.7861093 | 0.7892160 | 0.9367864 | virginica | 2 |
106 | 2.2797001 | -0.1248959 | 1.6582548 | 1.2014985 | virginica | 2 |
107 | -1.2153606 | -1.2269183 | 0.4416005 | 0.6720744 | virginica | 2 |
109 | 1.1146798 | -1.2269183 | 1.1947674 | 0.8044304 | virginica | 2 |
110 | 1.7619133 | 1.1975310 | 1.3685752 | 1.7309225 | virginica | 2 |
111 | 0.8557865 | 0.3159131 | 0.7892160 | 1.0691425 | virginica | 2 |
112 | 0.7263398 | -0.7861093 | 0.9050878 | 0.9367864 | virginica | 2 |
115 | -0.0503404 | -0.5657048 | 0.7892160 | 1.5985665 | virginica | 2 |
116 | 0.7263398 | 0.3159131 | 0.9050878 | 1.4662105 | virginica | 2 |
119 | 2.4091467 | -1.0065138 | 1.8320626 | 1.4662105 | virginica | 2 |
121 | 1.3735732 | 0.3159131 | 1.1368315 | 1.4662105 | virginica | 2 |
124 | 0.5968931 | -0.7861093 | 0.6733441 | 0.8044304 | virginica | 2 |
126 | 1.7619133 | 0.3159131 | 1.3106393 | 0.8044304 | virginica | 2 |
127 | 0.4674464 | -0.5657048 | 0.6154082 | 0.8044304 | virginica | 2 |
128 | 0.3379997 | -0.1248959 | 0.6733441 | 0.8044304 | virginica | 2 |
130 | 1.7619133 | -0.1248959 | 1.1947674 | 0.5397184 | virginica | 2 |
131 | 2.0208067 | -0.5657048 | 1.3685752 | 0.9367864 | virginica | 2 |
132 | 2.6680401 | 1.6383400 | 1.5423830 | 1.0691425 | virginica | 2 |
133 | 0.7263398 | -0.5657048 | 1.0788956 | 1.3338545 | virginica | 2 |
134 | 0.5968931 | -0.5657048 | 0.7892160 | 0.4073624 | virginica | 2 |
135 | 0.3379997 | -1.0065138 | 1.0788956 | 0.2750064 | virginica | 2 |
137 | 0.5968931 | 0.7567220 | 1.0788956 | 1.5985665 | virginica | 2 |
139 | 0.2085530 | -0.1248959 | 0.6154082 | 0.8044304 | virginica | 2 |
143 | -0.0503404 | -0.7861093 | 0.7892160 | 0.9367864 | virginica | 2 |
144 | 1.2441265 | 0.3159131 | 1.2527034 | 1.4662105 | virginica | 2 |
145 | 1.1146798 | 0.5363176 | 1.1368315 | 1.7309225 | virginica | 2 |
146 | 1.1146798 | -0.1248959 | 0.8471519 | 1.4662105 | virginica | 2 |
147 | 0.5968931 | -1.2269183 | 0.7312801 | 0.9367864 | virginica | 2 |
149 | 0.4674464 | 0.7567220 | 0.9630237 | 1.4662105 | virginica | 2 |
150 | 0.0791063 | -0.1248959 | 0.7892160 | 0.8044304 | virginica | 2 |
Here are a few graphic representations showing how the data is split when 2 clusters are used. As you can see, it splits one of the species pretty accurately. However, the other 2 do not split well when only two clusters are used.
ggplot(data=Clusterdata,mapping = aes(x=Sepal.Width,y=Petal.Width,color=Cluster))+geom_point(alpha=0.5)
ggplot(data=Clusterdata,mapping = aes(x=Sepal.Width,y=Petal.Width,color=Cluster))+geom_point(alpha=0.5)+facet_wrap(~Species)
ggplot(data=Clusterdata,mapping = aes(x=Sepal.Width,y=Petal.Width,color=Species))+geom_point(alpha=0.5) +
geom_point(data=as.data.frame(TwoClusters$centers), aes(color="Cluster center"), size=5) + theme(legend.title = element_blank())+ggtitle("Iris Cluster Demonstration")
This is a table of the data when 3 data clusters are used.
ThreeClusters<-kmeans(trainTransformed[,-5],centers=3)
Clusterdata<-trainTransformed
Clusterdata$Cluster<-as.factor(ThreeClusters$cluster)
knitr::kable(Clusterdata)%>%
kableExtra::kable_styling("striped")%>%
kableExtra::scroll_box(width = "100%",height="300px")
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | Cluster | |
---|---|---|---|---|---|---|
1 | -0.9564672 | 0.9771265 | -1.3544131 | -1.3132658 | setosa | 3 |
4 | -1.6037007 | 0.0955086 | -1.2964772 | -1.3132658 | setosa | 3 |
5 | -1.0859139 | 1.1975310 | -1.3544131 | -1.3132658 | setosa | 3 |
6 | -0.5681271 | 1.8587444 | -1.1806053 | -1.0485537 | setosa | 3 |
8 | -1.0859139 | 0.7567220 | -1.2964772 | -1.3132658 | setosa | 3 |
9 | -1.8625941 | -0.3453004 | -1.3544131 | -1.3132658 | setosa | 3 |
10 | -1.2153606 | 0.0955086 | -1.2964772 | -1.4456218 | setosa | 3 |
11 | -0.5681271 | 1.4179355 | -1.2964772 | -1.3132658 | setosa | 3 |
12 | -1.3448073 | 0.7567220 | -1.2385412 | -1.3132658 | setosa | 3 |
15 | -0.0503404 | 2.0791489 | -1.4702849 | -1.3132658 | setosa | 3 |
17 | -0.5681271 | 1.8587444 | -1.4123490 | -1.0485537 | setosa | 3 |
18 | -0.9564672 | 0.9771265 | -1.3544131 | -1.1809098 | setosa | 3 |
19 | -0.1797871 | 1.6383400 | -1.1806053 | -1.1809098 | setosa | 3 |
20 | -0.9564672 | 1.6383400 | -1.2964772 | -1.1809098 | setosa | 3 |
21 | -0.5681271 | 0.7567220 | -1.1806053 | -1.3132658 | setosa | 3 |
22 | -0.9564672 | 1.4179355 | -1.2964772 | -1.0485537 | setosa | 3 |
26 | -1.0859139 | -0.1248959 | -1.2385412 | -1.3132658 | setosa | 3 |
28 | -0.8270205 | 0.9771265 | -1.2964772 | -1.3132658 | setosa | 3 |
31 | -1.3448073 | 0.0955086 | -1.2385412 | -1.3132658 | setosa | 3 |
32 | -0.5681271 | 0.7567220 | -1.2964772 | -1.0485537 | setosa | 3 |
33 | -0.8270205 | 2.2995534 | -1.2964772 | -1.4456218 | setosa | 3 |
35 | -1.2153606 | 0.0955086 | -1.2964772 | -1.3132658 | setosa | 3 |
36 | -1.0859139 | 0.3159131 | -1.4702849 | -1.3132658 | setosa | 3 |
37 | -0.4386805 | 0.9771265 | -1.4123490 | -1.3132658 | setosa | 3 |
40 | -0.9564672 | 0.7567220 | -1.2964772 | -1.3132658 | setosa | 3 |
42 | -1.7331474 | -1.6677272 | -1.4123490 | -1.1809098 | setosa | 3 |
44 | -1.0859139 | 0.9771265 | -1.2385412 | -0.7838417 | setosa | 3 |
45 | -0.9564672 | 1.6383400 | -1.0647335 | -1.0485537 | setosa | 3 |
47 | -0.9564672 | 1.6383400 | -1.2385412 | -1.3132658 | setosa | 3 |
50 | -1.0859139 | 0.5363176 | -1.3544131 | -1.3132658 | setosa | 3 |
54 | -0.4386805 | -1.6677272 | 0.1519209 | 0.1426504 | versicolor | 2 |
55 | 0.8557865 | -0.5657048 | 0.4995364 | 0.4073624 | versicolor | 1 |
58 | -1.2153606 | -1.4473227 | -0.2536306 | -0.2544177 | versicolor | 2 |
59 | 0.9852331 | -0.3453004 | 0.4995364 | 0.1426504 | versicolor | 1 |
60 | -0.8270205 | -0.7861093 | 0.0939849 | 0.2750064 | versicolor | 2 |
61 | -1.0859139 | -2.3289407 | -0.1377587 | -0.2544177 | versicolor | 2 |
62 | 0.0791063 | -0.1248959 | 0.2677927 | 0.4073624 | versicolor | 2 |
63 | 0.2085530 | -1.8881317 | 0.1519209 | -0.2544177 | versicolor | 2 |
64 | 0.3379997 | -0.3453004 | 0.5574723 | 0.2750064 | versicolor | 2 |
66 | 1.1146798 | 0.0955086 | 0.3836645 | 0.2750064 | versicolor | 1 |
67 | -0.3092338 | -0.1248959 | 0.4416005 | 0.4073624 | versicolor | 2 |
68 | -0.0503404 | -0.7861093 | 0.2098568 | -0.2544177 | versicolor | 2 |
69 | 0.4674464 | -1.8881317 | 0.4416005 | 0.4073624 | versicolor | 2 |
70 | -0.3092338 | -1.2269183 | 0.0939849 | -0.1220617 | versicolor | 2 |
72 | 0.3379997 | -0.5657048 | 0.1519209 | 0.1426504 | versicolor | 2 |
76 | 0.9852331 | -0.1248959 | 0.3836645 | 0.2750064 | versicolor | 1 |
77 | 1.2441265 | -0.5657048 | 0.6154082 | 0.2750064 | versicolor | 1 |
79 | 0.2085530 | -0.3453004 | 0.4416005 | 0.4073624 | versicolor | 2 |
80 | -0.1797871 | -1.0065138 | -0.1377587 | -0.2544177 | versicolor | 2 |
81 | -0.4386805 | -1.4473227 | 0.0360490 | -0.1220617 | versicolor | 2 |
84 | 0.2085530 | -0.7861093 | 0.7892160 | 0.5397184 | versicolor | 2 |
85 | -0.5681271 | -0.1248959 | 0.4416005 | 0.4073624 | versicolor | 2 |
86 | 0.2085530 | 0.7567220 | 0.4416005 | 0.5397184 | versicolor | 1 |
87 | 1.1146798 | 0.0955086 | 0.5574723 | 0.4073624 | versicolor | 1 |
89 | -0.3092338 | -0.1248959 | 0.2098568 | 0.1426504 | versicolor | 2 |
90 | -0.4386805 | -1.2269183 | 0.1519209 | 0.1426504 | versicolor | 2 |
92 | 0.3379997 | -0.1248959 | 0.4995364 | 0.2750064 | versicolor | 2 |
93 | -0.0503404 | -1.0065138 | 0.1519209 | 0.0102944 | versicolor | 2 |
96 | -0.1797871 | -0.1248959 | 0.2677927 | 0.0102944 | versicolor | 2 |
97 | -0.1797871 | -0.3453004 | 0.2677927 | 0.1426504 | versicolor | 2 |
102 | -0.0503404 | -0.7861093 | 0.7892160 | 0.9367864 | virginica | 2 |
106 | 2.2797001 | -0.1248959 | 1.6582548 | 1.2014985 | virginica | 1 |
107 | -1.2153606 | -1.2269183 | 0.4416005 | 0.6720744 | virginica | 2 |
109 | 1.1146798 | -1.2269183 | 1.1947674 | 0.8044304 | virginica | 1 |
110 | 1.7619133 | 1.1975310 | 1.3685752 | 1.7309225 | virginica | 1 |
111 | 0.8557865 | 0.3159131 | 0.7892160 | 1.0691425 | virginica | 1 |
112 | 0.7263398 | -0.7861093 | 0.9050878 | 0.9367864 | virginica | 1 |
115 | -0.0503404 | -0.5657048 | 0.7892160 | 1.5985665 | virginica | 1 |
116 | 0.7263398 | 0.3159131 | 0.9050878 | 1.4662105 | virginica | 1 |
119 | 2.4091467 | -1.0065138 | 1.8320626 | 1.4662105 | virginica | 1 |
121 | 1.3735732 | 0.3159131 | 1.1368315 | 1.4662105 | virginica | 1 |
124 | 0.5968931 | -0.7861093 | 0.6733441 | 0.8044304 | virginica | 1 |
126 | 1.7619133 | 0.3159131 | 1.3106393 | 0.8044304 | virginica | 1 |
127 | 0.4674464 | -0.5657048 | 0.6154082 | 0.8044304 | virginica | 1 |
128 | 0.3379997 | -0.1248959 | 0.6733441 | 0.8044304 | virginica | 1 |
130 | 1.7619133 | -0.1248959 | 1.1947674 | 0.5397184 | virginica | 1 |
131 | 2.0208067 | -0.5657048 | 1.3685752 | 0.9367864 | virginica | 1 |
132 | 2.6680401 | 1.6383400 | 1.5423830 | 1.0691425 | virginica | 1 |
133 | 0.7263398 | -0.5657048 | 1.0788956 | 1.3338545 | virginica | 1 |
134 | 0.5968931 | -0.5657048 | 0.7892160 | 0.4073624 | virginica | 1 |
135 | 0.3379997 | -1.0065138 | 1.0788956 | 0.2750064 | virginica | 2 |
137 | 0.5968931 | 0.7567220 | 1.0788956 | 1.5985665 | virginica | 1 |
139 | 0.2085530 | -0.1248959 | 0.6154082 | 0.8044304 | virginica | 1 |
143 | -0.0503404 | -0.7861093 | 0.7892160 | 0.9367864 | virginica | 2 |
144 | 1.2441265 | 0.3159131 | 1.2527034 | 1.4662105 | virginica | 1 |
145 | 1.1146798 | 0.5363176 | 1.1368315 | 1.7309225 | virginica | 1 |
146 | 1.1146798 | -0.1248959 | 0.8471519 | 1.4662105 | virginica | 1 |
147 | 0.5968931 | -1.2269183 | 0.7312801 | 0.9367864 | virginica | 2 |
149 | 0.4674464 | 0.7567220 | 0.9630237 | 1.4662105 | virginica | 1 |
150 | 0.0791063 | -0.1248959 | 0.7892160 | 0.8044304 | virginica | 1 |
These graphs show how the data is split when 3 clusters are used rather than 2. As you can see, using 3 centers allows for a more accurate split of the data.
ggplot(data=Clusterdata,mapping = aes(x=Sepal.Width,y=Petal.Width,color=Cluster))+geom_point(alpha=0.5)
ggplot(data=Clusterdata,mapping = aes(x=Sepal.Width,y=Petal.Width,color=Cluster))+geom_point(alpha=0.5)+facet_wrap(~Species)
ggplot(data=Clusterdata,mapping = aes(x=Sepal.Width,y=Petal.Width,color=Species))+geom_point(alpha=0.5) +
geom_point(data=as.data.frame(ThreeClusters$centers), aes(color="Cluster center"), size=5) + theme(legend.title = element_blank())+ggtitle("Iris Cluster Demonstration")
And lastly, here is a table form of the data when 4 clusters are used.
FourClusters<-kmeans(trainTransformed[,-5],centers=4)
Clusterdata<-trainTransformed
Clusterdata$Cluster<-as.factor(FourClusters$cluster)
knitr::kable(Clusterdata)%>%
kableExtra::kable_styling("striped")%>%
kableExtra::scroll_box(width = "100%",height="300px")
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | Cluster | |
---|---|---|---|---|---|---|
1 | -0.9564672 | 0.9771265 | -1.3544131 | -1.3132658 | setosa | 4 |
4 | -1.6037007 | 0.0955086 | -1.2964772 | -1.3132658 | setosa | 4 |
5 | -1.0859139 | 1.1975310 | -1.3544131 | -1.3132658 | setosa | 4 |
6 | -0.5681271 | 1.8587444 | -1.1806053 | -1.0485537 | setosa | 4 |
8 | -1.0859139 | 0.7567220 | -1.2964772 | -1.3132658 | setosa | 4 |
9 | -1.8625941 | -0.3453004 | -1.3544131 | -1.3132658 | setosa | 4 |
10 | -1.2153606 | 0.0955086 | -1.2964772 | -1.4456218 | setosa | 4 |
11 | -0.5681271 | 1.4179355 | -1.2964772 | -1.3132658 | setosa | 4 |
12 | -1.3448073 | 0.7567220 | -1.2385412 | -1.3132658 | setosa | 4 |
15 | -0.0503404 | 2.0791489 | -1.4702849 | -1.3132658 | setosa | 4 |
17 | -0.5681271 | 1.8587444 | -1.4123490 | -1.0485537 | setosa | 4 |
18 | -0.9564672 | 0.9771265 | -1.3544131 | -1.1809098 | setosa | 4 |
19 | -0.1797871 | 1.6383400 | -1.1806053 | -1.1809098 | setosa | 4 |
20 | -0.9564672 | 1.6383400 | -1.2964772 | -1.1809098 | setosa | 4 |
21 | -0.5681271 | 0.7567220 | -1.1806053 | -1.3132658 | setosa | 4 |
22 | -0.9564672 | 1.4179355 | -1.2964772 | -1.0485537 | setosa | 4 |
26 | -1.0859139 | -0.1248959 | -1.2385412 | -1.3132658 | setosa | 4 |
28 | -0.8270205 | 0.9771265 | -1.2964772 | -1.3132658 | setosa | 4 |
31 | -1.3448073 | 0.0955086 | -1.2385412 | -1.3132658 | setosa | 4 |
32 | -0.5681271 | 0.7567220 | -1.2964772 | -1.0485537 | setosa | 4 |
33 | -0.8270205 | 2.2995534 | -1.2964772 | -1.4456218 | setosa | 4 |
35 | -1.2153606 | 0.0955086 | -1.2964772 | -1.3132658 | setosa | 4 |
36 | -1.0859139 | 0.3159131 | -1.4702849 | -1.3132658 | setosa | 4 |
37 | -0.4386805 | 0.9771265 | -1.4123490 | -1.3132658 | setosa | 4 |
40 | -0.9564672 | 0.7567220 | -1.2964772 | -1.3132658 | setosa | 4 |
42 | -1.7331474 | -1.6677272 | -1.4123490 | -1.1809098 | setosa | 2 |
44 | -1.0859139 | 0.9771265 | -1.2385412 | -0.7838417 | setosa | 4 |
45 | -0.9564672 | 1.6383400 | -1.0647335 | -1.0485537 | setosa | 4 |
47 | -0.9564672 | 1.6383400 | -1.2385412 | -1.3132658 | setosa | 4 |
50 | -1.0859139 | 0.5363176 | -1.3544131 | -1.3132658 | setosa | 4 |
54 | -0.4386805 | -1.6677272 | 0.1519209 | 0.1426504 | versicolor | 2 |
55 | 0.8557865 | -0.5657048 | 0.4995364 | 0.4073624 | versicolor | 1 |
58 | -1.2153606 | -1.4473227 | -0.2536306 | -0.2544177 | versicolor | 2 |
59 | 0.9852331 | -0.3453004 | 0.4995364 | 0.1426504 | versicolor | 1 |
60 | -0.8270205 | -0.7861093 | 0.0939849 | 0.2750064 | versicolor | 2 |
61 | -1.0859139 | -2.3289407 | -0.1377587 | -0.2544177 | versicolor | 2 |
62 | 0.0791063 | -0.1248959 | 0.2677927 | 0.4073624 | versicolor | 1 |
63 | 0.2085530 | -1.8881317 | 0.1519209 | -0.2544177 | versicolor | 2 |
64 | 0.3379997 | -0.3453004 | 0.5574723 | 0.2750064 | versicolor | 1 |
66 | 1.1146798 | 0.0955086 | 0.3836645 | 0.2750064 | versicolor | 1 |
67 | -0.3092338 | -0.1248959 | 0.4416005 | 0.4073624 | versicolor | 1 |
68 | -0.0503404 | -0.7861093 | 0.2098568 | -0.2544177 | versicolor | 2 |
69 | 0.4674464 | -1.8881317 | 0.4416005 | 0.4073624 | versicolor | 2 |
70 | -0.3092338 | -1.2269183 | 0.0939849 | -0.1220617 | versicolor | 2 |
72 | 0.3379997 | -0.5657048 | 0.1519209 | 0.1426504 | versicolor | 1 |
76 | 0.9852331 | -0.1248959 | 0.3836645 | 0.2750064 | versicolor | 1 |
77 | 1.2441265 | -0.5657048 | 0.6154082 | 0.2750064 | versicolor | 1 |
79 | 0.2085530 | -0.3453004 | 0.4416005 | 0.4073624 | versicolor | 1 |
80 | -0.1797871 | -1.0065138 | -0.1377587 | -0.2544177 | versicolor | 2 |
81 | -0.4386805 | -1.4473227 | 0.0360490 | -0.1220617 | versicolor | 2 |
84 | 0.2085530 | -0.7861093 | 0.7892160 | 0.5397184 | versicolor | 1 |
85 | -0.5681271 | -0.1248959 | 0.4416005 | 0.4073624 | versicolor | 1 |
86 | 0.2085530 | 0.7567220 | 0.4416005 | 0.5397184 | versicolor | 1 |
87 | 1.1146798 | 0.0955086 | 0.5574723 | 0.4073624 | versicolor | 1 |
89 | -0.3092338 | -0.1248959 | 0.2098568 | 0.1426504 | versicolor | 1 |
90 | -0.4386805 | -1.2269183 | 0.1519209 | 0.1426504 | versicolor | 2 |
92 | 0.3379997 | -0.1248959 | 0.4995364 | 0.2750064 | versicolor | 1 |
93 | -0.0503404 | -1.0065138 | 0.1519209 | 0.0102944 | versicolor | 2 |
96 | -0.1797871 | -0.1248959 | 0.2677927 | 0.0102944 | versicolor | 1 |
97 | -0.1797871 | -0.3453004 | 0.2677927 | 0.1426504 | versicolor | 1 |
102 | -0.0503404 | -0.7861093 | 0.7892160 | 0.9367864 | virginica | 1 |
106 | 2.2797001 | -0.1248959 | 1.6582548 | 1.2014985 | virginica | 3 |
107 | -1.2153606 | -1.2269183 | 0.4416005 | 0.6720744 | virginica | 2 |
109 | 1.1146798 | -1.2269183 | 1.1947674 | 0.8044304 | virginica | 1 |
110 | 1.7619133 | 1.1975310 | 1.3685752 | 1.7309225 | virginica | 3 |
111 | 0.8557865 | 0.3159131 | 0.7892160 | 1.0691425 | virginica | 3 |
112 | 0.7263398 | -0.7861093 | 0.9050878 | 0.9367864 | virginica | 1 |
115 | -0.0503404 | -0.5657048 | 0.7892160 | 1.5985665 | virginica | 1 |
116 | 0.7263398 | 0.3159131 | 0.9050878 | 1.4662105 | virginica | 3 |
119 | 2.4091467 | -1.0065138 | 1.8320626 | 1.4662105 | virginica | 3 |
121 | 1.3735732 | 0.3159131 | 1.1368315 | 1.4662105 | virginica | 3 |
124 | 0.5968931 | -0.7861093 | 0.6733441 | 0.8044304 | virginica | 1 |
126 | 1.7619133 | 0.3159131 | 1.3106393 | 0.8044304 | virginica | 3 |
127 | 0.4674464 | -0.5657048 | 0.6154082 | 0.8044304 | virginica | 1 |
128 | 0.3379997 | -0.1248959 | 0.6733441 | 0.8044304 | virginica | 1 |
130 | 1.7619133 | -0.1248959 | 1.1947674 | 0.5397184 | virginica | 3 |
131 | 2.0208067 | -0.5657048 | 1.3685752 | 0.9367864 | virginica | 3 |
132 | 2.6680401 | 1.6383400 | 1.5423830 | 1.0691425 | virginica | 3 |
133 | 0.7263398 | -0.5657048 | 1.0788956 | 1.3338545 | virginica | 1 |
134 | 0.5968931 | -0.5657048 | 0.7892160 | 0.4073624 | virginica | 1 |
135 | 0.3379997 | -1.0065138 | 1.0788956 | 0.2750064 | virginica | 1 |
137 | 0.5968931 | 0.7567220 | 1.0788956 | 1.5985665 | virginica | 3 |
139 | 0.2085530 | -0.1248959 | 0.6154082 | 0.8044304 | virginica | 1 |
143 | -0.0503404 | -0.7861093 | 0.7892160 | 0.9367864 | virginica | 1 |
144 | 1.2441265 | 0.3159131 | 1.2527034 | 1.4662105 | virginica | 3 |
145 | 1.1146798 | 0.5363176 | 1.1368315 | 1.7309225 | virginica | 3 |
146 | 1.1146798 | -0.1248959 | 0.8471519 | 1.4662105 | virginica | 3 |
147 | 0.5968931 | -1.2269183 | 0.7312801 | 0.9367864 | virginica | 1 |
149 | 0.4674464 | 0.7567220 | 0.9630237 | 1.4662105 | virginica | 3 |
150 | 0.0791063 | -0.1248959 | 0.7892160 | 0.8044304 | virginica | 1 |
Here are some visual representations of what the data would look like when split between 4 cluster centers. Because there are only 3 species, 4 cluster centers does not prove to be extra beneficial.
ggplot(data=Clusterdata,mapping = aes(x=Sepal.Width,y=Petal.Width,color=Cluster))+geom_point(alpha=0.5)
ggplot(data=Clusterdata,mapping = aes(x=Sepal.Width,y=Petal.Width,color=Cluster))+geom_point(alpha=0.5)+facet_wrap(~Species)
ggplot(data=Clusterdata,mapping = aes(x=Sepal.Width,y=Petal.Width,color=Species))+geom_point(alpha=0.5) +
geom_point(data=as.data.frame(FourClusters$centers), aes(color="Cluster center"), size=5) + theme(legend.title = element_blank())+ggtitle("Iris Cluster Demonstration")
I have demonstrated how clustering can be used in machine learning, but what are some ways that this can be used in the real world? There are many scenarios when clustering could be beneficial. It is often used in marketing to identify households or individuals who are similar to each other to better understand how they need to market certain products to them. The same goes for streaming services. Businesses like Netflix and Hulu use clustering to identify subscribers who tend to watch similar shows or movies so they can provide better recommendations to them. It can also be used in the insurance industry. Insurance providers can identify households who are in the same cluster to see how similar they are based on how they use their insurance throughout the year. The insurance company can then set monthly premiums for a whole group based on their similar actions. Clustering also proves to be beneficial to the profession of accounting. Specifically, it can be used in auditing to identify outliers. This allows auditors to examine those numbers more closely and see what they are so high/low.