๐Ÿš€ Growth

L.Point ๊ณ ๊ฐ ์„ธ๊ทธ๋จผํŠธ๋ณ„ ๋งˆ์ผ€ํŒ…๊ณผ ์ถ”์ฒœ์‹œ์Šคํ…œ

fiftyline 2023. 3. 13. 08:30

2022๋…„ 7์›” ์ง„ํ–‰ํ•œ ๋กฏ๋ฐ๋ฉค๋ฒ„์Šค ๋น…๋ฐ์ดํ„ฐ ๊ฒฝ์ง„๋Œ€ํšŒ ๊ธฐ๋ก์ž…๋‹ˆ๋‹ค.

 


๋ชฉ์ฐจ

1.  ์„ธ๊ทธ๋จผํŠธ์™€ ๊ฐœ์ธํ™”์˜ ์ค‘์š”์„ฑ
      1-1. EDA
2. ๊ณ ๊ฐ ๊ตฐ์ง‘ํ™” ์‚ฌ์šฉ ๋ณ€์ˆ˜
      2-1. ๋ณ€์ˆ˜ ์ƒ์„ฑ:์ œํœด์‚ฌ ์ด์šฉ์„ฑํ–ฅ
      2-2. ๋ณ€์ˆ˜ ์ƒ์„ฑ:์ผ์ฃผ์ผ ์ด์šฉํŒจํ„ด
      2-3. ๋ณ€์ˆ˜ ์ƒ์„ฑ:์ƒํ’ˆ์ทจํ–ฅ(RC1~6)
3. ๊ณ ๊ฐ ๊ตฐ์ง‘ํ™”
      3-1. ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ ๋ฐ ๋งˆ์ผ€ํŒ…์ „๋žต
4. ๊ตฐ์ง‘ ๋‚ด ์ถ”์ฒœ์‹œ์Šคํ…œ
      4-1. ALS์ถ”์ฒœ์‹œ์Šคํ…œ ์„ ์ •์ด์œ 
      4-2. ์ถ”์ฒœ์‹œ์Šคํ…œ ์ ์šฉ
      4-3. ๋ชจ๋ธ ์„ฑ๋Šฅ ํ‰๊ฐ€ ๋ฐ ๊ธฐ๋Œ€ํšจ๊ณผ

 


 

1. ์„ธ๊ทธ๋จผํŠธ์™€ ๊ฐœ์ธํ™”์˜ ์ค‘์š”์„ฑ


๋””์ง€ํ„ธ ํผ์ŠคํŠธ ์‹œ๋Œ€์—์„œ ๋Œ€๋ถ€๋ถ„์˜ ์†Œ๋น„์ž๋Š” ๊ณ ๊ฐ๊ฒฝํ—˜์ด ์ œํ’ˆ๋งŒํผ ์ค‘์š”ํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค. 

๋˜ํ•œ ๊ฐœ์ธํ™”๋œ ๊ฒฝํ—˜์„ ๊ธฐ๋Œ€ํ•˜๋Š” ๊ณ ๊ฐ์ด ์ ์  ๋Š˜์–ด๋‚˜๊ณ  ์žˆ๋‹ค. ๊ธฐ์—… ์ž…์žฅ์—์„œ๋„ ๊ณ ๊ฐ๋“ค์„ ์„ธ๋ถ„ํ™”ํ•ด ๊ทธ๋ฃน์œผ๋กœ ๋ฌถ๋Š”๋‹ค๋ฉด, ํ•œ ๋ช… ๋˜๋Š” ์ „์ฒด๋กœ์„œ์˜ ๊ณ ๊ฐ์ผ ๋•Œ ๋ณด๋‹ค ๋งˆ์ผ€ํŒ… ๋“ฑ์— ์žˆ์–ด ๋น„์šฉ ๋Œ€๋น„ ์ข‹์€ ํšจ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋กœ์—ดํ‹ฐ ๋†’์€ ๊ณ ๊ฐ์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. 
๋”ฐ๋ผ์„œ ํšจ์œจ์ ์ธ ๊ณ ๊ฐ๊ด€๋ฆฌ๊ฐ€ ๊ฐ€๋Šฅํ•˜๋„๋ก ๊ณ ๊ฐ์„ ์„ธ๋ถ„ํ™”์‹œ์ผœ ํŠน์„ฑ์„ ๊ฐ€์ง„ ์„ธ๊ทธ๋จผํŠธ๋ฅผ ์ƒ์„ฑํ–ˆ๋‹ค.

 

 

  • ๋ฐ์ดํ„ฐ ์„ ํƒ ๋ฐ ์ „์ฒ˜๋ฆฌ


 

SQL๋กœ ์ƒํ’ˆ๊ตฌ๋งค์ •๋ณด(pdde) ํ…Œ์ด๋ธ”์— ๊ณ ๊ฐ์ •๋ณด(demo)์™€ ์ƒํ’ˆ์ •๋ณด(pd_clac)๋ฅผ ํฌํ•จํ•œ pdde_merge ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค.

 

select pdde.*, demo.ma_fem_dv, demo.ages, demo.zon_hlv as demo_zon_hlv, pd_clac.pd_nm, pd_clac.clac_hlv_nm, pd_clac.clac_mcls_nm
from pdde
left outer join demo
on pdde.cust = demo.cust
left outer join pd_clac
on pdde.pd_c = pd_clac.pd_c

 

 

์ œํœด์‚ฌ ์ด์šฉ์ •๋ณด(cop_u)์— ๊ณ ๊ฐ์ •๋ณด(demo)์™€ ์ ํฌ์ •๋ณด(br)๋ฅผ ํฌํ•จํ•œ cop_u_merge ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค.

 

select cop_u.*, demo.ma_fem_dv, demo.ages, demo.zon_hlv as demo_zon_hlv, br.cop_c, br.zon_hlv as br_zon_hlv, br.zon_mcls 
from cop_u 
left outer join demo
on cop_u.cust = demo.cust
left outer join br
on cop_u.br_c = br.br_c

 

 

  • ์ƒํ’ˆ๊ตฌ๋งค์ •๋ณด EDA

 

 

์ด์šฉ์ž ์ค‘ ์—ฌ์„ฑ์˜ ๋น„์œจ์ด 70% ์ •๋„. 40๋Œ€ ์ด์šฉ์ž ๋น„์œจ์ด ๊ฐ€์žฅ ๋†’๋‹ค.

 

 

 

๊ธˆ,ํ† ,์ผ์— ๊ตฌ๋งค๊ฐ€ ๋งŽ์ด ๋ฐœ์ƒํ•˜๊ณ , ์ œํœด์‚ฌ A01, A02์—์„œ์˜ ๊ตฌ๋งค๊ฐ€ ์ „์ฒด ๊ตฌ๋งค์˜ 70%๊ฐ€๋Ÿ‰์„ ์ฐจ์ง€ํ•œ๋‹ค.

 

 

 

์ด ์ด์šฉํšŸ์ˆ˜์™€ ์ด ์ด์šฉ๊ธˆ์•ก์˜ ๊ทธ๋ž˜ํ”„. ๊ฐ ๋ณ€์ˆ˜๋Š” ์•„๋ž˜ R์ฝ”๋“œ๋ฅผ ํ†ตํ•ด ์ƒ์„ฑํ–ˆ๋‹ค.

 

#rct_pdde : pdde์˜์ˆ˜์ฆ(์ œํœด์‚ฌ, ์ด์šฉ์•ก, ๊ณ ๊ฐ, ๋‚ ์งœ)
rct_pdde <- unique(pdde %>%
                     group_by(rct_no) %>%
                     summarise(cop_c = cop_c, rct_am=sum(buy_am), cust = cust, de_dt = de_dt))

# pdde ์˜์ˆ˜์ฆ, ์ด์šฉ์•ก
tmp <- rct_pdde %>%
  group_by(cust)%>%
  summarise(pdde_rct_cnt = n(), pdde_sum = sum(rct_am))
demo <- merge(x = demo,y = tmp,by="cust", all.x = TRUE)
# copu ์˜์ˆ˜์ฆ, ์ด์šฉ์•ก
tmp <- copu %>%
  group_by(cust)%>%
  summarise(copu_rct_cnt = n(), copu_sum = sum(buy_am))
demo <- merge(x = demo, y = tmp,by="cust", all.x = TRUE)
# ๋‘ ์˜์ˆ˜์ฆ๊ณผ ์ด์šฉ์•ก ๋”ํ•˜๊ธฐ
demo$rct <- apply(demo[,c("pdde_rct_cnt","copu_rct_cnt")],1,sum)
demo$am <- apply(demo[,c("pdde_sum","copu_sum")],1,sum)

 

 

์—ฌ๋Ÿฌ ๋ณ€์ˆ˜๋“ค์—์„œ ์ฐจ์ด๋ฅผ ๋ณด์ด๊ธฐ์— ๊ณ ๊ฐ ์„ธ๋ถ„ํ™”๊ฐ€ ๊ฐ€๋Šฅํ•  ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค.

 

 

 


2. ๊ณ ๊ฐ ๊ตฐ์ง‘ํ™” ์‚ฌ์šฉ ๋ณ€์ˆ˜

 

๊ณ ๊ฐ ๋ถ„๋ฅ˜์— ์œ ์˜๋ฏธํ•œ ๋ณ€์ˆ˜๋ฅผ ์„ ์ •ํ–ˆ๋‹ค. 

์„ฑ๋ณ„, ๋‚˜์ด, ์ด ์ด์šฉ ํšŸ์ˆ˜, ์ด ์ง€๋ถˆ๊ธˆ์•ก์€ ์ œ๊ณต๋˜๊ฑฐ๋‚˜ ์ด๋ฏธ ๋งŒ๋“ค์–ด ๋‘” ๋ณ€์ˆ˜์ด๊ณ , 

์ œํœด์‚ฌ ์ด์šฉ์„ฑํ–ฅ, ์ด์šฉํŒจํ„ด(์ผ์ฃผ์ผ), ์ƒํ’ˆ์ทจํ–ฅ(์œ ํ†ต) ๋ณ€์ˆ˜๋Š” ์ƒˆ๋กญ๊ฒŒ ์ƒ์„ฑํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค.

 

 

1) ์ œํœด์‚ฌ ์ด์šฉ์„ฑํ–ฅ


๊ฐ ์ œํœด์‚ฌ ์ด์šฉ ๋น„์œจ(๊ฐ ์ œํœด์‚ฌ ์ด์šฉ ํšŸ์ˆ˜ / ์ด ์ด์šฉ ํšŸ์ˆ˜) ์„ ์ด์šฉํ•ด k-means ํด๋Ÿฌ์Šคํ„ฐ๋งํ–ˆ๋‹ค.

# ์ œํœด์‚ฌ๋ณ„ ์ด์šฉ ๋น„์œจ 
dat <- data.frame(dcast(rct_pdde,cust~cop_c))
demo <- merge(x = demo,y = dat,by="cust", all.x = TRUE)
dat2 <- data.frame(dcast(copu,cust~cop_c))
demo <- merge(x = demo,y = dat2,by="cust", all.x = TRUE)
demo[is.na(demo)] <- 0
sum(is.na(demo))
demo[,7:18] <- demo[,7:18]/demo$rct #์ œํœด์‚ฌ๋ณ„ ์˜์ˆ˜์ฆ์„ ์ด ์˜์ˆ˜์ฆ๊ฐœ์ˆ˜๋กœ ๋‚˜๋ˆ„๊ธฐ

# ์ œํœด์‚ฌ ์ด์šฉ ์„ฑํ–ฅ(clus_cop) ์ƒ์„ฑ
#elbow method
fviz_nbclust(demo[7:18], kmeans, method = "wss", k.max = 12) + 
  theme_minimal() + 
  ggtitle("Elbow Method")
#silhouette
kClusters <-  2:9
resultForEachK2 <- data.frame(k = kClusters, silAvg = rep(NA, length(kClusters))) 
for(i in 1:length(kClusters)){
  resultForEachK2$silAvg[i] <- avg_sil(kClusters[i], demo[7:18])
}
plot(resultForEachK2$k, resultForEachK2$silAvg,
     type = "b", pch = 19, frame = FALSE, 
     xlab = "Number of clusters K",
     ylab = "Average Silhouettes")

#6๊ฐœ๊ตฐ์ง‘ ์ƒ์„ฑ
clus_cop <- kmeans(demo[7:18],6)
table(clus_cop$cluster)
demo$clus_cop <- as.factor(clus_cop$cluster)

๊ตฐ์ง‘ ์ˆ˜๊ฐ€ 6๊ฐœ์ผ ๋•Œ ์‹ค๋ฃจ์—ฃ๊ณ„์ˆ˜๊ฐ€ ๋†’๊ณ , 7๊ฐœ๋กœ ๋Š˜์–ด๋‚  ๋•Œ elbow method ๊ทธ๋ž˜ํ”„์—์„œ ๊ธฐ์šธ๊ธฐ๊ฐ€ ์™„๋งŒํ•ด์ง€๋ฏ€๋กœ

6๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ์ด๋ฃจ์–ด์ง„ ๋ณ€์ˆ˜๋ฅผ ์ƒ์„ฑํ–ˆ๋‹ค.

1๊ทธ๋ฃน์€ D01์ œํœด์‚ฌ ์ด์šฉ์ด ๋งŽ๋‹ค.

 

2๊ทธ๋ฃน์€ A03์ œํœด์‚ฌ ์ด์šฉ์ด ๋งŽ๋‹ค.

 

4๊ทธ๋ฃน์€ A01์ œํœด์‚ฌ ์ด์šฉ์ด ๋งŽ๋‹ค.

 

5๊ทธ๋ฃน์€ A02์ œํœด์‚ฌ ์ด์šฉ์ด ๋งŽ๋‹ค.

 

6๊ทธ๋ฃน์€ A04์ œํœด์‚ฌ ์ด์šฉ์ด ๋งŽ๋‹ค.

 

  •  ์ œํœด์‚ฌ ์ด์šฉ ์„ฑํ–ฅ(clus_cop) ๋ณ€์ˆ˜ ์ƒ์„ฑ ๊ฒฐ๊ณผ
1๊ทธ๋ฃน : D01์„ ์ฃผ๋กœ ์ด์šฉํ•จ.
2๊ทธ๋ฃน : A03์„ ์ฃผ๋กœ ์ด์šฉํ•˜๊ณ , A01, D01๋„ ์ด์šฉํ•˜๋Š” ๊ณ ๊ฐ์ด ์žˆ๋Š” ํŽธ
3๊ทธ๋ฃน : ํฐ ๊ฒฝํ–ฅ์„ ๋ณด์ด์ง€๋Š” ์•Š์œผ๋‚˜ A01, A02, A06, C01์„ ์ด์šฉํ•˜๋Š” ๊ณ ๊ฐ๋“ค์ด ์žˆ๋Š” ํŽธ
4๊ทธ๋ฃน : A01์„ ์ฃผ๋กœ ์ด์šฉํ•จ.
5๊ทธ๋ฃน : A02๋ฅผ ์ฃผ๋กœ ์ด์šฉํ•จ.
6๊ทธ๋ฃน : A04๋ฅผ ์ฃผ๋กœ ์ด์šฉํ•จ.

 

 

 

2) ์ด์šฉํŒจํ„ด

 

์š”์ผ๋ณ„ ์ด์šฉ ๋น„์œจ(๊ฐ ์š”์ผ ์ด์šฉ ํšŸ์ˆ˜ / ์ด ์ด์šฉ ํšŸ์ˆ˜)์„ ์ด์šฉํ•ด k-means ํด๋Ÿฌ์Šคํ„ฐ๋ง.

# ์š”์ผ๋ณ„ ์ด์šฉ ๋น„์œจ
rct_pdde$wday <- as.factor(wday(ymd(rct_pdde$de_dt)))
str(rct_pdde)
dat <- data.frame(dcast(rct_pdde,cust~wday))
str(dat)
copu$wday <- as.factor(wday(ymd(copu$de_dt)))
dat2 <- data.frame(dcast(copu,cust~wday))
str(dat2)
names(dat) <- c("cust","sun","mon","tue","wed","thr","fri","sat")
demo <- merge(x = demo,y = dat,by="cust", all.x = TRUE)
demo <- merge(x = demo,y = dat2,by="cust", all.x = TRUE)
str(demo)
demo[is.na(demo)] <- 0
head(demo)
demo$sun <- apply(demo[,c("sun","X1")],1,sum) #pdde์™€ copu ํ•ฉํ•˜๊ธฐ
demo$mon <- apply(demo[,c("mon","X2")],1,sum)
demo$tue <- apply(demo[,c("tue","X3")],1,sum)
demo$wed <- apply(demo[,c("wed","X4")],1,sum)
demo$thr <- apply(demo[,c("thr","X5")],1,sum)
demo$fri <- apply(demo[,c("fri","X6")],1,sum)
demo$sat <- apply(demo[,c("sat","X7")],1,sum)
demo <- demo[,c(1:27)]
# ์š”์ผ๋ณ„ ์˜์ˆ˜์ฆ์„ ์ด ์˜์ˆ˜์ฆ์œผ๋กœ ๋‚˜๋ˆ„๊ธฐ
demo$sun <- demo$sun/demo$rct
demo$mon <- demo$mon/demo$rct
demo$tue <- demo$tue/demo$rct
demo$wed <- demo$wed/demo$rct
demo$thr <- demo$thr/demo$rct
demo$fri <- demo$fri/demo$rct
demo$sat <- demo$sat/demo$rct

# ์š”์ผ๋ณ„ ํŒจํ„ด(pat2) ์ƒ์„ฑ
training.data <- demo[demo$rct!=1,c(19:25)]
str(training.data)
#elbow method
install.packages("factoextra") 
library(factoextra)
fviz_nbclust(training.data, kmeans, method = "wss", k.max = 9) + 
  theme_minimal() + 
  ggtitle("Elbow Method")
#silhouette
kClusters <-  2:9
resultForEachK2 <- data.frame(k = kClusters, silAvg = rep(NA, length(kClusters))) 
#์ „์ฒด ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜ ํ‰๊ท  ๊ฒฐ๊ณผ ๊ณ„์‚ฐ
for(i in 1:length(kClusters)){
  resultForEachK2$silAvg[i] <- avg_sil(kClusters[i], training.data)
}
#์‹ค๋ฃจ์—ฃ๊ณ„์ˆ˜ ๊ทธ๋ž˜ํ”„
plot(resultForEachK2$k, resultForEachK2$silAvg,
     type = "b", pch = 19, frame = FALSE, 
     xlab = "Number of clusters K",
     ylab = "Average Silhouettes")

wday_km <- kmeans(training.data,5)
demo$pat2[demo$rct!=1] <- as.factor(wday_km$cluster)

 

 

2๊ฐœ์ผ ๋•Œ ์‹ค๋ฃจ์—ฃ๊ณ„์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋†’์ง€๋งŒ ์„ธ๋ถ„ํ™”๋ฅผ ์œ„ํ•ด 5๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋กœ ์ด๋ฃจ์–ด์ง„ ๋ณ€์ˆ˜๋ฅผ ์ƒ์„ฑํ–ˆ๋‹ค. 

์›”, ํ™”, ๋ชฉ, ๊ธˆ์š”์ผ์€ 3๊ทธ๋ฃน์˜ ์ด์šฉ๋น„์œจ์ด ๊ฐ€์žฅ ๋†’์€ ํŽธ.

 

1๊ทธ๋ฃน์€ ์ˆ˜์š”์ผ, 2๊ทธ๋ฃน์€ ํ† ์š”์ผ, 5๊ทธ๋ฃน์€ ์ผ์š”์ผ ์ด์šฉ๋น„์œจ์ด ๋†’๋‹ค.

 

  • ์ผ์ฃผ์ผ ์ด์šฉํŒจํ„ด(pat2) ๋ณ€์ˆ˜ ์ƒ์„ฑ ๊ฒฐ๊ณผ
1๊ทธ๋ฃน : ์ฃผ๋กœ ์ˆ˜์š”์ผ์— ์ด์šฉ.
2๊ทธ๋ฃน : ํ† >์ผ.ํ† ์š”์ผ์— ๊ฐ€์žฅ ๋งŽ์ด ์ด์šฉ. ์ฃผ์ค‘๋ณด๋‹ค๋Š” ์ฃผ๋ง ์ด์šฉ์ด ๋งŽ์€ ํŽธ.
3๊ทธ๋ฃน : ์ฃผ์ค‘์— ์ฃผ๋กœ ์ด์šฉ. ์ฃผ์ค‘์—์„œ๋Š” ์ˆ˜์š”์ผ์ด ๋‚ฎ์€ ํŽธ.
4๊ทธ๋ฃน : ๊ธˆ, ํ† , ์ผ์— ์ฃผ๋กœ ์ด์šฉ. ํ† ์š”์ผ์ด ๊ฐ€์žฅ ๋†’์Œ.
5๊ทธ๋ฃน : ์ผ>ํ† . ์ผ์š”์ผ์— ์ฃผ๋กœ ์ด์šฉ. ์ฃผ์ค‘๋ณด๋‹ค๋Š” ์ฃผ๋ง ์ด์šฉ์ด ๋งŽ์€ ํŽธ.

 

 

3) ์ƒํ’ˆ ์ทจํ–ฅ

 

๊ตฌ๋งคํ•œ ์ƒํ’ˆ์˜ ๋Œ€๋ถ„๋ฅ˜๋ฅผ ์ด์šฉํ•ด ์ƒํ’ˆ ์ทจํ–ฅ์„ ํŒŒ์•…ํ–ˆ๋‹ค.

์ด ๋Œ€๋ถ„๋ฅ˜์˜ ๊ฐœ์ˆ˜๋Š” 60๊ฐœ. ์ด ์ค‘์—์„œ ์ทจํ–ฅํŒŒ์•…์— ๋„์›€์ด ๋˜์ง€ ์•Š๋Š” ๋Œ€๋ถ„๋ฅ˜๋Š” ์ œ์™ธํ•˜๋„๋ก ํ–ˆ๋‹ค.


๋จผ์ € ๊ฐ ๊ณ ๊ฐ์ด ๋Œ€๋ถ„๋ฅ˜๋งˆ๋‹ค ๊ตฌ์ž…ํ•œ ํšŸ์ˆ˜๋ฅผ ์•Œ ์ˆ˜ ์žˆ๋Š” ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•˜๊ณ , ๊ฐ ๋Œ€๋ถ„๋ฅ˜์˜ ๋ถ„์‚ฐ๊ณผ ์ƒ๊ด€๊ณ„์ˆ˜๋ฅผ ํ™•์ธํ–ˆ๋‹ค.

 

# ์ƒํ’ˆ์ทจํ–ฅ ํŒŒ์ƒ๋ณ€์ˆ˜(RC1~6) ์ƒ์„ฑ ---------------------------------

#๊ณ ๊ฐX๋Œ€๋ถ„๋ฅ˜ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ƒ์„ฑ
hlv <- dcast(pdde,cust~clac_hlv_nm,sum,value.var="buy_ct")
new_hlv <- hlv #์ž ๊น ๋ณต์‚ฌ
new_hlv <- new_hlv[-c(1567,23258),] #์ด์ƒ์น˜ ์ œ๊ฑฐ
new_hlv <- new_hlv[,-1]

#๋Œ€๋ถ„๋ฅ˜๋ณ€์ˆ˜ ์„ ํƒ
library(caret)
names(new_hlv[,nearZeroVar(new_hlv)]) # 0์— ๊ฐ€๊นŒ์šด ๋ถ„์‚ฐ
new_hlv <- new_hlv[, -nearZeroVar(new_hlv)] #์ œ๊ฑฐ

findCorrelation(cor(new_hlv),cutoff = 0.4) # ์ƒ๊ด€๊ณ„์ˆ˜ 0.4์ด์ƒ
cor(new_hlv)[c(10,11,33,30,5,6,35,3,15,27),c(10,11,33,30,5,6,35,3,15,27)] #0.4์ด์ƒ
new_hlv <- new_hlv[,-c(11,33,30,5,6,35,3,15,27)] #10 ๋‘๊ณ  ๋‚˜๋จธ์ง€ ์ œ๊ฑฐ

๋ถ„์‚ฐ์ด 0์— ๊ฐ€๊นŒ์šด ๋Œ€๋ถ„๋ฅ˜๋ฅผ ์ œ์™ธํ•˜๊ณ , ์ƒ๊ด€๊ณ„์ˆ˜๊ฐ€ ๋†’์€ ๋Œ€๋ถ„๋ฅ˜๋Š” ํ•˜๋‚˜๋งŒ ๋‚จ๊ฒจ๋‘๊ณ  ์ œ์™ธํ–ˆ๋‹ค.

 

2023.3.13 ์…€ํ”„ ๋ฆฌ๋ทฐ

- ์ƒ๊ด€๊ณ„์ˆ˜๊ฐ€ ๋†’์€ ๋ณ€์ˆ˜๋“ค์„ ์ œ๊ฑฐํ•˜์ง€ ์•Š๊ณ  FA๋ฅผ ์ง„ํ–‰ํ–ˆ๋‹ค๋ฉด ๋” ํ•ด์„ํ•˜๊ธฐ ์‰ฌ์šด ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™”์„ ๊ฒƒ์ด๋‹ค.
- ์š”์ธ ์ถ”์ถœ ์ „, scale์„ ํ†ตํ•ด ๋ถ„์‚ฐ์„ 1์œผ๋กœ ๋งŒ๋“ค์–ด์•ผํ•œ๋‹ค.
- KMO, Bartlett์„ ํ†ตํ•ด FA๊ฐ€ ์ ํ•ฉํ•œ์ง€ ํ™•์ธํ•œ ํ›„์— ์ง„ํ–‰ํ•˜๋Š” ๊ฒƒ์ด ์ข‹๋‹ค.
- KMO(Kaiser-Meyer-Olkin factor adequacy) : ๋ณ€์ˆ˜๋“ค์ด ์ƒ๊ด€์„ ๊ฐ–๋Š” ์ •๋„๋กœ, 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ์š”์ธ๋ถ„์„์„ ํ•˜๋Š” ๊ฒƒ์ด ์ ํ•ฉํ•œ ๋ณ€์ˆ˜๋ผ๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•˜๋ฉฐ, 0.5 ์ดํ•˜๋Š” ์š”์ธ๋ถ„์„์„ ํ•˜๊ธฐ์— ๋ถ€์ ํ•ฉํ•œ ๊ฒƒ์œผ๋กœ ์ฃผ๋กœ ์—ฌ๊ฒจ์ง.
- Bartlett : ๋ชจ๋“  ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ 0์ด๋ผ๋Š” ๊ท€๋ฌด๊ฐ€์„ค์„ ์„ธ์›Œ๋†“๊ณ  ์ด๋ฅผ ๊ฒ€์ •. ์นด์ด์ œ๊ณฑ๊ณผ p๊ฐ’์„ ๋ณด๊ณ  ํŒ๋‹จํ•œ๋‹ค. p๊ฐ’์ด ์œ ์˜ํ•œ ์ˆ˜์ค€(< 0.05)์ด๋ฉด ๋งŒ์กฑํ•œ๋‹ค. ์นด์ด์ œ๊ณฑ์€ ํ‘œ๋ณธ ํฌ๊ธฐ์— ๋ฏผ๊ฐํ•˜๊ธฐ์— ์ฃผ์˜.
new_hlv <- scale(new_hlv)
KMO(new_hlv)
cortest.bartlett(new_hlv)โ€‹


 

library(psych)
library(GPArotation)
hlv.factor <- principal(new_hlv, rotate='none')
names(hlv.factor)
hlv.factor$values
plot(hlv.factor$values, type="b")

๊ณ ์œ ๊ฐ’ eigenvalue : ๊ฐ ์š”์ธ์ด ์„ค๋ช…ํ•˜๋Š” ๋ถ„์‚ฐ์˜ ํฌ๊ธฐ. ๊ฐ ๋ณ€์ˆ˜๋“ค์ด ํ•ด๋‹น ์š”์ธ์— ์ ์žฌ๋œ ๊ฐ’์„ ์ œ๊ณฑํ•˜์—ฌ ๋ชจ๋‘ ํ•ฉํ•œ ๊ฐ’. ์š”์ธ๋ถ„์„์—์„œ๋Š” ๊ฐ ๋ณ€์ˆ˜์˜ ์ „์ฒด ๋ถ„์‚ฐ์„ 1์œผ๋กœ ๋ณธ๋‹ค. ๊ณ ์œ ๊ฐ’์ด 1๋ณด๋‹ค ์ž‘์œผ๋ฉด ๊ทธ ์š”์ธ์˜ ์„ค๋ช…๋ ฅ์€ ๋ณ€์ˆ˜ ํ•˜๋‚˜๋ณด๋‹ค๋„ ์ž‘์Œ์„ ๋œปํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์š”์ธ์˜ ์ˆ˜๋ฅผ ์ •ํ•  ๋•Œ ๊ณ ์œ ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ ์‚ฌ์šฉํ•˜๊ธฐ๋„ ํ•œ๋‹ค.

 

๊ณ ์œ ๊ฐ’์ด 1๋ณด๋‹ค ํฐ ๊ฐ’์ด 6๊ฐœ์ด๋ฏ€๋กœ 6๊ฐœ ์ธ์ž๋ฅผ ์„ ํƒํ–ˆ๋‹ค.

 

2023.3.13 ์…€ํ”„ ๋ฆฌ๋ทฐ

- ๋” ๊ฐ„๋‹จํ•˜๊ฒŒ ์ทจํ–ฅ์„ ๋‚˜๋ˆ„๊ณ  ์‹ถ๋‹ค๋ฉด, screeplot์—์„œ ๊ธฐ์šธ๊ธฐ๊ฐ€ ๊ธ‰์†๋„๋กœ ์ค„์–ด๋“œ๋Š” 3๋ฒˆ์งธ ์ธ์ž๊นŒ์ง€๋กœ ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

ํ•ด์„์˜ ํŽธ์˜๋ฅผ ์œ„ํ•ด ์ง๊ตํšŒ์ „์„ ์‚ฌ์šฉํ–ˆ๋‹ค. 

hlv.Varimax = principal(new_hlv, nfactors = 6, rotate="varimax")
hlv.Varimax
loadings(hlv.Varimax) 

#RC1:์‚ด๋ฆผ๋ ฅ RC2:ํŒจ์…˜๊ด€์‹ฌ๋„ RC3:์ทจ๋ฏธ์˜ ํ–‰๋ณต๊ธฐ์—ฌ๋„ 
#RC5:์œคํƒํ•œ ์‚ถ ์š•๊ตฌ RC4:์•„์ด์œ„ํ•œ ์†Œ๋น„ RC6:๊ฐ€์กฑ์™ธ์‹

 

2023.3.13 ์…€ํ”„ ๋ฆฌ๋ทฐ

FA๊ฐ€ ์•„๋‹Œ, principal ํ•จ์ˆ˜๋กœ PCA๋ฅผ ํ–ˆ๋‹ค. (๊ฑฐ์˜ ๋น„์Šทํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ์ง„ํ–‰๋˜๊ธฐ์—  FA์™€ PCA๋ฅผ ์„ž์–ด์„œ ์ž˜๋ชป๋œ ๋ฐฉ๋ฒ•์œผ๋กœ ์“ฐ๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค๊ณ  ํ•จ.)
FA์™€ PCA์˜ ์ฐจ์ด์— ๋Œ€ํ•ด ๋” ๊ณต๋ถ€ํ•ด๋ณธ ๊ฒฐ๊ณผ, ์ด ๊ฒฝ์šฐ์—์„œ๋Š” FA๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹๋‹ค.
์ง€๊ธˆ ์˜๋„ํ•˜๋Š” ๊ฒƒ์€ ๋Œ€๋ถ„๋ฅ˜ ๋ณ€์ˆ˜๋“ค์„ ์ถ•์†Œํ•ด์„œ ๋” ํฐ ๋Œ€๋ถ„๋ฅ˜๋กœ ๋งŒ๋“œ๋ ค๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ
๊ตฌ๋งค ์ทจํ–ฅ์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜๋ฅผ ๋งŒ๋“ค๊ณ ์‹ถ์€ ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์—,
๊ฐ ์š”์ธ์— ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์กด์žฌํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์ธ์ •ํ•˜๊ณ , ๊ณ ์œ ์š”์ธ์˜ ๋ถ„์‚ฐ์„ ์ธ์ •ํ•˜๊ณ , ์„ ์ •๋œ ๊ฐ ์š”์ธ์˜ ์ค‘์š”์„ฑ์ด ๋‹ค๋ฅด์ง€ ์•Š๋„๋ก ํ•˜๋Š” ์š”์ธ๋ถ„์„(FA)๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.
์ƒ๊ด€๊ด€๊ณ„๋ฅผ ์ธ์ •ํ•˜๋ฏ€๋กœ ์š”์ธํšŒ์ „์€ obliminํšŒ์ „์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

#์š”์ธ๊ฐœ์ˆ˜ ์ •ํ•˜๊ธฐ------------------
library(psych)
eigen(cor(new_hlv)) #8๊ฐœ๊นŒ์ง€
 # fac <- principal(new_hlv, rotation='none')#์ดˆ๊ธฐ.method:principal
 # fac$values #๊ฐ™์€๊ฒฐ๊ณผ
#d <- fac(new_hlv, rotation='none') #method:minimum residual. mle๋Š” factor๊ฐœ์ˆ˜ ์ง€์ • ํ•„์š”..
#d$values #8๊ฐœ๊นŒ์ง€
plot(fac$values, type="b")#3๊ฐœ๊นŒ์ง€
scree(cor(new_hlv, use = "pairwise.complete.obs"))#3๊ฐœ๊นŒ์ง€

#3๊ฐœ-------------------
fac3<- factanal(new_hlv,rotation= 'varimax', # ํšŒ์ „๋ฐฉ๋ฒ• ์ง€์ • 
              # scores= 'regression', # ์š”์ธ์ ์ˆ˜ ๊ณ„์‚ฐ ๋ฐฉ๋ฒ• ์ง€์ •
               factors = 3) #method:mle
#fa(new_hlv, nfactors= 3, rotate = "varimax")

fac3

fac3$scores # ์ธ์ž ๋“์  == ์› ๋ฐ์ดํ„ฐ ์ขŒํ‘œ
sort(fac3$uniquenesses)
print(fac3$loadings, cut=.4, sort=T, digits = 3) 

#์˜ค๋ธ”๋ฆฌ๋ฏผ
fac3o<- factanal(new_hlv,rotation= 'oblimin', factors = 3)
print(fac3o$loadings, cut=.4, sort=T, digits = 3) 

# 8๊ฐœ ------------------------
fac8<- factanal(new_hlv,rotation= 'varimax',
                factors = 8)
fac8
fac8$scores
sort(fac8$uniquenesses)
print(fac8$loadings, cut=.4, sort=T, digits = 3) 

#์˜ค๋ธ”๋ฆฌ๋ฏผ
fac8o<- factanal(new_hlv,rotation= 'oblimin', factors =8)
print(fac8o$loadings, cut=.4, sort=T, digits = 8)


 ์ฐธ๊ณ 

sort(hlv.Varimax$uniquenesses) #๊ณ ์œ ์š”์ธ์˜ ๋ถ„์‚ฐ = 1-๊ณตํ†ต์š”์ธ์˜๋ถ„์‚ฐ
print(hlv.Varimax$loadings, cut=.5, sort=T, digits = 3) #์ •๋ ฌ

 

- loadings : ์š”์ธ์ด ๋ณ€์ˆ˜์— ๋ผ์น˜๋Š” ๊ณต๋ถ„์‚ฐ์˜ ํฌ๊ธฐ. ์š”์ธ๊ณผ ๋ณ€์ˆ˜๋“ค ๊ฐ„์˜ ์ƒ๊ด€๊ณ„์ˆ˜. (ยฑ0.5์ •๋„๊ฐ€ ์œ ์˜)

- SS loadings(eigen value) : ์ธ์ž ์ ์žฌ๋Ÿ‰(loadings)์˜ ์ œ๊ณฑํ•ฉ. ๋†’์„์ˆ˜๋ก ๋ณ€์ˆ˜๊ฐ„ ์œ ์‚ฌ์„ฑ์ด ์ข‹๋‹ค.

- Proportion Var : ์ „์ฒด๋ถ„์‚ฐ ์ค‘ ๊ฐ ์ฃผ์„ฑ๋ถ„์ด ์ฐจ์ง€ํ•˜๋Š” ๋ถ„์‚ฐ์˜ ๋น„์œจ์„ ์˜๋ฏธ.
- Cumulative Var : ์ถ”์ถœ๋œ ์ฃผ์„ฑ๋ถ„์˜ ๋ถ„์‚ฐํ•ฉ. ์ด ๋ฐ์ดํ„ฐ์—์„œ๋Š” 36.3%๋ฅผ ์„ค๋ช…ํ•œ๋‹ค.

 

RC1๋ถ€ํ„ฐ RC6๊นŒ์ง€์˜ ํ•ด์„์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

RC 1 : ์‚ด๋ฆผ๋ ฅ
RC 2 : ํŒจ์…˜๊ด€์‹ฌ๋„
RC 3 : ์ทจ๋ฏธ์˜ ํ–‰๋ณต๊ธฐ์—ฌ๋„
RC 4 : ์•„์ด์œ„ํ•œ ์†Œ๋น„
RC 5 : ์œคํƒํ•œ ์‚ถ์— ๋Œ€ํ•œ ์š•๊ตฌ
RC 6 : ์™ธ์‹, ์ž„๋Œ€

์œ„ ํ•ด์„์„ ๊ธฐ๋ฐ˜์œผ๋กœ, score๋ฅผ ์ด์šฉํ•ด ์ƒˆ๋กœ์šด ๊ณ ๊ฐ ์ทจํ–ฅ ๋ณ€์ˆ˜ 6๊ฐœ๋ฅผ ์ƒ์„ฑํ–ˆ๋‹ค.

 

 

2023.3.13 ์…€ํ”„ ๋ฆฌ๋ทฐ

์ƒ๊ด€๊ณ„์ˆ˜๊ฐ€ ๋†’์€ ๋ณ€์ˆ˜๋ฅผ ์ œ๊ฑฐํ•˜์ง€ ์•Š๊ณ , FA๋ถ„์„์„ obliminํšŒ์ „์„ ์ด์šฉํ•ด ๋‹ค์‹œ ์ง„ํ–‰ํ•ด๋ดค์„ ๋•Œ,
[3์š”์ธ ]
์š”์ธ1 : ์‹ํ’ˆ ๊ตฌ๋งค ํƒ€์ž…
์š”์ธ2 : ์‹์žฌ๋ฃŒ ๊ตฌ๋งค ํƒ€์ž…
์š”์ธ3 : ํŒจ์…˜, ์‹๊ธฐ/์กฐ๋ฆฌ๊ธฐ๊ตฌ ๊ตฌ๋งค ํƒ€์ž…
[8์š”์ธ]
์š”์ธ1 : ์‹์žฌ๋ฃŒ ๊ตฌ๋งค ํƒ€์ž…
์š”์ธ2 : ํŒจ์…˜ ๊ตฌ๋งค ํƒ€์ž…
์š”์ธ3 : ์ž์ทจ ๊ฐ„ํŽธ์‹(๊ณผ์ž,๋Œ€์šฉ์‹) ๊ตฌ๋งค ํƒ€์ž…
์š”์ธ4 : ๊น”๋” ์„ ํ˜ธ(์„ธ์ œ/์œ„์ƒ, ํผ์Šค๋„์ผ€์–ด)
์š”์ธ5 : ์ง์žฅ์ธ์šฉํ’ˆ(์‚ฌ๋ฌด์šฉํ’ˆ,๊ณต๊ตฌ)
์š”์ธ6 : ์ฒญ์†Œ์šฉํ’ˆ ๊ตฌ๋งค ํƒ€์ž…
์š”์ธ7 : ๋งˆ์‹ค ๊ฒƒ ์„ ํ˜ธ
์š”์ธ8 : ์™ธ์‹ ์„ ํ˜ธ

๋น„๊ต์  ์š”์ธ๊ฐ„ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์—†์–ด๋ณด์ด์ง€ ์•Š๋Š” ๊ฒฐ๊ณผ๋กœ ๋ฐ”๋€Œ์—ˆ๋‹ค.

 


 

3. ๊ณ ๊ฐ ๊ตฐ์ง‘ํ™”

 

12๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์ด์šฉํ•ด ๊ณ ๊ฐ์„ ๊ตฐ์ง‘ํ™” ์ง„ํ–‰.

์—ฐ์†ํ˜•๊ณผ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜๊ฐ€ ํ•จ๊ป˜ ์žˆ๋Š” ์ž๋ฃŒ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ์œ„ํ•ด

K-Prototypes ์ด์šฉ.

๊ณ ๊ฐ์„ 6๊ฐœ ๊ตฐ์ง‘์œผ๋กœ ์„ธ๋ถ„ํ™”.

 

 

 

 

๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ

 

  • 1๊ทธ๋ฃน

๋”๋ณด๊ธฐ

18๋ช…

์—ฌ์„ฑ 39%, ๋‚จ์„ฑ 61%

40๋Œ€ 67%, 30๋Œ€ 17%, 50๋Œ€ 11%

์ด์šฉํŒจํ„ด 4: 67%, 5: 22%, 2: 11%

์ œํœด์‚ฌ์„ฑํ–ฅ 4: 100%

์ด์šฉํšŸ์ˆ˜ ํ‰๊ท  188ํšŒ

์ด์šฉ๊ธˆ์•ก ํ‰๊ท  3์–ต 2์ฒœ๋งŒ์›

ํŒจ์…˜๊ด€์‹ฌ๋„ ํ‰๊ท  4.18   ์œคํƒํ•œ์‚ถ์š•๊ตฌ ํ‰๊ท  0.06

์™ธ์‹,์ž„๋Œ€ 1.7

40๋Œ€ ์ „ํ›„์˜ ์ด์šฉํšŸ์ˆ˜๊ฐ€ ๋งŽ๊ณ  ๊ธˆ์•ก์ด ๋งค์šฐ ๋†’์€ ๊ณ ๊ฐ.

๋Œ€๋ถ€๋ถ„ A01 ์ œํœด์‚ฌ๋ฅผ ์ด์šฉํ•˜๋ฉฐ ๊ธˆ,ํ† ,์ผ, ๊ทธ์ค‘ ํ† ์š”์ผ์— ๊ฐ€์žฅ ๋งŽ์ด ์ด์šฉํ•จ.

ํŒจ์…˜๊ด€์‹ฌ๋„๊ฐ€ ๋งค์šฐ ๋†’๊ณ  ์™ธ์‹๋„ ์ข…์ข… ํ•˜๋Š” ํŽธ.

โ‡’ 1๊ทธ๋ฃน์€ ํฐ ์†Œ๋น„๋ฅผ ํ•˜๋Š” ๊ทœ๋ชจ๊ฐ€ ์ž‘์€ ๊ทธ๋ฃน.

A01์ œํœด์‚ฌ์—์„œ ํ† ์š”์ผ์— 40๋Œ€ ๊ณ ๊ฐ์ด ์ข‹์•„ํ•˜๋Š” ๋ธŒ๋žœ๋“œ์˜ ํ”„๋กœ๋ชจ์…˜ ํ–‰์‚ฌ์— ์ดˆ๋Œ€ํ•˜๋Š” ๋“ฑ์˜ ๊ฐœ์ธ๋งˆ์ผ€ํŒ…์„ ํ•œ๋‹ค๋ฉด ๊ตฌ๋งค๋ฅผ ์œ ๋„ํ•ด ๋งค์ถœ์„ ๋” ๋†’์ผ ์ˆ˜ ์žˆ์Œ.

 

  • 2๊ทธ๋ฃน

๋”๋ณด๊ธฐ

6663๋ช…

์—ฌ์„ฑ 83%, ๋‚จ์„ฑ 18%

40๋Œ€ 50%, 30๋Œ€ 20%, 50๋Œ€ 15%

์ด์šฉํŒจํ„ด 4: 71%, 3: 10%

์ œํœด์‚ฌ์„ฑํ–ฅ 5: 46%, 2: 12%, 4: 12%, 3: 11%

์ด์šฉํšŸ์ˆ˜ ํ‰๊ท  57ํšŒ

์ด์šฉ๊ธˆ์•ก ํ‰๊ท  220๋งŒ์›

์‚ด๋ฆผ๋ ฅ ํ‰๊ท  0.2

์ทจ๋ฏธ์˜ ํ–‰๋ณต๊ธฐ์—ฌ๋„ ํ‰๊ท  0.19

40๋Œ€ ์ „ํ›„์˜ ์—ฌ์„ฑ. A02๋ฅผ ๊ฝค ์ž์ฃผ ์ด์šฉ. ์‚ด๋ฆผ๋ ฅ๊ณผ ์ทจ๋ฏธ์˜ ํ–‰๋ณต๊ธฐ์—ฌ๋„๊ฐ€ ํ‰๊ท ๋ณด๋‹ค ์‚ด์ง ๋†’์Œ. ๊ธˆ,ํ† ,์ผ์— ์ฃผ๋กœ ์ด์šฉํ•˜๊ณ  ํ† ์š”์ผ์— ๊ฐ€์žฅ ๋งŽ์ด ์ด์šฉ.

โ‡’ ๊ฐ€์žฅ ๊ทœ๋ชจ๊ฐ€ ํฐ ๊ทธ๋ฃน. ์ฃผ๋ง์— ์žฅ์„ ๋ณด๋Š” ๊ฐ€์ •.

A02์ œํœด์‚ฌ์—์„œ ํ† ์š”์ผ์— ์‚ด๋ฆผ๊ณผ ๊ด€๋ จ๋œ ์ƒํ’ˆ์˜ ๋งˆ์ผ€ํŒ… ์ง„ํ–‰ ์‹œ ๋†’์€ ํšจ๊ณผ๋ฅผ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ์Œ.

 

  • 3๊ทธ๋ฃน

๋”๋ณด๊ธฐ

4904๋ช…

์—ฌ์„ฑ 30%, ๋‚จ์„ฑ 70%

30๋Œ€ 53%, 20๋Œ€ 15%, 40๋Œ€ 14%

์ด์šฉํŒจํ„ด 3: 49%, 4: 21%

์ œํœด์‚ฌ์„ฑํ–ฅ 3: 39%, 6: 14%, 1: 14%

์ด์šฉํšŸ์ˆ˜ ํ‰๊ท  43ํšŒ

์ด์šฉ๊ธˆ์•ก ํ‰๊ท  139๋งŒ์›

30๋Œ€ ์ „ํ›„์˜ ๋‚จ์„ฑ. ์ฃผ์ค‘ ์ด์šฉ๋Ÿ‰์ด ๋งŽ์€ ํŽธ. ํŠน๋ณ„ํžˆ ์ž์ฃผ ์ด์šฉํ•˜๋Š” ์ œํœด์‚ฌ ์—†์Œ.

โ‡’ ์ด์šฉ ํšŸ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ์ ์Œ. ํ•„์š”ํ•˜๋‹ค๊ณ  ๋А๋‚„ ๋•Œ๋งŒ ์ด์šฉํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์–ด๋ณด์ž„.

๊ณ ๊ฐ์ด ๋ชฐ๋ž๋˜ ํ•„์š”ํ•œ ์ƒํ’ˆ์„ ์ถ”์ฒœํ•˜๋ฉฐ ์ด์šฉ์„ ์œ ๋„ํ•˜๋Š” ๋ฐฉ๋ฒ•.

 

  • 4๊ทธ๋ฃน

๋”๋ณด๊ธฐ

107๋ช…

์—ฌ์„ฑ 69%, ๋‚จ์„ฑ 31%

40๋Œ€ 40%, 30๋Œ€ 27%, 50๋Œ€ 25%

์ด์šฉํŒจํ„ด 4: 61%, 5: 14%

์ œํœด์‚ฌ์„ฑํ–ฅ 4: 96%

์ด์šฉํšŸ์ˆ˜ ํ‰๊ท  180ํšŒ

์ด์šฉ๊ธˆ์•ก ํ‰๊ท  1์–ต์›

ํŒจ์…˜๊ด€์‹ฌ๋„ ํ‰๊ท  2.4  

์•„์ด์œ„ํ•œ์†Œ๋น„ ํ‰๊ท  0.5

์œคํƒํ•œ ์‚ถ ์š•๊ตฌ ํ‰๊ท  2.5   

์™ธ์‹,์ž„๋Œ€ ํ‰๊ท  1.1

40๋Œ€ ์ „ํ›„์˜ ์ด์šฉ๊ธˆ์•ก์ด ๋†’์€ ๊ณ ๊ฐ.

๋Œ€๋ถ€๋ถ„ A01 ์ œํœด์‚ฌ๋ฅผ ์ด์šฉํ•˜๋ฉฐ ๊ธˆ,ํ† ,์ผ, ๊ทธ์ค‘ ํ† ์š”์ผ์— ๊ฐ€์žฅ ๋งŽ์ด ์ด์šฉํ•จ.

ํŒจ์…˜๊ด€์‹ฌ๋„์™€ ์œคํƒํ•œ ์‚ถ์— ๋Œ€ํ•œ ์š•๊ตฌ๊ฐ€ ๋†’์Œ.

์™ธ์‹์„ ํ•˜๋ฉฐ ์•„์ด๊ฐ€ ์žˆ์„ ํ™•๋ฅ ์ด ์žˆ์Œ.

โ‡’ ํŒจ์…˜๊ด€์‹ฌ๋„์™€ ์‚ถ์˜ ์งˆ์— ๋Œ€ํ•œ ์š•๊ตฌ๊ฐ€ ๋†’๊ธฐ ๋•Œ๋ฌธ์— ์ด๊ฒƒ์„ ๊ฑด๋“œ๋ฆด ์ˆ˜ ์žˆ๋Š”

์ƒˆ๋กœ์šด ์ œํ’ˆ ์ถ”์ฒœ์ด ํšจ๊ณผ๊ฐ€ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๋ณด์ž„.

 

  • 5๊ทธ๋ฃน

๋”๋ณด๊ธฐ

647๋ช…

์—ฌ์„ฑ 72%, ๋‚จ์„ฑ 28%

40๋Œ€ 36%, 30๋Œ€ 28%, 50๋Œ€ 21%

์ด์šฉํŒจํ„ด 4: 59%, 3: 19%

์ œํœด์‚ฌ์„ฑํ–ฅ 4: 86%

์ด์šฉํšŸ์ˆ˜ ํ‰๊ท  143ํšŒ

์ด์šฉ๊ธˆ์•ก ํ‰๊ท  3115๋งŒ์›

์‚ด๋ฆผ๋ ฅ 0.16  

ํŒจ์…˜๊ด€์‹ฌ๋„ ํ‰๊ท  1.4

์•„์ด์œ„ํ•œ์†Œ๋น„ ํ‰๊ท  0.4 

์œคํƒํ•œ ์‚ถ ์š•๊ตฌ ํ‰๊ท  1.5    

์™ธ์‹,์ž„๋Œ€ 0.6

40๋Œ€ ์ „ํ›„์˜ ์—ฌ์„ฑ ๊ณ ๊ฐ.

A01 ์ œํœด์‚ฌ๋ฅผ ๋งŽ์ด ์ด์šฉํ•˜๋ฉฐ ๊ธˆ,ํ† ,์ผ, ๊ทธ์ค‘ ํ† ์š”์ผ์— ๊ฐ€์žฅ ๋งŽ์ด ์ด์šฉํ•จ.

์‚ด๋ฆผ๋ ฅ์ด ์žˆ๊ณ  ํŒจ์…˜๊ด€์‹ฌ๋„์™€ ์œคํƒํ•œ ์‚ถ์— ๋Œ€ํ•œ ์š•๊ตฌ๊ฐ€ ๋†’์Œ.

์™ธ์‹์„ ์ข…์ข… ํ•˜๋ฉฐ ์•„์ด๊ฐ€ ์žˆ์„ ํ™•๋ฅ ์ด ์žˆ์Œ.

โ‡’ 5๊ทธ๋ฃน์—๋Š” ์•„์ด๊ฐ€ ์žˆ๋Š” 3์ธ ์ด์ƒ ๊ฐ€์กฑ์„ ํƒ€๊ฒŸ์œผ๋กœ ํ•œ ๋งˆ์ผ€ํŒ… ์ง„ํ–‰.

(A01 ์ œํœด์‚ฌ ๋‚ด ์‹๋‹น์˜ ๊ธˆ์•ก ์กฐ๊ฑด๋ถ€ ํ• ์ธ์ฟ ํฐ ์ œ๊ณต ๋“ฑ)

 

  • 6๊ทธ๋ฃน

๋”๋ณด๊ธฐ

5547๋ช…

์—ฌ์„ฑ 87%, ๋‚จ์„ฑ 13%

40๋Œ€ 47%, 30๋Œ€ 19%, 50๋Œ€ 17%

์ด์šฉํŒจํ„ด 3: 52%, 4: 18%

์ œํœด์‚ฌ์„ฑํ–ฅ 4: 66%

์ด์šฉํšŸ์ˆ˜ ํ‰๊ท  71ํšŒ

์ด์šฉ๊ธˆ์•ก ํ‰๊ท  441๋งŒ์›

ํŒจ์…˜๊ด€์‹ฌ๋„ ํ‰๊ท  0.2 

์•„์ด์œ„ํ•œ์†Œ๋น„ ํ‰๊ท  0.08

์œคํƒํ•œ ์‚ถ ์š•๊ตฌ ํ‰๊ท  0.1 

์™ธ์‹,์ž„๋Œ€ 0.3

40๋Œ€ ์ „ํ›„์˜ ์—ฌ์„ฑ ๊ณ ๊ฐ.

๋‹ค์–‘ํ•œ ์ œํœด์‚ฌ๋ฅผ ์ด์šฉํ•˜๋ฉฐ ๊ทธ ์ค‘์—์„œ๋Š” A01 ์ œํœด์‚ฌ๋ฅผ ๋งŽ์ด ์ด์šฉ.

์ฃผ์ค‘์— ์ด์šฉ๋Ÿ‰์ด ์žˆ๋Š” ํŽธ.

ํŒจ์…˜๊ด€์‹ฌ๋„์™€ ์œคํƒํ•œ ์‚ถ์— ๋Œ€ํ•œ ์š•๊ตฌ๊ฐ€ ํ‰๊ท ๋ณด๋‹ค ๋†’์Œ.

๊ฐ€์กฑ ์™ธ์‹์„ ๊ฐ€๋” ํ•˜๋Š” ํŽธ์ด๋ฉฐ ์•„์ด๊ฐ€ ์žˆ์„ ํ™•๋ฅ ์ด ์žˆ์Œ.

โ‡’ ์ด์šฉ ํšŸ์ˆ˜์— ๋น„ํ•ด ์ด์šฉ ๊ธˆ์•ก์ด ๋‚ฎ๋‹ค. ๋งŒ์กฑ์Šค๋Ÿฌ์šด ์ œํ’ˆ์„ ์ฐพ์ง€ ๋ชปํ–ˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์œผ๋ฏ€๋กœ ์ƒํ’ˆ์ถ”์ฒœ๊ณผ ํ•จ๊ป˜ ํ• ์ธ์ฟ ํฐ์„ ๋ฐœํ–‰ํ•œ๋‹ค.

 

 


 

4. ๊ตฐ์ง‘ ๋‚ด ์ถ”์ฒœ์‹œ์Šคํ…œ

์ปค๋‹ค๋ž€ ๊ตฐ์ง‘์˜ ๊ณ ๊ฐ์ด ๋ชจ๋‘ ๊ฐ™์€ ์ƒํ’ˆ์„ ๊ตฌ๋งคํ•˜๋Š” ๊ฒƒ์„ ๊ธฐ๋Œ€ํ•˜๊ธฐ๋Š” ์–ด๋ ต๋‹ค.

ํ˜„๋ช…ํ•œ ์†Œ๋น„๋ฅผ ํ•˜๋Š” ๊ณ ๊ฐ์ด ๋งŽ์•„์กŒ๊ณ  ๊ฐœ์ธ๋งˆ๋‹ค ์ทจํ–ฅ์ด ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์—, ๋น„์Šทํ•œ ํŠน์„ฑ์„ ๊ฐ€์ง„ ๊ทธ๋ฃน์•ˆ์—์„œ๋„ ์ƒํ’ˆ์„ ๋ณด๋Š” ๊ด€์ ์ด ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๊ตฐ์ง‘์ด ์•„๋‹Œ ๊ฐœ์ธ์— ๋Œ€ํ•œ ์ƒํ’ˆ ์ถ”์ฒœ์ด ํ•„์š”ํ•˜๋‹ค.

๋˜ํ•œ ๊ณ ๊ฐ์˜ ์ž…์žฅ์—์„œ๋Š” ๋ฒˆ๊ฑฐ๋กญ๊ฒŒ ์ƒํ’ˆ์„ ๋น„๊ตํ•˜๋Š” ๊ณผ์ • ์—†์ด ๋งŒ์กฑ์Šค๋Ÿฌ์šด ์ œํ’ˆ์„ ์ฐพ์„ ์ˆ˜ ์žˆ๊ณ , ๋ชฐ๋ž๋˜ ์ทจํ–ฅ์„ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.

 

ALS ์ถ”์ฒœ์‹œ์Šคํ…œ

ALS๋Š” ์ž ์žฌ ์š”์ธ ํ˜‘์—… ํ•„ํ„ฐ๋ง์„ ์ด์šฉํ•˜๋Š” ๋ชจ๋ธ์ด๋‹ค.

ํ˜‘์—… ํ•„ํ„ฐ๋ง์€ ์‚ฌ์šฉ์ž์™€ ์•„์ดํ…œ๊ฐ„ ์ƒํ˜ธ์ž‘์šฉ์„ ํ™•์ธํ•ด์„œ ํ–‰๋ ฌ์˜ ๋นˆ๊ณต๊ฐ„์„ ์ถ”๋ก ํ•˜๋Š” ๋ฐฉ๋ฒ•์ธ๋ฐ,

์‚ฌ์šฉ์ž์™€ ์•„์ดํ…œ ์‚ฌ์ด์— ์ž ์žฌ๋œ ์–ด๋–ค ์š”์ธ์ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ณ , ํ–‰๋ ฌ ๋ถ„ํ•ด๋ฅผ ํ†ตํ•ด ๊ทธ ์š”์ธ์„ ์ฐพ์•„๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

(์‚ฌ์šฉ์ž-์•„์ดํ…œ ์ƒํ˜ธ ์ž‘์šฉ ํ–‰๋ ฌ์„ ๋‘ ๊ฐœ์˜ ์ € ์ฐจ์› ์ง์‚ฌ๊ฐํ˜• ํ–‰๋ ฌ์˜ ๊ณฑ์œผ๋กœ ๋ถ„ํ•ดํ•˜์—ฌ ์ž‘๋™)

์‚ฌ์šฉ์ž์˜ ์ž ์žฌ์š”์ธ๊ณผ ์•„์ดํ…œ์˜ ์ž ์žฌ์š”์ธ์„ ๋‚ด์ ํ•ด์„œ ๊ตฌ๋งค ํ–‰๋ ฌ์„ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•œ๋‹ค.

ALS๋Š” ์‚ฌ์šฉ์ž,์•„์ดํ…œ ํ–‰๋ ฌ ์ค‘ ํ•˜๋‚˜๋ฅผ ๊ณ ์ •์‹œํ‚ค๊ณ  ๋‹ค๋ฅธ ํ•˜๋‚˜์˜ ํ–‰๋ ฌ์„ ์ตœ์ ํ™”ํ•˜๋Š” ๊ฒƒ์„ ๋ฐ˜๋ณตํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

 

    • ALS ์„ ์ •์ด์œ 
      1.  ์ˆ˜๋ ดํ•˜๋Š” ํ–‰๋ ฌ์„ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.์‚ฌ์šฉ์ž ํ–‰๋ ฌ๊ณผ ์•„์ดํ…œ ํ–‰๋ ฌ์„ ๋™์‹œ์— ์ตœ์ ํ™”ํ•˜๋ฉด Non-convexํ•˜๊ธฐ๋•Œ๋ฌธ์— local minima๋ฅผ ๊ฐ€์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ํ•˜์ง€๋งŒ ALS ๋ชจ๋ธ์€ ํ•˜๋‚˜์˜ ํ–‰๋ ฌ์„ ๊ณ ์ •ํ•˜๊ณ  ๋‚˜๋จธ์ง€ ํ•˜๋‚˜์˜ ํ–‰๋ ฌ์„ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ์‹์„ ๋ฐ˜๋ณตํ•˜๊ธฐ๋•Œ๋ฌธ์— Convexํ˜•ํƒœ๋กœ์„œ ์ˆ˜๋ ดํ•˜๋Š” ํ–‰๋ ฌ์„ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค. 
      2. ๊ตฌ๋งค๋ฐ์ดํ„ฐ์˜ ๋ถˆํ™•์‹ค์„ฑ์„ ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ๋‹ค.
        ๊ตฌ๋งค์™€ ๊ฐ™์€ Implicit Feedback์€ ์„ ํ˜ธ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ง€ํ‘œ๊ฐ€ ์•„๋‹ˆ๊ธฐ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ์˜ ์‹ ๋ขฐ๋„๊ฐ€ ๋ถ€์กฑํ•˜๋‹ค. ์‚ฌ์šฉ์ž๊ฐ€ ํ•ด๋‹น ์•„์ดํ…œ์„ ๊ตฌ๋งคํ–ˆ์ง€๋งŒ ๋ถˆ๋งŒ์กฑํ•˜์˜€์„ ์ˆ˜๋„ ์žˆ๊ณ , ๊ตฌ๋งคํ•˜์ง€ ์•Š์€ ๋ชจ๋“  ์•„์ดํ…œ์„ ์‹ซ์–ดํ•˜๋Š” ์•„์ดํ…œ์œผ๋กœ ํŒ๋‹จํ•  ์ˆ˜๋„ ์—†๋‹ค. ์ด๋Ÿฌํ•œ ๋ถˆํ™•์‹ค์„ฑ์„ ALS์˜ Confidence๋ฅผ ํ†ตํ•ด ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

  • ๊ฐ ๊ตฐ์ง‘๋ณ„ ์ถ”์ฒœ์‹œ์Šคํ…œ ์ ์šฉ

์ ์ ˆํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋ฐ์ดํ„ฐ์…‹๋งˆ๋‹ค ๋‹ค๋ฅด๊ณ , ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ํฌ๊ธฐ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ, ์ตœ์ ํ™” ์‹œ๊ฐ„์ด ๋งค์šฐ ๊ธด ๊ฒƒ๋„ ํšจ์œจ์ ์ด์ง€ ์•Š๋‹ค. ๋”ฐ๋ผ์„œ ๊ตฐ์ง‘๋งˆ๋‹ค ๊ฐ๊ฐ ๋‹ค๋ฅธ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ ์šฉํ•ด ์ตœ์ ํ™”์‹œ์ผœ๋ณด๊ณ , ์ ์ •ํ•œ ์‹œ๊ฐ„ ๋‚ด์—์„œ ๊ฐ€์žฅ ์ข‹์€ ํผํฌ๋จผ์Šค๋ฅผ ๋‚ด๋Š” ์ตœ์ ์˜ ๊ตฐ์ง‘๋ณ„ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ฐพ์•˜๋‹ค.

๊ทธ๋ฆฌ๊ณ  ๊ฐ ๊ตฐ์ง‘๋ณ„๋กœ ๋žœ๋คํ•œ ๊ณ ๊ฐ์„ ํ•œ๋ช… ์„ ์ •ํ•ด, ํ•ด๋‹น ๊ณ ๊ฐ์—๊ฒŒ ์ถ”์ฒœํ•˜๋Š” ์ƒํ’ˆ ๋ฆฌ์ŠคํŠธ๋ฅผ ํ™•์ธํ•ด๋ณด์•˜๋‹ค.

 

Loss Function

def loss_function(C, P, xTy, X, Y, r_lambda):
  predict_error = np.square(P - xTy)
  confidence_error = np.sum(C * predict_error)
  regularization = r_lambda * (np.sum(np.square(X))+ np.sum(np.square(Y)))
  total_loss = confidence_error + regularization
  return np.sum(predict_error),confidence_error, regularization, total_loss

def optimizer_user(X, Y, C, P, nu, nf, r_lambda):
  yT = np.transpose(Y)
  for u in range(nu):
    Cu = np.diag(C[u])
    yT_Cu_y = np.matmul(np.matmul(yT,Cu),Y)
    l = np.dot(r_lambda,np.identity(nf))
    yT_Cu_pu = np.matmul(np.matmul(yT,Cu),P[u])
    X[u] = np.linalg.solve(yT_Cu_y + l , yT_Cu_pu)

def optimizer_item(X, Y, C, P, ni, nf, r_lambda):
  xT = np.transpose(X)
  for i in range(ni):
    Ci = np.diag(C[:,i])
    xT_Ci_x = np.matmul(np.matmul(xT,Ci),X)
    l = np.dot(r_lambda, np.identity(nf))
    xT_Ci_pi = np.matmul(np.matmul(xT,Ci),P[:,i])
    Y[i] = np.linalg.solve(xT_Ci_x + l,xT_Ci_pi)
    
predict_errors = []
confidence_errors = []
regularization_list = []
total_losses = []

for i in range(15):
    if i!=0:   
        optimize_user(X, Y, C, P, nu, nf, r_lambda)
        optimize_item(X, Y, C, P, ni, nf, r_lambda)
    predict = np.matmul(X, np.transpose(Y))
    predict_error, confidence_error, regularization, total_loss = loss_function(C, P, predict, X, Y, r_lambda)
    
    predict_errors.append(predict_error)
    confidence_errors.append(confidence_error)
    regularization_list.append(regularization)
    total_losses.append(total_loss)
    
    print('----------------step %d----------------' % i)
    print("predict error: %f" % predict_error)
    print("confidence error: %f" % confidence_error)
    print("regularization: %f" % regularization)
    print("total loss: %f" % total_loss)
    
predict = np.matmul(X, np.transpose(Y))
print('final predict')
print([predict])

from matplotlib import pyplot as plt
%matplotlib inline

plt.subplots_adjust(wspace=100.0, hspace=20.0)
fig = plt.figure()
fig.set_figheight(10)
fig.set_figwidth(10)
predict_error_line = fig.add_subplot(2, 2, 1)
confidence_error_line = fig.add_subplot(2, 2, 2)
regularization_error_line = fig.add_subplot(2, 2, 3)
total_loss_line = fig.add_subplot(2, 2, 4)

predict_error_line.set_title("Predict Error") 
predict_error_line.plot(predict_errors)

confidence_error_line.set_title("Confidence Error")
confidence_error_line.plot(confidence_errors)

regularization_error_line.set_title("Regularization")
regularization_error_line.plot(regularization_list)

total_loss_line.set_title("Total Loss")
total_loss_line.plot(total_losses)
plt.show()

# ๊ณ ๊ฐ์ด ์‚ฐ ๋ฌผํ’ˆ
k = test_df.iloc[rcust]
k_index = np.array(k[k != 0].index)
k_index


#cust_id =๊ณ ๊ฐ ์•„์ด๋””, user_item: ์˜ˆ์ธก ํ–‰๋ ฌ, real:์‹ค์ œ ํ–‰๋ ฌ, N: ์ถ”์ฒœ ํ’ˆ๋ชฉ ๊ฐœ์ˆ˜
def recommend(cust_id, user_item, real, N):
  pred_cust = predict_df.loc[cust_id]
  real_cust = real.loc[cust_id]
  real_index = np.array(real_cust[real_cust != 0].index)
  pred_cust_dp = pred_cust.drop(labels=real_index,axis=0)
  top = np.sort(pred_cust_dp)[- N:]
  print(pred_cust[pred_cust.isin(top)])

 

1) ์ฒซ ๋ฒˆ์งธ ๊ตฐ์ง‘ : ํฐ ์†Œ๋น„ ํ•˜๋Š” ๊ทœ๋ชจ๊ฐ€ ์ž‘์€ ๊ทธ๋ฃน

 

- ๋žœ๋คํ•œ ํŠน์ • ๊ตฌ๋งค์ž๊ฐ€ 1๋…„๊ฐ„ ๊ตฌ๋งคํ•œ ๋ฌผํ’ˆ

 

- ์œ„ ๊ณ ๊ฐ์ด ๊ตฌ๋งคํ•˜์ง€ ์•Š์€ ์ƒํ’ˆ ์ค‘, 10๊ฐœ์˜ ์ถ”์ฒœ์ƒํ’ˆ

โ‡’ ์ž์‹ ์„ ๊พธ๋ฏธ๋Š” ๊ฒƒ๊ณผ ๊ด€๋ จ๋œ ์ œํ’ˆ๋“ค๊ณผ ์—ฌ์„ฑ ์˜๋ฅ˜๋ฅผ ๋งŽ์ด ์‚ฌ๋Š” ๊ณ ๊ฐ์—๊ฒŒ ๊ทธ์™€ ๊ด€๋ จํ•ด์„œ ์•„์šฐํ„ฐ, ํ† ํŠธ๋ฐฑ, ํ–ฅ์ˆ˜, ํŽ˜์ด์…œ ํด๋ Œ์ €์™€ ๊ฐ™์ด ์ž์‹ ์„ ๊พธ๋ฏธ๋Š” ๊ฒƒ๊ณผ ๊ด€๋ จ๋œ ์ œํ’ˆ๊ตฐ์„ ์ฃผ๋กœ ์ถ”์ฒœํ•ด ์คŒ.

 

 

2) ๋‘ ๋ฒˆ์งธ ๊ตฐ์ง‘ : ์ฃผ๋ง์— ์žฅ์„ ๋ณด๋Š” ๊ฐ€์ •

 

- ๋žœ๋คํ•œ ํŠน์ • ๊ตฌ๋งค์ž๊ฐ€ 1๋…„๊ฐ„ ๊ตฌ๋งคํ•œ ๋ฌผํ’ˆ

- ์œ„ ๊ณ ๊ฐ์ด ๊ตฌ๋งคํ•˜์ง€ ์•Š์€ ์ƒํ’ˆ ์ค‘, 10๊ฐœ์˜ ์ถ”์ฒœ์ƒํ’ˆ

โ‡’ ์‹์žฌ๋ฃŒ ๊ตฌ๋งค๋‚ด์—ญ์ด ๋งŽ์€ ๊ณ ๊ฐ์—๊ฒŒ ๋‹ค๋ฅธ ์‹์žฌ๋ฃŒ๋ฅผ ์ถ”์ฒœํ•ด์ฃผ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Œ. ๊ฐ„ํ˜น ์—ฌ์„ฑ ์˜๋ฅ˜๋ฅผ ๊ตฌ๋งคํ•œ ์ด๋ ฅ์ด ์žˆ๋Š” ๊ณ ๊ฐ์—๊ฒŒ ์—ฌ์„ฑ ์˜๋ฅ˜ ๊ด€๋ จ ํ’ˆ๋ชฉ ๋˜ํ•œ ์ถ”์ฒœํ•จ.

 

 

3) ์„ธ ๋ฒˆ์งธ ๊ตฐ์ง‘ : ํ•„์š”์— ์˜ํ•œ ๊ฐ„ํ—์  ์†Œ๋น„

 

- ๋žœ๋คํ•œ ํŠน์ • ๊ตฌ๋งค์ž๊ฐ€ 1๋…„๊ฐ„ ๊ตฌ๋งคํ•œ ๋ฌผํ’ˆ

- ์œ„ ๊ณ ๊ฐ์ด ๊ตฌ๋งคํ•˜์ง€ ์•Š์€ ์ƒํ’ˆ ์ค‘, 10๊ฐœ์˜ ์ถ”์ฒœ์ƒํ’ˆ

โ‡’ ์ƒํ’ˆ๊ถŒ, ๋ƒ‰๋™ํ”ผ์ž, ํŒจ์ŠคํŠธํ‘ธ๋“œ๋งŒ์„ ๊ตฌ๋งคํ•œ ๊ณ ๊ฐ์—๊ฒŒ ๊ด€๋ จ์ƒํ’ˆ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์„ ํ˜ธํ•  ์ˆ˜ ์žˆ๋Š” ๋‹ค๋ฅธ ๋‹ค์–‘ํ•œ ์ƒํ’ˆ๊ตฐ์„ ์ถ”์ฒœํ•ด์คŒ. ๊ณ ๊ฐ์ด ๋ชฐ๋ž๋˜ ํ•„์š”ํ•œ ์ƒํ’ˆ์„ ์ถ”์ฒœํ•˜๋ฉฐ์ด์šฉ์„ ์œ ๋„ํ•  ์ˆ˜ ์žˆ์Œ.

 

 

4) ๋„ค ๋ฒˆ์งธ ๊ตฐ์ง‘ : ํŒจ์…˜๊ด€์‹ฌ๋„, ์‚ถ์˜ ์งˆ์— ๋Œ€ํ•œ ์š•๊ตฌ ๋†’์Œ

 

- ๋žœ๋คํ•œ ํŠน์ • ๊ตฌ๋งค์ž๊ฐ€ 1๋…„๊ฐ„ ๊ตฌ๋งคํ•œ ๋ฌผํ’ˆ

- ์œ„ ๊ณ ๊ฐ์ด ๊ตฌ๋งคํ•˜์ง€ ์•Š์€ ์ƒํ’ˆ ์ค‘, 10๊ฐœ์˜ ์ถ”์ฒœ์ƒํ’ˆ

โ‡’ ๊ณจํ”„, ์Œ์‹, ์ฃผ๋ฅ˜, ๊ฐ„์‹๊ฑฐ๋ฆฌ, ๊ฐ€์ „์ œํ’ˆ ๋“ฑ ๋‹ค์–‘ํ•œ ์ œํ’ˆ๊ตฐ์„ ๊ตฌ๋งคํ•˜์—ฌ ์—ฌ๋Ÿฌ ์ข…๋ฅ˜์˜ ์ œํ’ˆ์„ ๋‹ค์–‘ํ•˜๊ฒŒ ์ถ”์ฒœํ•ด์คŒ.

ํŒจ์…˜๊ด€์‹ฌ๋„์™€ ์‚ถ์˜ ์งˆ์— ๋Œ€ํ•œ ์š•๊ตฌ๊ฐ€ ๋†’๊ธฐ ๋•Œ๋ฌธ์— ์ถ”์ฒœํšจ๊ณผ๊ฐ€ ํด ๊ฒƒ์œผ๋กœ ๋ณด์ด๋ฏ€๋กœ ๋‹ค๋ฅธ ๊ณ ๊ฐ๊ตฐ๋ณด๋‹ค ์ ๊ทน์ ์ธ ์ถ”์ฒœ์„œ๋น„์Šค๋ฅผ ์ œ๊ณต.

 

 

5) ๋‹ค์„ฏ ๋ฒˆ์งธ ๊ตฐ์ง‘ : ์•„์ด๊ฐ€ ์žˆ๋Š” 3์ธ ์ด์ƒ ๊ฐ€์กฑ์˜ ์†Œ๋น„ ์„ฑํ–ฅ

 

- ๋žœ๋คํ•œ ํŠน์ • ๊ตฌ๋งค์ž๊ฐ€ 1๋…„๊ฐ„ ๊ตฌ๋งคํ•œ ๋ฌผํ’ˆ

- ์œ„ ๊ณ ๊ฐ์ด ๊ตฌ๋งคํ•˜์ง€ ์•Š์€ ์ƒํ’ˆ ์ค‘, 10๊ฐœ์˜ ์ถ”์ฒœ์ƒํ’ˆ

โ‡’ ๊ฐ„ํŽธ์‹๊ณผ ์œก๋ฅ˜, ์‹์žฌ๋ฃŒ๋ฅผ ์ž์ฃผ ๊ตฌ๋งคํ•˜๋˜ ๊ณ ๊ฐ์—๊ฒŒ ๋น„์Šทํ•œ ์ œํ’ˆ์„ ์ถ”์ฒœํ•ด์คŒ.

 

 

6) ์—ฌ์„ฏ ๋ฒˆ์งธ ๊ตฐ์ง‘ : 1ํšŒ๋‹น ๊ตฌ๋งค๊ธˆ์•ก์ด ๋‚ฎ์Œ.

 

- ๋žœ๋คํ•œ ํŠน์ • ๊ตฌ๋งค์ž๊ฐ€ 1๋…„๊ฐ„ ๊ตฌ๋งคํ•œ ๋ฌผํ’ˆ

- ์œ„ ๊ณ ๊ฐ์ด ๊ตฌ๋งคํ•˜์ง€ ์•Š์€ ์ƒํ’ˆ ์ค‘, 10๊ฐœ์˜ ์ถ”์ฒœ์ƒํ’ˆ

โ‡’ ๊ตฌ๋งคํ•œ ๋ฌผํ’ˆ๊ณผ ๊ด€๋ จ๋œ ์ œํ’ˆ๊ตฐ์˜ ๋‹ค์–‘ํ•œ ์ œํ’ˆ์„ ์ถ”์ฒœํ•ด์คŒ. ์‹ ์ค‘ํ•˜๊ฒŒ ์ œํ’ˆ์„ ๊ณ ๋ฅด๋Š” ๊ณ ๊ฐ์ผ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์œผ๋ฏ€๋กœ ์ถ”์ฒœ์„œ๋น„์Šค๋ฅผ ํ†ตํ•ด ์ถฉ์„ฑ๋„ ๋†’์€ ๊ณ ๊ฐ์ด ๋  ์ˆ˜ ์žˆ์Œ.

 

 

  • ๋ชจ๋ธ ์„ฑ๋Šฅ ํ‰๊ฐ€

์‹ค์ œ ๊ตฌ๋งค ํ–‰๋ ฌ์—์„œ ๋žœ๋คํ•œ ๊ณ ๊ฐ์„ ์„ ์ •ํ•˜์—ฌ ๊ทธ ๊ณ ๊ฐ์ด ๊ตฌ๋งคํ•œ ๋ฌผํ’ˆ ์ค‘ 20%๋ฅผ ์‚ฌ์ง€ ์•Š์•˜๋‹ค๊ณ  ๊ฐ€์ •ํ–ˆ๋‹ค.

๊ฐ€์ƒ ๊ตฌ๋งค ํ–‰๋ ฌ์„ ํ†ตํ•ด ์ถ”์ฒœํ•œ 10๊ฐœ์˜ ๋ฌผํ’ˆ ์ค‘, ์‹ค์ œ๋กœ ๊ตฌ๋งคํ–ˆ์ง€๋งŒ ๊ตฌ๋งคํ•˜์ง€ ์•Š์•˜๋‹ค๊ณ  ๊ฐ€์ •ํ•œ ๋ฌผํ’ˆ์˜ ๋น„์œจ์„ ํ™•์ธํ–ˆ๋‹ค.

 

๋žœ๋ค ๊ณ ๊ฐ์— ๋Œ€ํ•ด์„œ 10๊ฐœ์˜ ๋ฌผํ’ˆ์„ ์ถ”์ฒœํ•˜๋Š” ์ž‘์—…์„ 6ํšŒ ๋ฐ˜๋ณตํ–ˆ๋‹ค.

60๊ฐœ์˜ ์ถ”์ฒœ ๋ฌผํ’ˆ ์ค‘ 34๊ฐœ, 57%๋ฅผ ์‹ค์ œ๋กœ ์‚ฐ ๊ฒƒ์œผ๋กœ ๋‚˜์™”๋‹ค.  

์˜ˆ์ƒ ์ ์ˆ˜๊ฐ€ ๋†’์€ ์ œํ’ˆ ๋ชฉ๋ก์€,

1. ์‹ค์ œ๋กœ ์„ ํ˜ธ๋„๊ฐ€ ๋†’์ง€๋งŒ ๋ชจ๋ฅด๋Š” ์ œํ’ˆ์ด๊ฑฐ๋‚˜,

2. ์‹ค์ œ๋กœ ์„ ํ˜ธ๋„๊ฐ€ ๋†’์ง€๋งŒ ํ˜„์žฌ์‹œ์ ์—์„œ ์•„์ง์€ ๊ตฌ๋งคํ•˜์ง€ ์•Š์€ ์ œํ’ˆ์ด ๋งŽ์„ ์ˆ˜ ์žˆ๋‹ค.

(๊ทธ๋ฆฌ๊ณ  ์ถ”์ฒœ์‹œ์Šคํ…œ ์ž์ฒด๋„ ์œ„์™€ ๊ฐ™์€ ์ผ€์ด์Šค๋ฅผ ์œ„ํ•œ ์‹œ์Šคํ…œ์ด๋‹ค.)

๋”ฐ๋ผ์„œ ๊ทธ๋Ÿฌํ•œ ์ œํ’ˆ์„ ํฌํ•จํ•œ ์ถ”์ฒœ๋ชฉ๋ก์—์„œ ์‹ค์ œ ๊ตฌ๋งคํ•œ ์ œํ’ˆ์˜ ๋น„์œจ์ด 50% ์ด์ƒ์ด๋ผ๋Š” ๊ฒƒ์€ ๋‚˜์˜์ง€ ์•Š์€ ์„ฑ๋Šฅ์„ ๋ณด์ž„์„ ์˜๋ฏธํ•œ๋‹ค.

 

 

  • ์ถ”์ฒœ์‹œ์Šคํ…œ์˜ ๋งˆ์ผ€ํŒ… ํšจ๊ณผ

์ถ”์ฒœ ์‹œ์Šคํ…œ์€ ๊ตฌ๋งค๋‚ด์—ญ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐœ์ธ์—๊ฒŒ ์ด๋ฏธ ์„ ํ˜ธ๊ฐ€ ํ™•์ธ๋œ ์ œํ’ˆ๊ณผ ์„ ํ˜ธํ•  ํ™•๋ฅ ์ด ๋†’์€ ์ œํ’ˆ์„ ๋ชจ๋‘ ์ถ”์ฒœํ•ด์ค€๋‹ค.

๊ธฐ์กด ์ด์šฉ๋‚ด์—ญ์ด ๋งŽ์€ ๊ณ ๊ฐ์—๊ฒŒ๋Š” ์ทจํ–ฅ์„ ๋”์šฑ ์ •ํ™•ํ•˜๊ฒŒ ์˜ˆ์ธกํ•ด ๋งŒ์กฑ์Šค๋Ÿฌ์šด ์ถ”์ฒœ๋ฆฌ์ŠคํŠธ๋ฅผ ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ๊ณ 

์ด์šฉ๋‚ด์—ญ์ด ์ ์€ ๊ณ ๊ฐ์—๊ฒŒ๋„ ์ข‹์•„ํ•  ๋งŒํ•œ ์ œํ’ˆ์„ ์ถ”์ฒœํ•˜๋ฉฐ ์ถ”๊ฐ€์ ์ธ ์ด์šฉ์„ ์ด๋Œ์–ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

์ด๋ฅผ ํ†ตํ•ด, ๊ณ ๊ฐ์€ ๊ธฐ์—…์ด ์ œ๊ณตํ•˜๋Š” ์„œ๋น„์Šค์— ๋Œ€ํ•ด ๋” ๋†’์€ ๋งŒ์กฑ๊ฐ์„ ๋А๋‚„ ์ˆ˜ ์žˆ์œผ๋ฉฐ ๊ธฐ์—…์€ ๋งค์ถœ ์ƒ์Šน๊ณผ ํ•จ๊ป˜ lock-inํšจ๊ณผ๋ฅผ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

 

 


 

์ฐธ๊ณ 

 
 

7.54.1 R์—์„œ ์‹ค๋ฃจ์—ฃ ๋ถ„์„(Silhouette Analysis) ์‹ค์‹œํ•˜๊ธฐ

0. ์ฐจ๋ก€ 1. ๋“ค์–ด๊ฐ€๊ธฐ 2. ์‹ค๋ฃจ์—ฃ ๊ณ„์ˆ˜(Sihouette Coefficient) ๊ตฌํ•˜๊ธฐ 3. R์—์„œ ์‹ค๋ฃจ์—ฃ ๋ถ„์„์‹ค์‹œํ•˜๊ธฐ : c...

blog.naver.com

 

๋…ผ๋ฌธ์—์„œ ์ž˜๋ชปํ•˜๋Š” ์ธ์ž๋ถ„์„(์š”์ธ๋ถ„์„)๊ณผ ์ฃผ์„ฑ๋ถ„ ๋ถ„์„์˜ ์ฐจ์ด ์ดํ•ด - ๋ฐ•์ค‘ํฌ

๋…ผ๋ฌธ์—์„œ ์ž˜๋ชปํ•˜๋Š” ์ธ์ž๋ถ„์„(์š”์ธ๋ถ„์„)๊ณผ ์ฃผ์„ฑ๋ถ„ ๋ถ„์„์— ๋Œ€ํ•˜์—ฌ 01 ๋ฐ•์ค‘ํฌ(์—ฐ์„ธ๋Œ€ํ•™๊ต ์ธ์ง€๊ณตํ•™์—ฐ๊ตฌ์‹ค) ๋…ผ...

blog.naver.com

 

[R์„ ํ™œ์šฉํ•œ ๋…ผ๋ฌธํ†ต๊ณ„] ์š”์ธ๋ถ„์„

์š”์ธ๋ถ„์„์€ ๊ด€์ธก ๊ฐ€๋Šฅํ•œ ์—ฌ๋Ÿฌ ๋ณ€์ˆ˜๋กœ๋ถ€ํ„ฐ ์†Œ์ˆ˜์˜ ์š”์ธ์„ ์ถ”์ถœํ•˜์—ฌ ๋ณ€์ˆ˜ ๊ฐ„์˜ ๊ด€๋ จ์„ฑ์„ ์„ค๋ช…ํ•˜๋Š” ๊ธฐ๋ฒ•์œผ๋กœ ...

blog.naver.com

 

์ฃผ์„ฑ๋ถ„๋ถ„์„๊ณผ ์š”์ธ๋ถ„์„์˜ ์ฐจ์ด

์ฃผ์„ฑ๋ถ„๋ถ„์„(Principal Component Analysis)๊ณผ ์š”์ธ๋ถ„์„(Factor Analysis)์˜ ์ฐจ์ด๊ฐ€ ๋ญ˜๊นŒ์š”? ์˜ˆ์‹œ๋กœ R์„ ์‚ฌ์šฉํ•ด์„œ ํฌ์ผ“๋ชฌ ๋Šฅ๋ ฅ์น˜ ๋ฐ์ดํ„ฐ์˜ ์ฐจ์›์„ ์ถ•์†Œํ•ด๋ณด๊ณ (PCA), ์”จ๋ฆฌ์–ผ ํ‰๊ฐ€์˜ ํŠน์„ฑ์„ ๊ตฌ์„ฑํ•˜๋Š” ์ž ์žฌ์  ์š”์ธ

pizzathief.oopy.io

 

๋Œ“๊ธ€์ˆ˜0