Softmax Classifier 코드로 구현해보기 (TensorFlow)

지난 포스트에서 Softmax Classifier의 개념을 알아봤습니다.
이번엔 실제 코드로 구현하며 개념을 하나씩 확인해보겠습니다.

전체 코드

import tensorflow as tf
import numpy as np

x_data = [[1, 2, 1, 1],
          [2, 1, 3, 2],
          [3, 1, 3, 4],
          [4, 1, 5, 5],
          [1, 7, 5, 5],
          [1, 2, 5, 6],
          [1, 6, 6, 6],
          [1, 7, 7, 7]]

y_data = [[0, 0, 1],
          [0, 0, 1],
          [0, 0, 1],
          [0, 1, 0],
          [0, 1, 0],
          [0, 1, 0],
          [1, 0, 0],
          [1, 0, 0]]

nb_classes = 3
x_data = np.asarray(x_data, dtype=np.float32)
y_data = np.asarray(y_data, dtype=np.float32)

dataset = tf.data.Dataset.from_tensor_slices((x_data, y_data)).batch(len(x_data))

W = tf.Variable(tf.random.normal([4, nb_classes]), name='weight')
b = tf.Variable(tf.random.normal([3]), name='bias')

variable = [W, b]

def softmax_fn(features):
    hypothesis = tf.nn.softmax(tf.matmul(features, W) + b)
    return hypothesis

def loss_fn(features, labels):
    hypothesis = tf.nn.softmax(tf.matmul(features, W) + b)
    cost = tf.reduce_mean(-tf.reduce_sum(y_data * tf.math.log(hypothesis), axis=1))
    return cost

def grad(hypothesis, features, labels):
    with tf.GradientTape() as tape:
        loss_value = loss_fn(features, labels)
    return tape.gradient(loss_value, [W, b])

optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)

n_epochs = 3000
for step in range(n_epochs + 1):
    for features, labels in iter(dataset):
        hypothesis = softmax_fn(features)
        grads = grad(hypothesis, features, labels)
        optimizer.apply_gradients(grads_and_vars=zip(grads, [W, b]))
    if step % 300 == 0:
        print("Iter: {}, Loss: {:.4f}".format(step, loss_fn(features, labels)))

a = softmax_fn(x_data)
print(tf.argmax(a, 1))       # 가설을 통한 예측값
print(tf.argmax(y_data, 1))  # 실제 값

1. 데이터 준비 : One-Hot Encoding

y_data = [[0, 0, 1],  # 클래스 2
          [0, 0, 1],
          [0, 0, 1],
          [0, 1, 0],  # 클래스 1
          [0, 1, 0],
          [0, 1, 0],
          [1, 0, 0],  # 클래스 0
          [1, 0, 0]]

지난 포스트에서 배운 One-Hot Encoding 그대로입니다.
3개의 클래스를 분류하기 위해 각 클래스를 아래처럼 표현합니다.

클래스 One-Hot

0	[1, 0, 0]
1	[0, 1, 0]
2	[0, 0, 1]

정답인 클래스만 1, 나머지는 0으로 표현하는 방식입니다.

2. 데이터셋 파이프라인

dataset = tf.data.Dataset.from_tensor_slices((x_data, y_data)).batch(len(x_data))

from_tensor_slices : x_data와 y_data를 쌍으로 묶어서 데이터셋을 만들어줍니다
.batch(len(x_data)) : 전체 데이터를 한 배치로 묶습니다
학습 루프에서 for features, labels in iter(dataset) 으로 꺼내 사용합니다

3. W, b 초기화

nb_classes = 3  # 분류할 클래스 수

W = tf.Variable(tf.random.normal([4, nb_classes]), name='weight')
b = tf.Variable(tf.random.normal([3]), name='bias')

variable = [W, b]

W의 shape : [4, 3]

입력 변수가 4개, 출력 클래스가 3개이므로:

[n, 4]  *  [4, 3]  =  [n, 3]
  X     *    W    =   각 클래스의 점수

variable = [W, b] 는 학습 중 업데이트할 변수들을 리스트로 묶어둔 것입니다.
optimizer.apply_gradients 에 전달하거나 gradient 계산 대상을 명확하게 지정할 때 사용합니다.

4. Softmax 함수

def softmax_fn(features):
    hypothesis = tf.nn.softmax(tf.matmul(features, W) + b)
    return hypothesis

지난 포스트에서 배운 흐름 그대로입니다.

x → tf.matmul(X, W) + b  →  Linear 연산
  → tf.nn.softmax(...)    →  각 클래스의 확률로 변환 (합 = 1)

예를 들어 Linear 결과가 [2.0, 1.0, 0.1] 이라면 Softmax를 통과하면:

[2.0, 1.0, 0.1]  →  [0.7, 0.2, 0.1]  (합 = 1.0)

이렇게 각 클래스의 확률로 변환됩니다.

5. Cost 함수 : Cross-Entropy

def loss_fn(features, labels):
    hypothesis = tf.nn.softmax(tf.matmul(features, W) + b)
    cost = tf.reduce_mean(-tf.reduce_sum(y_data * tf.math.log(hypothesis), axis=1))
    return cost

지난 포스트에서 배운 Cross-Entropy Cost Function 그대로입니다.

cost = -∑ L * log(S)

tf.math.log(hypothesis) : 예측 확률에 log 적용
y_data * tf.math.log(hypothesis) : 실제 정답(L)과 Element-wise 곱 (ο)
-tf.reduce_sum(..., axis=1) : 각 데이터별로 합산
tf.reduce_mean(...) : 전체 평균

예측이 맞을 때 cost = 0, 틀릴 때 cost = ∞ 가 되어 틀린 예측에 큰 패널티를 줍니다.

6. 경사 하강법으로 W, b 업데이트

def grad(hypothesis, features, labels):
    with tf.GradientTape() as tape:
        loss_value = loss_fn(features, labels)
    return tape.gradient(loss_value, [W, b])

optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)

for step in range(n_epochs + 1):
    for features, labels in iter(dataset):
        hypothesis = softmax_fn(features)
        grads = grad(hypothesis, features, labels)
        optimizer.apply_gradients(grads_and_vars=zip(grads, [W, b]))

GradientTape : cost를 W와 b로 미분해 gradient를 구합니다
optimizer.apply_gradients : gradient를 적용해 W, b를 자동으로 업데이트합니다
zip(grads, [W, b]) : gradient와 변수를 쌍으로 묶어 전달합니다

7. 예측 결과 확인 : argmax

a = softmax_fn(x_data)
print(tf.argmax(a, 1))       # 가설을 통한 예측값
print(tf.argmax(y_data, 1))  # 실제 값

tf.argmax 는 가장 큰 값의 인덱스를 반환합니다.

Softmax 출력  :  [0.7, 0.2, 0.1]
tf.argmax     :   0              ← 가장 큰 값의 위치

지난 포스트에서 배운 argmax → One-Hot 흐름의 argmax 단계입니다.

예측값과 실제값의 argmax가 일치하면 → 올바른 예측!

전체 흐름 정리

x_data 입력 (8개, 변수 4개)
    ↓
tf.matmul(X, W) + b          → Linear 연산 [n, 3]
    ↓
tf.nn.softmax(...)            → 각 클래스 확률로 변환 (합 = 1)
    ↓
Cross-Entropy Cost            → 맞으면 0, 틀리면 ∞
    ↓
GradientTape → gradient 계산
    ↓
optimizer.apply_gradients     → W, b 자동 업데이트
    ↓
반복 (3000 epochs)
    ↓
tf.argmax(예측값, 1)          → 가장 높은 확률의 클래스
    ↓
최종 분류 결과

정리

개념 수식 코드

One-Hot Encoding	[0, 1, 0]	y_data = [[0,0,1], [0,1,0], ...]
Softmax	확률 변환 (합 = 1)	tf.nn.softmax(tf.matmul(X, W) + b)
Cross-Entropy	-∑ L * log(S)	y_data * tf.math.log(hypothesis)
argmax	가장 큰 값의 인덱스	tf.argmax(a, 1)
경사 하강법	W := W - α * gradient	optimizer.apply_gradients(...)

Softmax Classifier를 이해하면 여러 클래스를 분류하는 딥러닝 모델의 출력층 구조도 훨씬 쉽게 이해할 수 있습니다. 🚀

'AI > ML' 카테고리의 다른 글

머신러닝 학습 팁 : Learning Rate, 데이터 전처리 (0)	2026.03.12
Softmax Classifier : 실제 데이터로 동물 분류하기 (TensorFlow) (0)	2026.03.11
Softmax Classifier : 여러 클래스를 분류하기 (0)	2026.03.10
Logistic Regression 코드로 구현해보기 (TensorFlow) (0)	2026.03.09
AI 공부를 위한 파이썬 개발 환경 구축하기 (Windows) (0)	2026.03.08

phg

Softmax Classifier 코드로 구현해보기 (TensorFlow)

전체 코드

1. 데이터 준비 : One-Hot Encoding

2. 데이터셋 파이프라인

3. W, b 초기화

4. Softmax 함수

5. Cost 함수 : Cross-Entropy

6. 경사 하강법으로 W, b 업데이트

7. 예측 결과 확인 : argmax

전체 흐름 정리

정리

'AI > ML' 카테고리의 다른 글

티스토리툴바

Softmax Classifier 코드로 구현해보기 (TensorFlow)

전체 코드

1. 데이터 준비 : One-Hot Encoding

2. 데이터셋 파이프라인

3. W, b 초기화

4. Softmax 함수

5. Cost 함수 : Cross-Entropy

6. 경사 하강법으로 W, b 업데이트

7. 예측 결과 확인 : argmax

전체 흐름 정리

정리

'AI > ML' 카테고리의 다른 글

관련글

티스토리툴바