Pytorch를 사용하여 CIFAR10 이미지 분류기 만들기 (w. Vgg16)

728x90

CIFAR-10 이미지 분류기 만들기

합성곱 신경망(Convolution Neural Network)을 사용하여 성능이 향상된 이미지 분류기 생성 (w.Vgg16)

개요 및 결론 요약

개요

CNN을 활용하여 직접 이미지 분류기를 만들어 성능을 확인하고, Pre-trainded된 모델을 Fine Tunning하여 성능을 비교하여 얼마나 차이나는지 확인함.
- Simple Convolution Neural Network를 생성하여 CIFAR-10 이미지 데이터를 구별하는 분류기를 생성하여 성능을 확인함.
- Pre Trained된 VGG16를 CIFAR-10 데이터로 Fine Tunning 후 성능을 직접 구축한 Simple CNN 대비 얼마나 성능이 좋아졌는지 확인함.
- 두 모델은 모두 같은 하이퍼파라미터와 손실 함수를 사용하였음

결론

Simple CNN 모델의 성능 58%, VGG16 모델의 성능은 80.15%으로 Simple CNN 대비 VGG16이 22.15%p 높은 성능을 보임
직접 CNN 모델을 구축하는것도 좋으나, 실제 업무에서는 Pre-trained된 모델을 가져와 Fine Tunning하여 사용하는게 성능도 좋고 시간도 절약될 것으로 파악됨

# Package import
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
from torchvision import transforms

import matplotlib.pyplot as plt
import numpy as np
import random

CIFAR-10 데이터 불러와서 이미지 확인

CIFAR-10 데이터란?

CIFAR-10 데이터는 컴퓨터 비전 분야의 이미지 분류 작업에 널리 사용되는 벤치마크 데이터 세트로 60,000개의 32x32x3 컬러 이미지로 구성되어 있음.
총 10개의 비행기, 자동차, 새, 고양이, 사슴, 개, 개구리, 말, 배, 트럭 클래스로 이루어져있음
데이터 세트의 크기가 작고 단순하기 때문에 새로운 모델이나 기술을 신속하게 테스트하는 데 적합함
해당 과제에서는 학습 40000장, 검증 10000장, 테스트 10000장으로 진행함

# Set the random seeds for reproducibility
random.seed(42)
np.random.seed(42)
torch.manual_seed(42)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

transform = transforms.Compose(
    # numpy array -> Torch Tensor로 변환
    [transforms.ToTensor(),
     # RGB 채널에 0.5를 뺀 후 표준편차를 0.5로 나누어 이미지 정규화
     transforms.Normalize((0.5, 0.5, 0.5),(0.5, 0.5, 0.5))]
)

# batch size 정의
batch_size = 128

# 학습 데이터 다운
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
# 테스트 데이터 다운
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

# 학습, 검증 데이터로더 정의
trainset_size = int(0.8 * len(trainset))
valset_size = len(trainset) - trainset_size
trainset, valset = torch.utils.data.random_split(trainset, [trainset_size, valset_size])

trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True)
valloader = torch.utils.data.DataLoader(valset, batch_size=batch_size, shuffle=False)

# 테스트 데이터로더 정의
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=False)

Files already downloaded and verified
Files already downloaded and verified

train_samples = len(trainset)
val_samples = len(valset)
test_samples = len(testset)
image_shape = testset.data.shape[1:]

print(f'Number of Train samples: {train_samples}')
print(f'Number of Val samples: {val_samples}')
print(f'Number of Test samples: {test_samples}')
print(f'Image shape: {image_shape}')

Number of Train samples: 40000
Number of Val samples: 10000
Number of Test samples: 10000
Image shape: (32, 32, 3)

# 이미지 확인을 위한 함수 정의
def imshow(img):
    # 정규화 해제
    img = img / 2 + 0.5 
    npimg = img.numpy()
    # npimg 축 순서 변경
    plt.figure(figsize=(12,18))
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

# 이미지 가져오기
dataiter = iter(trainloader)
images, labels = next(dataiter)

# 이미지 보여주기
# images 변수에 저장된 이미지들을 그리드 형태로 만들어 시각화하는 코드
imshow(torchvision.utils.make_grid(images))

output_8_0

간단한 합성곱(CNN) 신경망 분류기 생성

Conv, Maxpool, fc를 사용하여 간단한 합성곱 신경망 분류기를 정의함

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

HyperParameter 및 Loss Function 정의

Simple CNN, VGG16 모두 같은 HyperParameter를 사용함
- Loss : CrossEntropy
- Optimizer : Adam
- Learing_rate : 0.001
- Epochs : 10
- Batch size : 128

# hyper Parameter
learning_rate = 0.001
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Loss Function
mymodel = MyModel().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(mymodel.parameters(), lr=learning_rate)

Simple CNN Training 및 성능 확인

Train Accuracy : 58.56%
Test Accuracy : 58%
개, 사슴, 고양이는 다른 Class에 비해 잘 맞추지 못했음

num_epochs = 10
for epoch in range(num_epochs): 
a    running_loss = 0.0

    # Simple CNN 학습
    mymodel.train()
    for i, data in enumerate(trainloader):
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()

        outputs = mymodel(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if (i+1) % 100 == 0:
            print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(trainloader)}], Loss: {running_loss/100:.4f}')
            running_loss = 0.0

    # Simple CNN 검증
    mymodel.eval()
    with torch.no_grad():
        correct = 0
        total = 0
        for inputs, labels in valloader:
            inputs, labels = inputs.to(device), labels.to(device)

            outputs = mymodel(inputs)
            _, predicted = torch.max(outputs.data, 1)

            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        print(f'Epoch [{epoch+1}/{num_epochs}], Accuracy of the network on the Val images: {100 * correct / total}%')

print('Finished Training')

Epoch [1/10], Step [100/313], Loss: 2.0294
Epoch [1/10], Step [200/313], Loss: 1.7495
Epoch [1/10], Step [300/313], Loss: 1.6427
Epoch [1/10], Accuracy of the network on the Val images: 42.42%
Epoch [2/10], Step [100/313], Loss: 1.5582
Epoch [2/10], Step [200/313], Loss: 1.5240
Epoch [2/10], Step [300/313], Loss: 1.4954
Epoch [2/10], Accuracy of the network on the Val images: 47.71%
Epoch [3/10], Step [100/313], Loss: 1.4331
Epoch [3/10], Step [200/313], Loss: 1.4126
Epoch [3/10], Step [300/313], Loss: 1.4159
Epoch [3/10], Accuracy of the network on the Val images: 51.1%
Epoch [4/10], Step [100/313], Loss: 1.3540
Epoch [4/10], Step [200/313], Loss: 1.3555
Epoch [4/10], Step [300/313], Loss: 1.3372
Epoch [4/10], Accuracy of the network on the Val images: 52.52%
Epoch [5/10], Step [100/313], Loss: 1.3030
Epoch [5/10], Step [200/313], Loss: 1.2751
Epoch [5/10], Step [300/313], Loss: 1.2644
Epoch [5/10], Accuracy of the network on the Val images: 54.08%
Epoch [6/10], Step [100/313], Loss: 1.2288
Epoch [6/10], Step [200/313], Loss: 1.2306
Epoch [6/10], Step [300/313], Loss: 1.2140
Epoch [6/10], Accuracy of the network on the Val images: 56.11%
Epoch [7/10], Step [100/313], Loss: 1.1530
Epoch [7/10], Step [200/313], Loss: 1.1663
Epoch [7/10], Step [300/313], Loss: 1.1702
Epoch [7/10], Accuracy of the network on the Val images: 56.47%
Epoch [8/10], Step [100/313], Loss: 1.1332
Epoch [8/10], Step [200/313], Loss: 1.1087
Epoch [8/10], Step [300/313], Loss: 1.1053
Epoch [8/10], Accuracy of the network on the Val images: 58.27%
Epoch [9/10], Step [100/313], Loss: 1.0781
Epoch [9/10], Step [200/313], Loss: 1.0766
Epoch [9/10], Step [300/313], Loss: 1.0693
Epoch [9/10], Accuracy of the network on the Val images: 58.45%
Epoch [10/10], Step [100/313], Loss: 1.0229
Epoch [10/10], Step [200/313], Loss: 1.0472
Epoch [10/10], Step [300/313], Loss: 1.0406
Epoch [10/10], Accuracy of the network on the Val images: 58.56%
Finished Training

# 정확도 확인
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        outputs = mymodel(images)

        # 가장 높은 값를 갖는 분류(class)를 정답으로
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')

Accuracy of the network on the 10000 test images: 58 %

# 각 분류(class)에 대한 예측값 계산을 위해 class명 정의
classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
correct_pred = {classname: 0 for classname in classes}
total_pred = {classname: 0 for classname in classes}

with torch.no_grad():
    for data in testloader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        outputs = mymodel(images)
        _, predictions = torch.max(outputs, 1)
        # 각 분류별로 올바른 예측 수 저장
        for label, prediction in zip(labels, predictions):
            if label == prediction:
                correct_pred[classes[label]] += 1
            total_pred[classes[label]] += 1


# 각 분류별 정확도(accuracy)를 출력
for classname, correct_count in correct_pred.items():
    accuracy = 100 * float(correct_count) / total_pred[classname]
    print(f'Accuracy for class: {classname:5s} is {accuracy:.1f} %')

Accuracy for class: plane is 60.8 %
Accuracy for class: car   is 60.4 %
Accuracy for class: bird  is 57.8 %
Accuracy for class: cat   is 36.4 %
Accuracy for class: deer  is 46.7 %
Accuracy for class: dog   is 43.1 %
Accuracy for class: frog  is 78.3 %
Accuracy for class: horse is 60.2 %
Accuracy for class: ship  is 73.5 %
Accuracy for class: truck is 68.4 %

VGG16 Fine Tunning 및 성능 확인

VGG16 성능

Train Accuracy : 80.79%
Test Accuracy : 80.15%
Class별 성능의 차이는 나지만, 전체적으로 Simple CNN보단 성능이 좋음.

VGG16이란?

이미지 인식 작업에 일반적으로 사용되는 딥러닝 신경망 아키텍처로 옥스포드 대학에서 개발됨.
16은 네트워크의 레이어 수를 나타내며, VGG16은 13개의 컨볼루션 레이어와 3개의 완전 연결 레이어로 구성됨.
VGG16은 이미지 인식, 객체 감지, 심지어 아트 스타일 트랜스퍼와 같은 다양한 컴퓨터 비전 애플리케이션에 널리 사용되고 있음.
Pytorch에서 모델을 다운받아 손쉽게 Fine Tunning하여 사용 가능함

# Load the pre-trained VGG16 model
vgg16 = torchvision.models.vgg16(pretrained=True)

# 마지막 Layer를 CIFAR-10 데이터에 맞게 재정의
num_classes = 10
vgg16.classifier[6] = torch.nn.Linear(vgg16.classifier[6].in_features, num_classes)

/usr/local/lib/python3.9/dist-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/usr/local/lib/python3.9/dist-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)

# HypterParameter 정의
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(vgg16.parameters(), lr=learning_rate)

# VGG16 Fine Tunning
vgg16.to(device)
for epoch in range(num_epochs):
    running_loss = 0.0
    vgg16.train()
    for i, (inputs, labels) in enumerate(trainloader):
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = vgg16(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

        if (i+1) % 100 == 0:
            print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(trainloader)}], Loss: {running_loss/100:.4f}')
            running_loss = 0.0

    # VGG16 검증
    vgg16.eval()
    with torch.no_grad():
        correct = 0
        total = 0
        for inputs, labels in valloader:
            inputs, labels = inputs.to(device), labels.to(device)

            outputs = vgg16(inputs)
            _, predicted = torch.max(outputs.data, 1)

            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        print(f'Accuracy of the network on the Val images: {100 * correct / total}%')

print('Finished Training')

Epoch [1/10], Step [100/313], Loss: 2.3092
Epoch [1/10], Step [200/313], Loss: 1.8725
Epoch [1/10], Step [300/313], Loss: 1.7153
Accuracy of the network on the Val images: 34.89%
Epoch [2/10], Step [100/313], Loss: 1.4934
Epoch [2/10], Step [200/313], Loss: 1.3086
Epoch [2/10], Step [300/313], Loss: 1.1836
Accuracy of the network on the Val images: 59.82%
Epoch [3/10], Step [100/313], Loss: 1.0705
Epoch [3/10], Step [200/313], Loss: 0.9924
Epoch [3/10], Step [300/313], Loss: 0.9499
Accuracy of the network on the Val images: 67.57%
Epoch [4/10], Step [100/313], Loss: 0.8214
Epoch [4/10], Step [200/313], Loss: 0.7617
Epoch [4/10], Step [300/313], Loss: 0.7767
Accuracy of the network on the Val images: 75.64%
Epoch [5/10], Step [100/313], Loss: 0.6590
Epoch [5/10], Step [200/313], Loss: 0.6808
Epoch [5/10], Step [300/313], Loss: 0.6602
Accuracy of the network on the Val images: 77.98%
Epoch [6/10], Step [100/313], Loss: 0.5766
Epoch [6/10], Step [200/313], Loss: 0.5743
Epoch [6/10], Step [300/313], Loss: 0.6239
Accuracy of the network on the Val images: 78.29%
Epoch [7/10], Step [100/313], Loss: 0.5245
Epoch [7/10], Step [200/313], Loss: 0.5322
Epoch [7/10], Step [300/313], Loss: 0.5020
Accuracy of the network on the Val images: 80.36%
Epoch [8/10], Step [100/313], Loss: 0.3884
Epoch [8/10], Step [200/313], Loss: 0.4778
Epoch [8/10], Step [300/313], Loss: 0.4630
Accuracy of the network on the Val images: 77.58%
Epoch [9/10], Step [100/313], Loss: 0.3769
Epoch [9/10], Step [200/313], Loss: 0.4439
Epoch [9/10], Step [300/313], Loss: 0.4386
Accuracy of the network on the Val images: 78.98%
Epoch [10/10], Step [100/313], Loss: 0.3738
Epoch [10/10], Step [200/313], Loss: 0.3877
Epoch [10/10], Step [300/313], Loss: 0.4045
Accuracy of the network on the Val images: 80.79%
Finished Training

# Evaluate the model
vgg16.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for inputs, labels in testloader:
        inputs, labels = inputs.to(device), labels.to(device)

        outputs = vgg16(inputs)
        _, predicted = torch.max(outputs.data, 1)

        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print(f'Accuracy of the network on the test images: {100 * correct / total}%')

Accuracy of the network on the test images: 80.15%

# 각 분류(class)에 대한 예측값 계산을 위해 class명 정의
classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
correct_pred = {classname: 0 for classname in classes}
total_pred = {classname: 0 for classname in classes}

with torch.no_grad():
    for data in testloader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        outputs = vgg16(images)
        _, predictions = torch.max(outputs, 1)
        for label, prediction in zip(labels, predictions):
            if label == prediction:
                correct_pred[classes[label]] += 1
            total_pred[classes[label]] += 1


# 각 분류별 정확도(accuracy)를 출력합니다
for classname, correct_count in correct_pred.items():
    accuracy = 100 * float(correct_count) / total_pred[classname]
    print(f'Accuracy for class: {classname:5s} is {accuracy:.1f} %')

Accuracy for class: plane is 84.7 %
Accuracy for class: car   is 83.4 %
Accuracy for class: bird  is 68.9 %
Accuracy for class: cat   is 66.3 %
Accuracy for class: deer  is 88.8 %
Accuracy for class: dog   is 68.7 %
Accuracy for class: frog  is 85.0 %
Accuracy for class: horse is 83.6 %
Accuracy for class: ship  is 89.1 %
Accuracy for class: truck is 83.0 %

Reference

https://tutorials.pytorch.kr/beginner/blitz/cifar10_tutorial.html

728x90

'AI > Computer Vision' 카테고리의 다른 글

MNIST 데이터로 해보는 CNN (Convolution Neural Network) (1)	2024.11.20
Penn-Fudan으로 알아보는 객체 탐지(Object Detection), 분할(Segmentation) with FasterRCNN (2)	2024.11.16

CIFAR-10 이미지 분류기 만들기

개요 및 결론 요약

개요

결론

CIFAR-10 데이터 불러와서 이미지 확인

CIFAR-10 데이터란?

간단한 합성곱(CNN) 신경망 분류기 생성

HyperParameter 및 Loss Function 정의

Simple CNN Training 및 성능 확인

VGG16 Fine Tunning 및 성능 확인

VGG16 성능

VGG16이란?

'AI > Computer Vision' 카테고리의 다른 글

티스토리툴바