Deep learning/Studying

EfficientNet Transfer Learning & Fine tuning

๋น„๋น„์ด์ž‰ 2021. 9. 8. 11:20
๋ฐ˜์‘ํ˜•

๐Ÿ’ช๐ŸปTraining a model from a scratch(๋ชจ๋ธ์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ํ•™์Šต)

: ์ •ํ™•๋„๋Š” ์•„์ฃผ ๋Š๋ฆฌ๊ฒŒ ์˜ฌ๋ผ๊ฐ€๊ณ  overfitting ๋  ๊ฐ€๋Šฅ์„ฑ์ด ํฌ๋‹ค

trainable params, non-trainable params๋ฅผ ์ถœ๋ ฅํ•ด๋ณด๋ฉด ํ•™์Šต๊ฐ€๋Šฅํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์ˆ˜๊ฐ€ ํ›จ์”ฌ ํผ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

๋กœ์Šค๊ฐ’์€ 1~4์‚ฌ์ด๊ฐ’์— ์žˆ๊ณ  train accruacy 0.6, valid accruacy๋Š” 0.2์ˆ˜์ค€์— ๋จธ๋ฌธ๋‹ค.

from tensorflow.keras.applications import EfficientNetB0

with strategy.scope():
    inputs = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3))
    x = img_augmentation(inputs)
    outputs = EfficientNetB0(include_top=True, weights=None, classes=NUM_CLASSES)(x)

    model = tf.keras.Model(inputs, outputs)
    model.compile(
        optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"]
    )

model.summary()

epochs = 40  
hist = model.fit(ds_train, epochs=epochs, validation_data=ds_test, verbose=2)

๐Ÿ’ช๐ŸปTransfer Learning from pre-trained weights

1๏ธโƒฃ freeze all layers and train only the top layers.

-๋น„๊ต์  ํฐ learning rate(1e-2)๊ฐ’์ด ์‚ฌ์šฉ๋˜๊ณ , validation accuracy์™€ loss๊ฐ’์€ training accruacy์™€ loss๊ฐ’์— ๋น„ํ•ด์„œ ๋” ์ข‹๋‹ค. ์™œ๋ƒํ•˜๋ฉด This is because the regularization is strong, which only suppresses training-time metrics.

-50epoch์ •๋„์—์„œ ์ˆ˜๋ ดํ•˜๊ณ , augmentation layer๋ฅผ ์ถ”๊ฐ€ํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ, accuracy๋Š” ~60ํผ์„ผํŠธ์—๋งŒ ๋„๋‹ฌํ•œ๋‹ค.

 

def build_model(num_classes):
    inputs = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3))
    x = img_augmentation(inputs)
    model = EfficientNetB0(include_top=False, input_tensor=x, weights="imagenet")

    # Freeze the pretrained weights
    model.trainable = False

    # Rebuild top
    x = layers.GlobalAveragePooling2D(name="avg_pool")(model.output)
    x = layers.BatchNormalization()(x)

    top_dropout_rate = 0.2
    x = layers.Dropout(top_dropout_rate, name="top_dropout")(x)
    outputs = layers.Dense(NUM_CLASSES, activation="softmax", name="pred")(x)

    # Compile
    model = tf.keras.Model(inputs, outputs, name="EfficientNet")
    optimizer = tf.keras.optimizers.Adam(learning_rate=1e-2)
    model.compile(
        optimizer=optimizer, loss="categorical_crossentropy", metrics=["accuracy"]
    )
    return model

 

2๏ธโƒฃ Unfreeze a number of layers and fit the model using smaller learning rate.

-feature extraction์ด pre-trained model๋กœ ์ž˜ ๋˜์—ˆ๋‹ค๋ฉด ์ด ๊ณผ์ •์—์„œ validation accuracy๋ฅผ ๋†’์ด๋Š”๋ฐ์—๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค.

-๊ทธ๋Ÿฌ๋‚˜, pre-trained weights๋ฅผ imagenet๊ณผ ์–ด๋Š์ •๋„ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์…‹์— ์‚ฌ์šฉํ•œ๋‹ค๋ฉด ์ด fine tuning ๊ณผ์ •์€ feature extraction ๊ณผ์ •์— ์žˆ์–ด์„œ ์ค‘์š”ํ•˜๋‹ค.

 

Batchnormalization layer์— layer.trainable = False ํ•˜๋Š” ๊ฒƒ์€ ๋ ˆ์ด์–ด๋ฅผ ์–ผ๋ฆฐ๋‹ค๋Š” ๊ฒƒ์ด๊ณ , ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ๋‚ด๋ถ€์˜ ์ƒํƒœ๋Š” ํ•™์Šต์ค‘์— ๋ฐ”๋€Œ์ง€ ์•Š๋Š”๋‹ค. "FROZEN SATE", "INFERENCE MODE" ๋‘ ๊ฐ€์ง€์˜ ์ƒํƒœ๋Š” ๋ช…ํ™•ํ•˜๊ฒŒ ๋‹ค๋ฅธ ์ƒํƒœ์ด๋‹ค.

 

๐Ÿ’ช๐ŸปTips for fine tuning EfficientNet

On unfreezing layers: (๋™๊ฒฐ์‹œํ‚ค์ง€ ์•Š์€ ๋ ˆ์ด์–ด์—์„œ๋Š”)

  • The BathcNormalization layers need to be kept frozen (more details). If they are also turned to trainable, the first epoch after unfreezing will significantly reduce accuracy.

 

  • In some cases it may be beneficial to open up only a portion of layers instead of unfreezing all. This will make fine tuning much faster when going to larger models like B7.
  • Each block needs to be all turned on or off. This is because the architecture includes a shortcut from the first layer to the last layer for each block. Not respecting blocks also significantly harms the final performance.

Some other tips for utilizing EfficientNet:

  • Larger variants of EfficientNet do not guarantee improved performance, especially for tasks with less data or fewer classes. In such a case, the larger variant of EfficientNet chosen, the harder it is to tune hyperparameters.
  • EMA (Exponential Moving Average) is very helpful in training EfficientNet from scratch, but not so much for transfer learning.
  • Do not use the RMSprop setup as in the original paper for transfer learning. The momentum and learning rate are too high for transfer learning. It will easily corrupt the pretrained weight and blow up the loss. A quick check is to see if loss (as categorical cross entropy) is getting significantly larger than log(NUM_CLASSES) after the same epoch. If so, the initial learning rate/momentum is too high.
  • Smaller batch size benefit validation accuracy, possibly due to effectively providing regularization.

์ถœ์ฒ˜: https://keras.io/examples/vision/image_classification_efficientnet_fine_tuning/

 

 

 

 

 

 

 

๋ฐ˜์‘ํ˜•