๐ช๐ปTraining a model from a scratch(๋ชจ๋ธ์ ์ฒ์๋ถํฐ ํ์ต)
: ์ ํ๋๋ ์์ฃผ ๋๋ฆฌ๊ฒ ์ฌ๋ผ๊ฐ๊ณ overfitting ๋ ๊ฐ๋ฅ์ฑ์ด ํฌ๋ค
trainable params, non-trainable params๋ฅผ ์ถ๋ ฅํด๋ณด๋ฉด ํ์ต๊ฐ๋ฅํ ํ๋ผ๋ฏธํฐ์ ์๊ฐ ํจ์ฌ ํผ์ ์ ์ ์๋ค.
๋ก์ค๊ฐ์ 1~4์ฌ์ด๊ฐ์ ์๊ณ train accruacy 0.6, valid accruacy๋ 0.2์์ค์ ๋จธ๋ฌธ๋ค.
from tensorflow.keras.applications import EfficientNetB0
with strategy.scope():
inputs = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3))
x = img_augmentation(inputs)
outputs = EfficientNetB0(include_top=True, weights=None, classes=NUM_CLASSES)(x)
model = tf.keras.Model(inputs, outputs)
model.compile(
optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"]
)
model.summary()
epochs = 40
hist = model.fit(ds_train, epochs=epochs, validation_data=ds_test, verbose=2)
๐ช๐ปTransfer Learning from pre-trained weights
1๏ธโฃ freeze all layers and train only the top layers.
-๋น๊ต์ ํฐ learning rate(1e-2)๊ฐ์ด ์ฌ์ฉ๋๊ณ , validation accuracy์ loss๊ฐ์ training accruacy์ loss๊ฐ์ ๋นํด์ ๋ ์ข๋ค. ์๋ํ๋ฉด This is because the regularization is strong, which only suppresses training-time metrics.
-50epoch์ ๋์์ ์๋ ดํ๊ณ , augmentation layer๋ฅผ ์ถ๊ฐํ์ง ์์ ๊ฒฝ์ฐ, accuracy๋ ~60ํผ์ผํธ์๋ง ๋๋ฌํ๋ค.
def build_model(num_classes):
inputs = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3))
x = img_augmentation(inputs)
model = EfficientNetB0(include_top=False, input_tensor=x, weights="imagenet")
# Freeze the pretrained weights
model.trainable = False
# Rebuild top
x = layers.GlobalAveragePooling2D(name="avg_pool")(model.output)
x = layers.BatchNormalization()(x)
top_dropout_rate = 0.2
x = layers.Dropout(top_dropout_rate, name="top_dropout")(x)
outputs = layers.Dense(NUM_CLASSES, activation="softmax", name="pred")(x)
# Compile
model = tf.keras.Model(inputs, outputs, name="EfficientNet")
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-2)
model.compile(
optimizer=optimizer, loss="categorical_crossentropy", metrics=["accuracy"]
)
return model
2๏ธโฃ Unfreeze a number of layers and fit the model using smaller learning rate.
-feature extraction์ด pre-trained model๋ก ์ ๋์๋ค๋ฉด ์ด ๊ณผ์ ์์ validation accuracy๋ฅผ ๋์ด๋๋ฐ์๋ ํ๊ณ๊ฐ ์๋ค.
-๊ทธ๋ฌ๋, pre-trained weights๋ฅผ imagenet๊ณผ ์ด๋์ ๋ ๋ค๋ฅธ ๋ฐ์ดํฐ์ ์ ์ฌ์ฉํ๋ค๋ฉด ์ด fine tuning ๊ณผ์ ์ feature extraction ๊ณผ์ ์ ์์ด์ ์ค์ํ๋ค.
Batchnormalization layer์ layer.trainable = False ํ๋ ๊ฒ์ ๋ ์ด์ด๋ฅผ ์ผ๋ฆฐ๋ค๋ ๊ฒ์ด๊ณ , ๊ทธ๋ ๊ธฐ ๋๋ฌธ์ ๋ด๋ถ์ ์ํ๋ ํ์ต์ค์ ๋ฐ๋์ง ์๋๋ค. "FROZEN SATE", "INFERENCE MODE" ๋ ๊ฐ์ง์ ์ํ๋ ๋ช ํํ๊ฒ ๋ค๋ฅธ ์ํ์ด๋ค.
๐ช๐ปTips for fine tuning EfficientNet
On unfreezing layers: (๋๊ฒฐ์ํค์ง ์์ ๋ ์ด์ด์์๋)
- The
BathcNormalization
layers need to be kept frozen (more details). If they are also turned to trainable, the first epoch after unfreezing will significantly reduce accuracy.
- In some cases it may be beneficial to open up only a portion of layers instead of unfreezing all. This will make fine tuning much faster when going to larger models like B7.
- Each block needs to be all turned on or off. This is because the architecture includes a shortcut from the first layer to the last layer for each block. Not respecting blocks also significantly harms the final performance.
Some other tips for utilizing EfficientNet:
- Larger variants of EfficientNet do not guarantee improved performance, especially for tasks with less data or fewer classes. In such a case, the larger variant of EfficientNet chosen, the harder it is to tune hyperparameters.
- EMA (Exponential Moving Average) is very helpful in training EfficientNet from scratch, but not so much for transfer learning.
- Do not use the RMSprop setup as in the original paper for transfer learning. The momentum and learning rate are too high for transfer learning. It will easily corrupt the pretrained weight and blow up the loss. A quick check is to see if loss (as categorical cross entropy) is getting significantly larger than log(NUM_CLASSES) after the same epoch. If so, the initial learning rate/momentum is too high.
- Smaller batch size benefit validation accuracy, possibly due to effectively providing regularization.
์ถ์ฒ: https://keras.io/examples/vision/image_classification_efficientnet_fine_tuning/
'Deep learning > Studying' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
Weight initialization(๊ฐ์ค์น์ด๊ธฐํ ) (0) | 2021.09.10 |
---|---|
์ ์ดํ์ต ,ํ์ธํ๋(๋ฏธ์ธ์กฐ์ ) ๊ฐ๋จ์ ๋ฆฌ (0) | 2021.09.09 |
Overfitting Underfitting (0) | 2021.09.07 |
Bayesian Optimiation (0) | 2021.09.03 |
Hyperband (Successive halving Algorithm ๋ณด์ํ ์๊ณ ๋ฆฌ์ฆ) (0) | 2021.08.23 |
Uploaded by Notion2Tistory v1.1.0