习近平:拓展改革督察工作广度深度 提高督察实效

class paddle.optimizer.lr. LinearWarmup ( learning_rate: float | paddle.optimizer.lr.LRScheduler, warmup_steps: int, start_lr: float, end_lr: float, last_epoch: int = -1, verbose: bool = False ) [source]
百度 一组很容易被人忽略的数据是,截至2017年末,全国银行卡在用发卡数量亿张,同比增长%,增速在放缓。

Linear learning rate warm up strategy. Update the learning rate preliminarily before the normal learning rate scheduler. For more information, please refer to Bag of Tricks for Image Classification with Convolutional Neural Networks

When epoch < warmup_steps, learning rate is updated as:

\[lr = start\_lr + (end\_lr - start\_lr) * \frac{epoch}{warmup\_steps}\]

where start_lr is the initial learning rate, and end_lr is the final learning rate;

When epoch >= warmup_steps, learning rate is updated as:

\[lr = learning_rate\]

where learning_rate is float or any subclass of LRScheduler .

Parameters
  • learning_rate (float|LRScheduler) – The learning rate after warm-up. It is a python float number or any subclass of LRScheduler .

  • warmup_steps (int) – total steps of warm up. It must be a positive integer.

  • start_lr (float) – Initial learning rate of warm up.

  • end_lr (float) – Final learning rate of warm up.

  • last_epoch (int, optional) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.

  • verbose (bool, optional) – If True, prints a message to stdout for each update. Default: False .

Returns

LinearWarmup instance to schedule learning rate.

Examples

>>> # Example1: train on default dynamic graph mode
>>> import paddle
>>> import numpy as np

>>> # train on default dynamic graph mode
>>> linear = paddle.nn.Linear(10, 10)
>>> scheduler = paddle.optimizer.lr.LinearWarmup(
...         learning_rate=0.5, warmup_steps=20, start_lr=0, end_lr=0.5, verbose=True)
>>> sgd = paddle.optimizer.SGD(learning_rate=scheduler, parameters=linear.parameters())
>>> for epoch in range(20):
...     for batch_id in range(5):
...         x = paddle.uniform([10, 10])
...         out = linear(x)
...         loss = paddle.mean(out)
...         loss.backward()
...         sgd.step()
...         sgd.clear_gradients()
...         scheduler.step()    # If you update learning rate each step
...     # scheduler.step()        # If you update learning rate each epoch
>>> # Example2: train on static graph mode
>>> import paddle
>>> import numpy as np
>>> paddle.enable_static()
>>> main_prog = paddle.static.Program()
>>> start_prog = paddle.static.Program()
>>> with paddle.static.program_guard(main_prog, start_prog):
...     x = paddle.static.data(name='x', shape=[None, 4, 5])
...     y = paddle.static.data(name='y', shape=[None, 4, 5])
...     z = paddle.static.nn.fc(x, 100)
...     loss = paddle.mean(z)
...     scheduler = paddle.optimizer.lr.LinearWarmup(
...         learning_rate=0.5, warmup_steps=20, start_lr=0, end_lr=0.5, verbose=True)
...     sgd = paddle.optimizer.SGD(learning_rate=scheduler)
...     sgd.minimize(loss)
...
>>> exe = paddle.static.Executor()
>>> exe.run(start_prog)
>>> for epoch in range(20):
...     for batch_id in range(5):
...         out = exe.run(
...             main_prog,
...             feed={
...                 'x': np.random.randn(3, 4, 5).astype('float32'),
...                 'y': np.random.randn(3, 4, 5).astype('float32')
...             },
...             fetch_list=loss.name)
...         scheduler.step()    # If you update learning rate each step
...     # scheduler.step()        # If you update learning rate each epoch
state_dict ( ) _LRStateDict

state_dict?

Returns the state of the LinearWarmup scheduler as a dict.

It is a subset of self.__dict__ .

set_state_dict ( state_dict: _LRStateDict ) None

set_state_dict?

Loads state_dict for LinearWarmup scheduler.

get_lr ( ) float

get_lr?

For those subclass who overload LRScheduler (Base Class), User should have a custom implementation of get_lr() .

Otherwise, an NotImplementedError exception will be thrown.

set_dict ( state_dict: _LRStateDict ) None

set_dict?

Loads the schedulers state.

state_keys ( ) None

state_keys?

For those subclass who overload LRScheduler (Base Class). Acquiescently, “last_epoch, last_lr” will be saved by self.keys = ['last_epoch', 'last_lr'] .

last_epoch is the current epoch num, and last_lr is the current learning rate.

If you want to change the default behavior, you should have a custom implementation of _state_keys() to redefine self.keys .

step ( epoch: Optional[int] = None ) None

step?

step should be called after optimizer.step . It will update the learning rate in optimizer according to current epoch . The new learning rate will take effect on next optimizer.step .

Parameters

epoch (int, None) – specify current epoch. Default: None. Auto-increment from last_epoch=-1.

Returns

None

Examples

>>> import paddle
>>> value = paddle.arange(26, dtype='float32')
>>> a = paddle.reshape(value, [2, 13])
>>> linear = paddle.nn.Linear(13, 5)
>>> adadelta = paddle.optimizer.Adadelta(learning_rate=0.0003, epsilon=1e-06, rho=0.95,
...                             parameters = linear.parameters())
>>> out = linear(a)
>>> out.backward()
>>> adadelta.step()
>>> adadelta.clear_grad()
>>> import paddle
>>> value = paddle.arange(26, dtype='float32')
>>> a = paddle.reshape(value, [2, 13])
>>> linear = paddle.nn.Linear(13, 5)
>>> adadelta = paddle.optimizer.Adadelta(learning_rate=0.0003, epsilon=1e-06, rho=0.95,
...                             parameters = linear.parameters())
>>> out = linear(a)
>>> out.backward()
>>> adadelta.step()
>>> adadelta.clear_grad()