-
#19. Building your Recurrent Neural Network - Step by Step연구실 2019. 10. 23. 23:56
- RNN: memory를 가지기 때문에 NLP를 비롯한 다른 task들에 매우 효과적으로 작동한다.
- hidden layer activation에서 정보나 문맥을 기억하는데, uni-directional RNN에서 과거의 정보를 이후의 layer에 넣을 수 있게 만들어준다. bidirectional RNN은 과거와 미래의 문맥 둘 다 고려가 가능하다.
* Foward propagation for the basic Recurrent Neural Network
- 본 예시에서는 Tx = Ty
- 1. 한 과정에 필요한 계산을 implement / 2. Tx time-step 동안 loop를 돌린다.
(1) RNN cell
# GRADED FUNCTION: rnn_cell_forward def rnn_cell_forward(xt, a_prev, parameters): """ Implements a single forward step of the RNN-cell as described in Figure (2) Arguments: xt -- your input data at timestep "t", numpy array of shape (n_x, m). a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m) parameters -- python dictionary containing: Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x) Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a) Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a) ba -- Bias, numpy array of shape (n_a, 1) by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1) Returns: a_next -- next hidden state, of shape (n_a, m) yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m) cache -- tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters) """ # Retrieve parameters from "parameters" Wax = parameters["Wax"] Waa = parameters["Waa"] Wya = parameters["Wya"] ba = parameters["ba"] by = parameters["by"] ### START CODE HERE ### (≈2 lines) # compute next activation state using the formula given above a_next = np.tanh(np.dot(Wax, xt) + np.dot(Waa, a_prev) + ba) # compute output of the current cell using the formula given above yt_pred = softmax(np.dot(Wya, a_next) + by) ### END CODE HERE ### # store values you need for backward propagation in cache cache = (a_next, a_prev, xt, parameters) return a_next, yt_pred, cache
(2) RNN forward pass
# GRADED FUNCTION: rnn_forward def rnn_forward(x, a0, parameters): """ Implement the forward propagation of the recurrent neural network described in Figure (3). Arguments: x -- Input data for every time-step, of shape (n_x, m, T_x). a0 -- Initial hidden state, of shape (n_a, m) parameters -- python dictionary containing: Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a) Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x) Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a) ba -- Bias numpy array of shape (n_a, 1) by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1) Returns: a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x) y_pred -- Predictions for every time-step, numpy array of shape (n_y, m, T_x) caches -- tuple of values needed for the backward pass, contains (list of caches, x) """ # Initialize "caches" which will contain the list of all caches caches = [] # Retrieve dimensions from shapes of x and parameters["Wya"] n_x, m, T_x = x.shape n_y, n_a = parameters["Wya"].shape # initialize "a" and "y" with zeros (≈2 lines) a = np.zeros((n_a, m, T_x)) y_pred = np.zeros((n_y, m, T_x)) # Initialize a_next (≈1 line) a_next = a0 # loop over all time-steps for t in range(T_x): # Update next hidden state, compute the prediction, get the cache (≈1 line) a_next, yt_pred, cache = rnn_cell_forward(x[:,:,t], a_next, parameters) # Save the value of the new "next" hidden state in a (≈1 line) a[:,:,t] = a_next # Save the value of the prediction in y (≈1 line) y_pred[:,:,t] = yt_pred # Append "cache" to "caches" (≈1 line) caches.append(cache) # store values needed for backward propagation in cache caches = (caches, x) return a, y_pred, caches
* Long Short-Tern Memory(LSTM) network
- Gates:
1. Forget gate: 기억해두었던 내용을 삭제하고 싶은 경우
- Γf<t>는 0과 1의 값을 가지게 된다. 만약 값이 0이면 LSTM은 cell state c<t-1>로부터 그 정보를 제거하게 된다.
2. Update gate: 새로운 정보를 업데이트 해야하는 경 우
3. Update the cell: 새로운 subject를 업데이트 하기 위해서는 기존의 cell state에 새로운 벡터를 추가해야 한다.
- 새로운 cell state 식은:
4. Output gate
(1) LSTM cell
# GRADED FUNCTION: lstm_cell_forward def lstm_cell_forward(xt, a_prev, c_prev, parameters): """ Implement a single forward step of the LSTM-cell as described in Figure (4) Arguments: xt -- your input data at timestep "t", numpy array of shape (n_x, m). a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m) c_prev -- Memory state at timestep "t-1", numpy array of shape (n_a, m) parameters -- python dictionary containing: Wf -- Weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x) bf -- Bias of the forget gate, numpy array of shape (n_a, 1) Wi -- Weight matrix of the update gate, numpy array of shape (n_a, n_a + n_x) bi -- Bias of the update gate, numpy array of shape (n_a, 1) Wc -- Weight matrix of the first "tanh", numpy array of shape (n_a, n_a + n_x) bc -- Bias of the first "tanh", numpy array of shape (n_a, 1) Wo -- Weight matrix of the output gate, numpy array of shape (n_a, n_a + n_x) bo -- Bias of the output gate, numpy array of shape (n_a, 1) Wy -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a) by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1) Returns: a_next -- next hidden state, of shape (n_a, m) c_next -- next memory state, of shape (n_a, m) yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m) cache -- tuple of values needed for the backward pass, contains (a_next, c_next, a_prev, c_prev, xt, parameters) Note: ft/it/ot stand for the forget/update/output gates, cct stands for the candidate value (c tilde), c stands for the memory value """ # Retrieve parameters from "parameters" Wf = parameters["Wf"] bf = parameters["bf"] Wi = parameters["Wi"] bi = parameters["bi"] Wc = parameters["Wc"] bc = parameters["bc"] Wo = parameters["Wo"] bo = parameters["bo"] Wy = parameters["Wy"] by = parameters["by"] # Retrieve dimensions from shapes of xt and Wy n_x, m = xt.shape n_y, n_a = Wy.shape # Concatenate a_prev and xt (≈3 lines) concat = np.zeros((n_a+n_x, m)) concat[: n_a, :] = a_prev concat[n_a :, :] = xt # Compute values for ft, it, cct, c_next, ot, a_next using the formulas given figure (4) (≈6 lines) ft = sigmoid(np.dot(Wf, concat) + bf) it = sigmoid(np.dot(Wi, concat) + bi) cct = np.tanh(np.dot(Wc, concat) + bc) c_next = ft * c_prev + it * cct ot = sigmoid(np.dot(Wo, concat) + bo) a_next = ot * np.tanh(c_next) # Compute prediction of the LSTM cell (≈1 line) yt_pred = softmax(np.dot(Wy, a_next) + by) # store values needed for backward propagation in cache cache = (a_next, c_next, a_prev, c_prev, ft, it, cct, ot, xt, parameters) return a_next, c_next, yt_pred, cache
(2) Forward pass for LSTM
# GRADED FUNCTION: lstm_forward def lstm_forward(x, a0, parameters): """ Implement the forward propagation of the recurrent neural network using an LSTM-cell described in Figure (4). Arguments: x -- Input data for every time-step, of shape (n_x, m, T_x). a0 -- Initial hidden state, of shape (n_a, m) parameters -- python dictionary containing: Wf -- Weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x) bf -- Bias of the forget gate, numpy array of shape (n_a, 1) Wi -- Weight matrix of the update gate, numpy array of shape (n_a, n_a + n_x) bi -- Bias of the update gate, numpy array of shape (n_a, 1) Wc -- Weight matrix of the first "tanh", numpy array of shape (n_a, n_a + n_x) bc -- Bias of the first "tanh", numpy array of shape (n_a, 1) Wo -- Weight matrix of the output gate, numpy array of shape (n_a, n_a + n_x) bo -- Bias of the output gate, numpy array of shape (n_a, 1) Wy -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a) by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1) Returns: a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x) y -- Predictions for every time-step, numpy array of shape (n_y, m, T_x) caches -- tuple of values needed for the backward pass, contains (list of all the caches, x) """ # Initialize "caches", which will track the list of all the caches caches = [] # Retrieve dimensions from shapes of x and parameters['Wy'] (≈2 lines) n_x, m, T_x = x.shape n_y, n_a = parameters["Wy"].shape # initialize "a", "c" and "y" with zeros (≈3 lines) a = np.zeros((n_a, m, T_x)) c = np.zeros((n_a, m, T_x)) y = np.zeros((n_y, m, T_x)) # Initialize a_next and c_next (≈2 lines) a_next = a0 c_next = np.zeros(a_next.shape) # loop over all time-steps for t in range(T_x): # Update next hidden state, next memory state, compute the prediction, get the cache (≈1 line) a_next, c_next, yt, cache = lstm_cell_forward(x[:,:,t], a_next, c_next, parameters) # Save the value of the new "next" hidden state in a (≈1 line) a[:,:,t] = a_next # Save the value of the prediction in y (≈1 line) y[:,:,t] = yt # Save the value of the next cell state (≈1 line) c[:,:,t] = c_next # Append the cache into caches (≈1 line) caches.append(cache) # store values needed for backward propagation in cache caches = (caches, x) return a, y, c, caches
* Backpropagation in recurrent neural networks
- 현대 딥러닝 프레임워크에서는 forward pass만 수행해주면 프레임워크가 알아서 backpropagation을 수행시켜주기 때문에 따로 신경쓰지 않아도 된다.
- 간단한 NN에서는 parameter를 업데이트하기 위해 cost의 derivative를 계산했다.
- 비슷하게 RNN에서도 cost의 derivative를 계산한다.
(1) Basic RNN backward pass
-
'연구실' 카테고리의 다른 글
#21. Improvise a Jazz Solo with an LSTM Network (0) 2019.10.24 #20. Dinosaur Island - Character-Level Language Modeling (0) 2019.10.24 #18. Face Recognition for the Happy House (0) 2019.10.18 #17. Deep Learning & Art: Neural Style Transfer (0) 2019.10.18 #16. Autonomous driving - Car detection (0) 2019.10.15