#24. Neural Machine Translation

연구실 2019. 10. 28. 16:02

* Translating human-readable dates into machine-readable dates

- the 29th of August 1958, 03/30/1968, 24 JUNE 1987 등의 format으로 input으로 주어지면, 그것을 1958-08-29의 포맷으로 바꿔주는 모델을 만들어보자.

(1) Dataset

- 그런 다음 이런 데이터들을 index value로 매핑시켜 전처리를 하고, maximum length와 output의 길이도 정해준다.

* Neural machine translation with attention

(1) Attention mechanism

What one attention step does to calculate the attention variables a<t, t'>

- 두 종류의 LSTM을 사용한다. 아래에 있는 것은 Bi-directional LSTM으로 attention mechanism 전에 위치한다. Tx time steps 동안 수행된다. 위에 있는 LSTM은 attention mechanism 다음에 위치하는 것이다. Ty time step 동안 수행된다.

- post-attention LSTM은 s<t>와 c<t>를 다음 step으로 전달한다. LSTM은 output activations s<t>와 hidden cell state c<t>를 input으로 가지지만 y<t-1>는 input으로 가지지 않는다. 이 모델에서는 이전의 결과(YYYY-MM-DD)가 다음 결과에 크게 영향을 미치지 않기 때문이다.

- a⟨t⟩=[a→⟨t⟩; a←⟨t⟩]: pre-attention Bi-LSTM의 forward와 backward activation을 합친 것을 나타내기 위해 사용!

- 차례대로 함수를 implement 해 보자.

1. one_step_attention

- context vector를 계산해준다.

# Defined shared layers as global variables
repeator = RepeatVector(Tx)
concatenator = Concatenate(axis=-1)
densor1 = Dense(10, activation = "tanh")
densor2 = Dense(1, activation = "relu")
activator = Activation(softmax, name='attention_weights') # We are using a custom softmax(axis = 1) loaded in this notebook
dotor = Dot(axes = 1)

# GRADED FUNCTION: one_step_attention

def one_step_attention(a, s_prev):
    """
    Performs one step of attention: Outputs a context vector computed as a dot product of the attention weights
    "alphas" and the hidden states "a" of the Bi-LSTM.
    
    Arguments:
    a -- hidden state output of the Bi-LSTM, numpy-array of shape (m, Tx, 2*n_a)
    s_prev -- previous hidden state of the (post-attention) LSTM, numpy-array of shape (m, n_s)
    
    Returns:
    context -- context vector, input of the next (post-attetion) LSTM cell
    """
    
    ### START CODE HERE ###
    # Use repeator to repeat s_prev to be of shape (m, Tx, n_s) so that you can concatenate it with all hidden states "a" (≈ 1 line)
    s_prev = repeator(s_prev)
    # Use concatenator to concatenate a and s_prev on the last axis (≈ 1 line)
    concat = concatenator([a, s_prev])
    # Use densor1 to propagate concat through a small fully-connected neural network to compute the "intermediate energies" variable e. (≈1 lines)
    e = densor1(concat)
    # Use densor2 to propagate e through a small fully-connected neural network to compute the "energies" variable energies. (≈1 lines)
    energies = densor2(e)
    # Use "activator" on "energies" to compute the attention weights "alphas" (≈ 1 line)
    alphas = activator(energies)
    # Use dotor together with "alphas" and "a" to compute the context vector to be given to the next (post-attention) LSTM-cell (≈ 1 line)
    context = dotor([alphas, a])
    ### END CODE HERE ###
    
    return context

2. model()

- 전체 모델을 implement한다. input을 Bi-LSTM에 넣어 [a<1>, a<2>, ..., a<Tx>]를 받아 one_step_atten()을 Ty번 호출해 해당 output을 LSTM에 softmax activation을 사용하는 dense 레이어에 통과시켜 prediction을 만들어낸다.

# GRADED FUNCTION: model

def model(Tx, Ty, n_a, n_s, human_vocab_size, machine_vocab_size):
    """
    Arguments:
    Tx -- length of the input sequence
    Ty -- length of the output sequence
    n_a -- hidden state size of the Bi-LSTM
    n_s -- hidden state size of the post-attention LSTM
    human_vocab_size -- size of the python dictionary "human_vocab"
    machine_vocab_size -- size of the python dictionary "machine_vocab"

    Returns:
    model -- Keras model instance
    """
    
    # Define the inputs of your model with a shape (Tx,)
    # Define s0 and c0, initial hidden state for the decoder LSTM of shape (n_s,)
    X = Input(shape=(Tx, human_vocab_size))
    s0 = Input(shape=(n_s,), name='s0')
    c0 = Input(shape=(n_s,), name='c0')
    s = s0
    c = c0
    
    # Initialize empty list of outputs
    outputs = []
    
    ### START CODE HERE ###
    
    # Step 1: Define your pre-attention Bi-LSTM. Remember to use return_sequences=True. (≈ 1 line)
    a = Bidirectional(LSTM(n_a, return_sequences=True))(X)
    
    # Step 2: Iterate for Ty steps
    for t in range(Ty):
    
        # Step 2.A: Perform one step of the attention mechanism to get back the context vector at step t (≈ 1 line)
        context = one_step_attention(a, s)
        
        # Step 2.B: Apply the post-attention LSTM cell to the "context" vector.
        # Don't forget to pass: initial_state = [hidden state, cell state] (≈ 1 line)
        s, _, c = post_activation_LSTM_cell(context, initial_state = [s, c])
        
        # Step 2.C: Apply Dense layer to the hidden state output of the post-attention LSTM (≈ 1 line)
        out = output_layer(s)
        
        # Step 2.D: Append "out" to the "outputs" list (≈ 1 line)
        outputs.append(out)
    
    # Step 3: Create model instance taking three inputs and returning the list of outputs. (≈ 1 line)
    model = Model(inputs=[X, s0, c0], output=outputs)
    
    ### END CODE HERE ###
    
    return model

3. Compile

opt = Adam(lr=0.005, beta_1=0.9, beta_2=0.999, decay = 0.01)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

EXAMPLES = ['3 May 1979', '5 April 09', '21th of August 2016', 'Tue 10 Jul 2007', 'Saturday May 9 2018', 'March 3 2001', 'March 3rd 2001', '1 March 2001']
for example in EXAMPLES:
    
    source = string_to_int(example, Tx, human_vocab)
    source = np.array(list(map(lambda x: to_categorical(x, num_classes=len(human_vocab)), source))).swapaxes(0,1)
    prediction = model.predict([source, s0, c0])
    prediction = np.argmax(prediction, axis = -1)
    output = [inv_machine_vocab[int(i)] for i in prediction]
    
    print("source:", example)
    print("output:", ''.join(output))

- 결과:

'연구실' 카테고리의 다른 글

#26. "Deep Learning Cookbook" - 9. 이미 훈련된 이미지 인식 신경망 재사용하기 (0)	2019.11.08
#25. YOLO v3 with PyTorch (0)	2019.11.07
#23. Emojify (0)	2019.10.28
#22. Operations on word vectors (0)	2019.10.26
#21. Improvise a Jazz Solo with an LSTM Network (0)	2019.10.24

ABOUT ME

ꉂꉂ(ᵔᗜᵔ) ꉂꉂ(ᵔᗜᵔ)

* Translating human-readable dates into machine-readable dates

* Neural machine translation with attention

'연구실' 카테고리의 다른 글

티스토리툴바

ABOUT ME

* Translating human-readable dates into machine-readable dates

* Neural machine translation with attention

'연구실' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바