Week 8 - Reading

Chapter 9

  • This hidden layer is, in turn, used to calculate a corresponding output, yt. Sequences are processed by presenting one element at a time to the network. The key difference from a feed  forward network lies in the recurrent link shown in the figure with the dashed line. This link augments the input to the hidden layer with the activation value of the hidden layer from the preceding point in time.
  • In the commonly encountered case of soft classification, finding yt consists of a softmax computation that provides a normalized probability distribution over the sequential nature of simple recurrent networks can be illustrated by unrolling the network.
  • For applications that involve much longer input sequences, such as speech recognition, character-by-character sentence processing, or streaming of continuous inputs, unrolling an entire input sequence may not be feasible. In these cases, we can unroll the input into manageable fixed-length segments and treat each segment as a distinct training item. This approach is called Truncated Backpropagation Through Time (TBTT).
  • A closely related, and extremely useful, application of sequence labeling is to find and classify spans of text corresponding to items of interest in some task domain. An example of such a task is named entity recognition — the problem of named entity recognition finding all the spans in a text that correspond to names of people, places or organizations.
  • The loss is backpropagated all the way through the weights in the feedforward classifier through to its input, and then through to the three sets of weights in the RNN as described earlier in Section 9.1.2. This combination of a simple recurrent network with a feedforward classifier is our first example of a deep neural network.
  • Long short-term memory (LSTM) networks, divide the context management problem into two sub-problems: removing information no longer needed from the context, and adding information likely to be needed for later decision making. The key to the approach is to learn how to manage this context rather than hard-coding a strategy into the architecture.
  • The inputs to this network consist of ordinary word embeddings enriched with character information. Specifically, each input consists of the concatenation of the normal word embedding with embeddings derived from a bidirectional RNN that accepts the character sequences for each word as input,

Comments

Popular posts from this blog

Week 7 - Reading

Week 5 - Reading