How does an LSTM process sequences longer than its memory?Time steps in Keras LSTMCannot understand LSTM inferenceWhat is the intuition behind a Long Short Term Memory (LSTM) recurrent neural network?How to design architecture for LSTM neural networkHow does Keras generate an LSTM layer. What's the dimensionality?Understanding Calculations in LSTMsLSTM learn its parameterWhy do we need both cell state and hidden value in LSTM networks?Need help understanding LSTM graph from Neural Architecture SearchStateful LSTM internal memory sizeWhy do we need second tanh() in LSTM cellKeras LSTM Long Term Dependencies
can i play a electric guitar through a bass amp?
What is the word for reserving something for yourself before others do?
Python: next in for loop
What does it mean to describe someone as a butt steak?
Can divisibility rules for digits be generalized to sum of digits
Languages that we cannot (dis)prove to be Context-Free
Watching something be written to a file live with tail
Which models of the Boeing 737 are still in production?
Test if tikzmark exists on same page
Why "Having chlorophyll without photosynthesis is actually very dangerous" and "like living with a bomb"?
Approximately how much travel time was saved by the opening of the Suez Canal in 1869?
Can I make popcorn with any corn?
A newer friend of my brother's gave him a load of baseball cards that are supposedly extremely valuable. Is this a scam?
Accidentally leaked the solution to an assignment, what to do now? (I'm the prof)
Dragon forelimb placement
Is it important to consider tone, melody, and musical form while writing a song?
Test whether all array elements are factors of a number
Why did Neo believe he could trust the machine when he asked for peace?
What do you call a Matrix-like slowdown and camera movement effect?
Problem of parity - Can we draw a closed path made up of 20 line segments...
Can an x86 CPU running in real mode be considered to be basically an 8086 CPU?
Is it unprofessional to ask if a job posting on GlassDoor is real?
Is this a crack on the carbon frame?
Why doesn't Newton's third law mean a person bounces back to where they started when they hit the ground?
How does an LSTM process sequences longer than its memory?
Time steps in Keras LSTMCannot understand LSTM inferenceWhat is the intuition behind a Long Short Term Memory (LSTM) recurrent neural network?How to design architecture for LSTM neural networkHow does Keras generate an LSTM layer. What's the dimensionality?Understanding Calculations in LSTMsLSTM learn its parameterWhy do we need both cell state and hidden value in LSTM networks?Need help understanding LSTM graph from Neural Architecture SearchStateful LSTM internal memory sizeWhy do we need second tanh() in LSTM cellKeras LSTM Long Term Dependencies
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
Terminology:
Cell: the LSTM unit containing input, forget, output gates and the hiddenhT
and cell statecT
.
Hidden units/memory: How far back in time the LSTM is "unrolled". A hidden unit is an instance of the cell at a particular time.- A hidden unit is parameterized by
[wT, cT, hT-1]
: The gate weights for the current hidden unit, the current cell state, and the last hidden unit's output. WherewT
represents input, output, forget gate weights.
An LSTM maintains separate gate weights wT
for each hidden unit. This way it can treat different points in time of a sequence differently.
Let's say an LSTM has 3 hidden units so has gate weights w1, 2, w3
for each of them. Then a sequence x1, x2,...xN
comes through. I am illustrating the cell as it transitions over time:
@ t=1
xN....x3 x2 x1
[w1, c1, h0]
(c2, h1)
@ t=2
xN....x4 x3 x2 x1
[w2, c2, h1]
(c3, h2)
@ t=3
xN....x5 x4 x3 x2 x1
[w3, c3, h2]
(c4, h3)
But what happens at t=4
? The LSTM only has memory, therefore gate weights, for 3 steps:
@ t=4
xN....x6 x5 x4 x3 x2 x1
[w?, c3, h2]
(c4, h3)
What weights are used for x4
and all the following inputs? In essence, how are sequences that are longer than an LSTM cell's memory treated? Do the gate weights reset back to w1
, or do they remain static at their latest value wT
?
Edit: My question is not a duplicate of the LSTM inference question. It is asking about multi-step prediction from inputs. However, I am asking about what weights are used over time for sequences that are longer than the internal hidden cell states. The question of weights is not addressed in that answer.
neural-networks lstm rnn
New contributor
$endgroup$
add a comment |
$begingroup$
Terminology:
Cell: the LSTM unit containing input, forget, output gates and the hiddenhT
and cell statecT
.
Hidden units/memory: How far back in time the LSTM is "unrolled". A hidden unit is an instance of the cell at a particular time.- A hidden unit is parameterized by
[wT, cT, hT-1]
: The gate weights for the current hidden unit, the current cell state, and the last hidden unit's output. WherewT
represents input, output, forget gate weights.
An LSTM maintains separate gate weights wT
for each hidden unit. This way it can treat different points in time of a sequence differently.
Let's say an LSTM has 3 hidden units so has gate weights w1, 2, w3
for each of them. Then a sequence x1, x2,...xN
comes through. I am illustrating the cell as it transitions over time:
@ t=1
xN....x3 x2 x1
[w1, c1, h0]
(c2, h1)
@ t=2
xN....x4 x3 x2 x1
[w2, c2, h1]
(c3, h2)
@ t=3
xN....x5 x4 x3 x2 x1
[w3, c3, h2]
(c4, h3)
But what happens at t=4
? The LSTM only has memory, therefore gate weights, for 3 steps:
@ t=4
xN....x6 x5 x4 x3 x2 x1
[w?, c3, h2]
(c4, h3)
What weights are used for x4
and all the following inputs? In essence, how are sequences that are longer than an LSTM cell's memory treated? Do the gate weights reset back to w1
, or do they remain static at their latest value wT
?
Edit: My question is not a duplicate of the LSTM inference question. It is asking about multi-step prediction from inputs. However, I am asking about what weights are used over time for sequences that are longer than the internal hidden cell states. The question of weights is not addressed in that answer.
neural-networks lstm rnn
New contributor
$endgroup$
$begingroup$
@Sycorax not a duplicate. That question is asking about multi-step prediction and how inputs relate to it. My question is about the internal mechanics i.e. what weights are used for sequences longer than memory.
$endgroup$
– hazrmard
Apr 3 at 16:44
$begingroup$
My question is not about prediction in the first place. It is about the use of weights. This is not addressed in the other question.
$endgroup$
– hazrmard
Apr 3 at 16:47
$begingroup$
Ah, I see. Withdrawn.
$endgroup$
– Sycorax
Apr 3 at 16:53
add a comment |
$begingroup$
Terminology:
Cell: the LSTM unit containing input, forget, output gates and the hiddenhT
and cell statecT
.
Hidden units/memory: How far back in time the LSTM is "unrolled". A hidden unit is an instance of the cell at a particular time.- A hidden unit is parameterized by
[wT, cT, hT-1]
: The gate weights for the current hidden unit, the current cell state, and the last hidden unit's output. WherewT
represents input, output, forget gate weights.
An LSTM maintains separate gate weights wT
for each hidden unit. This way it can treat different points in time of a sequence differently.
Let's say an LSTM has 3 hidden units so has gate weights w1, 2, w3
for each of them. Then a sequence x1, x2,...xN
comes through. I am illustrating the cell as it transitions over time:
@ t=1
xN....x3 x2 x1
[w1, c1, h0]
(c2, h1)
@ t=2
xN....x4 x3 x2 x1
[w2, c2, h1]
(c3, h2)
@ t=3
xN....x5 x4 x3 x2 x1
[w3, c3, h2]
(c4, h3)
But what happens at t=4
? The LSTM only has memory, therefore gate weights, for 3 steps:
@ t=4
xN....x6 x5 x4 x3 x2 x1
[w?, c3, h2]
(c4, h3)
What weights are used for x4
and all the following inputs? In essence, how are sequences that are longer than an LSTM cell's memory treated? Do the gate weights reset back to w1
, or do they remain static at their latest value wT
?
Edit: My question is not a duplicate of the LSTM inference question. It is asking about multi-step prediction from inputs. However, I am asking about what weights are used over time for sequences that are longer than the internal hidden cell states. The question of weights is not addressed in that answer.
neural-networks lstm rnn
New contributor
$endgroup$
Terminology:
Cell: the LSTM unit containing input, forget, output gates and the hiddenhT
and cell statecT
.
Hidden units/memory: How far back in time the LSTM is "unrolled". A hidden unit is an instance of the cell at a particular time.- A hidden unit is parameterized by
[wT, cT, hT-1]
: The gate weights for the current hidden unit, the current cell state, and the last hidden unit's output. WherewT
represents input, output, forget gate weights.
An LSTM maintains separate gate weights wT
for each hidden unit. This way it can treat different points in time of a sequence differently.
Let's say an LSTM has 3 hidden units so has gate weights w1, 2, w3
for each of them. Then a sequence x1, x2,...xN
comes through. I am illustrating the cell as it transitions over time:
@ t=1
xN....x3 x2 x1
[w1, c1, h0]
(c2, h1)
@ t=2
xN....x4 x3 x2 x1
[w2, c2, h1]
(c3, h2)
@ t=3
xN....x5 x4 x3 x2 x1
[w3, c3, h2]
(c4, h3)
But what happens at t=4
? The LSTM only has memory, therefore gate weights, for 3 steps:
@ t=4
xN....x6 x5 x4 x3 x2 x1
[w?, c3, h2]
(c4, h3)
What weights are used for x4
and all the following inputs? In essence, how are sequences that are longer than an LSTM cell's memory treated? Do the gate weights reset back to w1
, or do they remain static at their latest value wT
?
Edit: My question is not a duplicate of the LSTM inference question. It is asking about multi-step prediction from inputs. However, I am asking about what weights are used over time for sequences that are longer than the internal hidden cell states. The question of weights is not addressed in that answer.
neural-networks lstm rnn
neural-networks lstm rnn
New contributor
New contributor
edited Apr 3 at 16:58
Sycorax
42.3k12111207
42.3k12111207
New contributor
asked Apr 3 at 16:30
hazrmardhazrmard
1085
1085
New contributor
New contributor
$begingroup$
@Sycorax not a duplicate. That question is asking about multi-step prediction and how inputs relate to it. My question is about the internal mechanics i.e. what weights are used for sequences longer than memory.
$endgroup$
– hazrmard
Apr 3 at 16:44
$begingroup$
My question is not about prediction in the first place. It is about the use of weights. This is not addressed in the other question.
$endgroup$
– hazrmard
Apr 3 at 16:47
$begingroup$
Ah, I see. Withdrawn.
$endgroup$
– Sycorax
Apr 3 at 16:53
add a comment |
$begingroup$
@Sycorax not a duplicate. That question is asking about multi-step prediction and how inputs relate to it. My question is about the internal mechanics i.e. what weights are used for sequences longer than memory.
$endgroup$
– hazrmard
Apr 3 at 16:44
$begingroup$
My question is not about prediction in the first place. It is about the use of weights. This is not addressed in the other question.
$endgroup$
– hazrmard
Apr 3 at 16:47
$begingroup$
Ah, I see. Withdrawn.
$endgroup$
– Sycorax
Apr 3 at 16:53
$begingroup$
@Sycorax not a duplicate. That question is asking about multi-step prediction and how inputs relate to it. My question is about the internal mechanics i.e. what weights are used for sequences longer than memory.
$endgroup$
– hazrmard
Apr 3 at 16:44
$begingroup$
@Sycorax not a duplicate. That question is asking about multi-step prediction and how inputs relate to it. My question is about the internal mechanics i.e. what weights are used for sequences longer than memory.
$endgroup$
– hazrmard
Apr 3 at 16:44
$begingroup$
My question is not about prediction in the first place. It is about the use of weights. This is not addressed in the other question.
$endgroup$
– hazrmard
Apr 3 at 16:47
$begingroup$
My question is not about prediction in the first place. It is about the use of weights. This is not addressed in the other question.
$endgroup$
– hazrmard
Apr 3 at 16:47
$begingroup$
Ah, I see. Withdrawn.
$endgroup$
– Sycorax
Apr 3 at 16:53
$begingroup$
Ah, I see. Withdrawn.
$endgroup$
– Sycorax
Apr 3 at 16:53
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
The gates are a function of the weights, the cell state and the hidden state.
The weights are fixed.
Consider the equation for the forget gate $f_t$:
$$f_t = sigma(W_f cdot [h_t-1, x_t]+b_f)$$
The forget gate uses the new data $x_t$ and the hidden state $h_t-1$, but $W_f$ and $b_f$ are fixed. This is why the LSTM only needs to keep the previous $h$ and the previous $c$.
More information:
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
$endgroup$
$begingroup$
So my premise that separate weights are maintained for each hidden state is incorrect? Only a single set of weights is learned no matter how far back in time the network is rolled?
$endgroup$
– hazrmard
Apr 3 at 17:07
1
$begingroup$
Yes, that was why I was initially confused. There's a single set of weights and biases. The weights, the previous $h$ and previous $c$, and the new input $x$ are used to update the gates. The gates are used to (1) produce the prediction for the next step and (2) update $h$ and $c$.
$endgroup$
– Sycorax
Apr 3 at 17:10
add a comment |
$begingroup$
Expanding a little bit what Sycorax said, the basic recurrent cell is something like
$$ h_t = tanh(W_hhh_t-1 + W_xhx_t) $$
It is a function of previous hidden state $h_t-1$ and current input $x_t$, and returns current hidden state $h_t$. Same applies to LSTM cell that is a special kind of RNN.
So it does not "look" directly at any input in the past, the information from the past is passed only through the hidden states $h_t$. What follows, in theory all the history to some extent contributes to $h_t$. If you have multiple such cells, then they do not look directly at different points in time, but just use different weights. Of course, it can be the case that some of the hidden states will carry more information from points that are further in time, as compared to other hidden states, but the kind of information that is carried is something that is learned from the data, rather then forced.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
hazrmard is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f400998%2fhow-does-an-lstm-process-sequences-longer-than-its-memory%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The gates are a function of the weights, the cell state and the hidden state.
The weights are fixed.
Consider the equation for the forget gate $f_t$:
$$f_t = sigma(W_f cdot [h_t-1, x_t]+b_f)$$
The forget gate uses the new data $x_t$ and the hidden state $h_t-1$, but $W_f$ and $b_f$ are fixed. This is why the LSTM only needs to keep the previous $h$ and the previous $c$.
More information:
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
$endgroup$
$begingroup$
So my premise that separate weights are maintained for each hidden state is incorrect? Only a single set of weights is learned no matter how far back in time the network is rolled?
$endgroup$
– hazrmard
Apr 3 at 17:07
1
$begingroup$
Yes, that was why I was initially confused. There's a single set of weights and biases. The weights, the previous $h$ and previous $c$, and the new input $x$ are used to update the gates. The gates are used to (1) produce the prediction for the next step and (2) update $h$ and $c$.
$endgroup$
– Sycorax
Apr 3 at 17:10
add a comment |
$begingroup$
The gates are a function of the weights, the cell state and the hidden state.
The weights are fixed.
Consider the equation for the forget gate $f_t$:
$$f_t = sigma(W_f cdot [h_t-1, x_t]+b_f)$$
The forget gate uses the new data $x_t$ and the hidden state $h_t-1$, but $W_f$ and $b_f$ are fixed. This is why the LSTM only needs to keep the previous $h$ and the previous $c$.
More information:
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
$endgroup$
$begingroup$
So my premise that separate weights are maintained for each hidden state is incorrect? Only a single set of weights is learned no matter how far back in time the network is rolled?
$endgroup$
– hazrmard
Apr 3 at 17:07
1
$begingroup$
Yes, that was why I was initially confused. There's a single set of weights and biases. The weights, the previous $h$ and previous $c$, and the new input $x$ are used to update the gates. The gates are used to (1) produce the prediction for the next step and (2) update $h$ and $c$.
$endgroup$
– Sycorax
Apr 3 at 17:10
add a comment |
$begingroup$
The gates are a function of the weights, the cell state and the hidden state.
The weights are fixed.
Consider the equation for the forget gate $f_t$:
$$f_t = sigma(W_f cdot [h_t-1, x_t]+b_f)$$
The forget gate uses the new data $x_t$ and the hidden state $h_t-1$, but $W_f$ and $b_f$ are fixed. This is why the LSTM only needs to keep the previous $h$ and the previous $c$.
More information:
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
$endgroup$
The gates are a function of the weights, the cell state and the hidden state.
The weights are fixed.
Consider the equation for the forget gate $f_t$:
$$f_t = sigma(W_f cdot [h_t-1, x_t]+b_f)$$
The forget gate uses the new data $x_t$ and the hidden state $h_t-1$, but $W_f$ and $b_f$ are fixed. This is why the LSTM only needs to keep the previous $h$ and the previous $c$.
More information:
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
edited Apr 3 at 17:02
answered Apr 3 at 16:53
SycoraxSycorax
42.3k12111207
42.3k12111207
$begingroup$
So my premise that separate weights are maintained for each hidden state is incorrect? Only a single set of weights is learned no matter how far back in time the network is rolled?
$endgroup$
– hazrmard
Apr 3 at 17:07
1
$begingroup$
Yes, that was why I was initially confused. There's a single set of weights and biases. The weights, the previous $h$ and previous $c$, and the new input $x$ are used to update the gates. The gates are used to (1) produce the prediction for the next step and (2) update $h$ and $c$.
$endgroup$
– Sycorax
Apr 3 at 17:10
add a comment |
$begingroup$
So my premise that separate weights are maintained for each hidden state is incorrect? Only a single set of weights is learned no matter how far back in time the network is rolled?
$endgroup$
– hazrmard
Apr 3 at 17:07
1
$begingroup$
Yes, that was why I was initially confused. There's a single set of weights and biases. The weights, the previous $h$ and previous $c$, and the new input $x$ are used to update the gates. The gates are used to (1) produce the prediction for the next step and (2) update $h$ and $c$.
$endgroup$
– Sycorax
Apr 3 at 17:10
$begingroup$
So my premise that separate weights are maintained for each hidden state is incorrect? Only a single set of weights is learned no matter how far back in time the network is rolled?
$endgroup$
– hazrmard
Apr 3 at 17:07
$begingroup$
So my premise that separate weights are maintained for each hidden state is incorrect? Only a single set of weights is learned no matter how far back in time the network is rolled?
$endgroup$
– hazrmard
Apr 3 at 17:07
1
1
$begingroup$
Yes, that was why I was initially confused. There's a single set of weights and biases. The weights, the previous $h$ and previous $c$, and the new input $x$ are used to update the gates. The gates are used to (1) produce the prediction for the next step and (2) update $h$ and $c$.
$endgroup$
– Sycorax
Apr 3 at 17:10
$begingroup$
Yes, that was why I was initially confused. There's a single set of weights and biases. The weights, the previous $h$ and previous $c$, and the new input $x$ are used to update the gates. The gates are used to (1) produce the prediction for the next step and (2) update $h$ and $c$.
$endgroup$
– Sycorax
Apr 3 at 17:10
add a comment |
$begingroup$
Expanding a little bit what Sycorax said, the basic recurrent cell is something like
$$ h_t = tanh(W_hhh_t-1 + W_xhx_t) $$
It is a function of previous hidden state $h_t-1$ and current input $x_t$, and returns current hidden state $h_t$. Same applies to LSTM cell that is a special kind of RNN.
So it does not "look" directly at any input in the past, the information from the past is passed only through the hidden states $h_t$. What follows, in theory all the history to some extent contributes to $h_t$. If you have multiple such cells, then they do not look directly at different points in time, but just use different weights. Of course, it can be the case that some of the hidden states will carry more information from points that are further in time, as compared to other hidden states, but the kind of information that is carried is something that is learned from the data, rather then forced.
$endgroup$
add a comment |
$begingroup$
Expanding a little bit what Sycorax said, the basic recurrent cell is something like
$$ h_t = tanh(W_hhh_t-1 + W_xhx_t) $$
It is a function of previous hidden state $h_t-1$ and current input $x_t$, and returns current hidden state $h_t$. Same applies to LSTM cell that is a special kind of RNN.
So it does not "look" directly at any input in the past, the information from the past is passed only through the hidden states $h_t$. What follows, in theory all the history to some extent contributes to $h_t$. If you have multiple such cells, then they do not look directly at different points in time, but just use different weights. Of course, it can be the case that some of the hidden states will carry more information from points that are further in time, as compared to other hidden states, but the kind of information that is carried is something that is learned from the data, rather then forced.
$endgroup$
add a comment |
$begingroup$
Expanding a little bit what Sycorax said, the basic recurrent cell is something like
$$ h_t = tanh(W_hhh_t-1 + W_xhx_t) $$
It is a function of previous hidden state $h_t-1$ and current input $x_t$, and returns current hidden state $h_t$. Same applies to LSTM cell that is a special kind of RNN.
So it does not "look" directly at any input in the past, the information from the past is passed only through the hidden states $h_t$. What follows, in theory all the history to some extent contributes to $h_t$. If you have multiple such cells, then they do not look directly at different points in time, but just use different weights. Of course, it can be the case that some of the hidden states will carry more information from points that are further in time, as compared to other hidden states, but the kind of information that is carried is something that is learned from the data, rather then forced.
$endgroup$
Expanding a little bit what Sycorax said, the basic recurrent cell is something like
$$ h_t = tanh(W_hhh_t-1 + W_xhx_t) $$
It is a function of previous hidden state $h_t-1$ and current input $x_t$, and returns current hidden state $h_t$. Same applies to LSTM cell that is a special kind of RNN.
So it does not "look" directly at any input in the past, the information from the past is passed only through the hidden states $h_t$. What follows, in theory all the history to some extent contributes to $h_t$. If you have multiple such cells, then they do not look directly at different points in time, but just use different weights. Of course, it can be the case that some of the hidden states will carry more information from points that are further in time, as compared to other hidden states, but the kind of information that is carried is something that is learned from the data, rather then forced.
answered Apr 3 at 17:24
Tim♦Tim
59.8k9132225
59.8k9132225
add a comment |
add a comment |
hazrmard is a new contributor. Be nice, and check out our Code of Conduct.
hazrmard is a new contributor. Be nice, and check out our Code of Conduct.
hazrmard is a new contributor. Be nice, and check out our Code of Conduct.
hazrmard is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f400998%2fhow-does-an-lstm-process-sequences-longer-than-its-memory%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
@Sycorax not a duplicate. That question is asking about multi-step prediction and how inputs relate to it. My question is about the internal mechanics i.e. what weights are used for sequences longer than memory.
$endgroup$
– hazrmard
Apr 3 at 16:44
$begingroup$
My question is not about prediction in the first place. It is about the use of weights. This is not addressed in the other question.
$endgroup$
– hazrmard
Apr 3 at 16:47
$begingroup$
Ah, I see. Withdrawn.
$endgroup$
– Sycorax
Apr 3 at 16:53