Your question might find its answer there https://moodlearchive.epfl.ch/2018-2019/mod/forum/discuss.php?d=17853
The implementation suggested does not involve a sliding window but a prediction (of the next word token) at each timestep.
X: word[0], ..., word[n], ..., word[maxlen-1]
T: word[1], ..., word[n+1], ..., word[maxlen]
Best,
Florian