The update rule TD(0) Q-Learning:
Q(t-1) = (1-alpha) * Q(t-1) + (alpha) * (Reward(t-1) + gamma* Max( Q(t) ) )
Then take either the current best action (to optimize) or a random action (to explorer)
Where MaxNextQ is the maximum Q that can be got in the next state...
But in TD(1) I think update rule will be:
Q(t-2) = (1-alpha) * Q(t...
Background
Users can pick dates as shown in the following screen shot:
Any starting month/day and ending month/day combinations are valid, such as:
Mar 22 to Jun 22
Dec 1 to Feb 28
The second combination is difficult (I call it the "tricky date scenario") because the year for the ending month/day should come after the year for th...
I'm trying to wrap my head around this task and wondering if there is a standard way of doing this or some libraries that would be useful.
Certain events are tracked and timed at several data sources S1 ... SN. The recorded information is the event type and timestamp. There may be several events of the same type sequentially or they may...
I have a table with a CreatedDate column. I want to order by the rows that have been in the database the longest, but need to determine the number of days by subtracting it from DateTime.Now.
I realise this is doable simply by ordering by the CreatedDate, but I need to do more than that.
I need to order the items by the amount of times...