Difference between revisions of "Methods"

From aHuman Wiki
Jump to: navigation, search
(Automated page entry using MWPush.pl)
 
(Automated page entry using MWPush.pl)
 
Line 1: Line 1:
 
 
<pre style="color: green">Algorithms and Methods</pre>
 
<pre style="color: green">Algorithms and Methods</pre>
  
Line 10: Line 9:
 
* backward modeling
 
* backward modeling
 
* batch training
 
* batch training
**# batch back-propagation
+
  *# batch back-propagation
**# training by epoch
+
  *# training by epoch
 
* boosting (Schapire, 2001)
 
* boosting (Schapire, 2001)
 
* cascade algorithm
 
* cascade algorithm
Line 29: Line 28:
 
* greedy learning
 
* greedy learning
 
* incremental training (of neural network)
 
* incremental training (of neural network)
**# incremental back-propagation
+
  *# incremental back-propagation
**# online training
+
  *# online training
**# training by pattern
+
  *# training by pattern
 
* kernel methods
 
* kernel methods
 
* learning
 
* learning
**# learning-by-example
+
  *# learning-by-example
 
* mini-batch training (Wilson and Martinez, 2003)
 
* mini-batch training (Wilson and Martinez, 2003)
 
* off-line learning
 
* off-line learning
Line 42: Line 41:
 
* policy iteration algorithm (Kaelbling et al., 1996)
 
* policy iteration algorithm (Kaelbling et al., 1996)
 
* reinforcement learning
 
* reinforcement learning
**# episodic reinforcement learning
+
  *# episodic reinforcement learning
**# model-based reinforcement learning
+
  *# model-based reinforcement learning
**# model-free reinforcement learning
+
  *# model-free reinforcement learning
**# non-episodic reinforcement learning
+
  *# non-episodic reinforcement learning
**# off-policy learning
+
  *# off-policy learning
**# on-policy learning
+
  *# on-policy learning
**# tabular reinforcement learning
+
  *# tabular reinforcement learning
 
* supervised training
 
* supervised training
 
* temporal difference learning algorithms
 
* temporal difference learning algorithms
Line 55: Line 54:
 
* value iteration algorithm
 
* value iteration algorithm
 
* wake-sleep algorithm
 
* wake-sleep algorithm
**# contrastive wake-sleep
+
  *# contrastive wake-sleep
 
* weight update algorithm
 
* weight update algorithm
  
Line 66: Line 65:
 
* steepness parameter
 
* steepness parameter
 
* step size
 
* step size
**# fixed step-size
+
  *# fixed step-size
**# dynamic step-size
+
  *# dynamic step-size
 
* variational bound
 
* variational bound
  
Line 75: Line 74:
 
* Bayesian techniques (Neal, 1996)
 
* Bayesian techniques (Neal, 1996)
 
* Cascade 2
 
* Cascade 2
**# Cascade 2 with caching
+
  *# Cascade 2 with caching
 
* Cascade Correlation (Prechelt, 1997)
 
* Cascade Correlation (Prechelt, 1997)
 
* Casper algorithm (Treadgold and Gedeon, 1997)
 
* Casper algorithm (Treadgold and Gedeon, 1997)
Line 83: Line 82:
 
* Explicit Explore or Exploit (Kearns and Singh, 1998)
 
* Explicit Explore or Exploit (Kearns and Singh, 1998)
 
* Gibbs Sampling
 
* Gibbs Sampling
**# Alternating Gibbs Sampling
+
  *# Alternating Gibbs Sampling
 
* K-Nearest Neighbor
 
* K-Nearest Neighbor
 
* Learning Vector Quantization (LVQ)
 
* Learning Vector Quantization (LVQ)
Line 94: Line 93:
 
* N-Step Return Algorithm
 
* N-Step Return Algorithm
 
* Neural Fitted Q Iteration (Riedmiller, 2005)
 
* Neural Fitted Q Iteration (Riedmiller, 2005)
**# NFQ-SARSA(L)
+
  *# NFQ-SARSA(L)
 
* Optimal Brain Damage (LeCun et al., 1990)
 
* Optimal Brain Damage (LeCun et al., 1990)
 
* Orthogonal Least Squares (OLS)
 
* Orthogonal Least Squares (OLS)
Line 100: Line 99:
 
* Prioritized Sweeping (Sutton and Barto, 1998)
 
* Prioritized Sweeping (Sutton and Barto, 1998)
 
* Q-Learning
 
* Q-Learning
**# Delayed Q-learning (Strehl et al., 2006)
+
  *# Delayed Q-learning (Strehl et al., 2006)
**# Generalized Policy Iteration (GPI) (Sutton and Barto, 1998)
+
  *# Generalized Policy Iteration (GPI) (Sutton and Barto, 1998)
**# Naive Q(λ) (Sutton and Barto, 1998)
+
  *# Naive Q(λ) (Sutton and Barto, 1998)
**# One-Step Q-Learning
+
  *# One-Step Q-Learning
**# Peng’s Q(λ) (Peng and Williams, 1994)
+
  *# Peng’s Q(λ) (Peng and Williams, 1994)
**# Q(λ)
+
  *# Q(λ)
**# Watkin’s Q(λ) (Watkins, 1989)
+
  *# Watkin’s Q(λ) (Watkins, 1989)
 
* Q-SARSA(λ)
 
* Q-SARSA(λ)
**# One-Step Q-SARSA Algorithm
+
  *# One-Step Q-SARSA Algorithm
 
* Quickprop (Fahlman, 1988)
 
* Quickprop (Fahlman, 1988)
 
* R-Max (Brafman and Tennenholtz, 2002)
 
* R-Max (Brafman and Tennenholtz, 2002)
 
* RPROP (Riedmiller and Braun, 1993)
 
* RPROP (Riedmiller and Braun, 1993)
**# iRPROP- (Igel and Hsken, 2000)
+
  *# iRPROP- (Igel and Hsken, 2000)
 
* SARSA(λ)
 
* SARSA(λ)
**# On-Policy SARSA-Learning
+
  *# On-Policy SARSA-Learning
**# SARSA with Linear Function Approximators (Gordon, 2000)
+
  *# SARSA with Linear Function Approximators (Gordon, 2000)
 
* Simulated Annealing (Kirkpatrick et al., 1987)
 
* Simulated Annealing (Kirkpatrick et al., 1987)
 
* Sliding Window Cache
 
* Sliding Window Cache
 
* SONN (Tenorio and Lee, 1989)
 
* SONN (Tenorio and Lee, 1989)
 
* Temporal Difference (TD) Learning
 
* Temporal Difference (TD) Learning
**# Monte Carlo Prediction
+
  *# Monte Carlo Prediction
**# TD(0) Algorithm (Sutton, 1988)
+
  *# TD(0) Algorithm (Sutton, 1988)
**# TD(λ) Algorithm
+
  *# TD(λ) Algorithm
**# TD(λ) Algorithm With Linear Function Approximators (Tsitsiklis and Roy, 1996)
+
  *# TD(λ) Algorithm With Linear Function Approximators (Tsitsiklis and Roy, 1996)
**# Temporal Difference (TD) Prediction Algorithm
+
  *# Temporal Difference (TD) Prediction Algorithm
 
* λ-Return Approach
 
* λ-Return Approach

Latest revision as of 19:07, 28 November 2018

Algorithms and Methods

@@Home -> ArtificialIntelligenceDictionary -> methods


General Methods

  • back-propagation (Werbos, 1974)
  • backward modeling
  • batch training
  *# batch back-propagation
  *# training by epoch
  • boosting (Schapire, 2001)
  • cascade algorithm
  • constructive training (of neural network)
  • convex optimization
  • covariance training
  • direct error minimization
  • discriminative training methods
  • dynamic algorithms
  • ensemble learning (Krogh and Vedelsby, 1995; Diettrich, 2000)
  • explorative algorithms
  • exploitive algorithms
  • forward modeling
  • genetic algorithms (Goldberg, 1989)
  • global optimization techniques
  • gradient methods
  • greedy learning
  • incremental training (of neural network)
  *# incremental back-propagation
  *# online training
  *# training by pattern
  • kernel methods
  • learning
  *# learning-by-example
  • mini-batch training (Wilson and Martinez, 2003)
  • off-line learning
  • off-policy learning
  • on-policy learning
  • online learning
  • policy iteration algorithm (Kaelbling et al., 1996)
  • reinforcement learning
  *# episodic reinforcement learning
  *# model-based reinforcement learning
  *# model-free reinforcement learning
  *# non-episodic reinforcement learning
  *# off-policy learning
  *# on-policy learning
  *# tabular reinforcement learning
  • supervised training
  • temporal difference learning algorithms
  • training
  • unsupervised training
  • value iteration algorithm
  • wake-sleep algorithm
  *# contrastive wake-sleep
  • weight update algorithm

Common Parameters of Algorithms

  • fixed step-size
  • dynamic step-size
  • patience parameter
  • momentum
  • steepness parameter
  • step size
  *# fixed step-size
  *# dynamic step-size
  • variational bound

Named Algorithms

  • Backprop
  • Bayesian techniques (Neal, 1996)
  • Cascade 2
  *# Cascade 2 with caching
  • Cascade Correlation (Prechelt, 1997)
  • Casper algorithm (Treadgold and Gedeon, 1997)
  • Cerebellar Model Articulation Controller (CMAC) (Albus, 1975; Glanz et al., 1991; Sutton and Barto, 1998)
  • Contrastive Divergence Learning
  • Dyna-Q (Sutton and Barto, 1998)
  • Explicit Explore or Exploit (Kearns and Singh, 1998)
  • Gibbs Sampling
  *# Alternating Gibbs Sampling
  • K-Nearest Neighbor
  • Learning Vector Quantization (LVQ)
  • Levenberg-Marquardt (More, 1977)
  • Locality-Sensitive Hashing (LSH)
  • Maximum Likelihood (ML) Learning
  • Model-Based Interval Estimation (MBIE) (Strehl and Littman, 2004)
  • Model-Based Policy Gradient methods (MBPG) (Wang and Dietterich, 2003)
  • Monte Carlo Algorithms
  • N-Step Return Algorithm
  • Neural Fitted Q Iteration (Riedmiller, 2005)
  *# NFQ-SARSA(L)
  • Optimal Brain Damage (LeCun et al., 1990)
  • Orthogonal Least Squares (OLS)
  • Particle Swarm (Kennedy and Eberhart, 1995)
  • Prioritized Sweeping (Sutton and Barto, 1998)
  • Q-Learning
  *# Delayed Q-learning (Strehl et al., 2006)
  *# Generalized Policy Iteration (GPI) (Sutton and Barto, 1998)
  *# Naive Q(λ) (Sutton and Barto, 1998)
  *# One-Step Q-Learning
  *# Peng’s Q(λ) (Peng and Williams, 1994)
  *# Q(λ)
  *# Watkin’s Q(λ) (Watkins, 1989)
  • Q-SARSA(λ)
  *# One-Step Q-SARSA Algorithm
  • Quickprop (Fahlman, 1988)
  • R-Max (Brafman and Tennenholtz, 2002)
  • RPROP (Riedmiller and Braun, 1993)
  *# iRPROP- (Igel and Hsken, 2000)
  • SARSA(λ)
  *# On-Policy SARSA-Learning
  *# SARSA with Linear Function Approximators (Gordon, 2000)
  • Simulated Annealing (Kirkpatrick et al., 1987)
  • Sliding Window Cache
  • SONN (Tenorio and Lee, 1989)
  • Temporal Difference (TD) Learning
  *# Monte Carlo Prediction
  *# TD(0) Algorithm (Sutton, 1988)
  *# TD(λ) Algorithm
  *# TD(λ) Algorithm With Linear Function Approximators (Tsitsiklis and Roy, 1996)
  *# Temporal Difference (TD) Prediction Algorithm
  • λ-Return Approach