Jack Yansong Li's Website

Regret analysis is broadly adopted in learning with online interactions. However, why using regret minimization instead of cost minimization as in other learning tasks such as batch learning? In this short note, we will give an example that by minimizing the regret, the learner gets a better reward.

1.Worst-case cost and worst-case regret cost

Consider the function

defined as

, where

is an arbitrary function. The optimization problem defined as

can be viewed as a worst-case optimization problem, and

is called worst-case cost.

Let's consider a scenario where we have

that represents the time required to drive a car from one location to another given policy

and environment

. In this context,

represents the driver's driving pattern, which encompass decisions like adjusting speed, selecting lanes (left or right), and other driving strategies. On the other hand,

represents a binary condition indicating whether it's currently raining (

) or not (

). Together, this function captures how the driver's decisions and weather conditions collectively impact the time it takes for the journey.

It's a widely acknowledged fact that driving takes more time on days when it's raining. Thus,

. By definition, the function

currently represents the time required for driving on rainy days.

Now, define

. We call

the worst-case regret cost. Unlike worst-case cost, the regret cost measures the maximum achievement it can be improved in the past days. For example, if a driver cannot drive in the rainy day due to rheumatoid arthritis. Then there is no difference between

and

for any

and

, which makes

. No regret at all! Because nothing can be improved in the worst-case.

2.An example that minimizes regret cost results in a better policy

3.Data correlation

The fundamental assumption of standard offline machine learning is that the collected data are independently and identically distributed, stemming from an unknown distribution [2]. However, this assumption can be easily compromised in the realm of online learning [1]. For instance, let's reconsider a scenario involving a driver who refrains from driving on rainy days. In this case, data points (representing the time taken to travel between places) within a one-hour time-frame might have a probability of collection as low as

. Conversely, data points beyond one hour might possess a probability of collection greater than

. This implies that data collected on different dates adhere to distinct distributions. This distinction underscores a crucial difference between traditional offline learning and online learning.

Bibliography

[1]: Shai Shalev-Shwartz. Online Learning and Online Convex Optimization. Foundations and Trends in Machine Learning, 4(2):107–194, 2011.
[2]: V.N. Vapnik. An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10(5):988–999, sep 1999.

Why using regret in online learning

1.Worst-case cost and worst-case regret cost

2.An example that minimizes regret cost results in a better policy

3.Data correlation

Bibliography