[main]Jack Yansong Li's Website
Regret analysis is broadly adopted in learning with online interactions. However, why using regret minimization instead of cost minimization as in other learning tasks such as batch learning? In this short note, we will give an example that by minimizing the regret, the learner gets a better reward.
Consider the function defined as
, where
is an arbitrary
function. The optimization problem defined as
![]() |
(1) |
can be viewed as a worst-case optimization problem, and
is called worst-case cost.
Let's consider a scenario where we have that
represents the time required to drive a car from one location to another
given policy
and environment
. In this context,
represents the driver's driving pattern, which encompass decisions like
adjusting speed, selecting lanes (left or right), and other driving
strategies. On the other hand,
represents a
binary condition indicating whether it's currently raining (
) or not (
).
Together, this function captures how the driver's decisions and weather
conditions collectively impact the time it takes for the journey.
It's a widely acknowledged fact that driving takes more time on days
when it's raining. Thus, . By
definition, the function
currently represents
the time required for driving on rainy days.
Now, define . We call
the worst-case regret cost. Unlike worst-case
cost, the regret cost measures the maximum achievement it can be
improved in the past days. For example, if a driver cannot drive in
the rainy day due to rheumatoid arthritis. Then there is no difference
between
and
for any
and
,
which makes
. No regret at
all! Because nothing can be improved in the worst-case.
Consider the function as follows:
The worst-case cost in this example is
for any
. Thus,
![]() |
(2) |
But for the worst-case regret cost, and
. Thus
![]() |
(3) |
The fundamental assumption of standard offline machine learning is that
the collected data are independently and identically distributed,
stemming from an unknown distribution [2]. However, this
assumption can be easily compromised in the realm of online learning [1]. For instance, let's reconsider a scenario involving a
driver who refrains from driving on rainy days. In this case, data
points (representing the time taken to travel between places) within a
one-hour time-frame might have a probability of collection as low as
. Conversely, data points
beyond one hour might possess a probability of collection greater than
. This implies that data
collected on different dates adhere to distinct distributions. This
distinction underscores a crucial difference between traditional offline
learning and online learning.