Knowing the Unknown: Experimentation under Delayed Success
Speakers:
Shraman BanerjeeShiv Nadar University
Abstract:-
We consider a continuous-time dynamic principal-agent model with experimentation in a mixed-news setting. Our model is an exponential two-armed bandit framework, featuring both known and unknown risky arms, where the unknown arm can be either good or bad. The agent can achieve success and failure from both known and good unknown risky arms, while failure can be achieved from a bad unknown risky arm. We assume success to be conclusive i.e. a single success truly reveals the unknown arm’s type, whereas failures are inconclusive. Although success arrives at a lower rate on the unknown risky arm than on the known arm, the unknown arm generates higher revenue for the experimenting agent. To motivate the agent to experiment with the unknown arm in spite of delayed success, the principal proposes a mechanism that rewards failures and punishes successes.