Accelerating Exploration with Unlabeled Prior Data
Paper
•
2311.05067
•
Published
•
1
Note Subset of parameter learnable during inference with SSL target. Great idea.
Note Still post-training.
Note Basically wrong, Markov Decision Process requires decision making invariant to history, the consideration of temporally dependent goal that's not encoded in current state itself is falacy.