What’s missing for AGI in today’s tech trajectories — and what we should work on next
TL;DR: Current progress — large-scale connectionist models, transformers, and clever self-supervised recipes — has moved the needle a lot, but several structural gaps remain on the path toward systems that learn, reason and act more like humans. Key missing pieces: online continual learning, grounded multimodal perception, reliable long-term memory, motivated (hot) control, dynamic attention, metacognition/conscious-like world models, and fluid reasoning & planning. Below I summarize the deficits and give concrete research directions and practical experiments the community can start today.
1) Current trajectory in one paragraph
Most of today’s stack builds on connectionist ideas (neural nets) scaled with transformer-like architectures and huge self-supervised pretraining corpora. Improvements focus on scaling, architectural tweaks, in-context learning, and adding external vector stores as a kind of auxiliary memory. That progress is enormous — but the resulting systems are still largely static, associative, and brittle when asked to learn or act outside their training distribution.
2) Core deficits and what to work on
Learning: from static weights → continuous, curiosity-driven learning
Deficit: Models are trained in one-off batches; weights are effectively static at deployment. Humans learn continuously — from sensory streams, from internal rehearsal during rest/sleep, from surprise, from imitation, and from insight. Directions:
- Build continual learning pipelines that update models online without forgetting (efficient rehearsal, sparse updates, synaptic consolidation).
- Develop neuromorphic / Hebbian-style modules that complement gradient learning for fast adaptation and passive learning from sensory streams.
- Integrate intrinsic motivation (curiosity & surprise) objectives to guide what to store and when to update. Use active inference as a unifying framework for perception + learning policies.
Perception: ground priors through real interaction
Deficit: Language models lack embodied, multimodal grounding and tend to ignore bottom-up/top-down constraint satisfaction needed for robust, reality-tuned priors. Directions:
- Prioritize perceptual learning benchmarks (multimodal continual streams, video + tactile + proprioception simulators and robots).
- Architectures that fuse bottom-up signals and top-down expectations early (multi-constraint satisfaction, hierarchical predictive coding).
- Research dynamics that produce metastability (edge-of-chaos transient dynamics) to support flexible binding and context-sensitive awareness.
Memory: beyond fixed context windows & brittle vector DBs
Deficit: Large context windows help short-term work but don’t replace human-like, reliable long-term memory with contextualized retrieval and forgetting. Vector DBs are useful but brittle and often lack structure for causal, episodic recall. Directions:
- Invest in integrated memory systems that blur short/long term: retrieval mechanisms that increase precision (latent reasoning), context-sensitive indexing, and hierarchical embeddings.
- Research constructive forgetting and memory weakening as part of optimization (prune stale weights/embeddings automatically).
- Explore depth-of-processing encoding strategies (make retrieval breadcrumbs deliberately optimized by the agent).
Hot vs Cold executive functions: add motivation & valence
Deficit: Pretraining uses “cold” objectives with no built-in motivation or intrinsic valence; models lack internal drives that bias which computations run. Directions:
- Implement dual loops: what to think (content selection) vs how desirable candidate thoughts are (value/valence).
- Add intrinsic value learning during pretraining (curiosity, competence progress, surprise minimization) so the model builds preferences for informative computations.
- Investigate model-predictive control and neuromorphic dynamics for steering attention and thought rather than relying solely on prompts.
Attention: from static predictive attention → proactive, precision-setting attention
Deficit: Transformer attention is precomputed and reactive; human attention is proactive, sets precision, and is shaped by arousal and goals. LLM attention confuses expected text patterns rather than logically steering cognitive resources. Directions:
- Design attention that adjusts precision/uncertainty dynamically (Bayesian precision control) and can be gated by goals or motivational signals.
- Couple attentional control with metacognitive signals (e.g., uncertainty estimates trigger deeper reasoning).
Consciousness & coherent world models: building bounded generative models
Deficit: LLMs don’t maintain coherent causal world models constrained by space/time/causality; no belief maintenance or metastable dynamics that mimic conscious transitions. Directions:
- Pursue architectures that learn compact, causal world models (latent dynamics with explicit causal variables).
- Combine probabilistic belief maintenance with metastable dynamics to allow the system to “bind” surprising inputs into new concepts.
Understanding & representation: associative → hierarchical, compositional models
Deficit: LLM “understanding” is associative and predictive; human understanding is hierarchical, compositional, causal and refined through iterative rerepresentation. Directions:
- Push neurosymbolic hybrids: let neural learners discover structure then export to explicit symbolic/graph representations for compositional manipulation.
- Build explicit convergence zones (cross-modal buffers) that bring different information streams together for re-presentation and compositional hypothesis generation.
Metacognition: reflect, recognize ignorance, allocate resources
Deficit: LLMs have poor built-in reflection: they do not reliably know what they don’t know or strategically allocate inference resources. Directions:
- Add internal monitoring layers: confidence estimation tied to control policies that decide whether to compute further, retrieve memory, or ask for more data.
- Train models to estimate cost/benefit of additional reasoning steps and to schedule computations under time/resource constraints.
Reasoning & problem solving: pattern match → active inference & causal induction
Deficit: LLMs pattern-match within distribution and struggle with novel, fluid inference and causal reasoning. Directions:
- Develop active inference and belief-plausibility engines that produce, test and revise causal hypotheses (possibly leveraging NARS or other incremental reasoning frameworks).
- Run hybrid pipelines that combine fast associative models with symbolic reasoners and causal discovery modules, and ensure results are written back into memory for reuse.
Decision making, planning & volition: from echoing data → goal formation & planning
Deficit: No intrinsic needs or dynamically formed goals; planning is shallow and brittle. Directions:
- Equip agents with intrinsic goal formation systems (curiosity + need states) and world models that support inverse planning and contingency generation (backup plans, counterfactual sim).
- Create planning benchmarks that require retrieving episodic analogs, simulating causal consequences, and composing multi-step plans with failure recovery.
3) Practical experiments & benchmarks to prioritize now
- Lifelong learning benchmarks: multimodal streams with evaluation on retrospective recall, forward transfer, and controlled forgetting.
- Curiosity-driven micro-tasks: tasks where intrinsic rewards (novelty, compressibility, competence progress) are the primary training signal.
- Memory reliability tests: episodic retrieval tasks where the agent must autonomously create, index, and later retrieve contextually-appropriate memories.
- Metacognition challenge suite: tasks requiring the agent to decide if it should ask for help, compute further, or decline confidently.
- Planning under resource constraints: time/budgeted planning tasks that reward robust backup plans and graceful degradation.
4) Architectural themes to explore
- Hybrid learning stacks: gradients + local Hebbian/neuromorphic updates + symbolic modules.
- Active inference controllers: agents that minimize expected free energy to select sensing, memory, and action.
- Memory architectures that are learned not engineered: indices, compression, and retrieval policies trained end-to-end with the agent’s objectives.
- Cross-modal convergence zones: explicit buffers where different modalities are fused, re-represented, and made available to reasoning modules.
- Meta-controllers: learnable controllers that set precision, attention, and compute budgets dynamically.
5) Societal, safety and evaluation notes
As agents gain continual learning, intrinsic motivation, and the ability to form goals, careful evaluation and alignment become essential. Benchmarks should include safety-stress tests, mechanisms to audit memory and goals, and ways to freeze or constraint goal formation during deployment. Open, reproducible evaluation suites are critical.
6) Call to action for the community
- Build and share lifelong learning datasets and simulators for embodied multimodal streams.
- Open-source memory modules and benchmarks that measure retrieval fidelity and forgetting.
- Publish reproducible experiments combining curiosity, metacognition, and active inference objectives.
- Integrate neuromorphic or Hebbian submodules into modern stacks and report tradeoffs in compute, robustness, and sample efficiency.
- Create shared evaluation frameworks that measure goal formation, planning reliability, and ability to learn from internal rehearsal.
Closing
Transformers and self-supervised learning reshaped what’s possible — but they’re one piece of a larger puzzle. Getting closer to AGI will mean complementing current strengths (scale, pattern learning) with mechanisms for continual, goal-driven learning; grounded perception; reliable, contextual memory; dynamic attention and metacognitive control; and causal, compositional reasoning. The practical path is hybrid: trainable neurosymbolic stacks, better memory systems, and intrinsic motivation mechanisms — all evaluated with rigorous, open benchmarks that stress life-long, embodied competence.
If you want, I can turn this into:
- a Hugging Face blog post draft formatted for the site (with meta, tags and short author bio),
- or a concrete experimental checklist (datasets, metrics, toy environments) you can run in a Week-0 prototyping sprint. Which would you like?