Jun 18, 2026
Centralized content-addressed version control using Merkle trees and chunked binary storage
Lore implements a service-backed architecture with content hashes, immutable revision chains, chunk deduplication, and lazy workspace hydration so large binary repositories stay fast without full checkouts. The design refines established VCS techniques for game-scale assets rather than introducing a new paradigm, limiting broad adoption outside studios already managing mixed code and media assets.
Run Inspect AI evaluations at scale on AWS with Kubernetes and results warehouse
Hawk provisions EKS pods, Fargate-hosted APIs, Aurora PostgreSQL, and an LLM proxy through Pulumi so that YAML-defined eval sets execute with isolated sandboxes and automatic log ingestion. The design extends the open-source Inspect framework with enterprise access controls and scanning but targets only AI safety and evaluation teams rather than general ML practitioners.
CEO-Bench evaluates long-horizon LLM agents via 500-day startup simulations with business databases and market dynamics
Agents operate through a programmable interface accessing databases, management tools, and social media inside a partially observable noisy market that features delayed and coupled consequences. This benchmark introduces sustained multi-month decision-making under uncertainty to agent evaluation, attracting researchers who need realistic tests beyond short-horizon tasks.
Flexible Python codebase for point cloud perception with official implementations of recent CVPR papers
It supplies modular training pipelines, dataset loaders, and backbones including Point Transformer variants, sparse CNNs, and self-supervised pretraining methods such as Sonata and masked scene contrast. The collection consolidates established 3D representation techniques for researchers already working in indoor and outdoor scene understanding rather than introducing a broadly reusable primitive.