Benjamin Biggs

I am a Research Scientist at Luma AI, where I work on unified understanding and generation models.

Previously, I was a Senior Applied Scientist at Amazon AGI, where I worked on the Amazon Nova family of generative models and Amazon Titan Image Generator.

I received my PhD in Computer Vision & Deep Learning from the University of Cambridge where I focused on 3D reconstruction of human and animal categories.

Outside the lab, I am a pianist, singer, theatre-goer and somewhat reluctant runner.

Position

Research Scientist, Luma AI

Location

San Francisco, California, United States

Email

benjbiggs@outlook.com

Research

My full publication list is available on Google Scholar.

Uni-1: A Unified Understanding and Generation Model
Uni-1: A Unified Understanding and Generation Model

Luma AI

A unified understanding and generation model that combines reasoning with visual imagination in a single autoregressive transformer. Enables structured reasoning during image synthesis, fine-grained visual understanding, and temporally consistent scene generation.

Amazon Nova 2 Omni

Amazon AGI

A unified multimodal model that processes text, image, video, and audio inputs and generates text and images. Supports up to 1M token contexts, enabling analysis of extensive codebases, long documents, and hours of video in a single prompt.

The Amazon Nova Family of Generative Models
The Amazon Nova Family of Generative Models

Amazon AGI

A family of foundation models spanning video generation (Nova Reel), image generation (Nova Canvas), and multimodal understanding (Nova Pro, Lite, Micro) across text, image, video, and document processing.

Diffusion Soup: Model Merging for Text-to-Image Diffusion Models (ECCV 2024)
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models (ECCV 2024)

Benjamin Biggs*, Arjun Seshadri*, Yang Zou, Achin Jain, Aditya Golatkar, Yusheng Xie, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto

A method for merging text-to-image diffusion models trained on sharded data. Enables training-free continual learning and unlearning with no extra memory or inference cost, achieving up to 30% improvement over a paragon model.

Amazon Titan Image Generator
Amazon Titan Image Generator

AWS Bedrock

A generative model for creating and editing high-quality images from natural language prompts, with built-in safeguards and invisible watermarking.

3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data (NeurIPS 2020, Spotlight - Top 3%)

Benjamin Biggs, Sébastien Ehrhadt, Hanbyul Joo, Benjamin Graham, Andrea Vedaldi and David Novotny

Recovering sets of plausible 3D human reconstructions from single and partially occluded views, using a best-of-M loss with a normalizing flow-based quantization scheme.

Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop (ECCV 2020)

Benjamin Biggs, Oliver Boyne, James Charles, Andrew Fitzgibbon and Roberto Cipolla

A fully automatic system for 3D dog reconstruction from weak 2D supervision, using SMBLD -- a deformable template model with a shape prior refined via expectation maximization. Also introduces the StanfordExtra dataset.

Creatures Great and SMAL: Recovering the Shape and Motion of Animals from Video (ACCV 2018, Oral - Top 5%)

Benjamin Biggs, Thomas Roddick, Andrew Fitzgibbon and Roberto Cipolla

Recovering 3D shape and motion of quadruped animals from monocular video. Trained on synthetic silhouettes to overcome limited animal motion capture data and generalize to real-world sequences.

Shape of You: Precise 3D Shape Estimations for Diverse Body Types (CVPR-W 2023, Oral)
Shape of You: Precise 3D Shape Estimations for Diverse Body Types (CVPR-W 2023, Oral)

Rohan Sakar, Achal Dave, Gerard Medioni and Benjamin Biggs

Improving 3D body shape estimation for diverse body types via new loss functions and a test-time optimization routine for parametric human reconstruction pipelines.

On the Road to Large-Scale 3D Monocular Scene Reconstruction Using Deep Implicit Functions (ICCV-W 2021)
On the Road to Large-Scale 3D Monocular Scene Reconstruction Using Deep Implicit Functions (ICCV-W 2021)

Thomas Roddick, Benjamin Biggs, Daniel Olmeda Reino and Roberto Cipolla

Using deep implicit functions to reconstruct large-scale driving scenes, with LiDAR-approximated occupancy labels to avoid requiring watertight training meshes.

Other Research

PhD Thesis - Benjamin Biggs
PhD Thesis - Benjamin Biggs

Supervisors: Roberto Cipolla & Andrew Fitzgibbon

Methods for 3D animal reconstruction using morphable models, covering synthetic silhouette training, in-the-loop shape prior refinement, and handling ambiguous inputs.

Virtual Try-On with Outfit Layer Mask
Virtual Try-On with Outfit Layer Mask

Benjamin Biggs, Philip Pinette, Charu Kothari, Caitlin Cagampan, Achal Dave, Scott Sun and Gerard Medioni

A virtual try-on method that generates a realistic digital model from an image and applies clothing using a layer mask. Built as part of Amazon Style, an ML-powered physical fashion store.

StanfordExtra
StanfordExtra

Benjamin Biggs, Oliver Boyne, Andrew Fitzgibbon and Roberto Cipolla

12k labelled instances of dogs in-the-wild with 2D keypoint and segmentations.

RodentNet

Benjamin Biggs, Andrew Fitzgibbon and Roberto Cipolla

Behaviour and key point predictions at ~15fps by a deep learning architecture we refer to as RodentNet. Results shown on validation sequences from the SCORHE dataset.

Kinect Gowning Application

Benjamin Biggs, Patrick Hyett and Abhir Bhalerao

Computer vision application for verifying regulatory gowning procedures in collaboration with GlaxoSmithKline. Won departmental award for best third year dissertation at the University of Warwick.