> Three 30 ns MD runs of human β2AR, in parallel, from a browser
> Apo in water, carazolol-bound, and embedded in a POPC bilayer
> Finished in an afternoon on cloud GPUs.
> No CHARMM-GUI session, No terminal, no topology debugging, no queue.
> Life is good!
Focus on the Science, we take care of the rest. Links in the comments.
We ran β2AR three ways: bare, with carazolol bound, and in a lipid bilayer. Carazolol sits in the extracellular pocket, but the RMSF drop shows up at the cytoplasmic end of TM6.Inverse agonism as distal dampening, visible in 30 ns of MD. Read more in the blogpost. Links in
We ran β2AR three ways: bare, with carazolol bound, and in a lipid bilayer. Carazolol sits in the extracellular pocket, but the RMSF drop shows up at the cytoplasmic end of TM6.Inverse agonism as distal dampening, visible in 30 ns of MD. Read more in the blogpost. Links in comments.
We at almost 50K downloads in 3 days niceee!!! Would love to see what the community is building. Meanwhile stay tuned for some more surprising coming. Currently cooking!!!
We have released the biggest protein data collection on Hugging Face, guys!
We have been working on this for more than 3 weeks now, starting from curating the raw data, doing a lot of filtering, splitting the datasets, sharding them, and doing a lot of analysis. Everything is
We have released the biggest protein data collection on Hugging Face, guys!
We have been working on this for more than 3 weeks now, starting from curating the raw data, doing a lot of filtering, splitting the datasets, sharding them, and doing a lot of analysis. Everything is summarized in our recent blog post.
LLMs got FineWeb, The Pile, RedPajama, Dolma. Protein ML got per-paper supplementary tables and FTP mirrors scattered across a dozen institutions.
Today we're releasing AminoWeb on @huggingface : 29 cleaned, ML-ready protein datasets, ~7.5 TB total. Sequence, structure,
The open-source protein ML space just got a massive upgrade. Phenomenal work by @anindyadeeps and @try_litefold on dropping the biggest protein data collection on Hugging Face
We have released the biggest protein data collection on Hugging Face, guys!
We have been working on this for more than 3 weeks now, starting from curating the raw data, doing a lot of filtering, splitting the datasets, sharding them, and doing a lot of analysis. Everything is
We have released the biggest protein data collection on Hugging Face, guys!
We have been working on this for more than 3 weeks now, starting from curating the raw data, doing a lot of filtering, splitting the datasets, sharding them, and doing a lot of analysis. Everything is
We have released the biggest protein data collection on Hugging Face, guys!
We have been working on this for more than 3 weeks now, starting from curating the raw data, doing a lot of filtering, splitting the datasets, sharding them, and doing a lot of analysis. Everything is
LLMs got FineWeb, The Pile, RedPajama, Dolma. Protein ML got per-paper supplementary tables and FTP mirrors scattered across a dozen institutions.
Today we're releasing AminoWeb on @huggingface : 29 cleaned, ML-ready protein datasets, ~7.5 TB total. Sequence, structure,
LLMs got FineWeb, The Pile, RedPajama, Dolma. Protein ML got per-paper supplementary tables and FTP mirrors scattered across a dozen institutions.
Today we're releasing AminoWeb on @huggingface : 29 cleaned, ML-ready protein datasets, ~7.5 TB total. Sequence, structure,
LLMs got FineWeb, The Pile, RedPajama, Dolma. Protein ML got per-paper supplementary tables and FTP mirrors scattered across a dozen institutions.
Today we're releasing AminoWeb on @huggingface : 29 cleaned, ML-ready protein datasets, ~7.5 TB total. Sequence, structure,
We have released the biggest protein data collection on Hugging Face, guys!
We have been working on this for more than 3 weeks now, starting from curating the raw data, doing a lot of filtering, splitting the datasets, sharding them, and doing a lot of analysis. Everything is
We have released the biggest protein data collection on Hugging Face, guys!
We have been working on this for more than 3 weeks now, starting from curating the raw data, doing a lot of filtering, splitting the datasets, sharding them, and doing a lot of analysis. Everything is
today was a massive day for protein engineering.
esmfold2 dropped—next gen of the esm series, fully open on @huggingscience. 1.1 billion predicted structures, 6.8 billion sequences. 800m more entries than the alphafold db, and reportedly edging out alphafold3 on protein complexes, including antibody–antigen binding.
alongside it: the new esm atlas. a huge expansion of known protein space, heavy on metagenomic sequences from soil, ocean, and the parts of biology that have been least characterised (until now!!)
and if that weren't enough, litefold dropped the fineweb of proteins, so every major protein database (pdb included) aggregated, cleaned, and made plug-and-play in one place.
these are the releases that push the whole field forward, and the pace of open science right now is almost motion-sickness inducing
all of it on huggingscience.co (and ofc @huggingface)
today was a massive day for protein engineering.
esmfold2 dropped—next gen of the esm series, fully open on @huggingscience. 1.1 billion predicted structures, 6.8 billion sequences. 800m more entries than the alphafold db, and reportedly edging out alphafold3 on protein
We have released the biggest protein data collection on Hugging Face, guys!
We have been working on this for more than 3 weeks now, starting from curating the raw data, doing a lot of filtering, splitting the datasets, sharding them, and doing a lot of analysis. Everything is
LLMs got FineWeb, The Pile, RedPajama, Dolma. Protein ML got per-paper supplementary tables and FTP mirrors scattered across a dozen institutions.
Today we're releasing AminoWeb on @huggingface : 29 cleaned, ML-ready protein datasets, ~7.5 TB total. Sequence, structure, function, MSA, variant-effect, stability, binding. UniProt, PDB, AlphaFoldDB, ESMAtlas, ProteinGym, MegaScale, Protenix, and more.
Typed Parquet. Homology-aware splits. Preserved score conventions. Full provenance per record.
Protein ML scaled architectures for years while the data layer stayed fragmented. We've also shared the full curation pipeline, case studies, and observations in the companion blog post.
Access the data: huggingface.co/LiteFold
Read the release blogpost: litefold.ai/blog/aminoweb
1K Followers 3K FollowingWeb3 Product Manager and Founder. Tokenomics Lead @denariilabs. FMR Head of Product @knightsofdegen; Co-Founder @leaguedao;. Southwestern Law Alumni.
934 Followers 4K Following#Entrepreneur, Cancer Research Scientist, Computational Biologist, Immuno-Oncology, CAR T Cells, Gut Microbiome, Nanopore Sequencing Pathogen Detection & AMR
349 Followers 728 FollowingPostdoc @BiochemOxford. PhD from @ShirtsGroupCU. Keen on compchem, deep learning & education. Rookie runner. Originally from Taiwan.