Common Crawl Foundation @CommonCrawl
Common Crawl is a non-profit foundation dedicated to the Open Web. commoncrawl.org San Francisco, CA Joined February 2010-
Tweets1K
-
Followers7K
-
Following2K
-
Likes525
We are excited to announce the release of an @MLCommons AI Safety benchmark POC. Built through an inclusive decision-making and engineering process, the POC validates our approach to a v1.0 AI Safety benchmark suite. Learn more: mlcommons.org/2024/04/mlc-ai… #AI, #benchmarks
Are you a fan of both Common Crawl and Discord? If so, join the Common Crawl Foundation's new Discord server! discord.com/invite/njaVFh7…
Reupping this for California folk because I posted this before you all woke up this morning. My analysis of the California Journalism Preservation Act: drive.google.com/drive/folders/…
Reupping this for California folk because I posted this before you all woke up this morning. My analysis of the California Journalism Preservation Act: drive.google.com/drive/folders/…
Today, Gen AI Commons launches the LF AI & Data Outreach Survey which is designed to help us better understand the community’s insights and perspectives on #opensource and #generativeAI.✔️ 🔗 Learn more and take the survey: hubs.la/Q02pgQG10
Common Crawl Foundation is happy to join NIST's new U.S. AI Safety Institute Consortium in support of efforts to create safe and trustworthy AI. Learn more: nist.gov/artificial-int… x.com/nist/status/17…
Common Crawl Foundation is happy to join NIST's new U.S. AI Safety Institute Consortium in support of efforts to create safe and trustworthy AI. Learn more: nist.gov/artificial-int… x.com/nist/status/17…
Common Crawl is hiring! linkedin.com/jobs/view/3813…
Had a great time chatting with @jasonhowell and @jeffjarvis on their new podcast @AIInsideShow about @CommonCrawl, AI, and the Right to Learn - even for machines aiinside.show/episode/ai-pos… youtube.com/watch?v=VTEdIk…
Tomorrow is the PREMIERE episode of @AIInsideShow! Live stream kicks off Wednesday, January 24 at 11 am PT/2 pm ET. @jeffjarvis and I welcome @skrenta from Common Crawl to the show! Subscribe: aiinside.show Support: patreon.com/aiinsideshow Live: youtube.com/watch?v=VTEdIk…
Great interview about open data with Common Crawl's @pjox13 and Thom at the recent Paris AI-Pulse conference #Scaleway #aiPulse youtube.com/watch?v=hmlWY2…
@jeffjarvis @NJNewsCommons Watch Jeff's full opening remarks here: youtu.be/tX26ijBQs2k
Here are the remarks without a wall on Buzzmachine: buzzmachine.com/2024/01/09/jou…
Common Crawl Foundation is proud to benefit from generous support from DuckDuckGo. From all of us at CCF, and on behalf of all of the users of our dataset, thank you!
Common Crawl Foundation is proud to benefit from generous support from DuckDuckGo. From all of us at CCF, and on behalf of all of the users of our dataset, thank you!
After the US holiday weekend, starting with the European morning, our aggressive downloaders have returned. Please see our status page for the details of maximizing download speeds despite 503 Slow Down errors. status.commoncrawl.org
Jeremy Howard @jeremyphoward
222K Followers 5K Following 🇦🇺 Co-founder: @AnswerDotAI & @FastDotAI ; Hon Professor: @UQSchoolITEE ; Digital Fellow: @StanfordAntonio García Mart�.. @antoniogm
170K Followers 13K Following Founder @spindl_xyz. Wrote 'Chaos Monkeys' (https://t.co/LHo7HbnpNa). Wearer of many hats. גם זה יעבור 🇺🇲🇪🇸Michael Nielsen @michael_nielsen
96K Followers 6K Following Searching for the numinous 🇦🇺 🇨🇦, home in 🇺🇸 Research @AsteraInstitute https://t.co/maezekzRUbMichael L. Nelson @phonedude_mln
2K Followers 976 Following Professor: @WebSciDL, @ODUcs, @ODUVMASC (2002-now); Engineer: @NASA_Langley (1991-2002); Postdoc: @UNCSILS (2000-2001)Peter Wang 🦋 @pwang
48K Followers 2K Following Chief AI & Co-founder @AnacondaInc; invented @pyscript_dev, @PyData @Bokeh @Datashader. Former physicist. A student of the human condition. bsky: @wang.social👩💻 Paige Bai.. @DynamicWebPaige
59K Followers 2K Following ✨Keep it simple, make it scale. AI should be about empowering people, building understanding, & making dreams realities. 👩💻GenAI @GoogleDeepMind ex-@GitHubAshkhen Kazaryan @Ashkhen
3K Followers 3K Following Tech law & policy. Football. Russia raised, Armenian blood, DC living. Views are just mine, in case you were wondering. She/her/hersMert Güvençli @mertguvencli
880 Followers 1K Following I like building things. 📊 https://t.co/TlfsRJ99O4 📠 https://t.co/the9hFH9v1 🦦 https://t.co/a3OjIwgnhJวิไลวาว.. @QLdxpBE5mkbM3oM
52 Followers 1K Following คุณต้องการนัดเดทกับสาวไหมคะ เพิ่ม https://t.co/yegH2yKm3dMiguelángel Verde Ga.. @M_Verde
1K Followers 5K Following Senior Editorial Project Manager, @WikimediaPolicy | Global politics and emerging technologies (Dr. rer. pol.) | Opinions are my ownMayank Singh @geekmarcus
394 Followers 147 Following PhD student at UofA | NLP, History, Astronomy, and more | Forbes 420 under 420Super Dtp @dtp_super
148 Followers 3K Following Super DTP" offers Multilingual Desktop Publishing (DTP), E-Learning, and PDF Remediation Services to localization companies and #translation agencies worldwide.DTP Labs @LabsDtp
448 Followers 3K Following ‘DTP LABS’ offers premium multilingual Desktop Publishing (DTP), multimedia engineering, and e-learning #localization services.Lea Mary @LeaMary03
1 Followers 41 FollowingAbdulrahman Tabaza @embed_dim
3 Followers 771 Following enjoyer of various vector spaces, encoders and modalitiesStephen Morgenstern @smorgenstern_
96 Followers 2K Following @Wharton '15 | Ex-Scotia Capital | Ex Machina fan | Film finance/producing - into info asymmetry & Getty watermarks | 3x NYT Bestseller PurchaserClint J. @SearchDataEng
618 Followers 992 Following 🔍 | Data , LLM , & Search Engineer. 💼 | Seeking new opportunities! ॐEduardo Vieira ☀️ @eduardoenemark
165 Followers 4K Following I'm human programmer. +1 curioso da Computação.Pallab Mahato @PallabMahato145
7 Followers 89 FollowingGyana.dev 🐦𝕏 @gyanaranjan_dev
52 Followers 382 Following Senior frontend developer github - https://t.co/XsaCe2sHwXVedh kumar @Vedhkumar2
11 Followers 131 FollowingVaibhav P Singhal @vaibhv22
4 Followers 93 Following Backend dev @ Atlys | Prev: Gojek, Amazon | Enthusiastic about Web3 & Distributed systems | AI ExperimenterAshish Arora @AshishArora0077
34 Followers 699 Following Software Engineer @ InterviewBit Scaler AcademyHimanshu Kholiya @himanshkholiya
4 Followers 61 FollowingGoldyy @Harzit_Sharma
20 Followers 124 Following Wannabe Techie | 0xdev | Cohort 2.0 Memes & espresso shot ☕Rahul Kohli @RahulKo05817259
2 Followers 81 FollowingAaryamaan. R @r_yamaan
0 Followers 14 FollowingHarika @Harika3011
0 Followers 16 FollowingHabeel Shamsudeen @Habee1_
6 Followers 86 Following Passionate Computer Science Engineering Student | Coding Enthusiast | Full Stack Developer in the Making | Open to Exciting OpportunitiesAakash Mahajan @AakashM_25
2 Followers 90 Followingthatawkwardguy @thatawkwar87774
3 Followers 67 Following just my anonymous account to talk about tech and my doubtsNitin Srivastava @Nitin_dev3d
42 Followers 219 Following It is all about creating. A wannabe Indie Hacker | Frontend Dev @Appinventiv.Divyansh Singh @Divyanshsingh_7
1 Followers 68 FollowingNLPww @nl_pww9821
8 Followers 100 FollowingNathalie @Noelletennis
601 Followers 323 Following F1, tennis, surf, cinema, TV series, music, technology, literature.Mevoov 👎 @downvoteguy
688 Followers 4K Following Nothing specific in bio, because I don't want to limit myself from learning and sharing diverse things 😊 #undergrad🎓 22😎Khushal Mali @Khushal_8448
68 Followers 240 Following Full Stack Engineer | MERN Stack Specialist | Driving End-to-End Product Development. #WebDev #Nextjs #React #ReactnativeHemant Budhe @Hemant_Budhe
1 Followers 55 FollowingParth Kacha @ParthKacha_
6 Followers 68 FollowingAnkan Banerjee @AnkannBan
12 Followers 99 Following Freshman || JAVA || Python || Web dev enthusiast || Learnersvvk13 @svvk_13
7 Followers 145 Following 🇮🇳 | Developer | Books 📚 | कर्मण्येवाधिकारस्ते मा फलेषु कदाचनSaleH Mohamed @SZS_7_
1 Followers 67 Following I work and learn in the field of cybersecurity because learning does not stop at a specific timeAnu @Anu928642059649
0 Followers 34 FollowingAndrew Curran @AndrewCurran_
11K Followers 7K Following Atypically Friendly - I write about AI and human creativity. Will periodically make extremely unusual arguments.Anshul Kahar @AnshulKahar2729
78 Followers 387 Following Full Stack Developer 🚀 | MERN Stack | Open-SourceYann LeCun @ylecun
711K Followers 718 Following Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.Bojan Tunguz @tunguz
187K Followers 8K Following Machine Learning ex Nvidia. Kaggle Quadruple Grandmaster. Data Scientist. Physicist. Catholic. Husband. Father. Stanford Alum. e/xgb. XGBoost.eth. AMDG.Andrew Ng @AndrewYNg
1.0M Followers 912 Following Co-Founder of Coursera; Stanford CS adjunct faculty. Former head of Baidu AI Group/Google Brain. #ai #machinelearning, #deeplearning #MOOCsJason Scott @textfiles
48K Followers 707 Following Proprietor of https://t.co/sdyjXHCZF7, historian, filmmaker, archivist, storyteller. Works on/for the Internet Archive. Rank Amateur.GitHub @github
2.6M Followers 341 Following The AI-powered developer platform to build, scale, and deliver secure software.Jeremy Howard @jeremyphoward
222K Followers 5K Following 🇦🇺 Co-founder: @AnswerDotAI & @FastDotAI ; Hon Professor: @UQSchoolITEE ; Digital Fellow: @StanfordPercy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistInternet Archive @internetarchive
321K Followers 1K Following Internet Archive is a non-profit digital library offering millions of free books, movies & audio files, plus billions of saved web pages in the @waybackmachine.Michael Nielsen @michael_nielsen
96K Followers 6K Following Searching for the numinous 🇦🇺 🇨🇦, home in 🇺🇸 Research @AsteraInstitute https://t.co/maezekzRUbMichael L. Nelson @phonedude_mln
2K Followers 976 Following Professor: @WebSciDL, @ODUcs, @ODUVMASC (2002-now); Engineer: @NASA_Langley (1991-2002); Postdoc: @UNCSILS (2000-2001)Peter Wang 🦋 @pwang
48K Followers 2K Following Chief AI & Co-founder @AnacondaInc; invented @pyscript_dev, @PyData @Bokeh @Datashader. Former physicist. A student of the human condition. bsky: @wang.socialJimmy Lin @lintool
13K Followers 842 Following I profess CS-ly at the @UWaterloo and gaze into the technological crystal ball at @Primal. I used to write code for @Twitter and slides for @Cloudera.👩💻 Paige Bai.. @DynamicWebPaige
59K Followers 2K Following ✨Keep it simple, make it scale. AI should be about empowering people, building understanding, & making dreams realities. 👩💻GenAI @GoogleDeepMind ex-@GitHubChris Albon @chrisalbon
86K Followers 2K Following Director of Machine Learning at the Wikimedia Foundation. We host Wikipedia.clem 🤗 @ClementDelangue
91K Followers 5K Following Co-founder & CEO @HuggingFace 🤗, the open and collaborative platform for AI buildersWojciech Galuba @wgaluba
490 Followers 1K Following Head of Data & Evals @Cohere | prev: Research Eng Lead @MetaAI | founded @Meta’s A/B testing platform and the AI annotation platform | @ICepfl alumnusxAI @xai
997K Followers 36 FollowingIgor Babuschkin @ibab
44K Followers 684 Following Maybe the real AGI was the friends we made along the way. @xAIInternet Ethics @IEthics
10K Followers 7K Following The Internet Ethics program at the Markkula Center for Applied Ethics, Santa Clara University / Irina Raicu behind the keyboardAI Inside @AIInsideShow
153 Followers 6 Following Each week, @jasonhowell and @jeffjarvis talk with the world's biggest names in AI about the trends, advances, and pitfalls of this rapidly evolving technology.Jason Howell @jasonhowell
36K Followers 950 Following Tech Podcaster: Techsploder, AI Inside, Android Faithful. Formerly @TWiT @CNET. Musician (Yellowgold). He/him.Joe Amditis @jsamditis
2K Followers 1K Following products and events @centercoopmedia + producer @wtfjht + adjunct @montclair_scm (he/him)Moham AT @Dev_num0
174 Followers 593 Following AI Engineer who develops Agents | EX Electrical Engineer | Curious about Intelligence & human brain | Founder of @NowyselfChris Sprigman @CJSprigman
7K Followers 1K Following Murray and Kathleen Bring Professor, NYU Law. IP, antitrust, comparative con law. https://t.co/QQwEfNFu0L Member, Lex Lumina PLLC. https://t.co/smcuKKCQPzRobert Scoble @Scobleizer
504K Followers 68K Following Follow me on my new podcast with AI startups, Unaligned. Tech industry color commentator since 1993. Author/Blogger. Former strategist @Microsoft.Ben Mullin @BenMullin
29K Followers 8K Following @nytimes media reporter | 530-961-3223 | [email protected]Dimitra (Demi Tsav) T.. @dimitratsav
53 Followers 134 Following CEO of TradeQuad, an AI company: building the custom AI tools of your trade, for managing your projects, customers, people and operations. @tradequadJeff (Gutenberg Paren.. @jeffjarvis
173K Followers 5K Following @BuzzMachine; prof @CUNY's @NewmarkJSchool; books: THE GUTENBERG PARENTHESIS & MAGAZINE: https://t.co/GJXpfuvxUg @[email protected]Eiso Kant @eisokant
7K Followers 1K Following Co-founder & CTO @poolsideai w/ @jasoncwarner “The best way to predict the future is to invent it.” - Alan Kay Prev: Athenian & source{d}OpenForum Europe @OpenForumEurope
3K Followers 1K Following Working to achieve an open, competitive EU ICT market #OpenSource #OpenStandards #OpenInnovationStanford Internet Obs.. @stanfordio
15K Followers 89 Following The Stanford Internet Observatory is a cross-disciplinary program studying the abuse of the Internet and providing thoughtful policy and technical solutions.Aleph Alpha @Aleph__Alpha
7K Followers 2 Following Our mission is a European generalizable AI. We're hiring: https://t.co/TSKL1fbwe0 #AGI, #artificialintelligence, #writtenbyahuman,#writtenbyanAIJürgen Schmidhuber @SchmidhuberAI
107K Followers 0 Following Invented principles of meta-learning (1987), GANs (1990), Transformers (1991), very deep learning (1991), etc. Our AI is used many billions of times every day.Mozilla @mozilla
279K Followers 4K Following We work to ensure the internet remains a public resource that is open and accessible to all. The nonprofit behind @Firefox. #BlackLivesMatterTogether AI @togethercompute
27K Followers 303 Following The future of AI is open-source. Let's build together.davidlee @davidlee
61K Followers 3K Following Head @samsungnext. Prior: @google, @stumbleupon (pre-eBay!), @svangel, @refactor. Tweets/RT are mine.a16z @a16z
763K Followers 47 Following we invest in software eating the world https://t.co/A9eTFq6Xbx https://t.co/MXGUBJoMi4 Sign up for our newsletters: https://t.co/vkcLgyb2qXMillion Short @MillionShort
364 Followers 495 Following Imagine a search engine that simply removed the top 1 million most popular web sites from its index. What would you discover?Pedro Ortiz Suarez @pjox13
634 Followers 791 Following Senior Research Scientist at the Common Crawl Foundation. Weird coffee person ☕️, runner 🏃🏻♂️. (he/him) 🇫🇷🇪🇺🇨🇴Andrii @AirIngener
124 Followers 358 FollowingHaxsys @Haxsyss
6 Followers 11 FollowingDaniel Griffin @danielsgriffin
1K Followers 4K Following building tools for exploring & evaluating search engines @ARCHIGNES: https://t.co/ulSzjhTBAa https://t.co/Ovq4esQpADGranitzer Michael @mgrani
379 Followers 156 Following Interested on everything that makes machines smarter when they interact with humans == kdd, visualisation, information retrieval, machine learning ....OpenWebSearch.eu @ope.. @OpenWebSearchEU
239 Followers 198 Following https://t.co/aVqK70nocd initiative – Promoting Europe's Independence in Web Search Funded by the #EU Horizon programme under grant agreement No 101070014.kipply @kipperrii
8K Followers 824 Following "mischievous yet harmless" - claude opus | alt @kipperriiii | rgDavid Kanter @TheKanter
4K Followers 199 Following Executive Director @MLCommons making machine learning better for everyone. @MLPerf CPUs, computer architecture, semiconductors, graphics, economics, writes @RWTIntuitive Machines @Int_Machines
94K Followers 377 Following We open access to the Moon for the progress of humanity.Anthropic @AnthropicAI
261K Followers 26 Following We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant Claude at https://t.co/aRbQ97uk4d.ICML Conference @icmlconf
70K Followers 17 Following Int'l Conf on ML • July 21-27, 2024 (Vienna, Austria) • #icml2024 • Contact: https://t.co/6saHKWV01y • https://t.co/sFwmcQNWkEMarc Laporte @MarcLaporte
869 Followers 2K Following @WikiSuite Founder + @EvoluData CEO: @TikiWiki, @Virtualmin, @CyphtWebmail, @ManticoreSearch, @Syncthing, @RubixML, @MeshCentral, @JitsiNews @XiboSignage, etc.We are excited to announce the release of an @MLCommons AI Safety benchmark POC. Built through an inclusive decision-making and engineering process, the POC validates our approach to a v1.0 AI Safety benchmark suite. Learn more: mlcommons.org/2024/04/mlc-ai… #AI, #benchmarks
This is actually pretty funny. Now I understand why @tweetbaack told me that @CommonCrawl manually intervenes in cases like this.
No! We will be in the LLMs. Immortal, in a sense.
Hundreds, thousands of years after they died we have some peoples letters and stories. We digital people won’t leave a trace, except in the ether
Reupping this for California folk because I posted this before you all woke up this morning. My analysis of the California Journalism Preservation Act: drive.google.com/drive/folders/…
Today I'm releasing a major paper on California's Journalism Preservation Act (& its federal cousin, JCPA): its weaknesses; the history of news & copyright; newspapers' long history of fighting new technologies & competitors--and alternative solutions. 1/ drive.google.com/file/d/1HHcDuU…
Someone asked me recently why the Paris AI ecosystem is so 🔥 these days. On the surface, it looks like Paris became a major #AI center overnight, but it didn't. It took time and didn't appear by accident. Here is a story that started more than 10 years ago. 👇
Ran the Paris Marathon yesterday. It was an amazing experience. Getting into running was probably the best decision I’ve made in recently. It has helped massively with both physical and mental health. I highly recommend any type of physical activity, especially for researchers 🏃🏻♂️
I've got an update... I'm launching a new podcast. Want early access? DM me or reply below with "episode #1".
🚀 Exciting Announcement! Introducing the #HPLT language resources – a massive multilingual dataset from @CommonCrawl & @internetarchive, featuring monolingual & bilingual corpora. Our collection spans 75 languages with ≈5.6 trillion word tokens! 🌐 #LLMs #NLP
We will be presenting the HPLT datasets HOW-TO and insights at @LrecColing in Torino. Paper already in Arxiv.org: arxiv.org/pdf/2403.14009….
Following the release of the main crawl, the Web Graphs for September/October, November/December 2023 and February/March 2024 are out! 🥳🚀🕸️ Once again, doing this with my colleague @thomvaughan was an amazing learning experience. 🤓 Let us know if you have any feedback! 📝
We all share responsibility for building AI that improves lives & unlocks a better future for humanity. @svangel makes this pledge, and we are proud to initiate this Open Letter: Build AI for a Better Future. Please join @OpenAI, @Meta, @Google, @ycombinator, @huggingface,…
Donating $20k to e/acc causes Who's doing interesting work? Please tag / link in comments (must be 501c3) cc @BasedBeffJezos
Don't miss episode 1! Here, @skrenta talks about the potential consequences of publishers' decisions to remove their content from data collections like that of @CommonCrawl. Subscribe: aiinside.show/episode/ai-pos……Watch: youtube.com/watch?v=VTEdIk… Support: patreon.com/aiinsideshow
Had a great time chatting with @jasonhowell and @jeffjarvis on their new podcast @AIInsideShow about @CommonCrawl, AI, and the Right to Learn - even for machines aiinside.show/episode/ai-pos… youtube.com/watch?v=VTEdIk…
Tomorrow is the PREMIERE episode of @AIInsideShow! Live stream kicks off Wednesday, January 24 at 11 am PT/2 pm ET. @jeffjarvis and I welcome @skrenta from Common Crawl to the show! Subscribe: aiinside.show Support: patreon.com/aiinsideshow Live: youtube.com/watch?v=VTEdIk…
I am using Common Crawl data to analyze tech stacks of websites for my new SaaS, and it's been a game changer. Huge thanks to the amazing @CommonCrawl team for their incredible resources! 🙌 #buildinpublic
As ever, I crosspost on my blog: buzzmachine.com/2024/01/11/in-…
My report after testifying in the Senate about AI and journalism, with discussion of fair use, JCPA, Section 230, deep fakes, and bad poetry: medium.com/whither-news/i…
@jeffjarvis @NJNewsCommons Watch Jeff's full opening remarks here: youtu.be/tX26ijBQs2k
Here are the remarks without a wall on Buzzmachine: buzzmachine.com/2024/01/09/jou…
Common Crawl. It’s all you need.