apachehudi @apachehudi
Official twitter handle of Apache Hudi. We marry stream processing to petabytes of data. https://t.co/Ka1NABVHlw hudi.apache.org Joined January 2019-
Tweets412
-
Followers3K
-
Following134
-
Likes246
Building a near Real-time Lakehouse with Apache Hudi using AWS Stack. Real-time data analytics on operational data is increasingly becoming a standard requirement. A 🧵
Bin Packing Algorithm for "Small File" Issue in Lakehouses. Small File problem is one of the critical problems in a data lake that impacts query performance when reading files using compute engines. The problem occurs when writing data in smaller chunks 🧵
Query Optimization with 'Clustering' in Apache Hudi. Today I presented how the clustering service in Hudi makes a huge impact on the overall query perf. To highlight the difference, I ran the same query using Presto once before clustering & after in a 1 TB TPC-DS dataset.
A few months back, I started this 10-post blog series: @apachehudi from Zero to One, with a goal to give a comprehensive deep-dive of Hudi designs. Happy to share the last post today: (10/10) Becoming "One" - the upcoming 1.0 highlights #apachehudi open.substack.com/pub/datumagic/…
Super excited to bring the Monthly Hudi newsletter to the community! 🎉 There's just so much momentum happening with Apache Hudi & the overall lakehouse space that we needed to bring this to one consolidated place! Link: hudinewsletter.substack.com/p/hudi-newslet…
Have you checked out our YouTube yet? It has some of the amazing videos from our Community sync and Hudi Live sessions! ✅ Notion’s journey through different stages of data scale ✅ Shaping a Database Experience within Data Lakes with Apache Hudi Link: youtube.com/@apachehudi
Have general Hudi quesitons? Wonder about Hudi's best practices or tips for troubleshooting? We are happy to start hosting additional 1-1 office hours every week! Book it now at calendly.com/apache-hudi/of…
Join us tomorrow to learn more about @doris_apache & Apache Hudi's integration! 🗓️ 13th March 2024 | 8 AM PT | 11 AM ET
Catch the recap of the #apachehudi Community Sync with Daniel Ford on YouTube! You can follow his journey with Amazon EMR and Apache Hudi! youtu.be/P29dfaxUdTU #dataengineering #community
Leonard Xu @Leonardxbj
2K Followers 699 Following Flink PMC Member & Flink CDC Lead, Flink Connector TL @alibaba_cloud, focus on Streaming SQL & Data IntegrationGwen (Chen) Shapira @gwenshap
26K Followers 9K Following Co-founder of @niledatabase. Making SaaS global, elastic and chill. Find me at: https://t.co/uyuHg400cpOnehouse @Onehousehq
915 Followers 98 Following Onehouse is the universal data lakehouse, offering a cloud-native managed lakehouse built on @apachehudi, accessible across table formats, engines and clouds.Jacek Laskowski @jace.. @jaceklaskowski
7K Followers 874 Following Freelance Data Engineer | #ApacheSpark #DeltaLake #Databricks #ApacheKafka #KafkaStreams | Java Champion | @theASF | #DatabricksBeaconsMim @mim_djo
9K Followers 3K Following #Fabric Enthusiast, Small Data And self service, #Microsoftemployee since Nov 2023 , but my tweets are my ownRobin Moffatt 🍻�.. @rmoff
10K Followers 661 Following DevEx Engineer at @Decodableco. Doing fun stuff with data and open source. 🌐 https://t.co/WparjfmCF5 🔗 Mastodon: @[email protected]Vinoth Chandar @byte_array
1K Followers 236 Following Founder @Onehousehq, Creator of @apachehudi. Distributed/Data Systems, Linkedin, Uber, Confluent alum. (views are mine)ABC @Ubunta
3K Followers 3K Following Data & ML Infrastructure for Healthcare https://t.co/FwocCiCQAT Opinions are पड़ोसी' In 🇩🇪Berlin from 🇮🇳Kolkata/छत्तीसगढ़Eric Sammer @esammer
13K Followers 715 Following ceo at @decodableco! prev: @splunk, @rocanainc (acq'd), @cloudera. open source / dist systems / data. o'reilly author. [email protected]Trino @trinodb
5K Followers 92 Following Distributed SQL query engine for big data, formerly known as PrestoSQL🕺💃🤟 Alexande.. @emaxerrno
4K Followers 2K Following Founder & CEO of @RedpandaData - A Kafka® replacement for mission critical systems. 10x Faster; Safe; API compatible. 🇨🇴Decodable @Decodableco
3K Followers 2K Following Decodable is a serverless real-time data platform built on #ApacheFlink. No clusters to set up. No code to write. No PhD required.Apache - The ASF @TheASF
67K Followers 211 Following Official feed: The Apache Software Foundation. The world's largest Open Source foundation provides $22B+ worth of software for the public good at 100% no cost.Alex Merced | Open Da.. @AMdatalakehouse
855 Followers 2K Following Developer Advocate at Dremio helping get the word about disruptive Open Data Lakehouse technology using best-in-breed tools like Dremio.Tim Spann @PaaSDev
4K Followers 5K Following Principal Developer Advocate 🥑 Cloudera https://t.co/ZpBW3t3IQN #NiFi xPivotal #Flink #Kafka #FLaNK xStreamNative 🐈⬛ 🇺🇦 https://t.co/lKExpMlKcuApache SeaTunnel @ASFSeaTunnel
481 Followers 104 Following A distributed, high-performance data integration platform for the synchronization and integration of massive data. Medium:https://t.co/CCcPqCHccqGary A. Stafford @GaryStafford
3K Followers 5K Following Area Principal Solutions Architect @AWSCloud | AWS Analytics Technical Field Community | 10x AWS Certified Pro | Former @ThoughtWorks & @AccentureStreamNative @streamnativeio
2K Followers 29 Following StreamNative was founded by the original creators of Apache Pulsar and offers a fully managed Pulsar solution.TENTANANO @tentanano
5 Followers 105 FollowingMike Caine @mikey_caine
195 Followers 5K FollowingCraig K @CKSolnEngineer
4 Followers 77 FollowingMohan Rajendran @MohanRajendran
72 Followers 2K Following Engineering@Amazon Photos | Previously Amazon Adskevinprice41 @k3v1nPr1c3
10 Followers 18 FollowingMauro Reinehr @MauroReinehr
59 Followers 930 FollowingAlejandro Duarte @alejandro_du
3K Followers 797 Following #Java #SQL #Programming #RaspberryPi #Vaadin #MariaDB #DevRel Published Author · Software Engineer · Developer Relations Engineer at MariaDBEldrid Rensburg @EldridRensburg
22 Followers 1K Following In the beginning, the Universe implemented Unix (Linux) & C (C with Classes) & said: let there = vars & saw that it was good . . ¯¯\_(ツ)_/¯¯ . . ʕつ•ᴥ•ʔつdinesh kr anand @DineshAnand30d
336 Followers 4K FollowingRolandas Ziukevicius @rolandaszz
26 Followers 827 FollowingKazimir Lyshchynski @k_lyshchynski
1 Followers 397 Following Individuum over socium, discipline over individuumKaiming @ AutoMQ @wan0573
40 Followers 451 Following Architect & Lead Evangelist @AutoMQ_lab. Formerly lead CDC Platform @alibaba_cloud & co-founder @CloudCanal. Interested in data streaming & CDC.vinayde @vinayde
40 Followers 317 Followingjiwen liu @jiwen_liu57664
0 Followers 4 FollowingDai Mars @DaiMars3306
27 Followers 91 FollowingData Mentor @mentor_data
9 Followers 24 FollowingMadhusudhanan Vri @MadhusudhananVr
5 Followers 31 Following高级码农 @9OQ3QlckSsA5i1F
30 Followers 374 FollowingKeith Kraus @keithjkraus
1K Followers 1K Following CTO and Co-Founder @VoltronData, @RAPIDSAI maintainer, @condaforge core. Previously @NVIDIA. My thoughts are my own.Raxit @raxit65535
28 Followers 186 Following well I don't think that much About my self. there are lots of other interesting topics & problems to invest time in.Venkateshkumar Siva @venkystweet
33 Followers 395 Following Bharat🇮🇳 Proud Farmer👨🏻🌾 CSKian🦁 Java Developer💻 #KonguNadu🏹🦚🐂🐏🐓🐅HomeTV @HomeYoutv
224 Followers 4K Following MSM Music Advertising Joint Stock Company  QUẢN LÝ CÔNG VIỆC & DỰ ÁN THỜI ĐẠI 4.0 Đưa ra những sản phẩm ứng dụng tối ưu cho doanh nghiệp, tăng hiệu quả quảnTariq @TariqBhaiyya
73 Followers 81 FollowingJam x @ManyiXu45456
100 Followers 1K Following Artists,legal scholar, cs @xmotz, AI researcher @xmotzKAI @kaisai121
159 Followers 3K FollowingKrit Pragobdee @KPragobdee
5 Followers 27 FollowingAbhishek G @AbhishekG26997
8 Followers 144 Followingdiscoveree @discovereeee
49 Followers 257 FollowingRaymond Kemonde @RaymondKemonde
561 Followers 2K Following Techie, Avid-reader, Writer, Witty and Leader. #stablegenius DBA ~ Databases, Servers, Storage.Abhi Singh @1abhi_singh
2 Followers 61 FollowingArun @arniekvr
2 Followers 45 FollowingJamesJWagner @JamesJWagner
244 Followers 932 Following Daddy. Pappa. Worshipper. Percusionist. Biker. Chef. Software Architect, Team Valor (Jax1320), Hawaiian at heart. #ActuallyAutistic #MauiRaider #HearUsNianticGunnar Morling 🌍 @gunnarmorling
51K Followers 302 Following Software engineer @Decodableco · Ex-lead of Debezium · Spec lead of Bean Validation 2.0 · Creator of JfrUnit, kcctl and MapStruct · Java Champion · 🚴Leonard Xu @Leonardxbj
2K Followers 699 Following Flink PMC Member & Flink CDC Lead, Flink Connector TL @alibaba_cloud, focus on Streaming SQL & Data IntegrationGwen (Chen) Shapira @gwenshap
26K Followers 9K Following Co-founder of @niledatabase. Making SaaS global, elastic and chill. Find me at: https://t.co/uyuHg400cpOnehouse @Onehousehq
915 Followers 98 Following Onehouse is the universal data lakehouse, offering a cloud-native managed lakehouse built on @apachehudi, accessible across table formats, engines and clouds.Jacek Laskowski @jace.. @jaceklaskowski
7K Followers 874 Following Freelance Data Engineer | #ApacheSpark #DeltaLake #Databricks #ApacheKafka #KafkaStreams | Java Champion | @theASF | #DatabricksBeaconsMim @mim_djo
9K Followers 3K Following #Fabric Enthusiast, Small Data And self service, #Microsoftemployee since Nov 2023 , but my tweets are my ownRobin Moffatt 🍻�.. @rmoff
10K Followers 661 Following DevEx Engineer at @Decodableco. Doing fun stuff with data and open source. 🌐 https://t.co/WparjfmCF5 🔗 Mastodon: @[email protected]Vinoth Chandar @byte_array
1K Followers 236 Following Founder @Onehousehq, Creator of @apachehudi. Distributed/Data Systems, Linkedin, Uber, Confluent alum. (views are mine)ABC @Ubunta
3K Followers 3K Following Data & ML Infrastructure for Healthcare https://t.co/FwocCiCQAT Opinions are पड़ोसी' In 🇩🇪Berlin from 🇮🇳Kolkata/छत्तीसगढ़Trino @trinodb
5K Followers 92 Following Distributed SQL query engine for big data, formerly known as PrestoSQLApache - The ASF @TheASF
67K Followers 211 Following Official feed: The Apache Software Foundation. The world's largest Open Source foundation provides $22B+ worth of software for the public good at 100% no cost.Gary A. Stafford @GaryStafford
3K Followers 5K Following Area Principal Solutions Architect @AWSCloud | AWS Analytics Technical Field Community | 10x AWS Certified Pro | Former @ThoughtWorks & @AccentureStreamNative @streamnativeio
2K Followers 29 Following StreamNative was founded by the original creators of Apache Pulsar and offers a fully managed Pulsar solution.Nilesh Mahajan @nilesh_mahajan
373 Followers 128 Following Micro-SAAS founder, Engineer and Writer. Building @walkthrough_so now. Former @uber @ebayAmazon Web Services @awscloud
2.2M Followers 465 Following The official account for Amazon Web Services (#AWS). ☁️ For help, please contact: @AWSSupportDipankar Mazumdar🥑 @Dipankartnt
1K Followers 528 Following Staff Data Engineering Advocate @OnehouseHQ, prev DevRel @Dremio, R&D @Qlik, Data @OtisElevatorCo | Author (O’Reilly) | Research: https://t.co/AiDKzVJCGaAlluxio @Alluxio
1K Followers 201 Following Data Orchestration for analytics and machine learning in the Cloud. @TachyonProject is now @Alluxio! [email protected]Mindy Ferguson @woman_hattan
367 Followers 569 Following VP AWS, Messaging and Streaming, @awscloud Urban birdwatcher, photographer (Nikon Z9), squash and tennis fan/player, loves fly fishing. NYC/LASid Anand @r39132
2K Followers 700 Following Dad, Hacker, Ambivert, Nature, History, & Science buff, Futbol fan (he/him) #TweetsMyOwnsoumil @soumil44145290
35 Followers 6 Following Hello! I’m Soumil shah | full stack python developerFred Pace @fpace
1K Followers 616 Following Dad, geek, tech junkie, data jockey, gadget freak, gamer, burger aficionado, pilot, Ducati owner. Retired... Former electron mover at MSFT and AMZN.Haider Sabri @hsabri
859 Followers 474 Following Former Head of Product & Engineering @TrainWithTempo, Former Head of Engineering @UberEatsAndy Walner @andywalner
218 Followers 939 Following Product Manager @OnehouseHQ // prev @Google, @DashworksAI, @UMichAnand Babu Periasamy @abperiasamy
2K Followers 150 Following MinIO, Gluster, Startups, Angel Investor. “Where there is love there is life.” ― Mahatma GandhiSarah Krasnik Bedell @sarahmk125
3K Followers 1K Following Analytics & GTM. @PrefectIO @Perpay_inc @JohnsHopkins @NorthwesternU. Blogging at https://t.co/jyBE9wg5cs. Ski ⛷️ and sail ⛵ in Vermont 🌲Simon Späti 🏔️ @sspaeti
3K Followers 1K Following Dad. Technical Author, Data Engineer and Educator https://t.co/49Ty3GXkqs, https://t.co/7r8pihWPQz. Tweets mostly: #dataengineering, #opensource, #writing, #pkm and #neovimApache Kyuubi @KyuubiApache
175 Followers 116 Following Apache Kyuubi: A Distributed and Multi-tenant Gateway to Provide Serverless SQL on Lakehouses. https://t.co/ETCmmPjkfXiamvinoth @iamvinoth
35 Followers 56 FollowingTim Meehan @tdcmeehan1
21 Followers 0 FollowingAdam Breindel @adbreind
655 Followers 331 FollowingBhaskar Ghosh @BGMusings
442 Followers 388 FollowingMd Hishaam Akhtar @HishaamAkhtar
326 Followers 349 Following They're sharing a drink they call lonliness, but its better than drinking alone Opinions are my own and are subject to change.Vaibhav Nivargi @vnivargi
765 Followers 2K Following Founder & CTO @moveworks. Previously: Founder @clearstorydata (@alteryx); Early engineer @asterdata (@teradata); @stanford CSMahdi Karabiben @MahdiKarabiben
445 Followers 2K Following Data @Zendesk. I love hearing what the data has to say. Views are my own. he/him.Denodo @denodo
6K Followers 6K Following #Denodo is the leader in #datamanagement – providing unmatched performance, unified access to the range of enterprise, #bigdata, #cloud and unstructured sourcesDarragh Kennedy @darraghke
1K Followers 2K Following Director of Engineering @Zendesk - views are my ownLéo Biscassi @leobiscassi
68 Followers 174 Following Problem solver, lifelong learner, curious about data systems.Apache Doris @doris_apache
1K Followers 2K Following An open-source real-time data warehouse. Github: https://t.co/8SplJcHxKH Slack: https://t.co/qOIgHkaZc0Sagar Sumit @sagarsumit6
54 Followers 64 Following Database Engineering @Onehousehq | PMC Member & Committer @apachehudi | CS @gtcomputingAWS Blogs (Unofficial.. @AWSBlogs
4K Followers 1 Following Unofficial feed of AWS blog posts across all categories. Built and maintained by @donkersgood.Surya Prasanna @ThinkSurya
93 Followers 1K FollowingRoshan Naik @naikrosh
97 Followers 33 Following Realtime Big Data. Architect of Apache Storm 2.0's high performance engine, the Kappa+ architecture and Castor. Physiology hacker.Ananth Packkildurai @ananthdurai
2K Followers 2K Following Data @Zendesk, @SlackHQ | Author https://t.co/rvlBOXX0cy | Creator of https://t.co/XdMVrxUay6 | Angel Investor | Advisor for early stage data startupsSimon Whiteley @MrSiWhiteley
3K Followers 588 Following Director of Engineering / Owner of @AdvAnalyticsUK, Speaker & Consultant. Spark Nerd. Londoner, foodie & gamer! Microsoft MVP. Databricks Beacon. He/Him.Eliad Gat 🇮🇱 @eliadgat
17 Followers 112 FollowingBob Haffner @bobhaffner
256 Followers 120 Following Data Engineer | Host of @EngSideOfData #dataengineering #dataengineer podcast: https://t.co/07UIkRSRciBuilding a near Real-time Lakehouse with Apache Hudi using AWS Stack. Real-time data analytics on operational data is increasingly becoming a standard requirement. A 🧵
The Magic of Hudi + Flink, Stream Processing on the Data Lakehouse x.com/i/broadcasts/1…
Bin Packing Algorithm for "Small File" Issue in Lakehouses. Small File problem is one of the critical problems in a data lake that impacts query performance when reading files using compute engines. The problem occurs when writing data in smaller chunks 🧵
Query Optimization with 'Clustering' in Apache Hudi. Today I presented how the clustering service in Hudi makes a huge impact on the overall query perf. To highlight the difference, I ran the same query using Presto once before clustering & after in a 1 TB TPC-DS dataset.
A few months back, I started this 10-post blog series: @apachehudi from Zero to One, with a goal to give a comprehensive deep-dive of Hudi designs. Happy to share the last post today: (10/10) Becoming "One" - the upcoming 1.0 highlights #apachehudi open.substack.com/pub/datumagic/…
Building an Open Data Lakehouse on S3 with @apachehudi & @prestodb. Super excited about this new workshop that I am running with the Presto team on building an open lakehouse architecture & doing ad hoc analytics on top of it.
Have general Hudi quesitons? Wonder about Hudi's best practices or tips for troubleshooting? We are happy to start hosting additional 1-1 office hours every week! Book it now at calendly.com/apache-hudi/of…
.@BrennaBuuck has previously explored how MinIO & Hudi can work together to build a modern #datalake. This blog post aims to build on that knowledge & offer an alternative implementation of @apachehudi & MinIO that leverages Hive Metastore Service (HMS). hubs.li/Q02pVqBX0
@apachehudi: From Zero To One (9/10) introducing HoodieStreamer - a Swiss Army knife for ingestion! #apachehudi #apachespark #apachekafka #distributedsystems #dataprocessing #cdc #dataengineering #databases #datalake #lakehouse blog.datumagic.com/p/apache-hudi-…
#apachehudi's #lakehouse offers a utility, "Hudi Streamer", for data ingestion: ✅ ingest data like @apachekafka , #apachepulsar & etc ✅ supports auto checkpoint management & integration with schema registries @confluentinc ✅ supports for backfills, one-off runs & more
More good stuff from @grabengineering - this time writing about how they are building a realtime datalake with tools including @ApacheFlink, @apachehudi, @ApacheSpark and @trinodb engineering.grab.com/enabling-near-…
During COVID, Zoom adoption soared and eng rapidly redesigned their log analytics with MSK, EMR, #apachehudi + Athena. This led to 82% compute cost savings and 90% on storage while perf from 5h->5min The Blog 👉 aws.amazon.com/blogs/big-data… #datalakehouse #apachespark #apacheflink
Addressing the small file issue is critical for optimizing the query performance on data lakes. The problem occurs when writing data in smaller chunks. e.g. Stream processing engines, like @ApacheFlink, ingest continuous data streams into table formats like Apache Hudi. 🧵
Hudi's indexing vs. Mult-modal indexing - Catch the details around #apachehudi indexing here: hudi.apache.org/docs/indexing #dataengineering #datalakehouse #databases
some common examples are apache iceberg iceberg.apache.org apache hudi hudi.apache.org Azure data Lake 1 learn.microsoft.com/en-us/azure/da…
Our latest blog explores how integrating #YugabyteDB with #ApacheHudi enhances data lakehouse capabilities through: ➡️real-time data processing ➡️efficient upserts and deletes ➡️improved consistency and scalability Find out more! ⬇️ hubs.la/Q02k0Tl00
🤿And for a deep dive into the value of the data lakehouse, download our free ebook, Building a Universal Data Lakehouse. onehouse.ai/whitepaper/one…
🛫The Big Data Show in Bengalaru this past weekend was a huge success. We ❤️ this session: Journey into the Data Lakehouse: Unveiling the Third Generation of Data Design. 💡Data engineering leaders from Onehouse, Visa, and Walmart discussed key questions - including the…
Concurrency Control in Apache Hudi. Table formats in #datalake support concurrent access to data by multiple transactions. This is one of the most important problem tackled by a lakehouse architecture as opposed to data lakes. A 🧵
Apache Hudi 1.0 will bring database-like capabilities to the open source lakehouse architecture. Here’s an accessible write-up of the intro session from Open Source Data Summit, with a punchy pair of videos covering key points. Read, watch, and learn! 😏 onehouse.ai/blog/apache-hu……
Looking forward to this one @Dipankartnt 👏 datadaytexas.com/2024/sessions#… #OneTable #apacheiceberg #apachehudi #deltalake
Join @Dipankartnt at @DataDayTexas happening on 27th Jan 2024. He will be talking about OneTable & how it solves the interoperability challenges among #lakehouse table formats. Join him with some of the other amazing folks in the data space in Austin, TX!