Do you miss meetups of the Big Data era, when you could learn about RAFT protocol or map-side join over pizza and beer? Us too! My friend @cwensel has launched SF Distributed Systems Meetup.
First meetup is next week, with talks by me and @conor_power23lu.ma/t6r4mi4v
If your company is still running Cascading+HadoopMR applications and are itching to drop MR for Spark, dm me.
Some companies have 100s of @Cascading apps in production and moving them off MR in a couple lines of code would be a huge cost savings.
Also true for @scalding.
I always suspected that Parquet and Orc weren't very good, but TIL from @ViktorLeis that one can do much much better with BtrBlocks. cs.cit.tum.de/fileadmin/w00c…
I'm hanging my shingle out again.
I have some new insights I want time to work out into a developer tool for cloud data engineers, but I want to bootstrap it with contract/consulting revenue.
This is how I started @Cascading in 2007.
chris.wensel.net
Please boost!
Scala folks: if there are open source GitHub Twitter libraries you like, you might want to make a clone/fork now. Seems like we are in a situation where anything can happen.
really interested in adding mergeable summary as a datatype to @cascading. think having daily/hourly aggregate tables, then daily/hourly performing a last N day/hour aggregation against the partials. cost efficient way to have rolling percentiles/aggregates
cross your fingers, we might have a @cascading 4.1 release tonight, and a quick 4.5 release a day later w/ Hadoop 3 support. Both will include native @ApacheParquet support to replace the support dropped by Parquet. Plus some well worn enhanced Parquet support in 4.6
For fun I really want to fork the @cascading local mode planner to support RocksDB as the backend for joins instead of the memory only hashmaps. Local mode was never intended to scale in joins.
@cwensel@PinterestEng Google gs connector does do aligned block reads...both that and this one work well for backwards seeks (footers, stripe footers -> stripes)
/2
9K Followers 2K Followingengineer for recommendations at @Netflix. Former employee of @stripe, @twitter. Former academic. Programmer. Maui resident. Just some guy. @[email protected]
5K Followers 417 FollowingOn a mission to tame data. Author of Apache Calcite, Mondrian OLAP engine, and the Morel language. Formerly Looker/Google, now working on the next thing.
91 Followers 3K FollowingKnowledge Representation, Logic, and Semantics to reason Complex Systems and dense text of Biology and History. Writes Culture, History, and Political Thoughts.
835 Followers 3K Followingengineering leader passionate about AI, distributed systems, and open source; currently @MSL/Meta Superintelligence Labs, previously @Netflix, @Intel, et al.
281 Followers 3K FollowingWorking as Data Engineer II @infinxinc.Worked as Data Scientist @kratinmobile . Completed Big Data Analytics Course @cdacindia ACTS,pune.
HakerRank @rohitsohlot
9K Followers 2K Followingengineer for recommendations at @Netflix. Former employee of @stripe, @twitter. Former academic. Programmer. Maui resident. Just some guy. @[email protected]
5K Followers 417 FollowingOn a mission to tame data. Author of Apache Calcite, Mondrian OLAP engine, and the Morel language. Formerly Looker/Google, now working on the next thing.
74 Followers 118 FollowingCEO of Analytics Inside specializing in big data, machine learning, and graph theory. Co-author of Learning Cascading from Packt Publishing.
5K Followers 150 FollowingSenior Writer for http://t.co/QF0Tfbi0gv, IDG Enterprise covering IT Security, Big Data, Open Source and Microsoft Tools and Servers
17K Followers 136 FollowingFounder of @redplanetlabs. On a mission to dramatically improve programmer productivity and software quality by reducing the complexity of software development.