Big Data – Four Performance Strategies
Big Data – Four Performance Strategies
In today's blog, I am covering a preview of the latest performance benchmark that the R&D and Talend Labs has run based on the TPC-H benchmark tests.
As ever, it is Talend’s mission to provide easy to use big data integration tools with the industry’s highest performing, most scalable integration code running natively on Hadoop.
As a part of this mission, we put every product release through a rigorous set of performance and scalability tests, including a performance benchmark developed by the Transaction Processing Performance Council, known as TPC-H.
In the latest release of Talend Big Data, we have implemented some key performance strategies and optimisations in Talend Studio that ensure that the Java code that is generated for MapReduce is already optimised. In previous versions these optimisations were possible, however it was incumbent upon the Talend Developer to implement them, or even know that the patterns and good-practice approach existed.
Talend has taken time to embedd the following optimisations in the Studio Design Time, the benefits of this generated output (deployed natively onto the Hadoop nodes), results in an performance uplift of 67 percent as compared to version 5.4.1.
- Move less data
- Improve performance, reduce errors, remove latency and ensure consistency of execution
- Execute natively
- Generate code that executes natively within Hadoop to remove any redundant network, parsing, unpacking, wait-times, cpu or disc cycles
- Remove any need to traverse logical environments, store data, execute logic or use network to execute a query
- Optimise at design time
- Build in ‘know-how’ and developer hints tips and tricks into the tool
- Reduce serialization and deserializations
- For specific scenario: Use RawComparator comparing keys by byte as opposed to deserializing the intermediary keys to perform a comparison
- Focus on overall productivity not just ‘raw throughput’
- Performance is perceived differently from a myriad perspective and roles
- Performance from training developers, to trouble shooting production environments all has an impact on end to project delivery and performance therein
The details of this TPC-H benchmark will be published as part of the 5.5.1 release.
In the meantime, ask your other integration vendors how they implement the performance strategies above in a graphical tool without the need to expert knowledge and man-years of experience... see what answers they can give :-)
Partecipa alla discussione
- Day-in-the-Life of a Data Integration Developer: Introduction to Talend Studio
- Why We Think Gartner’s 2016 Magic Quadrant for Data Integration is a Big Milestone for Open Source
- It’s Not About the Dot: A Journey to Becoming a Leader in the Gartner Magic Quadrant for Data Integration Tools
- CIO: 3 Questions to Ask about your Enterprise Data Lake
- What’s New in Talend Data Preparation 1.2?
- What’s New in Talend Data Preparation 1.2?
- Meeting the French President at the Elysée Palace
- Talend’s Evolution: An Innovative and Ongoing Journey
- Welcome to the Data-Driven Era
- Syncing Users and Groups from LDAP into Apache Ranger
- The Rise of MDM in the Analytics Age
- Practical Cryptography with Apache CXF JOSE
- 5 Enterprise Software Upgrade Best Practices You Should Know
- Are You Ready For The Data Age? Five Maturity Levels in Data-Driven Organizations
- SaaS Data Migration & Data Integration
- Bridging the Gap Between Business and IT with Self-Service Data Preparation
- How Apache Spark™ Feeds Real-Time Sports Analytics
- Creating a Hortonworks Big Data Pipeline at the Speed of Talend
- Data Preparation, to the Moon and Beyond
- Our Newest Data Fabric – A Gateway to Enterprise-Wide Data Driven Insights
- Data Prep 101: Diving into Enterprise Features
- IoST and IoUT: Why They Matter for IoT Growth
- Complex Generation and Distribution of Documents with Talend
- The Evolution of ETL and Continuous Integration
- Die Verschmelzung von digitaler und physischer Welt: das Internet der Dinge
- Spark Summit West & Apache Spark 2.0—An Electrifying Week in Big Data
- Moving Data to the Coalface to Achieve Business Success
- How to Aggregate Clickstream Data with Apache Spark
- The Lambda Architecture and Big Data Quality
- Artificial Intelligence is no Longer Science Fiction, It’s a Reality
- Career Opportunities in Talend for Big Data: Your Guide to Bagging Top Talend ETL Jobs
- Talend and “The Data Vault”
- Stop Chasing Perfection in Analytics. Here’s Why
- Introduction to Apache Beam
- Making Sense Out of the Big Data Tangle
- Telcos and the Big Data-Driven Opportunity
- Analytics for the Masses: Five Things to Consider
- The Real Challenge of Analytics
- Internet of Things: Connecting the Digital to the Physical World
- Utilizing the Kerberos Protocol in Talend
- Key Components for Laying the Foundation for your Data-Driven Enterprise
- Talend Job Design Patterns & Best Practices: Part 2
- What are the Top Three Questions Keeping CDOs Up at Night?
- Five Key Tips for Making MDM the Foundation for Your Customer Centric Organizations
- Talend Integration Cloud Spring ‘16: Making Leaps with Spark, Amazon Redshift, and EMR Integration
- The Five Phases of Hybrid Integration—Part II
- How To Operationalize Meta-Data in Talend with Dynamic Schemas
- Why Marketing Teams Need Data Prep Tools!
- Apache Solr High Speed Data Integration Plugin
- The Five Phases of Hybrid Integration—Part I
- Big Data: Why You Must Consider Open Source
- Step-by-Step: Running, Testing and Debugging a Job in Talend Open Studio
- Talend and Google Services Components: 9 Possibilities to Explore
- JAX-RS 2.1 Specification Work Has Started
- Delivering Data “As You Like It” with Self-Service
- Big Data & Logistics: 7 Current Trends to Watch
- Step-by-Step: Constructing a Job in Talend Open Studio
- Good News Marketeers! Your Day Job Just Got a WHOLE lot Easier
- Data Prep 101: Getting Started with Talend Data Preparation
- Clean and Actionable Data 1 Click away
- Big Data and the Big Game: Super Bowl 50
- 3 Trends Behind the Movement to Real-Time Data
- Talend Connect 2015: Rethinking Data
- 3 Cloud Trends to Prepare for in 2016
- WADL and Swagger United in Apache CXF
- Talend Joins Google to Propose Dataflow as an ASF Incubator Project
- All Talend MDM Users Can Now Help Create a Golden Record
- My Challenge to Informatica: Let’s Play
- Talend’s Benchmark Against Informatica – Setting the Record Straight
- Start Easily Using Apache Spark With Talend 6!
- Improve Customer Engagement and Generate More Business with Apache Spark
- Software Development’s Fountain of Youth
- Don’t Let Your Emails Bounce Back!
- Letting Your Data Quality Software Understand Your Data
- 2016 Predictions – 4 Ways Big Data & Analytics Will Impact Every Business
- Spoiler Alert! Talend 6.1 Hits the ‘Big Screen’
- When it Comes To Big Data – Speed Matters
- What’s Next for IoT: 4 Things to Watch
- IT stuff for free! – 3 Zero-Cost Integration Projects
- Explore the Talend 6 Studio and Its Exciting Productivity Features
- Creating the Golden Record that Makes Every Click Personal
- The Universal Language of Data Mastery
- [Demo] Combining Talend 6 + Spark for Real-Time Big Data Insights
- 6 Things You Should be Looking for in a Big Data Platform
- Too Soon to Talk Holiday Shopping?
- A Surprisingly Simple but Effective Masking System
- You Too Can Become a Data Rock Star & Change the World
- Our Sandbox has Better Toys
- Talend Connect: Step into the future of Big Data!
- Three Key Takeaways from Amazon re:Invent 2015
- Building ‘Houses’ in the Cloud
- You’ve Bought Into the Cloud: Now What?
- Self-Service and Data Governance Empowers LOB Users
- Why Driving a Data-Driven Culture is Essential to Business Success
- Unlocking the Power of the Cloud: Talend Teams Up with AWS at re:Invent 2015
- You Can’t Fake the Data-Driven Force
- Real-Time Big Data is About to Go Mainstream – Are You Ready?
- Survive and Thrive in a Data-Driven Future: Talend Hits the Big Apple at Strata and Hadoop World 2015!
- The Role of Data Governance in Delivering Seamless Omni-Channel Experiences
- The Path to Optimize Retail Operations through Big Data
- Being a Data-Driven Retailer: What’s in it for You?
- Bootstrapping AWS CloudFormation Stacks with Puppet and Structured EC2 User Data
- Retail: Personalised Services to Generate Customer Confidence
- What is a Container? Cloud and SOA Converge in API Management (Container Architecture Series Part 2)
- Use Big Data to Secure the Love of Your Customers
- Defining Your “One-Click”
- Big, Bad and Ugly – Challenges of Maintaining Quality in the Big Data Era – Part 1
- Key Capabilities of MDM for Anything, and Wrap-up (MDM Summer Series Part 11)
- Key Capabilities of MDM for Product Information Management (MDM Summer Series Part 10)
- Key Capabilities of MDM for Regulated Products (MDM Summer Series Part 9)
- Key Capabilities of MDM for Lean Managed Services (MDM Summer Series Part 8)
- Key Capabilities of MDM for Material Data (MDM Summer Series Part 7)
- MDM for Anything (MDM Summer Series Part 6)
- Product Information Management (MDM Summer Series Part 5)
- MDM for Regulated Products (MDM Summer Series Part 4)
- MDM for Lean Managed Services (MDM Summer Series Part 3)
- Home Centro Risorse
- Panoramica dei prodotti
Vuoi saperne di più?
Non perderti i nuovi contenuti! Iscriviti alla nostra newsletter.
Thank you for signing up!