Why Is the Dev & Data Tech Stack So Darn Complicated? (I/IV)
The Status Quo of the Dev & Data Stack
Welcome to the first article on my Substack. Well, not exactly the first one, as I recently re-shared a piece of my colleague Andre (Data-driven VC) and myself on “Developing for Developers: the Potential behind B2D Companies” that was initially published via Earlybird’s Medium, and which I want to take as a starting point for what comes now. I’ve grown to love Business-to-Developer, or B2D, companies, but at the same time a very plain and simple question was forming in my head that kept me cogitating: “Why is the dev & data tech stack so darn complicated?”. it feels like almost every day I come across a new great founding team with a new great idea on how to add additional functionality in an additional tool to the dev or the data stack. But already now the landscape is super crowded, hindering adoption and making it hard to set up and maintain the right tooling. In a four-step series I thus want to add some color to the following topics:
The status quo of the dev & data stack: how does today’s tooling landscape look like?
Who or what was it that made the dev & data stack so complicated: drivers and macro trends that triggered this development
What factors to consider when choosing one’s stack in all this mess: a recipe to find the ingredients you need to bake a suitable tech stack
What is next - will the dev & data stack continue to develop into a Rube Goldberg machine: some predictions (VCs love that) about how I see the future of the dev & data stack
If you are a software developer you are probably familiar with the concept of a ‘technology stack’. This refers to the set of tools, languages, frameworks, and libraries that developers use to create and deploy applications. However, if you have ever looked at a broad technology stack overview, no matter if you are a newbie, someone looking into it from the outside like myself, or a distinguished developer with fance degrees and years of experience, chances are you may have felt overwhelmed to some degree, both by the sheer number of components involved as well as the range of choices within these buckets.
The dev and data stack is like a never-ending game of Tetris, with new blocks being added all the time
Initially, I thought it was just me newly venturing into the space without a CompSci degree who feels lost but over time I figured it really is just damn complicated, also for experienced professionals having built software for years. In this blog post, we will explore what this developer stack looks like, why it is so complex, and why this difficulty is both necessary and inevitable. Or is it really?
What does the basic developer setup look like?
To start, let’s zoom out and have a quick look at what bits and pieces make up our beautifully complex developer stack:
🧑🏻💻 Basic set-up - developer stack:
💻 Hardware: obviously you’ll need a computer to get started (jokes aside - we’ll focus on the software components)
💬 A text editor or an IDE (such as VSCode or Eclipse) to write and edit code
🌐 Web browser (such as Google Chrome, Firefox, Safari): to test and view your web pages and to access integrated developer tools that can help debug and inspect your code
🕹 Version control system (such as Git): used to manage and track changes to your code over time
🏃♂️ Task Runners (such as Grunt and Gulp) and Package managers (popular ones include npm and Yarn): these help automate repetitive tasks and manage dependencies
🏗 Build Tools (such as make, Gradle, or Maven) to automate tasks such as compiling and deploying your code
☑️ Testing Tools: to ensure that your code works correctly and meets the requirements of the project
🐛 Debugging Tools: to use debugging tools to identify and fix errors in your code.
🛣 CI/CD Tools (such as Jenkins, CircleCI, or Travis CI): to automate the process of deploying your code
🤼 Collaboration Tools (such as Jira, Monday, Notion, Confluence, Slack): in case you work in teams and collaborate with other developers, this may include project management software, communication tools, and issue trackers
🈲 Programming languages
Front-end development:
HTML (Hypertext Markup Language) - used to create the structure and content of web pages
CSS (Cascading Style Sheets) - used to add styles, layout, and visual design to web pages
JavaScript - used to add interactivity and dynamic behavior to web pages
TypeScript - a superset of JavaScript that adds static type checking and other features, often used for building large-scale applications
Back-end development:
Java - a popular language used for developing enterprise-scale applications, known for its reliability, security, and portability
Python - a versatile language used for building web applications, data processing, scientific computing, and more, known for its simplicity and readability
Ruby - a language often used in web development, known for its expressiveness and productivity
PHP - a language often used for building dynamic web applications, known for its speed and ease of use
C# - a language developed by Microsoft and often used for building Windows applications, web applications, and games
Go - a language developed by Google and often used for building scalable and efficient applications
Rust - a language developed by Mozilla known for its speed, safety, and reliability, often used for building systems programming and web applications
🖼️ Programming frameworks
Front-end development:
React - a JavaScript library developed by Facebook for building user interfaces and great in creating reusable UI components that can be composed to build complex applications
Angular - a JavaScript framework developed by Google for building dynamic web applications, provides a comprehensive set of features for building complex applications
Vue.js - a progressive JavaScript framework for building user interfaces, known for its simplicity, flexibility, and ease of use
Back-end development:
Ruby on Rails - a popular framework for building web applications using Ruby, known for its convention-over-configuration approach, which simplifies development
Django - a Python-based web framework that is often used for building complex web applications, provides a comprehensive set of features, including an ORM, admin panel, and authentication system
Node.js - a server-side JavaScript runtime (so more an environment than a language or framework) that allows developers to build scalable, high-performance applications, particularly popular for building real-time applications
Express - a popular Node.js framework that provides a minimalist approach to building web applications, used for building APIs, web applications, and single-page applications
Flask - a lightweight Python-based web framework that is used for building web applications and APIs, provides a simple and flexible approach to web development
Laravel - a PHP-based web framework that is known for its elegant syntax and ease of use, provides a range of features, including an ORM, template engine, and authentication system.
As an early-stage investor in developer tools, I am lucky to have my ear on the ground and be exposed to new upcoming trends and tools early on. Some exciting additions to the traditional developer stack as you might know it today that I observe are, for example:
🚤 Productivity measuring tools: monitor individual developers’ productivity on certain key metrics (often around the DORA metrics) to gain visibility and remove blockers
🏎 Deployment automation platforms (think Heroku 2.0): cloud-based platforms for deploying, managing, and scaling containerized applications and microservices without the hassle
🤖 LLM-centered developer tooling: help developers interact with LLMs, include them into the products they are building, and getting the most out of the power of foundation models through effective prompting, data integration, labeling, fine-tuning, monitoring and orchestrating
🧠 Generative AI-powered assistants such as Copilot: AI-powered code assistants that provide suggestions and autocompletion when writing code
…and obviously, many more that complement the dev stack in any way, often by boosting productivity, increasing DevX, lowering cost, and allowing for more functionality.
There are often many parallels between the world of dev and the world of data. In the case of the current state of the stacks, why the stacks grew ever more fragmented and complex, and how this might develop in the future, I also see these similarities. So let’s also take a look at what is broadly considered the ‘modern data stack’ for a second:
🧑🏻💻 Basic set-up data - data stack
♨️ Data sources - where you’ll be getting your data from, this can be both internal and external sources
🎢 Data ingestion (such as Fivetran or Stitch) - tools to extract data from various sources, load it into a central repository, and apply transformations (depending on where transformation happens either ETL or ELT)
🔧 Data modeling (susch as dbt) - give dev/data workers a structured approach to transforming and organizing raw data into a format that is optimized for querying and analysis all while allowing teams to collaborate more effectively on data modeling projects and ensuring that data transformations are consistent and reproducible
🚦Workflow orchestration (such as Airflow) - enable developers to define, schedule, and monitor complex data processing pipelines. These tools provide a way to manage dependencies between tasks, retry failed tasks, and monitor the progress of the workflow in real-time.
💽 Data storage
Databases: organized collections of electronic data that are structured into tables and can be managed using specialized software; can be sub-classified into some widely used database types:
Relational databases: These store data in tables with predefined schemas and enforce data consistency rules using SQL. They are commonly used for transactional workloads and structured data
NoSQL databases: These store data in flexible document or key-value formats and are optimized for horizontal scaling and high availability. They are commonly used for unstructured data and semi-structured data
Object-oriented databases: These store data as objects, which are instances of classes defined in an object-oriented programming language
Hierarchical databases: These store data in a tree-like structure, with each node representing a record and each branch representing a set of fields
Network databases: These are similar to hierarchical databases, but with more flexibility in the relationships between records, allowing for more complex data structures.
Data Warehouse: specialized databases designed for querying and analyzing large volumes of structured and semi-structured data, optimized for read-heavy workloads and often used for business intelligence and analytics
Data Lake: large-scale, centralized repositories that store raw, unstructured data from a variety of sources, often used for exploratory analytics and data science
Data Lakehouse (funky stuff): as expected a hybrid of data warehouses and data lakes, providing the best of both worlds in a single solution, enabling organizations to query and analyze both raw and processed data in one place
📊 Business Intelligence applications (such as Tableau and Looker): provide data visualization and analysis tools that enable organizations to gain insights from their data and make informed decisions (through more or less fancy dashboards)
🪄 Reverse ETL (such as Hightouch and Census): enable businesses to sync data from their data storages back to other applications, allowing them to operationalize insights and drive actions based on their data (isn’t it nice how the circle closes?)
Ok. Wow. The current best practices of both the developer and the data stack are extremely fragmented and complicated. New cool things are constantly being added. At the same time, most categories are very much dominated by just a handful of best-of-breed solutions each. This is not only hard to set up and maintain, but it is also a true hurdle for less sophisticated talent to build stuff with it.
Having won this overview, we set the stage for the next episode where I will delve deeper into how this fragmentation came to be, and who or what is responsible for this mess.