In the midst of exploring diverse career opportunities in the realms of freelancing, e-commerce, marketplace analytics, on-chain analytics and crypto consulting an intriguing prospect came my way. It was an exceptional challenge and a rare opportunity – a chance to be the first data team member of a promising fintech product.
Having spent the past five years as a freelancer, where I had always been part of larger data teams as the 10th, 15th, or 25th member, this new venture fascinated me. It meant departing from my freelancing journey and committing myself to a single company full-time, a decision that carried significant weight.
Initially, I began working part-time for three weeks, allowing me to ease into the role, gather essential context, and set up the foundations. However, I was well aware that a colossal workload awaited me, necessitating a swift transition into a full-time engagement and implementing the famous modern data stack that was essential for moving faster.
During my early conversations with the founders, I made it a priority to deeply understand the business, the product, and the current state of affairs. I delved into their vision of the ideal next states and desired outcomes, absorbing the journey they were embarking on.
The company had successfully traversed the crucial 0-1 stage and discovered product-market fit. Now, we stood at the threshold of scaling from 1 to 10, preparing to expand teams and elevate the organization to new heights. As employee number 33, I joined a team of around 40-50 individuals, and by December ’21, our ranks had swelled to over 100, a testament to our rapid growth and collective ambition.
In this blog series, I will share the captivating tale of building a scalable data stack for startups. From the challenges faced to the strategies employed, this is a comprehensive account of how we transformed from zero to becoming a data powerhouse, poised to propel our company’s success.
Stay tuned for insights, lessons learned, and a roadmap to building an agile and effective data infrastructure that can empower startups on their own path from 1 to 10.
Read more: From Zero to Data Powerhouse: Building a Modern Data Stack for StartupsTable of Contents
Gaining Company Context
When embarking on the journey of building a scalable data stack for startups, gaining a deep understanding of the company context is crucial. This context can be broadly categorized into two areas: the various teams as stakeholders and the technical infrastructure that supports the company’s growth.
a) Teams as stakeholders: To effectively serve the needs of different teams such as product, marketing, business, compliance, and legal, regular catch-up meetings are essential. However, going beyond surface-level interactions, it’s important to dive deep into their challenges, day-to-day operations, and the evolution of their functions. This comprehensive understanding enables you to address immediate short-term needs and plan strategically for long-term requirements.
We implemented a solution that combined the automation of obvious objectives and key results (OKRs) through KPI monitoring. By taking an architectural approach—assessing the overall landscape and then delving into specific areas—we ensured that both immediate and long-term needs were met.
b) Technical infrastructure understanding: Understanding the technical infrastructure of the company is crucial when establishing a data function. Three core teams—backend, frontend, and mobile—alongside the CTO played a pivotal role in helping us comprehend and plan the construction of our data stack. It’s worth noting that the emphasis on these teams may vary based on the product focus and the company’s specific needs. However, the backend team is absolutely crucial in this context.
Additionally, considering collaboration with a DevOps team can greatly benefit the data engineering function within the team. By combining the knowledge and expertise of these technical teams, you can make informed decisions on designing the data infrastructure and selecting appropriate tools that will facilitate scalability as the company grows.
By gaining a comprehensive understanding of both the stakeholder teams and the technical infrastructure, we laid a solid foundation for building an effective and tailored modern data stack and started to envision the data architecture
Implementing the Modern Data Stack
When it comes to setting up a modern data stack, one of the most significant decisions is choosing between the buy vs. build approach. In our case, we opted for the buy approach to ensure a speedy implementation. With our marketing efforts ramping up rapidly, we needed quick results and efficient implementation.
Our goal was to establish a modern ELT (Extract, Load, Transform) data workflow layer that optimizes the data flow from various raw data sources (such as apps, websites, and other data sources) to our owned data warehouse. This data warehouse would then feed into a business intelligence (BI) tool & any ML/AI tools in future, enabling effective analysis and answering of business and product-related questions for all stakeholders.
In the modern data stack, one key aspect is the use of EL (Extract and Load) instead of traditional ETL (Extract, Transform, Load). We directly load all raw data from our sources, such as websites and apps, into our data warehouse and perform transformations on it at a later stage.
During the implementation phase, we utilized several key tools to build our modern data stack:
- Segment: We leveraged Segment to store all user events and behavioral data, providing valuable insights into user interactions.
- MongoDB: Our core backend data and transactional data were stored in MongoDB, serving as a critical data source for analysis.
- Snowflake: As our primary raw production data warehouse and ad-hoc analysis tool, we successfully set up the Snowflake account. We proceeded to add warehouses and implemented measures to ensure the removal of personally identifiable information (PII) from the data.
- Stitch & Fivetran: Stitch was integrated as our extraction and loading tool, facilitating seamless data transfer into the data warehouse.
- DBT Cloud: To transform raw data into usable production data, we employed DBT Cloud, a modern data tool that streamlined data management and analysis.
- Business Intelligence Tool: We evaluated various options, including Metabase, Mode, Looker, Redash, and Google Data Studio, to find the most suitable tool for our use cases and effectiveness. I wrote a blog on the best dashboard tools way back and it’s still relevant on how you can choose your own tools based on usage.
Additionally, we explored Hightouch for reverse ETL, enriching our operational analytics and gaining deeper insights.
Each tool was chosen carefully such that :-
- It could fit into the current tech stack that was being used by development & tech teams
- It can be used in a plug-and-play with faster onboarding and seamless integration
- It evolves well with the data and tech stack with the scale we were planning to achieve in our customer base
By implementing these tools as part of our modern data stack, we successfully streamlined data workflows, enabled data-driven decision-making, and provided actionable insights to drive our business forward.
Data Maturity Model
Understanding the maturity model of data analytics is essential to assess the current state of an organization and determine its desired future state. It provides valuable insights into the organization’s data capabilities and helps chart a path for growth and improvement.
In our journey towards building a scalable data stack for our startup, we recognized the significance of analytics that not only creates business and product impact but also enhances the efficiency of the data team.
To achieve these goals, we categorized our long-term vision for the data team into four primary buckets, as defined by our leadership:
- Actively supporting org teams: The data team actively supports other teams (product, business, marketing, compliance, customer support) within the organization by providing valuable insights and analysis. This involves proactively understanding user behavior, identifying what works well or not, and addressing day-to-day challenges while driving the continuous evolution of the product, the user experience, and more
- Data Analytics (DA): The data analytics team focuses on leveraging data to analyze what happened, and why it happened, to anticipate issues, and mitigate risks. For example, we developed dashboards and analysis on onboarding, acquisition, and retention of the user base. We supported marketing team with multiple performance dashboards for all potential external, and internal channels to understand the best ROI and maximum impact.
- Data Science (DS): The data science team explores predictive capabilities by utilizing advanced algorithms and machine learning techniques. This work revolved around answering critical questions such as predicting user behavior, optimizing business processes, and identifying opportunities for growth.
- Taking leadership roles in organizational growth projects: The data team actively participates in projects that contribute to the growth and development of the organization as a whole. They take ownership of internal and external data initiatives, planning to expand the organization’s data assets and capabilities.
Throughout our journey, these initiatives enabled us to support various teams within the organization, empowering them to make informed decisions and overcome bottlenecks. The data team’s work played a pivotal role in driving the growth and success of the company on a weekly basis.
By constantly striving to improve our data maturity levels and aligning our efforts with business impact, we were able to unlock the true potential of data analytics within our organization.
Are you ready to unlock the power of a modern data stack for your startup? Book a free consultation call with me today and discover how we can help you build a scalable data infrastructure that drives business growth and empowers data-driven decision-making.
During our call, we will dive deeper into your specific needs and challenges, exploring how a modern data stack can revolutionize your data strategy. Whether you’re an early-stage founder, a hiring manager, a C-suite executive, or a member of an expanding data team, our expertise, and experience can guide you on the path from zero to a data powerhouse.
Don’t miss out on this opportunity to gain valuable insights and take your startup to the next level. Click the link to schedule your free consultation call and let’s embark on this data-driven journey together.
[…] had chosen the modern data stack and were using tools like segment, DBT, appsflyer for a lot of data tracking, data transformation […]