Build the future you want

Join the companies disrupting their industries

Machine Learning Data Engineer



Software Engineering, Data Science
Amsterdam, Netherlands
Posted on Friday, July 21, 2023

Who are we?

On a mission to make video easy for anyone …

It is an exciting time to join Synthesia as we reached a hallmark by becoming a Unicorn, having raised $90 million in Series C funding and now evaluated at $1 billion!! ✨ 🦄

Synthesia is the world’s #1 AI video generation platform. Well, it’s actually a video production studio — in a browser. As in, no cameras or film crews at all. You simply choose an avatar, enter your script in one of 60 languages, and your video is ready in minutes. In Synthesia, you can build personalised on-the-fly videos, give your chatbot a human face or run 24/7 weather channels in different languages, to name just a few of the possibilities. 🎬

We believe the future of media is synthetic, and we are on a mission to turn cameras into code and make everyone a creator. To learn more, check out our brand video that explains what we’re doing at Synthesia.

About the role

We are looking for an experienced Machine Learning Data Engineer who loves dealing with large quantities of text and audio data. The successful candidate will be proficient in using machine learning techniques to build data processing pipelines, preparing ready-to-train datasets for large models.

If you are excited about the intersection of AI, Machine Learning, and Large Data, this role provides a unique opportunity to make a high-impact contribution. 💪🏻

Our aim is to make video content creation available for all - not only to studio production!

🧑🏼‍🔬 You will be someone who loves to code and build working systems. You are used to working in a fast-paced start-up environment. You will have experience with the software development life cycle, from ideation through implementation, to testing and release.

👩‍💼 You will join a group of more than 40 Engineers in the R&D department and will have the opportunity to collaborate with multiple research teams across diverse areas, our R&D research is guided by our co-founders - Prof. Lourdes Agapito and Prof. Matthias Niessner.

If you know and love Voicebox, Whisper, VALL-E, SPEAR-TTS and more - and you love machine learning and large data, then we would love to talk to you. We will also want to talk to you - if that's what you dream of doing. 🤩

What will you be doing?

🚀 In this position, you'll join the team to help develop our LLM-based TTS system that will provide our customers with voice clones that are indistinguishable from real voices. You will also help us create high quality, production ready code and take ownership of production pipelines. This would include:

  • Designing, developing, and maintaining data processing pipelines, utilising machine learning techniques to handle vast amounts of text and audio data, while ensuring data quality and accessibility.
  • Leveraging your understanding of machine learning algorithms and workflows to prepare data most effectively for usage in large scale models.
  • Use Big Data tools and frameworks to process, analyse, and derive insights from structured and unstructured data.
  • Collaborating with other ML Engineers and Researchers to understand their data requirements and provide them with ready-to-train datasets.
  • Monitoring the performance of data pipeline and machine learning models, troubleshoot data-related issues, and perform root cause analysis to implement strategic solutions.
  • Stay up-to-date with emerging technologies and tools in machine learning and data engineering to continually improve our data infrastructure.
  • Document data pipeline architecture and workflow, present findings to relevant stakeholders, and provide training as needed.

Who are you?

  • You have a background in Computer Science, Engineering, or a related field with 3+ years of experience. Advanced degrees with a focus on Machine Learning are preferred.
  • Proven experience as a Data Engineer, or similar role, with a demonstrated history in designing and building scalable data pipelines using Machine Learning techniques.
  • Familiarity with audio data processing and voice technologies is highly desirable.
  • You have excellent coding skills in Python and you are very passionate about the software development side of things.
  • You have solid proficiency in Unix-like command line operations, including the creation and execution of both quick one-liners and complex bash scripts.
  • You put emphasis on documenting your work in a clear and concise manner.
  • Ability to work effectively in a fast-paced, agile environment.
  • And finally..You have excellent verbal and written communication skills and you are passionate about what you do!

Nice to have…

  • Transformers, Huggingface, Whisper ASR.
  • Multi-threaded Python
  • AWS framework.

The good stuff...

💸 You will be compensated well (salary + stock options + bonus)

📍 You will work in a hybrid setting with an office in Amsterdam

🏝 You get 25 days of annual leave + public holidays

🥳 You will join an established company culture with regular socials and company retreats

🤩 You get 4 weeks paid sabbatical after 4 years at the company + $10,000!!

👉 You can participate in a generous referral scheme

🚀 You will have huge opportunities for your career growth

You can see more about Who we are and How we work here: