ABOUT | Jacob Dylan | Data Science

About this blog

My blog primarily covers two main areas:

MLOps, which focuses on the challenges of deploying data science projects in industry, and the best practices for successful productionization.
My insights and thoughts on deep learning and its application to IoT data, including how it can be used to model complex real-world networks.

When I first set foot in the industry as a data scientist, I was struck by the abundance of resources available for model development. Yet, as I delved further into the intricacies of this field, I realized that model development was only the tip of the iceberg. What university and boot camp programs alike fail to address is that the most significant challenges arise when deploying these projects at scale in a production environment. I soon learned that many companies still battle with this aspect. I hope that by sharing my expertise in this area, I can offer valuable insights to those who are just starting out in this field.

So whether you're interested in the latest MLOps trends or the cutting-edge developments in deep learning, you'll find something to engage with on my blog.

About the author

As an astute data scientist working for a global waste energy management company's UK subsidiary, I quickly developed a specialization in MLOps, I recognized the teams' critical issues with the scalability, visibility, deployment efficiency, and portability of data science production projects. In an effort to address these issues, I identified the value of transitioning from DevOps to MLOps, as well as the need for industry-standard open source tools that could integrate with our data platform. After conducting extensive research in my own time, I proposed a comprehensive data science platform/architecture plan to the head of data analytics. This solution combined both industry-standard open source tools and proprietary in-house tools, and was implemented to great success, revolutionizing our team's approach to production projects. In addition, I designed several standards that were later adopted throughout the team.

The new data science platform boasted several exciting capabilities, including:

Dual architectures for both batch and online inference, a major upgrade from our previous batch-only capabilities.
The ability to rapidly and easily deploy projects as RESTful APIs using FastAPI, Docker, Kubernetes, and Azure API Management Services, improving portability and scalability.
A shift towards end-to-end pipelines in Azure Data Factory, streamlining our workflow and simplifying project management.
The incorporation of the python logging library and opencensus, as well as Azure Application Insights and PowerBI, providing comprehensive telemetry data logging.
- This enables a consolidated report that encompasses all pipeline run information, with the added functionality to drill through by run_id or filter by project names to examine telemetry data dashboards.
- With this, we can gain deeper insights into production projects and answer important questions such as: which pipelines failed today? Are run times increasing over time? How long does each feature engineering step take to run? Is there a particular function slowing down the overall program, and could we improve its performance?
Robust model performance metric monitoring and performance drift-based alerting, ensuring that our models continue to perform optimally over time. This feature was especially critical given that we had over 500 models running in production.
Streamlined notification system with Slack channel alerts, replacing our outdated mailing lists and allowing for faster and more efficient communication.
MLflow integration to log project metadata in experiments and provide version control for models using the Model Registry, ensuring complete reproducibility of results and easier collaboration among team members.
Data quality checks to ensure the accuracy and completeness of data used in models, and model quality checks to ensure that models meet desired performance thresholds.

In addition to my extensive work in MLOps, I have also spearheaded a variety of end-to-end projects spanning multiple business domains. These have included everything from predicting customer satisfaction and forecasting energy usage, to developing predictive maintenance and anomaly detection solutions for large industrial facilities, and much more.

Beyond my professional pursuits, I am deeply invested in the world of academic research, particularly in the areas of graph neural networks and the application of deep learning to model IoT networks. The cutting-edge discoveries I uncover in this field never fail to inspire me, and I am constantly seeking out ways to translate these findings into practical, real-world solutions.

Education:

University of Bath MSc Computer Science: Distinction (Degree Average: 79%)

University of Bath BSc Natural Sciences (Physics/Maths/Chemistry): 2:1

Skill Highlights:

Programming languages: Python, SQL, C, Java, Haskell, R, Scala, Matlab.
Experience with the standard data science libraries/frameworks, such as: scikit-learn, pandas, matplotlib, statsmodels, TensorFlow, Keras, numpy, MLflow, requests, Apache Spark, FastAPI, unittest, pytest, opencensus, etc.
Experience with the following algorithms: linear and logistic regression, deep neural networks, convolutional NN, LSTM, various clustering algorithms, tree/forest based models, Naive Bayes, and various optimization algorithms.
Experience with the following tools:
- Azure services: Databricks, Data Factory, SSMS, PowerBI, Application Insights, Monitor, Kubernetes, Container Registry, API Management Service, Application Gateway, DevOps.
- Open-source: FastAPI, Docker.
- Project management: Jira.
Expertise in data warehousing best practices with a focus on Azure Synapse Analytics, including designing optimized database schemas and executing efficient ETL processes using custom stored procedures.
Extensive experience parallelising workloads using Pandas UDFs in pyspark.
Experienced in working with both operational engineers and non-technical business members from project conception through to business testing.
Committed to producing high-quality code with a focus on code quality, testing, and maintainability. Skilled in developing high-performance, self-documenting, and modular code that meets rigorous testing standards, including both unit tests and integration tests. Adopts a software engineering mindset to create solutions that are robust, scalable, and optimized for performance, delivering maximum value to end-users.
Proficient in designing, managing, and monitoring numerous project release pipelines and end-to-end production pipelines.

Jacob Turner

About this blog

About the author

Contact Me Here ...