Tsz Yu Timothy Tang

Machine Learning Engineer/Data Scientist

Machine Learning Engineer and Data Scientist. With a focus on the scientific computing python stack and associated low level language access (C via Cython, CUDA via Pytorch).

Also experienced in distributed computing for data science processes. (pyspark, dask).

Used Cloud tools for machine learning (Azure ML, Databricks, Palantir Foundry)

Experienced in web frameworks (Flask, Fastapi, torchserve) for model and algorithm deployment and serving.

Experienced in Spark via PySpark, and various data engineering tools (Kafka, Airflow, DuckDB) for batch processes and distributed training. AWS and Azure experience (S3, Redshift, Glue, Azure Actions, Blob Storage).

Experienced in managing tooling for the Python Data Science Stack (uv, Poetry, Conda)

Well versed in up to date Python modeling packages (XGBoost, LightGBM, StatsModel) and deep learning (Pytorch). Experienced in open source LLMs deployment (Ollama) and LLM evaluations (Deepeval, InspectAI).

Work Experience

Data Scientist (Contractor) with Machine Learning Engineering Focus

BP | March 2022 - December 2024

As a data science contractor, provided Data Science product expertise and analysis to various functions across BP. This included Production, Refining, Renewables, Strategy and Safety.

Deeply involved as a product data scientist in product development for a renewables modelling platform. Core work involves researching and designing countermeasures against the risks regarding Model IP in the context of an API Marketplace.

Investigated the effects and risks of model theft and replication if a model is made available in a models marketplace. With a family of internal test models, tested the limits of reverse engineering under a couple of scenarios defined through customer research.
Model IP protection: With the above results, designed a framework to measure the relative risk of theft and replication for a given model. Using the resulting metrics, a further system is developed to capture the ongoing risks of model calls on the API marketplace. This was turned into a monitoring and alerting service in production with the architecture and software engineering teams.
Investigated various GenAI applications in the context of a models marketplace. Generally focused on creating POCs and examples for RAG and GenAI based retrieval for various model documentation.
For the above designs, worked with internal counsel to submit for patent application.
Model marketplace development: As part of business development of the platform, aided in consulting for internal teams and startups on the strategy of productionizing and monetizing their internal models.

Working in conjunction with compressor engineers, introduced and applied explainable machine learning to compressor fouling analysis:

Working with compressor engineers to find factors for analysis. Further feature engineering was done to further analyze potential fouling issues in the compressors.
Created a model with explainable machine learning to explore counterfactuals. Designed a dashboard based on the model to allow compressor engineers to explore a angle of scenarios and options.

Analysis and modeling of equipment failures. Produced a survival analysis model to provide a clearer view of the pattern of failures.

Involved in introducing statistical analysis to understanding pattern of survival analysis, and aid in setting better health and safety targets.

Consulting work with strategy, aiding the team in designing the roadmap for modernizing data stacks and tooling.

Investigated Microsoft and OpenAI machine learning API offerings as part of a team wide AI drive. Focused especially on producing internal benchmarks using various evaluations packages.

As a member of technical staff, participated in facilitating knowledge sharing and mentorship. Organized and led a team data science reading group.

Machine Learning Engineer

Aioi Nissay Dowa | August 2018 - March 2022

With a focus on vehicle telematics data and insurance customers, worked on providing classification, categorisation and scoring models to enable paradigm changes and automation in insurance operations. Was involved in the full machine learning models development and deployment cycle, which includes, exploratory data analysis, data collection, feature engineering, model building and optimisation, acceptance testing, API building, deployment/serving and continuous monitoring/training.

Project lead for driving behaviour algorithms. Lead a team of 2-3 data scientists and engineers to develop the product.

Involved in creating new generations of driving behaviour predictive models in association with team data scientists and insurance analysts. Using mainly telematics data and incorporating other data sources, such as mapping and weather information,
The team was able to increase the ability to identify 70% more customers that have a high risk of causing an insurance event or impact incident compared to the previous telematics method. This translates to an estimated gain in loss prevention ratio of 30%.
The team was able to create a machine learning scoring pipeline in production to handle more than 100k drivers, which translates to over 200k trips a day.
Supported the transition to a next generation platform in conjunction with the engineering teams, with a focus on Data-as-a-service model.

Contributed to crash identification algorithms via telematics dynamics.

Created a filtering algorithm to identify noisey crashes, to either enable cleaning or removal from sample. We produced a capture rate of 95% of all real crashes, reducing the number of crashes that require customer support intervention by half.
Involved in designing further downstream categorisation and automation systems to enable automatic customer services and incident triage via a customer portal.

Delivered MLOps practises in conjunction with Operations teams

Created the CI/CD workflow for python and specifically machine learning projects. Created a bespoke deployment framework in conjunction with operations to address team needs. Furthermore, defined and built required data pipelines to deliver continuous monitoring and analysis of the predictive and classification models.

Delivered an end-to-end framework for model explainability, in order to enable insurance product enhancements and better customer service on the platform.

Involved in generating required visualisation and dashboard for management to understand data. Also supported any ad-hoc analysis and reporting for clients.

Involved in product, architecture and resource planning for projects. Managed team of 2 junior machine learning engineers to deliver projects, with coaching and guidance provided.

Operational Data Analyst

Amazon | October 2016 - September 2017

Maintain and manage performance metrics and communications for logistic operations with data-driven. Drove innovation in reporting and metrics management process.

Contact with key stakeholders to ensure compliance and improve performance of linehauls and warehouse operations.
Automation of reporting and communication process with VBA, SQL and Microsoft stack of business analytics tools.
Further developed warehouse operational metrics in tandem with operations management to identify areas of improvement.

Analyst

KPMG | October 2014 - March 2016

Technology risk consulting
Projects include: IT audit, IT process review, due diligence reporting, ISAE3402 reporting, Data quality assurance, information governance review.
Participated in designing audit programmes for bespoke projects.
Participated in dashboard building and analysis for clients.

Education

MSc in Data Science

Kings College London

2017 - 2018
Msci in Maths and Physics

University College London

2010 - 2014

Skills

Technical

Languages: Python/SQL/Matlab
Lower Level: C via Cython/JAX
MLOps: Flask/Fastapi/torchserve/MLFlow/Dask
Data Engineering: Kafka/Airflow/DuckDB/Spark/PostgreSQL
Modelling: XGBoost/LightGBM/StatsModel/Pytorch
LLMOps: Ollama/llama.cpp/comfy.ui/evals
AWS (S3, Redshift, Glue, EC2)
Azure (Azure Actions, Blob Storage, AzureML)
Databricks

Industries

Logistics
Renewables
Insurance
Oil aand Gas
Consulting

Other

Interested in using: Rust/Julia/Dagster/CUDA
Topics of Interest: Knowledge Gaph in LLM/Event-baased Time Series Modelling/Alternative Market Making

Languages

Interests

Sport Analytics
Scuba
Photography
Tennis
TCG

Print Version

HTML Version