Bioprocess Simulation Dataset Generator

Company ProjectArtificial Intelligence

An intelligent synthetic dataset generator that produces realistic bioprocess testing data for developing calculation scripts within bioprocess workflows. Built for scientific simulation, regulatory testing, and model robustness under edge-case scenarios.

Technologies

PythonJavaScriptAPI DevelopmentPrompt EngineeringBioprocess SimulationSynthetic Data GenerationCSV FormattingSensor Logic ModelingStatistical Simulation

Overview

Designed for Qubicon’s simulation engine, the AI Dataset Generator produces high-fidelity datasets based on real-world bioprocess parameters, setpoints, and expected sensor fluctuations. The tool uses Meta's Llama 3 to generate domain-specific data with embedded logic, enabling simulation of rare events, dynamic shifts, and environmental variations at scale within a bioprocess environment.

Challenge

Creating specific bioprocess data is time-consuming and often infeasible due to cost, time and efficiency. Teams at Qubicon used to manually build datasets for their simulation engine to test their specific scripts that they build for bioprocess workflows. Those scripts often contain complex logic, edge cases, and dynamic behaviors which must be tested on bioprocess simulation data. Building the datasets manually was a tedious process and cost the company money.

Solution

I came up with the idea and engineered an AI-driven generation pipeline that uses prompt-engineered LLMs and custom simulation logic to create structured realistic simulation datasets that reflect the specific edge cases desired for testing. It integrates with CSV templates compatible with Qubicon’s platform and models dynamic behaviors including phase transitions, setpoint adjustments, and failure simulations.

Results

Increased the company's efficiency and cost-effectiveness. Reduced dataset development time by over 90% and enabled teams to test feeding algorithms and control models against highly realistic edge-case scenarios. Supported rapid prototyping and model validation across multiple process types.

Project Gallery

Generator Interface

The generator interface showing parameter controls and output visualization

Generator Variable Example

Example of inputing a variable (a script we want to test using the simulation, which we need data for) in the generator

Generator Error Example

Example of inputting an error in the generator that we want reflected in the dataset for testing

Related Projects