The conventional approach to concrete mix design for dams is methodical, proven, and slow. A senior engineer specifies target properties (strength, workability, heat of hydration, durability), selects a starting formulation based on experience and ACI 211 or IS 10262 guidelines, and then iterates through 15 to 25 physical trial mixes. Each iteration requires batching, casting, curing, and testing at multiple ages. For mass concrete with 90-day or 365-day strength requirements, the trial mix programme can span months before a qualified design is confirmed.
Machine learning offers a fundamentally different workflow. Instead of sequential physical iterations, ML models screen hundreds or thousands of candidate formulations computationally, predicting performance across multiple objectives simultaneously. The engineer then selects the 5 to 8 most promising candidates for physical verification. The trial mix programme still happens, but it starts from a much stronger position.
The question for dam engineers is not whether ML can optimize a concrete mix. The research conclusively shows it can. The question is whether the available models work for the specific formulations, aggregate types, and performance requirements of mass concrete and RCC for dams.
The Current ML Landscape for Concrete Mix Design
BOxCrete: Open-Source Bayesian Optimization
Meta’s BOxCrete, released in March 2026, represents the most significant recent development. Built on Meta’s Adaptive Experimentation (Ax) platform, BOxCrete uses Gaussian Process regression to predict strength development and enable multi-objective optimization balancing mechanical performance against embodied carbon.
The model was trained on over 500 strength measurements from 123 mixtures (69 mortar, 54 concrete) tested at five curing ages (1, 3, 5, 14, and 28 days), achieving an average R-squared of 0.94 and RMSE of 0.69 ksi. At Meta’s Rosemount, Minnesota data center, the BOxCrete-optimized mix reached full structural strength 43% faster than the original formula while reducing cracking risk by nearly 10%.
The research team behind BOxCrete is led by Nishant Garg, associate professor at the University of Illinois Urbana-Champaign, in collaboration with cement producer Amrize. Their broader research programme has demonstrated that AI-generated concrete formulations can reduce carbon footprint by up to 40% while maintaining strength, with some formulations replacing upwards of 70% of cement with fly ash and slag combinations.
The critical distinction: BOxCrete is released under the MIT license on GitHub, meaning any organization can download, use, and modify it without licensing fees. This open-source approach is deliberate. As Garg’s team has stated, the goal is broad market penetration, particularly helping concrete producers who lack the budget for proprietary mix design software.
Supervised Learning Models: Strength Prediction
Beyond Bayesian optimization, a broad ecosystem of supervised ML models targets compressive strength prediction. The performance benchmarks from recent research (2024 to 2026) are consistently strong:
| Model | Application | R-squared | RMSE |
|---|---|---|---|
| CatBoost | General concrete | 0.94 | 2.7 MPa |
| XGBoost | General concrete | 0.98 (workability) | Varies |
| XGBoost-WOA | Fly ash + silica fume mixes | 0.995 | Not reported |
| CatBoost | RCC compressive strength | 0.983 | Not reported |
| Deep Neural Network + MOPSO | Blended concrete | >0.93 | Varies |
| Gradient Boosting-DE | SCM concrete | 0.995 | Not reported |
SHAP (SHapley Additive exPlanations) analysis across multiple studies confirms that cement content, water content, and water reducer dosage are the dominant predictive features for compressive strength. For mixes with SCMs, the type and dosage of fly ash, slag, and silica fume become significant variables, but their effects are highly non-linear, which is precisely why ML outperforms empirical formulas for these formulations.
Multi-Objective Optimization: Beyond Strength
The most impactful development for dam concrete is multi-objective optimization (MOO), which simultaneously balances competing design goals. For building construction, the typical objectives are strength, cost, and embodied carbon. For dam concrete, the objective space is broader and more consequential:
- Compressive strength at specified ages (28, 90, or 365 days)
- Heat of hydration (directly determines thermal cracking risk)
- Durability (AAR resistance, sulphate resistance, freeze-thaw)
- Workability (must suit the placement method: pumped, conveyed, or RCC roller-placed)
- Embodied carbon (increasingly a requirement for multilateral-funded projects)
- Cost (cement, SCMs, admixtures, transport)
Research using Non-dominated Sorting Genetic Algorithms (NSGA-II) combined with ML surrogate models has achieved approximately 30% reduction in CO2 emissions and 10 to 15% cost savings while maintaining target strength. Deep learning combined with Multi-Objective Particle Swarm Optimization (MOPSO) has produced optimized designs exceeding 50 MPa with cement reductions of up to 25%.
Why multi-objective optimization matters for dams
On a mass concrete dam, reducing cement by 20 kg per cubic metre has cascading effects: lower heat generation means reduced cooling requirements, which means fewer embedded cooling pipes, shorter post-cooling durations, and faster lift placement cycles. ML-driven MOO can quantify these trade-offs computationally, helping engineers make informed decisions before a single trial batch is mixed.
The Dam-Specific Gap
Despite the impressive performance metrics, a fundamental gap exists between the current ML model ecosystem and the requirements of mass concrete for dams. This gap must be understood honestly before adopting any ML tool on a hydroelectric project.
Training Data Mismatch
The vast majority of ML models for concrete strength prediction are trained on datasets dominated by:
- Ordinary Portland cement at 300 to 450 kg/m3
- Maximum aggregate size of 20 to 25 mm
- Evaluation at 7 and 28 days
- Standard curing conditions
Dam concrete operates in a different parameter space:
- Total cementitious content of 150 to 250 kg/m3 with 30 to 60% SCM replacement
- Maximum aggregate size of 75 to 150 mm
- Evaluation at 90 or 365 days for design acceptance
- Thermal control requirements that influence curing conditions
A model trained on the first dataset will extrapolate (unreliably) rather than interpolate (reliably) when asked to predict the performance of the second. BOxCrete’s training set of 123 mixtures does not include mass concrete formulations. Until dam-specific training data is incorporated, predictions must be treated as preliminary.
Aggregate Variability
Aggregate properties, specifically mineralogy, shape, surface texture, gradation, and alkali-silica reactivity, critically influence both fresh and hardened concrete performance. Himalayan river-run aggregates differ substantially from the crushed limestone or granite in most training datasets. A model that does not account for aggregate-specific effects will underperform on dam projects where aggregate selection drives key design decisions.
Long-Term Property Prediction
Dam concrete is designed for 100-year service lives. Predicting 365-day strength, long-term creep, drying shrinkage, and durability performance over decades requires either long-term test data or physics-informed models that ML alone cannot provide. Hybrid approaches that combine ML with established cement hydration models (such as those in the NIST VCCTL framework) are a promising research direction but are not yet production-ready.
A Practical Framework for Dam Engineers
Given both the capabilities and limitations, here is how ML tools can be productively integrated into the mix design process for hydroelectric projects today.
Phase 1: Computational Screening (ML-Assisted)
Before any physical testing, use ML models to explore the design space. Define the target properties (strength class, maximum heat of hydration, workability range, SCM constraints based on local availability) and let the optimization algorithm generate candidate formulations.
For this phase:
- BOxCrete or similar Bayesian optimization tools can screen hundreds of SCM blends computationally
- XGBoost/CatBoost models can rank candidates by predicted strength at multiple ages
- MOO algorithms (NSGA-II, MOPSO) can identify Pareto-optimal solutions balancing strength, heat, and cost
The output is a shortlist of 5 to 10 candidate formulations for physical testing, rather than a single “optimal” design. The model narrows the search; it does not replace engineering judgment about which candidates make physical sense given the project’s aggregate sources, SCM supply, and placement conditions.
Phase 2: Physical Validation (Standard Testing)
Every ML-recommended formulation must undergo full trial mixing and testing per IS 457, ACI 207, or project-specific specifications. This includes:
- Fresh property testing (slump/Vebe, air content, unit weight, initial set)
- Compressive strength at 7, 28, 90, and (where specified) 365 days
- Heat of hydration measurement (semi-adiabatic calorimetry or isothermal)
- Durability testing (ASTM C1567 for AAR, ASTM C666 for freeze-thaw, sulphate expansion)
The physical results then feed back into the ML model as project-specific training data, improving its predictions for subsequent mix iterations. This feedback loop is where the real value accumulates: each project builds a richer dataset that makes the model more reliable for the next project.
Phase 3: Production Monitoring (ML-Augmented)
Once a qualified mix is in production, ML tools shift from design to monitoring. Platforms like Giatec SmartMix track actual batch data against expected performance, flagging deviations in real time. If a cement shipment’s Blaine fineness changes, or if the fly ash loss-on-ignition varies between deliveries, the ML system can predict the effect on strength development before the concrete is placed.
This production monitoring capability is particularly valuable on large dam projects where concrete placement extends over years and material properties inevitably vary. The ML system provides early warning of mix performance shifts that might otherwise be detected only when 28-day cylinder results arrive weeks after placement.
What Dam Owners Should Do Now
The technology is not hypothetical. It is available, with open-source options requiring zero licensing cost. Three actions make sense immediately:
-
Digitize historical mix data. If your organization has trial mix records from previous dam projects in paper format, convert them to structured databases. This data becomes training material for dam-specific ML models and grows in value over time.
-
Run BOxCrete on your next trial mix programme. Download the open-source model, input your target properties and material constraints, and compare its recommended formulations against your engineer’s initial designs. Use it as a screening tool alongside conventional practice, not as a replacement.
-
Specify digital data collection in new project QC plans. Ensure that all batch plant data, test results, and material certificates are captured in structured digital formats from day one. Every data point collected today strengthens the ML models available for tomorrow’s projects.
PCCI’s mix design practice is built on leadership experience spanning 4,000+ MW of hydroelectric capacity. ML tools do not replace that experience. They amplify it by expanding the design space an engineer can explore and by making the optimization process faster and more rigorous. The firms that start building dam-specific ML datasets now will have a significant advantage when these tools mature, and they are maturing fast.
To discuss how ML-augmented mix design can be integrated into your next hydroelectric project, contact PCCI’s consulting team.