Skip to main content
Top-down view of a concrete curing tank at a dam project QC lab with 150 mm cast cubes and compression-test cylinders tagged by QR code, beside a machine-learning Pareto-front optimization dashboard, showing PCCI's AI-augmented mix design workflow.
Technical Brief 13 min read ·

Machine Learning for Concrete Mix Design: From BOxCrete to Dam-Specific Optimization

In March 2026, Meta released BOxCrete, an open-source Bayesian Optimization model for concrete mix design, under an MIT license. The model, developed with the University of Illinois and cement producer Amrize, reduces the carbon footprint of concrete by up to 40% while maintaining strength, with some formulations replacing upwards of 70% of cement with fly ash and slag combinations. For dam engineers, this raises an immediate question. Mass concrete for hydroelectric projects already uses high SCM dosages, low cement contents, and extended curing ages that fall outside the training data of most ML models. Can these tools actually help with dam-specific mix design, or are they solving a different industry's problem? This technical brief examines the current state of ML-driven mix design optimization, assesses its relevance to mass concrete for dams and RCC, and outlines a practical framework for integrating ML tools into the trial mix process without abandoning the engineering judgment that keeps dams standing.

KS

Kushal Sthapak

Co-Founder, PCCI

machine learning AI in construction mix design concrete optimization

The conventional approach to concrete mix design for dams is methodical, proven, and slow. A senior engineer specifies target properties (strength, workability, heat of hydration, durability), selects a starting formulation based on experience and ACI 211 or IS 10262 guidelines, and then iterates through 15 to 25 physical trial mixes. Each iteration requires batching, casting, curing, and testing at multiple ages. For mass concrete with 90-day or 365-day strength requirements, the trial mix programme can span months before a qualified design is confirmed.

Machine learning offers a fundamentally different workflow. Instead of sequential physical iterations, ML models screen hundreds or thousands of candidate formulations computationally, predicting performance across multiple objectives simultaneously. The engineer then selects the 5 to 8 most promising candidates for physical verification. The trial mix programme still happens, but it starts from a much stronger position.

The question for dam engineers is not whether ML can optimize a concrete mix. The research conclusively shows it can. The question is whether the available models work for the specific formulations, aggregate types, and performance requirements of mass concrete and RCC for dams.

The Current ML Landscape for Concrete Mix Design

BOxCrete: Open-Source Bayesian Optimization

Meta’s BOxCrete, released in March 2026, represents the most significant recent development. Built on Meta’s Adaptive Experimentation (Ax) platform, BOxCrete uses Gaussian Process regression to predict strength development and enable multi-objective optimization balancing mechanical performance against embodied carbon.

The model was trained on over 500 strength measurements from 123 mixtures (69 mortar, 54 concrete) tested at five curing ages (1, 3, 5, 14, and 28 days), achieving an average R-squared of 0.94 and RMSE of 0.69 ksi. At Meta’s Rosemount, Minnesota data center, the BOxCrete-optimized mix reached full structural strength 43% faster than the original formula while reducing cracking risk by nearly 10%.

The research team behind BOxCrete is led by Nishant Garg, associate professor at the University of Illinois Urbana-Champaign, in collaboration with cement producer Amrize. Their broader research programme has demonstrated that AI-generated concrete formulations can reduce carbon footprint by up to 40% while maintaining strength, with some formulations replacing upwards of 70% of cement with fly ash and slag combinations.

The critical distinction: BOxCrete is released under the MIT license on GitHub, meaning any organization can download, use, and modify it without licensing fees. This open-source approach is deliberate. As Garg’s team has stated, the goal is broad market penetration, particularly helping concrete producers who lack the budget for proprietary mix design software.

Supervised Learning Models: Strength Prediction

Beyond Bayesian optimization, a broad ecosystem of supervised ML models targets compressive strength prediction. The performance benchmarks from recent research (2024 to 2026) are consistently strong:

ModelApplicationR-squaredRMSE
CatBoostGeneral concrete0.942.7 MPa
XGBoostGeneral concrete0.98 (workability)Varies
XGBoost-WOAFly ash + silica fume mixes0.995Not reported
CatBoostRCC compressive strength0.983Not reported
Deep Neural Network + MOPSOBlended concrete>0.93Varies
Gradient Boosting-DESCM concrete0.995Not reported

SHAP (SHapley Additive exPlanations) analysis across multiple studies confirms that cement content, water content, and water reducer dosage are the dominant predictive features for compressive strength. For mixes with SCMs, the type and dosage of fly ash, slag, and silica fume become significant variables, but their effects are highly non-linear, which is precisely why ML outperforms empirical formulas for these formulations.

Multi-Objective Optimization: Beyond Strength

The most impactful development for dam concrete is multi-objective optimization (MOO), which simultaneously balances competing design goals. For building construction, the typical objectives are strength, cost, and embodied carbon. For dam concrete, the objective space is broader and more consequential:

  1. Compressive strength at specified ages (28, 90, or 365 days)
  2. Heat of hydration (directly determines thermal cracking risk)
  3. Durability (AAR resistance, sulphate resistance, freeze-thaw)
  4. Workability (must suit the placement method: pumped, conveyed, or RCC roller-placed)
  5. Embodied carbon (increasingly a requirement for multilateral-funded projects)
  6. Cost (cement, SCMs, admixtures, transport)

Research using Non-dominated Sorting Genetic Algorithms (NSGA-II) combined with ML surrogate models has achieved approximately 30% reduction in CO2 emissions and 10 to 15% cost savings while maintaining target strength. Deep learning combined with Multi-Objective Particle Swarm Optimization (MOPSO) has produced optimized designs exceeding 50 MPa with cement reductions of up to 25%.

Why multi-objective optimization matters for dams

On a mass concrete dam, reducing cement by 20 kg per cubic metre has cascading effects: lower heat generation means reduced cooling requirements, which means fewer embedded cooling pipes, shorter post-cooling durations, and faster lift placement cycles. ML-driven MOO can quantify these trade-offs computationally, helping engineers make informed decisions before a single trial batch is mixed.

The Dam-Specific Gap

Despite the impressive performance metrics, a fundamental gap exists between the current ML model ecosystem and the requirements of mass concrete for dams. This gap must be understood honestly before adopting any ML tool on a hydroelectric project.

Training Data Mismatch

The vast majority of ML models for concrete strength prediction are trained on datasets dominated by:

  • Ordinary Portland cement at 300 to 450 kg/m3
  • Maximum aggregate size of 20 to 25 mm
  • Evaluation at 7 and 28 days
  • Standard curing conditions

Dam concrete operates in a different parameter space:

  • Total cementitious content of 150 to 250 kg/m3 with 30 to 60% SCM replacement
  • Maximum aggregate size of 75 to 150 mm
  • Evaluation at 90 or 365 days for design acceptance
  • Thermal control requirements that influence curing conditions

A model trained on the first dataset will extrapolate (unreliably) rather than interpolate (reliably) when asked to predict the performance of the second. BOxCrete’s training set of 123 mixtures does not include mass concrete formulations. Until dam-specific training data is incorporated, predictions must be treated as preliminary.

Aggregate Variability

Aggregate properties, specifically mineralogy, shape, surface texture, gradation, and alkali-silica reactivity, critically influence both fresh and hardened concrete performance. Himalayan river-run aggregates differ substantially from the crushed limestone or granite in most training datasets. A model that does not account for aggregate-specific effects will underperform on dam projects where aggregate selection drives key design decisions.

Long-Term Property Prediction

Dam concrete is designed for 100-year service lives. Predicting 365-day strength, long-term creep, drying shrinkage, and durability performance over decades requires either long-term test data or physics-informed models that ML alone cannot provide. Hybrid approaches that combine ML with established cement hydration models (such as those in the NIST VCCTL framework) are a promising research direction but are not yet production-ready.

A Practical Framework for Dam Engineers

Given both the capabilities and limitations, here is how ML tools can be productively integrated into the mix design process for hydroelectric projects today.

Phase 1: Computational Screening (ML-Assisted)

Before any physical testing, use ML models to explore the design space. Define the target properties (strength class, maximum heat of hydration, workability range, SCM constraints based on local availability) and let the optimization algorithm generate candidate formulations.

For this phase:

  • BOxCrete or similar Bayesian optimization tools can screen hundreds of SCM blends computationally
  • XGBoost/CatBoost models can rank candidates by predicted strength at multiple ages
  • MOO algorithms (NSGA-II, MOPSO) can identify Pareto-optimal solutions balancing strength, heat, and cost

The output is a shortlist of 5 to 10 candidate formulations for physical testing, rather than a single “optimal” design. The model narrows the search; it does not replace engineering judgment about which candidates make physical sense given the project’s aggregate sources, SCM supply, and placement conditions.

Phase 2: Physical Validation (Standard Testing)

Every ML-recommended formulation must undergo full trial mixing and testing per IS 457, ACI 207, or project-specific specifications. This includes:

  • Fresh property testing (slump/Vebe, air content, unit weight, initial set)
  • Compressive strength at 7, 28, 90, and (where specified) 365 days
  • Heat of hydration measurement (semi-adiabatic calorimetry or isothermal)
  • Durability testing (ASTM C1567 for AAR, ASTM C666 for freeze-thaw, sulphate expansion)

The physical results then feed back into the ML model as project-specific training data, improving its predictions for subsequent mix iterations. This feedback loop is where the real value accumulates: each project builds a richer dataset that makes the model more reliable for the next project.

Phase 3: Production Monitoring (ML-Augmented)

Once a qualified mix is in production, ML tools shift from design to monitoring. Platforms like Giatec SmartMix track actual batch data against expected performance, flagging deviations in real time. If a cement shipment’s Blaine fineness changes, or if the fly ash loss-on-ignition varies between deliveries, the ML system can predict the effect on strength development before the concrete is placed.

This production monitoring capability is particularly valuable on large dam projects where concrete placement extends over years and material properties inevitably vary. The ML system provides early warning of mix performance shifts that might otherwise be detected only when 28-day cylinder results arrive weeks after placement.

What Dam Owners Should Do Now

The technology is not hypothetical. It is available, with open-source options requiring zero licensing cost. Three actions make sense immediately:

  1. Digitize historical mix data. If your organization has trial mix records from previous dam projects in paper format, convert them to structured databases. This data becomes training material for dam-specific ML models and grows in value over time.

  2. Run BOxCrete on your next trial mix programme. Download the open-source model, input your target properties and material constraints, and compare its recommended formulations against your engineer’s initial designs. Use it as a screening tool alongside conventional practice, not as a replacement.

  3. Specify digital data collection in new project QC plans. Ensure that all batch plant data, test results, and material certificates are captured in structured digital formats from day one. Every data point collected today strengthens the ML models available for tomorrow’s projects.

PCCI’s mix design practice is built on leadership experience spanning 4,000+ MW of hydroelectric capacity. ML tools do not replace that experience. They amplify it by expanding the design space an engineer can explore and by making the optimization process faster and more rigorous. The firms that start building dam-specific ML datasets now will have a significant advantage when these tools mature, and they are maturing fast.


To discuss how ML-augmented mix design can be integrated into your next hydroelectric project, contact PCCI’s consulting team.

Share this insight:

Frequently Asked Questions

Key Questions Answered

What is BOxCrete and how does it work for concrete mix design?
BOxCrete (Bayesian Optimization for Concrete) is an open-source AI model released by Meta in March 2026, developed in collaboration with the University of Illinois Urbana-Champaign and cement producer Amrize. It uses Gaussian Process regression to predict concrete strength development across multiple curing ages (1, 3, 5, 14, and 28 days) and enables multi-objective optimization that balances compressive strength against embodied carbon. The model was trained on a dataset of over 500 strength measurements from 123 mixtures (69 mortar and 54 concrete) and achieves an average R-squared of 0.94 with RMSE of 0.69 ksi. BOxCrete is built on Meta's Adaptive Experimentation (Ax) platform and is released under the MIT license on GitHub, meaning any organization can download, use, and modify it without licensing fees. In a real-world deployment at a data center in Rosemount, Minnesota, the BOxCrete-optimized mix reached full structural strength 43% faster than the original formula while reducing cracking risk by nearly 10%.
Can machine learning predict compressive strength of dam concrete accurately?
ML models achieve high accuracy for general concrete datasets, but their reliability for dam-specific concrete is unproven. CatBoost models achieve R-squared of 0.94 and RMSE of 2.7 MPa on standard datasets. For RCC specifically, CatBoost has reached R-squared of 0.983 in predicting compressive strength. However, these models are trained predominantly on concrete with ordinary Portland cement at typical dosages, tested at 28 days, with maximum aggregate sizes of 20 to 25 mm. Dam concrete operates in a fundamentally different parameter space: high SCM replacement (30 to 50% fly ash, 7 to 10% silica fume), low total cementitious content (150 to 200 kg per cubic metre), large maximum aggregate sizes (75 to 150 mm), and strength evaluation at 90 or 365 days. Until ML models are trained and validated on datasets that include these specific formulation ranges, their predictions for dam concrete should be treated as preliminary screening tools rather than design-grade outputs.
How does multi-objective optimization improve concrete mix design?
Traditional mix design optimizes for a single primary objective, usually compressive strength at a specified age, with other properties treated as constraints. Multi-objective optimization simultaneously balances competing objectives: strength, cost, embodied carbon, durability, and workability. Research published in 2024 and 2025 shows that Multi-Objective Genetic Algorithm approaches achieve savings of approximately 30% on CO2 emissions and 10 to 15% on material costs while maintaining target strength. For dam concrete, the relevant objectives include not just 28-day strength but also heat of hydration (directly related to thermal cracking risk), long-term strength development (90 and 365-day targets), resistance to alkali-aggregate reaction, and abrasion or cavitation resistance for hydraulic surfaces. ML models combined with optimization algorithms like NSGA-II (Non-dominated Sorting Genetic Algorithm) or particle swarm optimization can explore this multi-dimensional design space far more efficiently than manual trial-and-error, potentially identifying mix combinations that a human designer might not consider.
What ML tools are available for concrete mix design today?
Several tools are available at different levels of accessibility. BOxCrete (Meta/UIUC) is fully open-source under MIT license, available on GitHub, and uses Bayesian Optimization with Gaussian Process regression. It is the most accessible option for organizations willing to work with Python code. Giatec SmartMix is a commercial SaaS platform with an AI algorithm (Roxi) trained on data from over 7,500 construction projects. It provides mix optimization recommendations and has reduced cement usage by an average of 10 kg per mix in its first year. Quadrel is a SaaS platform for concrete mix management that partners with Meta's research. Beyond these purpose-built tools, general ML frameworks (scikit-learn, XGBoost, TensorFlow) are used extensively in research to build custom models. For dam engineers specifically, the practical reality is that no off-the-shelf tool is trained on mass concrete or RCC datasets. The most productive approach is to use open-source frameworks like BOxCrete as a starting point and supplement the training data with project-specific trial mix results.
How can ML help reduce cement content in dam concrete without sacrificing strength?
ML models excel at identifying the minimum cement content needed to achieve a target strength when SCMs are used as partial replacements. The UIUC and Meta collaboration demonstrated that AI-generated formulations can replace upwards of 70% of cement with fly ash and slag combinations while maintaining required strength. This works because ML models capture the non-linear interactions between cement content, SCM type and dosage, water-to-cementitious ratio, aggregate properties, and curing conditions that are difficult to predict with empirical formulas alone. For dam concrete, where cement content directly drives heat of hydration and therefore thermal cracking risk, finding the minimum viable cement content is a core design objective. PCCI's leadership has consistently advocated for cement optimization on hydroelectric projects, achieving significant reductions in cement content while exceeding target strengths. ML tools can accelerate this process by screening candidate formulations computationally before committing to physical trial mixes, but the final validation through standard testing per IS 457 and ACI 207 remains essential.
What are the barriers to adopting ML for mix design on Indian dam projects?
Four primary barriers exist. First, data availability: Indian dam projects have historically stored mix design and test data in paper-based formats rather than digital databases. ML models require structured datasets with consistent variables, and converting decades of paper records into usable training data is a significant undertaking. Second, aggregate variability: Himalayan and peninsular Indian aggregates vary dramatically in mineralogy, shape, gradation, and reactivity. A model trained on data from one project may not generalize to another using different aggregate sources. Third, regulatory acceptance: IS 456, IS 457, and project-specific specifications prescribe mix design methodologies (including minimum cement content, maximum water-to-cementitious ratio, and mandatory testing requirements) that do not currently accommodate ML-generated recommendations as a primary basis for design. ML outputs must be validated through conventional procedures. Fourth, workforce readiness: most dam site QC teams are experienced in conventional testing and mix adjustment but do not have training in data science or ML tools. Adoption requires either upskilling field engineers or integrating ML tools behind user-friendly interfaces that do not require programming expertise.
KS

About the Author

Kushal Sthapak

Co-Founder, PCCI

Kushal Sthapak co-founded PCCI combining four decades of inherited domain expertise in concrete technology with a focus on how emerging analytical and digital tools can improve project delivery for dam owners. He leads growth strategy, digital initiatives, and client engagement across South Asia.

Newsletter

Concrete Pulse

Stay ahead on concrete technology. Subscribe to our weekly newsletter. Field-tested insights on mass concrete, dam engineering, and QA/QC, delivered straight to your inbox.

Past Issues

Free. No spam. Unsubscribe anytime.

Talk to a concrete specialist within 24 hours.

Whether you're at pre-tender feasibility or mid-construction troubleshooting. Whether your project is in India, Bhutan, Nepal, or beyond.