Data-based Methods for the Screening and Design of Jet Fuels VT-Forschungsbericht 2024-02 Clemens Alexander Hall, M.Sc. Deutsches Zentrum für Luft- und Raumfahrt Institut für Verbrennungstechnik Stuttgart Herausgeber Deutsches Zentrum für Luft- und Raumfahrt Institut für Verbrennungstechnik Pfaffenwaldring 38-40 70569 Stuttgart Telefon (0 7 11) 68 62 - 3 08 Telefax (0 7 11) 68 62 - 5 78 Als Manuskript gedruckt. Abdruck oder sonstige Verwendung nur nach Absprache mit dem Institut gestattet D93, Stuttgart Data-based Methods for the Screening and Design of Jet Fuels A thesis accepted by the Faculty of Aerospace Engineering and Geodesy of the University of Stuttgart in partial fulfillment of the requirements for the degree of Doctor of Engineering Sciences (Dr.-Ing.) by Clemens Alexander Hall, M. Sc. born in Villingen-Schwenningen Main referee: Co-referee: Prof. Dr.-Ing. Manfred Aigner Prof. Alexander Mitsos PhD Date of defense: 23.05.2024 Institute of Combustion Technology for Aerospace Engineering University of Stuttgart 2024 Danksagung Die vorliegende Arbeit entstand im Rahmen meiner Tätigkeit als wissenschaftlicher Mitarbeiter am Institut für Verbrennungstechnik des Deutschen Zentrums für Luft- und Raumfahrt e.V. (DLR) in Stuttgart. Mein erster Dank gilt daher meinem ehemaligen Institutsleiter und Hauptberichter dieser Arbeit, Herrn Prof. Dr.- Ing. Manfred Aigner für die Möglichkeit zur Durchführung dieser Arbeit und die exzellenten Arbeitsbedingungen am Institut. Herrn Prof. Alexander Mitos PhD danke ich für die Übernahme des Mitberichts sowie für das große Interesse an meiner Arbeit. Ein besonderer Dank gebührt meinen Betreuern Dr.-Ing. Bastian Rauch und Dr.-IngUwe Bauder für ihre fachliche Unterstützung und die intensiven, stets produktiven Diskussionen, die maßgeblich zu den vorliegenden Ergebnissen beigetragen haben. Ebenso danke ich Dr. Patrick Le Clercq als Abteilungsleiter für seine Unterstützung und die gewährten Freiräume. Die hervorragende Zusammenarbeit der Abteilungen Mehrphasenströmungen und Alternative Treibstoffe (MAT) sowie Chemische Kinetik und Analytik (CKA) im Rahmen meines Promotionsvorhabens möchte ich ausdrücklich hervorheben. Mein Dank gilt hier insbesondere den Abteilungsleitern Markus Köhler und erneut Dr. Patrick Le Clercq. Den Kolleginnen und Kollegen der Abteilung MAT danke ich für die großartige Arbeitsatmosphäre und die vielen gemeinsamen Erlebnisse, die über den Arbeitsalltag hinausgingen. Allen Mitarbeiterinnen und Mitarbeitern des Instituts bin ich sehr dankbar für die intensive fachliche Unterstützung und die kollegiale Zusammenarbeit. Exemplarisch möchte ich hier Stephan Ruoff, Florian Pütz und Georg Eckel nennen. Abschließend danke ich von ganzem Herzen meiner Familie und meiner Freundin für ihren Rückhalt und die stetige Unterstützung, insbesondere in der Schlussphase dieser Arbeit. Stuttgart, Juni 2024 Contents Contents ........................................................................................................................................ i List of Figures ............................................................................................................................. iv List of Tables .............................................................................................................................. ix Nomenclature .............................................................................................................................. xi Abstract ..................................................................................................................................... xv Kurzfassung ............................................................................................................................. xvii 1 Introduction .......................................................................................................................... 1 1.1 Motivation ................................................................................................................... 1 1.2 Sustainable Aviation Fuel development and approval ................................................. 4 1.2.1 Jet fuel specifications and synthetic aviation fuel approval process ........................ 4 1.2.2 Jet fuel prescreening ................................................................................................ 7 1.2.3 Challenges for the prescreening concept implementation ......................................... 9 1.3 Objectives and research questions .............................................................................. 13 1.4 Chapter outline .......................................................................................................... 14 2 Fuel Property Modeling ...................................................................................................... 15 2.1 Principles of data-based modeling methods ............................................................... 18 2.2 Fuel property modeling methods ............................................................................... 20 2.2.1 QSPR with sampling method ................................................................................ 20 2.2.2 Direct Correlation method ..................................................................................... 24 2.2.3 Mean Quantitative Structure-Property Relationship method ................................ 25 2.3 Probabilistic Machine Learning correlation models ................................................... 26 2.3.1 Working principles of Artificial Neural Networks .................................................. 27 2.3.2 Working principle of Monte-Carlo Dropout Neural Networks ............................... 29 2.4 Model development and validation ............................................................................ 30 2.4.1 Training and validation ......................................................................................... 31 2.4.2 Hyperparameter optimization ................................................................................ 31 2.5 Predictive capability assessment methods for models ................................................ 34 2.5.1 Predictive capability metrics .................................................................................. 35 2.5.2 Example for the predictive capability assessment .................................................. 38 3 Composition and Property Database .................................................................................. 40 ii Contents 3.1 Data collection ........................................................................................................... 40 3.2 Data preprocessing and outlier detection ................................................................... 43 3.3 Data characterization ................................................................................................. 44 3.3.1 Data characterization of fuels ................................................................................ 44 3.3.2 Data characterization of pure compounds .............................................................. 49 4 Predictive Capability Assessment of Models and Adequacy Assessment for Fuel Screening 53 4.1 Part 1: Predictive capability assessment of models ................................................... 54 4.1.1 Density ................................................................................................................... 55 4.1.2 Surface tension ....................................................................................................... 57 4.1.3 Net heat of combustion .......................................................................................... 58 4.1.4 Kinematic viscosity ................................................................................................ 60 4.1.5 Flash point ............................................................................................................. 62 4.1.6 Freezing point ........................................................................................................ 64 4.1.7 Cetane number ....................................................................................................... 65 4.1.8 Distillation line ...................................................................................................... 67 4.2 Part 2: Adequacy assessment of models for fuel screening ......................................... 70 4.3 Summary and conclusion ........................................................................................... 76 5 Development of Fuel Design Tools ..................................................................................... 79 5.1 Molecular descriptors ................................................................................................. 81 5.2 Property correlation metrics ...................................................................................... 83 5.3 Investigation of fuel component structure-property relations .................................... 84 5.3.1 Density ................................................................................................................... 85 5.3.2 Surface tension ....................................................................................................... 90 5.3.3 Kinematic viscosity ................................................................................................ 93 5.3.4 Net heat of combustion .......................................................................................... 97 5.3.5 Flash point ........................................................................................................... 100 5.3.6 Freezing point ...................................................................................................... 103 5.3.7 Cetane number ..................................................................................................... 106 5.3.8 Boiling point ........................................................................................................ 109 5.3.9 Yield sooting index .............................................................................................. 112 5.4 Summary table of the structure-property correlations ............................................. 115 5.4.1 Usage example of the structure-property correlations .......................................... 117 Contents iii 5.5 Summary and conclusion ......................................................................................... 119 6 Fuel Design and Blending Analysis .................................................................................. 121 6.1 Prescreening of the untreated jet fuel candidate ...................................................... 121 6.2 Fuel design and prescreening of fuel variants .......................................................... 125 6.3 Blending study of fuel variant ................................................................................. 132 6.3.1 Case 1: blending study without uncertainty consideration .................................. 134 6.3.2 Case 2: blending study with uncertainty consideration ....................................... 136 6.4 Summary and conclusion ......................................................................................... 139 7 Summary and Outlook ..................................................................................................... 141 7.1 Summary .................................................................................................................. 141 7.2 Outlook .................................................................................................................... 144 References ................................................................................................................................ 146 A. Descriptions of Jet Fuel Screening Properties ..................................................................... 159 B. Approved and Pending Jet Fuel Production Routes ........................................................... 162 C. Utilized Structural Molecular Features ............................................................................... 166 D. Fuel Database Schema ........................................................................................................ 169 E. Cross-validation Results of Models for Training and Testing ............................................. 170 F. Reference Models ................................................................................................................. 184 G. Pure Compound Descriptor Plots ....................................................................................... 189 iv List of Figures List of Figures Figure 1.1: Contribution of measures for reducing prognosed international aviation net CO2 emissions [6]. ................................................................................................................................ 1 Figure 1.2: Restriction of possible fuel composition after ASTM D1655 for conventional crude oil-based jet fuels [29]. .................................................................................................................. 5 Figure 1.3: Flow diagram of the approval process of a new aviation turbine fuel after ASTM D4054 [24]. ................................................................................................................................... 6 Figure 1.4: Schematic illustration of the screening plots for Fuel A and Fuel B as part of the jet fuel prescreening. .......................................................................................................................... 9 Figure 1.5: Plot GCxGC measurement of conventional Jet A-1 fuel (upper left), FT-SPK (upper right), HEFA-SPK (middle left), ATJ-SPK (middle right) IH2 fuel (lower left). .......... 10 Figure 1.6 GCxGC measurement signal of Jet-A fuel [42]. ........................................................ 11 Figure 2.1: Family tree of approaches for the modeling of fuels. ............................................... 15 Figure 2.2: Spectrum of modeling approaches from physical models and empirical models to data-based models. ..................................................................................................................... 18 Figure 2.3: Quantified molecular features of 2,3-hydro-2-methly-1h-idene, number behind SMART key shows count of molecular feature[76]. ................................................................... 22 Figure 2.4: Schematic illustration of QSPR sampling modeling method from quantification of the molecular structure of components to the property estimation of the fuel. ......................... 22 Figure 2.5: Schematic illustration of direct correlation modeling method. ................................. 24 Figure 2.6: Schematic illustration of pseudo mean quantitative structure estimation of a jet fuel using GCxGC measurement and a mean occurrence matrix calculated from averaging structural features of possible isomers. ....................................................................................... 26 Figure 2.7: Schematic figures of artificial neural network neuron (left), connected artificial neural network (right) ................................................................................................................ 27 Figure 2.8 Schematic representation of MCNN with dropout functionality during prediction. Network neurons are deactivated randomly (gray) to generate a distribution of prediction values. ........................................................................................................................................ 30 Figure 2.9: Schematic illustration of a cross-validation. ............................................................ 31 Figure 2.10: Schematic workflow of the utilized hyperparameter optimization with cross- validation. .................................................................................................................................. 32 Figure 2.11: Schematic representation of application and validation domain of a model, adapted from [57]. .................................................................................................................................... 35 Figure 2.12: Schematic illustration of predictive capability aspects accuracy, validity and precision for probabilistic models. .............................................................................................. 35 Figure 2.13: Schematic predicted distribution of a probabilistic model with mean prediction y and lower and upper prediction intervals yl, yu for a confidence level of 95 % with the associated risk and certainty. ..................................................................................................... 36 Figure 2.14: Schematic illustration of unity plot for validation of probabilistic model. ............. 38 List of Figures v Figure 2.15: Schematic illustration of the screening plots for Fuel A and Fuel B as part of the jet fuel prescreening. .................................................................................................................. 39 Figure 3.1: Number of measurements vs temperature for density (upper left) kinematic viscosity (upper right) and surface tension (lower left). ............................................................ 42 Figure 3.2: Evaluated GCxGC measurement of a conventional jet fuel with representative molecules for each family. .......................................................................................................... 45 Figure 3.3: Overview of current jet fuel production processes, extended from Blakey et al. 2011 [101]. .......................................................................................................................................... 47 Figure 3.4: Scatter plot of GCxGC measurements with summed hydrocarbon families. Blue: conventional fuels. Green: synthetic fuels and blends. Blue and green shaded areas indicate the observed range. .......................................................................................................................... 48 Figure 3.5: Scatter plot visualizing the compositional similarity of fuels and fuel components based on the dimensional reduced representation of their respective mean quantitative structure and quantitative structure representation. ................................................................. 51 Figure 4.1: Validation results of the density prediction. Results for conventional fuels are displayed in blue, results of synthetic fuels in green. ................................................................. 56 Figure 4.2: Validation results of the surface tension in air prediction. Results for conventional fuels are displayed in blue, results of synthetic fuels in green. ................................................... 57 Figure 4.3: Validation results of the net heat prediction. Results for conventional fuels are displayed in blue, results of synthetic fuels in green. ................................................................. 59 Figure 4.4: Validation results of the kinematic viscosity prediction. Results for conventional fuels are displayed in blue, results of synthetic fuels in green. ................................................... 60 Figure 4.5: Validation results of the flash point prediction. Results for conventional fuels are displayed in blue, results of synthetic fuels in green. ................................................................. 63 Figure 4.6: Validation results of the freezing point prediction. Results for conventional fuels are displayed in blue, results of synthetic fuels in green. ................................................................. 64 Figure 4.7: Validation results of the cetane number prediction. Results for conventional fuels are displayed in blue, results of synthetic fuels in green. ........................................................... 66 Figure 4.8: Validation results of the distillation with 10 vol% evaporated volume prediction. Results for conventional fuels are displayed in blue, results of synthetic fuels in green. ........... 68 Figure 4.9: Validation results of the distillation with 50 vol% evaporated volume prediction. Results for conventional fuels are displayed in blue, results of synthetic fuels in green. ........... 68 Figure 4.10: Validation results of the distillation with 90 vol% evaporated volume prediction. Results for conventional fuels are displayed in blue, results of synthetic fuels in green. ........... 68 Figure 4.11: Composition plots of jet fuels used for Tier α prescreening: conventional oil-based fuel (upper left), SAF produced by the Fischer-Tropsch process (upper right) and SAF produced by Alcohol-to-Jet process (lower left). ........................................................................ 71 Figure 4.12: Results of Tier α prescreening: Jet A-1 fuel (left row), FT SPK (middle) and ATJ SPK (right). DC model predictions are displayed in blue, of the M-QSPR model in green and the QSPR sampling model in purple. ......................................................................................... 72 Figure 4.13: Predicted mean values for iso-alkanes isomers with a carbon number of 12 and 16 for the flash point (left), freezing point (middle) and cetane number (right). ........................... 74 vi List of Figures Figure 4.14: Results of Tier α prescreening for ATJ SPK with constrained isomer selection, predictions of the QSPR sampling model. ................................................................................. 75 Figure 5.1: Molecular structure of isopentane (left) and its calculated adjacency matrix (right). ................................................................................................................................................... 82 Figure 5.2: Density values at 15 °C of the hydrocarbon families over the carbon number nC. . 85 Figure 5.3: Density values at 15 °C of n-alkanes and iso-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b) and partial positive surface area PPSA (c). ... 87 Figure 5.4: Density values at 15 °C of mono-cyclo-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b) and number of ring atoms nR (c). .............................. 87 Figure 5.5: Density values at 15 °C of mono-aromatics over molecular descriptors: carbon number nC (a), branching index ηB (b) and partial positive surface area PPSA (c). ............... 87 Figure 5.6: Surface tension values at 22 °C of the hydrocarbon families over the carbon number nC. ............................................................................................................................................. 90 Figure 5.7: Surface tension values of n-alkanes and iso-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b). ....................................................................... 92 Figure 5.8: Surface tension values of mono-cyclo-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b). ................................................................................... 92 Figure 5.9: Surface tension values of mono-aromatics over molecular descriptors: carbon number nC (a), branching index ηB (b). ................................................................................... 92 Figure 5.10: Kinematic viscosity values at 0 °C of the hydrocarbon families over the carbon number nC. ................................................................................................................................ 94 Figure 5.11: Kinematic viscosity values of n-alkanes and iso-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b) and partial positive surface area PPSA (c). ... 96 Figure 5.12: Kinematic viscosity values of mono-cyclo-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b) and partial positive surface area PPSA (c). ... 96 Figure 5.13: Kinematic viscosity values of mono-aromatics over molecular descriptors: carbon number nC (a), branching index ηB (b) and partial positive surface area PPSA (c). ............... 96 Figure 5.14: Net heat of combustion values of the hydrocarbon families over the carbon number nC. ............................................................................................................................................. 97 Figure 5.15: Net heat of combustion values of n-alkanes and iso-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b). .................................................... 99 Figure 5.16: Net heat of combustion values of mono-cyclo-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b) and number of ring atoms nR (c). .................. 99 Figure 5.17: Net heat of combustion values of mono-aromatics over molecular descriptors: carbon number nC (a), branching index ηB (b). ....................................................................... 99 Figure 5.18: Flash point values of the hydrocarbon families over the carbon number nC. ...... 100 Figure 5.19: Flash point values of n-alkanes and iso-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b). ................................................................................. 102 Figure 5.20: Flash point values of mono-cyclo-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b). ................................................................................. 102 Figure 5.21: Flash point values of mono-aromatics over molecular descriptors: carbon number nC (a), branching index ηB (b). .............................................................................................. 102 List of Figures vii Figure 5.22: Freezing point values of the hydrocarbon families over the carbon number nC.. 103 Figure 5.23: Freezing point values of n-alkanes and iso-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b). ..................................................................... 105 Figure 5.24: Freezing point values of mono-cyclo-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b). ................................................................................. 105 Figure 5.25: Freezing point values of mono-aromatics over molecular descriptors: carbon number nC (a), branching index ηB (b). ................................................................................. 105 Figure 5.26: Cetane number values of the hydrocarbon families over the carbon number nC. 106 Figure 5.27: Cetane number values of n-alkanes and iso-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b). ..................................................................... 108 Figure 5.28: Cetane number values of mono-cyclo-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b).. ................................................................................ 108 Figure 5.29: Cetane number values of mono-aromatics over molecular descriptors: carbon number nC (a), branching index ηB (b). ................................................................................. 108 Figure 5.30: Boiling point values of the hydrocarbon families over the carbon number nC. ... 109 Figure 5.31: Boiling point values of n-alkanes and iso-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b). ..................................................................... 111 Figure 5.32: Boiling point values of mono-cyclo-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b) and number of ring atoms nR (c). ............................ 111 Figure 5.33: Boiling point values of mono-aromatics over molecular descriptors: carbon number nC (a), branching index ηB (b). .............................................................................................. 111 Figure 5.34: Yield sooting index values of the hydrocarbon families over the carbon number nC.. .......................................................................................................................................... 112 Figure 5.35: Yield sooting index values of n-alkanes and iso-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b). ..................................................................... 114 Figure 5.36: Yield sooting index values of mono-cyclo-alkanes over molecular descriptors: carbon number nC (a), branching index ηB (b). ..................................................................... 114 Figure 5.37: Yield sooting index values of mono-aromatics over molecular descriptors: carbon number nC (a), branching index ηB (b). ................................................................................. 115 Figure 6.1: Plot of composition of the untreated jet fuel candidate. ........................................ 121 Figure 6.2: Results of the jet Tier α jet fuel prescreening for the untreated jet fuel candidate. ................................................................................................................................................. 124 Figure 6.3: Conversion curves for hydrocracking of n-tetradecane, n-pentadecane, n- heptadecane and n-octadecane after Weitkamp [143] and Coonradt und Garwood [144]. ....... 126 Figure 6.4: Composition plots of the untreated fuel candidate (upper left), hydroisomerized variant (upper right), hydrocracked fuel variant (lower left) and hydrocracked and distillated variant (lower right). ............................................................................................................... 127 Figure 6.5: Results of the jet Tier α jet fuel prescreening for the three fuel variants: hydroisomerized (left), hydrocracked (middle) hydrocracked and distillated (right). .............. 129 Figure 6.6: Results of deterministic blending analysis, left: property plot with property ranges of fuel variant and blends, right: maximum blending fraction (top) of fuel variant, reduction of yield sooting index of blends (middle), CO2 reduction of blends (bottom). ............................. 135 viii List of Figures Figure 6.7: Results of blending analysis under consideration of uncertainties, left: parallel line property plot with property ranges of fuel variant and blends, right: maximum blending fraction (top) of fuel variant, reduction of yield sooting index of blends (middle), CO2 reduction of blends (bottom). .................................................................................................................. 137 Figure 6.8: Number of properties preventing blends for the blending study under consideration of uncertainties. ........................................................................................................................ 138 List of Tables ix List of Tables Table 1: Critical jet fuel properties for jet fuel prescreening. ....................................................... 8 Table 2: Utilized parameters for hyperparameter optimization of Monte-Carlo Neural Networks ................................................................................................................................................... 33 Table 3: Number of unique fuels and pure compounds #FL and corresponding datapoints #DP used for the training and validation of the models. ................................................................... 41 Table 4: Table of considered chemical families with the corresponding formula, structural criteria and an illustrated representative. .................................................................................. 46 Table 5: Comparison of the number of representative molecules available in the database (DB) and the number of theoretically possible molecules, calculated by MOLGEN (MG). ............... 50 Table 6: Reproducibilities of ASTM property measurement methods [106] ............................... 54 Table 7: Predictive capabilities of density models. .................................................................... 56 Table 8: Predictive capabilities of surface tension models. ........................................................ 58 Table 9: Predictive capabilities of net heat of combustion models. ........................................... 59 Table 10: Predictive capabilities of kinematic viscosity models. ................................................ 61 Table 11: Predictive capabilities of flash point models. ............................................................. 63 Table 12: Predictive capabilities of freezing point models. ........................................................ 65 Table 13: Predictive capabilities of cetane number models. ....................................................... 67 Table 14: Predictive capabilities of distillation models at 10 vol%. ........................................... 69 Table 15: Predictive capabilities of distillation models at 50 vol%. ........................................... 69 Table 16: Predictive capabilities of distillation models at 90 vol%. ........................................... 70 Table 17: Summary influence of different hydrocarbon families and molecular descriptors on the density at 15 °C, * indicates an annotation for the correlation, ∅ indicates insufficient data ... 88 Table 18: Summary influence of different hydrocarbon families and molecular descriptors on the surface tension at 22 °C, * indicates an annotation for the correlation, ∅ indicates insufficient data. ........................................................................................................................................... 93 Table 19: Summary influence of different hydrocarbon families and molecular descriptors on the kinematic viscosity at 0 °C, * indicates an annotation for the correlation, ∅ indicates insufficient data. ........................................................................................................................ 94 Table 20: Summary influence of different hydrocarbon families and molecular descriptors on the net heat of combustion, * indicates an annotation for the correlation, ∅ indicates insufficient data. ........................................................................................................................................... 98 Table 21: Summary influence of different hydrocarbon families and molecular descriptors on the flash point, * indicates an annotation for the correlation, ∅ indicates insufficient data. ......... 101 Table 22: Summary influence of different hydrocarbon families and molecular descriptors on the freezing point, * indicates an annotation for the correlation, ∅ indicates insufficient data. .... 104 Table 23: Summary influence of different hydrocarbon families and molecular descriptors on the cetane number, * indicates an annotation for the correlation, ∅ indicates insufficient data. ... 107 x List of Tables Table 24: Summary influence of different hydrocarbon families and molecular descriptors on the boiling point, * indicates an annotation for the correlation, ∅ indicates insufficient data. ...... 110 Table 25: Summary influence of different hydrocarbon families and molecular descriptors on the yield sooting index, * indicates an annotation for the correlation, ∅ indicates insufficient data. ................................................................................................................................................. 113 Table 26: Summary table for the relationship and influence of structural descriptors of fuel components on their physicochemical properties, average differences Δ are given in %. ......... 116 Table 27: Summary table of fuel design process operations ..................................................... 126 Table 28: Specification properties for jet fuels blends after ASTM D7566. ............................. 133 Nomenclature xi Nomenclature Acronyms Abbreviation Description ANN Artificial Neural Network ASTM American Society of Testing and Materials ATJ-SKA Alcohol-to-Jet Synthetic Kerosene with Aromatics ATJ-SPK Alcohol-to-Jet Synthetic Paraffinic Kerosene CAAFI Commercial Aviation Alternative Fuels Initiative CHJ Catalytic Hydrothermolysis Synthesized Kerosene CTM Continuous Thermodynamics Model CRC Coordinating Research Council CV Cross-validation DC Direct Correlation Method DCM Discrete Component Model DLR German Aerospace Center ECLIF Emission and Climate Impact of Alternative Fuels EU European Union FAA Federal Aviation Agency FT Fisher-Tropsch FT-SPK Fisher-Tropsch Hydroprocessed Synthetized Paraffinic Kerosene GC Group Contribution GCxGC Two-dimensional Gas Chromatography HDO-SAK Hydro-Deoxygenation Synthetic Kerosene HEFA-SPK Synthesized Paraffinic Kerosene from Hydroprocessed Esters and Fatty Acids HFP HEFA-SK High Freeze Point Hydroprocessed Esters and Fatty Acids Synthetic Kerosene HHC-SPK Hydroprocessed Hydrocarbons, Esters, and Fatty Acids Synthetic Paraffinic Kerosene IATA International Air Transport Association IFPEN French Institute of Petroleum IH2 Integrated Hydropyrolysis and Hydroconversion ICAO International Civil Aviation Organization ILUC Indirect Land Use and Change JETSCREEN Project for Jet Fuel Screening and Optimization JSON JavaScript Object Notation xii Nomenclature Abbreviation Description MCNN Monte-Carlo Dropout Neural Network M-QSPR Mean Quantitative Structure-Property Property Relationship NJFCP National Jet Fuel Combustion Project NOx Nitrogen Oxides OEM Original Equipment Manufacturer PI Prediction Interval QSPR Quantitative Structure-Property Relationship RMSE Root Mean Squared Error ReFGen Representative Fuel Generator SAF Sustainable Aviation Fuel SIP Synthesized Isoparaffins from Hydroprocessed Fermented Sugars SPK/A Synthesized Paraffinic Kerosene with Aromatics US United States UNIFAC Unified Functional Group Activity Theory VEM Valence Electron Mobile Environment XTL Electricity- or Solar-Radiation-to-Liquid Nomenclature xiii List of Symbols Greek symbols Symbol Unit Description 𝛼 - Atom core count 𝛽 - VEM environment count 𝛾 VEM vertex count ∆!" % Average change of property value by the change of the number of contained carbon atoms ∆#! % Average change of property value by the change of the branching index ∆$" % Average difference of property value of a hydrocarbon family to reference fuels 𝜂% - Topochemical atom index for branching 𝜂&'( - Local topochemical atom index 𝜂)&'( - Local topochemical atom index for an unbranched molecule 𝜂*&'( - Local topochemical atom index of a reference molecule containing only 𝜎-bonds 𝜈 mN/m Kinematic viscosity Latin symbols Symbol Unit Description 𝑎+, - Molecular adjacency matrix 𝑐𝑋𝐻0 - Number of aromatic carbon atoms connected only to other carbon atoms 𝑓- - Atomic vertex for non-hydrogen 𝜎-bonds 𝑓. - Atomic vertex for non-hydrogen 𝜋-bonds MAE Depends on physical property Mean Absolute Error MAOE Depends on physical property Mean Absolute Error of Outliers 𝑛𝐶 - Number of contained carbon atoms 𝑛𝑅 - Number of contained ring atoms NMPIW % Normalized Mean Prediction Interval Width xiv Nomenclature Symbol Unit Description PICP % Prediction Interval Coverage Probability 𝑟 - Pearson correlation coefficient 𝑃𝑃𝑆𝐴 - Partial Positive Surface Area 𝑆𝐴/0 - Sum of positive surface area 𝑤 kg/kg, mol/mol, vol/vol Mass, volume, or molar fraction Abstract xv Abstract To achieve climate neutrality in the aviation sector, research on new sustainable aviation fuels (SAF) is needed as the growing demand will exceed the production potential of established sustainable pathways. The focus is thereby not only on the exploration of sustainable feedstocks and the development of new production processes but also on the facilitation and acceleration of the whole fuel development process, from its conceptualization to its approval. The critical evaluation of a new production pathway guarantees the safe application and performance of a new fuel. The approval poses a major challenge for fuel producers, requiring a tremendous commitment of time, fuel volume and cost. Concepts that allow a fast-iterative, low-cost screening and design of new candidate fuels, to assess and optimize their chances for approval are thereby seen as key enablers. Established fuel screening concepts rely on model-based prediction, which, together with state- of-the-art compositional analytics, allow the fast assessment of SAF candidates from volumes as low as 5 mL. The design of new fuels, on the other hand, requires a comprehensive understanding of the composition of a jet fuel and properties considered critical for the fuel approval. This work describes the research and development of tools for the screening and design of jet fuels. Focusing on data-based methods, the tools are built from a database composed of both jet fuels and fuel components. It is thereby investigated whether and how data-based tools are able to support the screening and design of new SAF candidates and what their limitations are. For the jet fuel screening, three different modeling methods to predict physicochemical properties from compositional measurements are adapted and investigated: Direct correlation (DC), Mean Quantitative Structure-Property Relationship Modeling (M-QSPR) and Quantitative Structure- Property Relationship Modeling (QSPR) with sampling. All developed models are probabilistic, since the safety-relevant use case of jet fuel screening makes the consideration of uncertainties necessary. Rather than estimating one deterministic property value, probabilistic models estimate a distribution of values and with it the associated uncertainty. The predictive capabilities of the developed models are assessed using specially developed metrics and compared on the prediction of conventional and synthetic jet fuels. To put the developed models into reference, they are compared to established deterministic models from the literature. Identifying strengths and limitations of the different approaches, the models are applied to jet fuel screening to test their adequacy for the assessment of new SAF candidates. To support the design of new SAF candidates, the relationships between the fuel composition and critical physicochemical properties are investigated. The relationships are investigated on the basis of fuel components and the influence of their chemical families as well as the structural aspects size and the branching. Trends and relations are characterized with graphs and quantitative metrics that illustrate correlation and state the average value for a change in composition. xvi Abstract Both the developed models and design tools are applied to the use case of screening and then optimizing a real SAF candidate to maximize its chances for successful fuel approval. The SAF candidate and three optimized fuel variants with reformulated compositions are thereby screened to assess the most suitable production route. Afterwards, a blending analysis of the SAF candidate and the variants is conducted to estimate their maximum volume fraction in the mixture with representative conventional jet fuels, considering both the safety requirements as well as the potential reduction of CO2 and soot emissions. As potential next steps, this work identifies the need for advancements in the analytics of the fuel composition as well as the extension of the existing fuel property databases. The former would reduce the uncertainty in the property modeling, while the latter would increase both the predictive capability of the models and the understanding of the fuel property relations. Kurzfassung xvii Kurzfassung Der wachsende zivile Luftfahrtsektor und die begrenzte Steigerungsfähigkeit etablierter Produktionspfade für nachhaltige, synthetische Treibstoffe (SAF) erfordern intensive Forschung um das gesetzte Ziel der Klimaneutralität bis 2050 zu erreichen. Neben der Erforschung neuartiger Rohstoffe und Produktionsverfahren liegt der Fokus auf einer generellen Beschleunigung des gesamten Entwicklungsprozesses, von der initialen Treibstoffformulierung bis hin zur finalen Zulassung. Die Zulassung stellt für Treibstoffhersteller eine besondere Herausforderung dar, da sie enorme finanzielle und zeitliche Ressourcen sowie die Bereitstellung großer Treibstoffmengen erfordert. Innovative, kostengünstige Konzepte, die eine frühzeitige Bewertung und Optimierung von Treibstoffkandidaten anhand geringer Mengen ermöglichen, haben das Potenzial, den Entwicklungsprozess und die Zulassung signifikant zu beschleunigen. Die Diese neuen Bewertungskonzepte basieren auf einer Kombination moderner Treibstoffanalytik und modellbasierter Vorhersage kritischer Treibstoffeigenschaften und ermöglichen so die Bewertung des Kandidaten bereits ab einem Volumen von 5 mL. In der anschließenden Optimierung können die Treibstoffeigenschaften des Kandidaten durch gezielte Modifikation der Zusammensetzung verbessert werden, um die Chancen auf die eigentliche Zulassung zu erhöhen. Die Anforderungen an die Eigenschaftsmodelle und Designwerkzeuge sind hoch, da sie auch auf neuartige Treibstoffzusammensetzungen außerhalb des bisherigen Erfahrungsbereichs anwendbar sein müssen. Diese Arbeit untersucht das Potenzial und die Limitierungen datenbasierter Methoden als Werkzeuge für die beschriebene Treibstoffbewertung und das Treibstoffdesign. Unter Nutzung neuester Machine-Learning- Algorithmen und Datenbanken soll geklärt werden, ob und wie datenbasierte Methoden die frühe Phase der Treibstoffentwicklung und Zulassung unterstützen können. Für die Bewertung der Treibstoffe werden drei verschiedene Methoden zur Modellierung von acht kritischen Treibstoffeigenschaften auf Basis der Zusammensetzung entwickelt und untersucht: Direkte Korrelation (DC), Mean Quantitative Structure-Property Relationship Modeling (M-QSPR) und Quantitative Structure-Property Relationship Modeling (QSPR) mit Sampling. Alle drei Methoden greifen dabei auf probabilistische Modelle zurück, welche nicht nur einen deterministischen Wert pro Treibstoffeigenschaft vorhersagen, sondern einen möglichen Wertebereich abschätzen und so inhärente Unsicherheiten abbilden. Die Vorhersagefähigkeiten der entwickelten Modelle werden anhand eigens entwickelter Metriken sowohl für konventionelle als auch synthetische Treibstoffe bewertet und untereinander sowie mit etablierten deterministischen Modellen aus der Literatur verglichen. Die Eignung der Modelle für die eigentliche Bewertung von neuen Treibstoffkandidaten wird anschließend in einer simulierten Treibstoffbewertung von drei Kandidaten festgestellt. Für das Treibstoffdesign werden eigens Werkzeuge anhand von systematischen Untersuchungen der Beziehungen von Treibstoffzusammensetzung und den kritischen Eigenschaften erstellt. Der Einfluss der jeweiligen chemischen Familie, der Größe und der Topologie der Treibstoffkomponenten auf die xviii Kurzfassung Eigenschaften wird anhand von Grafiken und quantitativer Metriken untersucht und in Korrelationen erfasst. Die entwickelten Modelle und Designwerkzeuge wurden anschließend kombiniert, um einen Treibstoffkandidaten zu bewerten und zu optimieren und so dessen Chancen für die Zulassung zu maximieren. In einem ersten Schritt wurde hierbei der Treibstoffkandidat und die drei optimierten Varianten bewertet, um die Variante mit den größten Zulassungschancen zu ermitteln. Anschließend wird eine Mischungsanalyse der aussichtsreichsten Variante durchgeführt, um den maximalen Volumenanteil in Mischungen mit konventionellen Treibstoffen und die zu erwarten CO2 und Rußemission zu ermitteln. Im Rahmen dieser Arbeit wurden datenbasierte Methoden erfolgreich sowohl für die Bewertung als auch das Design von Treibstoffen entwickelt, untersucht und angewendet. Limitierungen wurden hierbei vor allem aufgrund von Unsicherheiten in den Zusammensetzungsmessungen und eingeschränkter Verfügbarkeit von Daten für das Training der Modelle und die Entwicklung der Designwerkzeuge festgestellt. Nächste mögliche Schritte sind somit weitere Forschung und Verbesserung der Treibstoffanalytik, sowie die Erweiterung der verfügbaren Datenbanken durch gezielt durchgeführte Messkampagnen. Ersteres würde die Unsicherheit in der Modellierung der Eigenschaften signifikant verringern, Zweiteres die Vorhersagefähigkeit der Modelle und die Verwendbarkeit und Aussagekraft der Designwerkzeuge verbessern. 1 Introduction 1 1 Introduction 1.1 Motivation The consequences of man-made climate change make an adaptation and realignment of the aviation industry inescapable. Politically set strategies like the “European Green Deal” of the European Union (EU) and the “Sustainable Aviation Fuel Grand Challenge” of the Government of the United States of America (US) foresee a need for an emission reduction in aviation of 90% [1] and 100 % [2] respectively, to achieve the goal of climate neutrality by 2050. As a globally growing industry, the aviation sector is expected to grow approximately 4% p.a. until 2050, depending on the region [3]. Hence, a rapid adaptation of alternative technology is necessary to establish a sustainable aviation industry. Recent reports by the Intergovernmental Panel on Climate Change (IPCC) clearly state that prompt actions are required to achieve the set emission reduction goals with technology that has high technical readiness and high chances of application at large scale [4]. This has been recognized by aviation associations like the International Civil Aviation Organization (ICAO). They rank the use of sustainable aviation jet fuel (SAF) as the technology with the highest technological readiness and the highest potential emission reduction for the aviation industry [5]. Figure 1.1 shows the potential contributions of measures for net CO2 reduction as part of the long-term high aspirational goal of ICAO from 2022 prognosed to 2050 [6]. According to ICAO's projections, sustainable aviation fuels (SAF) are expected to play a crucial role in reducing CO2 emissions from international aviation in the future. Despite the anticipated growth of the aviation industry, particularly in developing and emerging countries, the widespread adoption of SAF has the potential to decrease CO2 emissions below the levels seen during the 2021 COVID-19 pandemic low. Figure 1.1: Contribution of measures for reducing prognosed international aviation net CO2 emissions [6]. Sustainable Aviation Fuels Use Operational improvements Technological improvements 2 1.1 Motivation Apart from the reduction of the greenhouse effect of the CO2, the use of SAF has also the potential to reduce parts of the so-called non-CO2 climate effects, which result from contrails formed of emitted soot particulates. According to recent studies by Lee et al. [7] as well as Voigt et al. [8] and Faber et al. [9], the contribution of non-CO2 effects in aviation on the climate is larger than the one of CO2 emissions [8,9]. The emission of soot particles is strongly influenced by the jet fuel composition, with low aromatic SAF fuels showing significantly lower emissions with current technologies [8]. It is expected that the use of SAF and market-based measures like an emission trading system, will excel the potential reductions by improvements of the burner technology significantly, especially in later years [10]. The use of SAF on a large scale in the civil aviation industry is therefore a necessity to reach the emission reduction targets. The need for the large-scale application of SAF has been recognized by major political institutions, which have released legislative proposals for SAF use and emission reduction. The “ReFuelEU” aviation proposal of the European Parliament sets the minimum share of SAF to 2 % by 2025, 5 % by 2030, 32 % by 2040 and 63 % by 2050 in the “Fit for 55” concept for climate neutrality [11]. The US Government announced a significant increase of SAF production from currently 136’000 tons in 2020 [12] to 9.08 million tons by 2030 and 106 million tons by 2050 in their “Sustainable Aviation Fuel Grand Challenge”[2]. Currently, production rates of approved SAF technologies are only able to provide a fraction of the needed sustainable fuel with 200 000 tons, which corresponds to less than 0.1 % of worldwide jet fuel demand in 2019 [13]. Drastic increases in production are planned by companies like Neste, with a planned production of 1.5 million tons by 2023 [14], Shell with 2 million tons by 2025 [13] and World Energy with 5 million tons in 2024 [15]. However, the sufficient supply of SAF volumes required to achieve the set milestones for climate neutrality is highly uncertain, with 6.4 million tons required in 2025, 18.3 million tons in 2030 and 359.2 million tons by 2050, as recently estimated by the International Air Transport Association (IATA) in 2022 [16]. This becomes especially apparent, considering that current SAF production and the production planned until 2030 consist and will consist predominantly of bio-based SAF from feedstocks like rapeseed, soy, palm oil etc. [17]. These feedstocks are however not available in sufficient quantities without interfering with other industries, e.g. the food industry [18], or negatively impacting existing natural high carbon stocks though indirect land use and change (ILUC) [19]. ILUC summarizes the potential net release of CO2 from vegetation and soil when lands with high carbon stocks like forests and grasslands are converted to agricultural lands to compensate for the diversion of existing croplands to biofuel production. These biogenic production routes are therefore not expected to meet the rising demand of the growing aviation industry in the long term. The EU therefore increasingly supports the transition from food-based biofuels and fuels with high potential ILUC with the recast of the Renewable Energy Directive. Alternative feedstocks like lignocellulose, byproducts and wastes, as well as alternative non-biogenic production routes like 1 Introduction 3 Power-to-Liquid pathways [18] are thereby especially promoted. These production routes, however, have negligible market shares compared to biogenic routes or an overall small market readiness [5]. Further research and development of new SAF production routes is therefore needed, alongside the strongly growing SAF market. The focus is thereby not only on the identification of adequate feedstocks and the development of new production processes, but also on the facilitation and acceleration of the whole development process. From a laboratory concept, the process must be developed to industrial scale under compliance of the final product with the required approval protocol after ASTM D4054 [20], which guarantees the safe application of the produced fuel in the aviation industry. Historically, the approval of a new SAF production pathway alone can last up to several years and require multiple millions of dollars as well as hundreds of tons of fuel for the extensive testing [21]. Early production capabilities of a fuel candidate are however often on a laboratory scale and uncertain chances of success prevent additional investments for upscaling as well as the willingness to fund the required test program. Considering the given timeframe for the envisaged emission reduction in aviation, the process of designing a new jet fuel and optimizing it to pass the approval process has to be reduced to a minimum in order to meet the set goals of climate neutrality. Extensive research projects like the National Jet Fuel Combustions Program (NJFCP) funded by the US Government [22] and the project for Jet Fuel Screening and Optimization (JETSCREEN) [23] of the EU were initiated to facilitate and streamline the jet fuel approval process. Based on the findings of these research projects, Heyne and Rauch developed the concept of prescreening in 2020, which allows the assessment of new jet fuel candidates at an early stage of development with minimal cost and required fuel volume [24]. The prescreening assesses the chances of a jet fuel candidate to pass the approval process and gives fast feedback to the producer to redesign the composition and optimize the fuel accordingly. The concept thereby focuses on a few jet fuel properties that are regarded as particularly critical for the jet fuel approval by both the NJFCP and the JETSCREEN project. To reduce time, cost and required fuel volume for the measurements of these critical properties, the prescreening procedure utilizes predictive models, combined with modern analytical measurement methods. Together, these methods allow prediction of the critical properties from fuel volumes below 5 mL. The requirements for the models are high, since the predictions are expected to be comparable to property measurements and substitute them if not yet available. To meet the requirements, the predictions have to be accurate, highly reliable and reflect potential uncertainties for their risk-informed usage. Furthermore, the models need to adequately predict desired properties not only for the known range of jet fuel compositions but also for the compositions of new SAF that might significantly deviate from the known compositional range. Apart from the models, extensive knowledge about the relationship between the fuel composition and the desired critical properties is required to design a fuel and optimize its chances of passing the approval process. 4 1.2 Sustainable Aviation Fuel development and approval It is against this background that the following scope of this doctoral thesis is set, with the goal of developing highly accurate and reliable tools for the described use cases of screening and designing new SAF candidates and supporting their development at an early stage. In the future, these tools could be the basis for the screening and design of jet fuels under consideration of ecological aspects like the described non-CO2 effects, saving cost, time and fuel volume in the fuel approval process. 1.2 Sustainable Aviation Fuel development and approval The discrepancy between developing a new ecologically sustainable SAF production route under optimal economic conditions and ensuring its safe use is a serious challenge for fuel producers, the aviation industry and certification associations. Since 2008, seven unique production paths and with them seven SAF types have been developed by the fuel industry and certified by the American Society for Testing and Materials (ASTM) [25]. The approval by the ASTM is necessary for every newly developed production path and the corresponding fuel type. It guarantees the safe application of the fuel in the existing infrastructure of the aviation industry, from production, transport, storage, and handling to the operability in the aircraft. 1.2.1 Jet fuel specifications and synthetic aviation fuel approval process The ASTM (American Society for Testing and Materials) oversees three crucial specifications to which sustainable fuels or their conventional blending counterparts must adhere: D1655 [26], D7566 [25] and D4054. The standard practice ASTM D4054 and the specification for jet fuels containing synthesized hydrocarbons ASTM D7566 are relevant for SAF. ASTM D4054 describes the process for the approval of a new aviation turbine fuel, while ASTM D7566 holds the standards for aviation turbine fuel blends containing synthetic hydrocarbons. Each approved production path has an annex in ASTM D7566, which states the specifications for the respective fuel type, its production path and feedstock, their maximum blending fraction and specifications for the fuel blend itself. Blends that comply with the set specifications in the annexes of ASTM D7566 and the requirements for jet fuel blends are considered “drop-in fuels” that can directly be utilized in existing infrastructure and aircrafts. At the time of writing, ASTM D7566 states a maximum fraction of up to 10 % and 50 vol% for SAF blends, depending on the SAF type. ASTM D1655 holds the two major specifications for conventional jet fuel types civil aviation: Jet A, defined by the ASTM itself, and Jet A-1, defined by the Defense Standard 91-91 of the Ministry of Defense [27]. Besides Jet A and Jet A-1, specifications exist for further civil fuel types that are country-specific and play a minor role in the commercial aviation sector: TS-1 for Russia and the Commonwealth of Independent States and RP fuels for the Republic of China [28]. 1 Introduction 5 ASTM D1655 and ASTM D7566 Both ASTM D1655 and D7566 are performance specifications and do not explicitly define an allowed jet fuel composition or compositional range. They rather specify a combination of minimum and maximum requirements for physicochemical and performance properties and allow fractions of certain chemical families as well as trace compounds, e.g., antioxidants. In combination with the approved production routes, these specification requirements implicitly constrain the range of possible jet fuel compositions. To illustrate the compositional restriction as a result of the requirements of property and composition, Figure 1.2 shows a schematic ternary diagram after de Klerk for the resulting compositional range of conventional crude-oil based Jet A-1 fuel after ASTM D1655 [29]. The possible jet fuel composition and property constrained are thereby presented in a simplified schematic figure. The possible compositional range of Jet A-1, indicated in gray, is graphically restricted by the minimum and maximum requirements of the specification, e.g. the minimum aromatic content and the maximum freezing point. Figure 1.2: Restriction of possible fuel composition after ASTM D1655 for conventional crude oil-based jet fuels [29]. At the time of writing, ASTM D1655 and D7566 hold specification requirements that are classified into the following categories: composition, volatility, fluidity, combustion, corrosion, thermal stability, contaminates and additives. All properties and compositions of a fuel have to be measured with approved analytical methods that are also stated in the respective specification. 6 1.2 Sustainable Aviation Fuel development and approval ASTM D4054 ASTM D4054 describes the process for approving a new SAF production path and creating a respective specification that can be included as an annex in ASTM D7566. ASTM D4054 is based on the experiences from the approval processes by the British Ministry of Defense for the first synthetic jet fuel by the company Sasol in 2009 [30]. It was developed as a guide by original equipment manufacturers (OEM) of the aviation industry with the support of ASTM members and includes property and composition targets that are known to impact the performance of the turbine engines and fuel system [31]. The approval process consists of three parts: 1) Initial screening, 2) Follow-on testing and 3) Balloting and approval. The parts have to be successfully completed in sequential order to advance to the next. Figure 1.3 shows a schematic flow diagram of the approval process, with the two testing phases, the balloting and approval. Figure 1.3: Flow diagram of the approval process of a new aviation turbine fuel after ASTM D4054 [24]. The test programs of phases 1 and 2 are comprised of four tiers that have to be completed successfully, also in sequential order. A fuel is tested for its specification properties in Tier 1, followed by fit-for-purpose properties in Tier 2, component and rig tests in Tier 3 and finally engine tests in Tier 4. If a later test tier fails, there is a risk that the entire sequential testing process will have to start all over again. All required tests or compositional analyses require a substantial amount of volume from the fuel candidate. Tier 1 and Tier 2, in which predominantly physicochemical properties and the chemical composition are measured, demand 200 liters and 1 Introduction 7 around 50 000 US dollars testing cost. Tier 3 and 4 can demand between 100 000 to 450 000 liters and around 4 million US dollars in testing costs. Tier 3 and 4 investigate inter alia the spray characteristics, the ignition behavior, the cold start and lean blow-out, as well as the operability and performance of the fuel candidate [24,31]. The extent of the tests in Tier 3 are determined by engine OEMs based on the results of Tier 1 and Tier 2. Similarities in the chemical composition or the measured properties are thereby considered and influence the extent of testing in Tier 3 and Tier 4 and therefore the required fuel amounts [31]. After successfully passing the extensive test of phases, research reports are prepared and passed to the OEMs for their internal review. The report thereby states the results of the tests. The report of phase 2 furthermore has to give a detailed description of the way the production process will be controlled to ensure the same quality of the tested fuel and the fuel that will be produced in commercial quantities. In the review, OEMs, the Federal Aviation Agency (FAA) of the US Government and the ASTM decide if the new fuel candidate fits an existing annex in ASTM D7566 or if a new one has to be created [31]. The specification changes in phase 3 are the final gate of the approval process, in which the research report is balloted for comment and approval and the creation of a new annex in ASTM D7566. The balloting process allows diverse groups of stakeholders from other areas of the fuel and petroleum community to review the report and note concerns that might require additional measurements to be added in the new specification annex or stop the approval process entirely [31]. The complete ASTM D4054 approval process can take up to several years, requires a sustained commitment, millions of US dollars and up to hundreds of thousands of liters of fuel for testing [24,31]. Since fuel producers that seek approval can often not provide the necessary fuel amounts for testing, a “fast-track” approval process was added as annex 4 in ASTM D4054 in September of 2020. It reduces the approval to Tier 1 testing and selected tests from higher Tiers, a fast track research report reviewed by OEMs and FAA and the balloting and specification change [21,32]. New production paths approved after the fast-track process are however limited to a maximum blending fraction of 10 vol% [32]. 1.2.2 Jet fuel prescreening The need for an even faster, less fuel- and cost-intensive processes for the assessment and approval of a new SAF candidate was constituted by OEMs and ASTM in CAAFI 2014 [33]. Based on findings from the subsequent research projects NJFCP [22] and JETSCREEN [23], Heyne and Rauch developed a concept for an accelerated assessment process called jet fuel prescreening [24]. This concept makes it possible to assess the chances of a fuel passing the actual approval process at an early stage of development with minimal costs and fuel volume using model-based property predictions. [24]. Based on the results of the screening, a fuel producer can redesign the fuel composition to optimize it accordingly. The concept focuses on the assessment of the fuel composition and the evaluation of critical fuel properties, especially properties that influence 8 1.2 Sustainable Aviation Fuel development and approval operability and safety issues, which may not be directly exhibited until Tier 3 and Tier 4 of the approval process. Eight properties were identified as a short list by the research projects that have a critical impact on aircraft and engine and ground handling [24]. The eight properties are summarized in Table 1. Property Unit Dependency Min Max Density kg/m3 15 °C 775 840 Kinematic viscosity mm2/s -20 °C -40 °C 8 12 Surface mN/s 20 °C Net heat of combustion MJ/kg 42.8 Flash point °C 38 68 Freezing point °C -40 Derived cetane number - 30 Distillation line °C 10 vol% 150 205 50 vol% 165 229 90 vol% 190 262 100 vol% 300 𝑇12 − 𝑇32 10 𝑇42 − 𝑇32 40 Table 1: Critical jet fuel properties for jet fuel prescreening. With the exception of the cetane number, tests for all listed critical properties are part of the Tier 1 and Tier 2 test programs of part 1 of the ASTM D4054 approval process. Detailed descriptions of the properties and their importance for aircraft and engine and ground handling are given in individual paragraphs in the Supplementary Material A. To test composition and the outlined critical properties, the prescreening process provides two test tiers, Tier α and Tier β. Tier α is a screening based on the analyzed fuel composition with model-based property prediction and Tier β consists of experimental property measurements verifying predictions with particularly high uncertainties [24]. For Tier α, a fuel sample of just 5 mL is required. From this sample, the fuel composition is characterized using the analytical GCxGC method [24]. The model-based predictive models subsequently predict the outlined critical properties based on this compositional measurement. Tier β requires 150-500 mL of fuel, depending on the conducted tests. Based on the findings of Tier α and Tier β figures of merit for the performance in spray and engine operations, relevant for Tier 3 and Tier 4, can be estimated [24]. For a screened jet fuel candidate to have high chances of passing the actual ASTM D4054 test program, the estimated and measured properties of Tier α and Tier β should lie inside set specification limits of ASTM D4054 and ASTM D7566. If properties lie outside the specification 1 Introduction 9 limits, the jet fuel composition should be redesigned to meet the specification in the next iterations. Figure 1.4 shows an exemplary case for the screening a property for the two fuels A and B. While the property value of Fuel A lies inside the allowed value range, the property value of Fuel B lies below the lower specification limits, as indicated in red. The composition of Fuel B has therefore to be adjusted and screened again in another iteration. Ideally, the compositional redesign of Fuel B can be conducted virtually using simulative tools to further save time and fuel volume, resolving the need for cost- and time-intensive iterations of the process parameters. This requires comprehensive knowledge about the relationships between fuel composition and all relevant properties, as well as appropriate simulative tools of the production process. Figure 1.4: Schematic illustration of the screening plots for Fuel A and Fuel B as part of the jet fuel prescreening. 1.2.3 Challenges for the prescreening concept implementation Parts of the prescreening concept were implemented and tested in the scope of the JETSCREEN project to assess the availability and adequacy of predictive property models for the Tier α testing [23]. The assessment identified limitations and challenges in both the availability and adequacy of the models, which consequently limited the application of the prescreening concept [34]. The different limitations and challenges are explored in more detail in the following. Large variety of possible jet fuel compositions As outlined in Section 1.2.1, jet fuel specifications do not directly specify an allowed composition range but rather the limits for the possible value range of jet fuel properties. As a result, the compositions of jet fuel candidates that enter the screening process can differ drastically from the compositions of fuels from known and approved production routes. To illustrate the variation of the possible composition range, Figure 1.5 shows plots of the composition of four representative synthetic fuels from the DLR Jet Fuel Database: FT-SPK, HEFA-SPK, ATJ-SPK, IH2 and one conventional fuel, Jet A-1, as a reference. With the exception of the IH2 fuel, all fuels are already approved by ASTM. Detailed descriptions of the different approved and pending fuel types and their corresponding production paths are given in Supplementary Material B. 10 1.2 Sustainable Aviation Fuel development and approval Figure 1.5: Plot GCxGC measurement of conventional Jet A-1 fuel (upper left), FT-SPK (upper right), HEFA-SPK (middle left), ATJ-SPK (middle right) IH2 fuel (lower left). The comparison of the plots visualizes the drastic compositional differences between the fuel types and the known and established conventional Jet A-1 fuel on the upper left. While the Jet A-1 fuel shows a broad Gaussian-like distribution for all considered families over the number of contained carbon atoms, the compositions of the SAF fuels are dominated by one or two families with distinct distributions. For FT-SPK and HEFA-SPK the compositions are dominated by n- alkanes and iso-alkanes and for the IH2 by mono- and bi-cyclo-alkanes. The composition of the ATJ-SPK is made up almost entirely of two iso-alkanes with 12 and 16 carbon atoms. The wide variety of fuel compositions and the constant formulation of new candidates can pose a challenge to predictive models, as they may be confronted with fuels for which they have not been developed and validated. 1 Introduction 11 Limitations in compositional analytics of the fuel composition Both the compositional comparison of a new jet fuel candidate with already approved fuels and the property modeling of fuels require a compositional characterization of the candidate. There exist various analytical measurement methods that can be applied for the characterization of a jet fuel composition. ASTM D4054 lists both Mass Spectroscopy after ASTM D2425 [35] and High-Pressure Liquid Chromatography ASTM D6379 [36]. However, these methods only yield information about the cumulative fraction of compounds from the different hydrocarbon families. The accurate modeling of fuels requires more detailed information beyond the hydrocarbon family. Most modern laboratories use Two-dimensional Gas Chromatography (GCxGC) for the compositional analysis of jet fuels [24,37,38]. This measurement method uses two sequential gas chromatography columns for the separation of the fuel constituents with a subsequent mass spectroscopy or a flame ionization detector [39]. The two gas chromatography columns allow for a more precise identification of the fuel components, both by their chemical family and their number of carbon atoms they contain. However, the identification of the exact chemical component/isomer is currently not always possible for jet fuels due to the overlay of measurement signals [40,41]. Figure 1.6 illustrates this for an exemplary GCxGC measurement of conventional Jet A fuel. The colors in Figure 1.6 indicate the strength of the signal and the detected fraction of a fuel component, going from no signal (blue) to medium signal (green) to high signal (red). Figure 1.6 shows that signals lie in part very close to each other and can overlap. The clear classification of a signal and the identification of every individual component is therefore not possible. Figure 1.6 GCxGC measurement signal of Jet-A fuel [42]. The unidentified isomers can thereby have drastically different property values, which consequently affects the uncertainty in the property value of the fuel; e.g. for the freezing point, value differences of -110.15 °C to 12.6 °C are recorded in the created database for iso-alkanes containing 10 carbon atoms. These differences become increasingly significant if a fuel 12 1.2 Sustainable Aviation Fuel development and approval composition is dominated by one or two families or even distinct components, e.g. like for the ATJ-SPK. For the fuel prescreening, the models are required to reflect this uncertainty as part of their prediction. Furthermore, the predicted uncertainties have to be put into context with the limits of the fuel specification to illustrate the potential risk of accepting a prediction. Availability and adequacy of state-of-the-art fuel property models To apply predictive models for the Tier α prescreening, the models need to be 1) available 2) able to model the large possible composition space and reflect existing uncertainty, as well as 3) be able to predict adequate results. Models able to predict fuel properties on the basis of composition measurements have been investigated and developed since the 1950s [43]. In a recent publication, Vozka and Kilaz reviewed published fuel property models and compared them based on accuracy metrics provided by the respective authors [43]. The review lists possible models for six of the eight required properties able to predict on the basis of GCxGC measurements. All recommended models, e.g. by Shi et al. [40] and Vozka et al. [43] are deterministic data-based correlation models. These modeling methods directly correlated the GCxGC measurement, or averaged values of representative species, with the property returning one value. This means that uncertainties, e.g. due to unidentified isomers or other sources, cannot be reflected by the models and that the outlined prescreening requirements can therefore not be fulfilled. The provided accuracy metrics, in most cases averaged prediction errors, do furthermore not allow an estimation of the adequacy of the models for the application of prescreening. This is because the composition range of the fuels used for the publication and the calculation of the accuracy metric might not cover the composition range relevant for the screening. The ability of the models to predict adequate results is therefore highly uncertain. To assess the adequacy of the models, they must actually be tested on a representative selection of fuels relevant to prescreening, as shown in Figure 1.5. To actually assess the adequacy of available models, three different state-of-the-art property models were investigated in the scope of the JETSCREEN project for their ability to adequately predict for a selection of conventional and synthetic fuels [34]. The project compared the Representative Fuel Generator (ReFGen) model and the Quantitative Structure Property Relationship (QSPR) model of the French Institute of Petroleum (IFPEN) as well as the Continuous Thermodynamics Model (CTM) the German Aerospace Center (DLR) by Le Clercq [44]. Models for three of the eight properties were thereby available for the comparison. Likewise to the models of Vozka and Shi, these models are deterministic and approximate the fuel either by representative species like the ReFGen or QSPR model, or by fitted distributions for the hydrocarbon families like the CTM model. Inherent uncertainties of the GCxGC measurements could therefore not be reflected and the outlined requirements were therefore also not met. The assessment of the models showed in part significant deviations of up to 47 % from measurement data for the CTM model, especially for synthetic fuels [34]. The models were therefore rated inadequate for prescreening purposes. The deviations were explained by the simplified fuel 1 Introduction 13 representation of the models, which approximate the composition by only a few isomers and or distributions for the families, as well as their original fields of application, with the CTM model being mainly optimized for conventional fuels. This review illustrates the need for new property models, developed and tested specifically for the use case of jet fuel screening and design. Existing models do not fulfill the identified prescreening requirements and were found to be inadequate or tested with metrics that do not guarantee their predictive capability for the intended application. Following the increased development of data-based models, new modeling methods should be explored and created using newly developed Machine Learning algorithms. The new models should thereby be tested using predictive capability metrics that allow the assessment of their adequacy for the intended application of jet fuel prescreening. 1.3 Objectives and research questions As outlined over the course of the last chapter, the aviation industry and fuel producers are in need of linking concepts that accelerate and streamline the development and approval of new sustainable aviation fuels. Recent scientific work developed those concepts, like the prescreening concept introduced by Heyne and Rauch [24]. The prescreening concept itself however relies on predictive models and a comprehensive understanding of the relations between fuel composition and properties to optimize the fuel composition. This thesis investigates the question whether and how new data-based models are able to provide the tools for the outlined prescreening process. The research aims to develop both property models for predicting critical fuel properties and design tools to optimize fuel candidates for approval. The main objectives of this study are: • Development of models for the prediction of the critical jet fuel properties from GCxGC composition measurements under consideration of uncertainties • Development of an adequate database for the development and testing of the models • Development of predictive capability metrics to assess the adequacy of the models for the application of jet fuel prescreening • Development of tools for the jet fuel design based on the investigation of the relationships between fuel composition and the critical properties • Finally, the application of the developed tools for the screening and design of new jet fuel candidates to optimize their chances for approval To fulfill these objectives, this work focuses on the use of data-based methods both for the development of the models and the investigation of fuel composition and property relations. For the development of the necessary database, already existing data from different sources and databases is utilized. This reduces the need for own extensive measurement campaigns, and 14 1.4 Chapter outline allows the focus on different data-based modeling methods and correlation algorithms from the field of Machine Learning. Critical gaps in the utilized data as well as limitations that hinder the development of accurate and reliable tools are furthermore outlined, along with recommendations for future research. 1.4 Chapter outline To answer the research question and address the individual objectives, this work is structured into the following dedicated chapters. In Chapter 2 the theory and inner workings of the developed modeling methods are presented. The chapter furthermore holds sections for the model training and validation, as well as the developed predictive capability metrics and assessment process. The database for the development of the predictive models and the fuel design is described in Chapter 3. This chapter describes and characterizes the data and illustrates the utilized preprocessing and outlier detection. Chapter 4 holds the results of validation and adequacy assessment of the predictive models. The models are thereby compared with each other as well as with established models from the literature to relate their predictive capability with known modeling approaches. The adequacy of the models for the jet fuel screening is subsequently assessed based on a simulated prescreening of three fuels, which were excluded from the training and validation. In Chapter 5 the influence of structural aspects of fuel components on the considered properties is investigated. The influence of the chemical family, size and branching of the fuel components on the different properties is thereby summarized in tools, as basis for the subsequent fuel design. Chapter 6 applies all developed tools for a combined workflow of fuel screening and design for a real jet fuel candidate. The original jet fuel candidate and reformulated fuel variants, created as part of the fuel design, are thereby screened to assess their chances as potential applicants for the approval process. In a subsequent blending study the variant with the highest chances for approval is blended with a representative selection of conventional fuels to estimate their maximum blending fraction and potential as a synthetic blending component. Chapter 7 relates the results of this work to the set research question and objectives. Based on this discussion, possible next steps and recommendations for further research are suggested. 2 Fuel Property Modeling 15 2 Fuel Property Modeling The modeling of physicochemical properties has always been of great interest both for the scientific fuel community and the fuel industry. The ability to predict properties, e.g., the net heat of combustion of a fuel solely based on its composition reduces the need for respective measurements, thereby saving time and cost and allowing the assessment of fuels [24] or subsequent simulations of processes like evaporation [45,46]. Fundamentally, there exist two approaches of modeling a fuel: 1) modeling a fuel as a mixture of constituents and 2) modeling a fuel as an entity. The first approach describes a fuel as a mixture of more fundamental and underlying constituents that either exist in the fuel as components or are sufficiently representative for the fuel composition. The bulk property of the fuel is thereby calculated from the property values of the individual constituents using an adequate mixing rule. The second approach directly correlates a compositional measurement, e.g. a GCxGC measurement, or a chemometric measurement signal of the fuel with the physicochemical properties using a regression algorithm. In contrast to the first approach, this one does not rely on mixing rules. Over the years, research has produced several modeling methods for both of the two approaches. The development was thereby strongly coupled to the available compositional analytics and the availability of data for the development of the models and their desired applications. Figure 2.1 shows a schematic illustration of the family tree of the two approaches and their respective modeling methods, which will be explained in the following. Figure 2.1: Family tree of approaches for the modeling of fuels. For the mixture of constituents two major modeling methods emerged over the years. The method of modeling a fuel as a mixture of discrete components and the method of modeling a fuel as mixture of continuous distributions. The first method describes a fuel as a mixture of pure components, that have either been identified as fuel component, or are assumed to exist in the fuel and are sufficiently representative. The second method does not require the identification or assumption of individual fuel components. The fuel is rather modelled as a mixture of those continuous family distributions, where the distribution parameters are calculated from the 16 compositional measurement, e.g. the GCxGC measurement. Both methods estimate the bulk property of the fuel from the property values of the individual constituents. For models using the method of discrete components, the property values of the fuel components can be provided from predictions, like in the Discrete Component Model (DCM) of Le Clercq of the German Aerospace Center (DLR) [45,47] or from measurements as in the model of Yang et al. [48]. Mixture of continuous distribution models on the other hand, like the DLR Continuous Thermodynamics Model (CTM) of Le Clercq [44], predict the property for each family by an underlying correlation that relates the family distribution with the property [30]. The utilization of models from the two methods is often strongly restricted to the use case models were designed for (e.g. fluid dynamic simulation) as well as the available compositional analytics and validation data. For the use case of simulating complex physical phenomena e.g. evaporation, computational limitations often constrain the number of possible fuel constituents since each constituent requires its own set of equations for the mass balance. For discrete component models, this limits the number of fuel components to one representative compound per family and carbon number, as for the DCM model of Le Clercq, or even fewer if the fuel is approximated using a surrogate, e.g. the model of Bell [49]. CTM models are especially suitable for the study of complex physical phenomena like evaporation [46]. The approach of modeling a fuel as an entity stands in strong contrast to the presented methods of modeling a fuel as mixture of constituents. Both the chemometric and the direct correlation methods directly correlate the compositional measurement of a fuel as a whole, or the chemometrical measurement signal with the physicochemical properties. Information and assumptions about potential fuel components are not necessary. The first models of the entity approach were developed by Cookson et al. [50–54] in the 1980s. They followed the direct correlation method and correlated the mass or volume fraction of the identified hydrocarbon families using a multilinear regression algorithm. The fractions of the hydrocarbon families were determined using GC, nuclear magnetic resonance spectroscopy, and high-pressure liquid chromatography. Chemometric models for the application of jet fuels were firstly developed by Morris et al. for the application of the prediction of critical fuel properties from near-infrared absorption spectra for the US Navy [55,56]. Direct correlation methods have the distinct advantage of using evaluated measurements in a standardized format. The standardized format allows the utilization of measurements from multiple different laboratories for the training of the direct correlation models. For chemometric models, the standardization of measurement signals is very challenging, which often limits the usable data to one reference laboratory or one particular measurement apparatus. With the increasing use of GCxGC for the compositional analysis of fuels, direct correlation methods with GCxGC measurements as input were developed, inter alia by Shi et al. [40] and Vozka et al. [43]. The modeling methods described up to this point are all deterministic, meaning they predict one property value for a given fuel composition. This is inherent to the described modeling 2 Fuel Property Modeling 17 approaches, where the DCM model of LeClercq uses just one species to represent a family with a certain carbon number and the outlined direct correlation methods solely use deterministic correlation algorithms. Research conducted on jet fuel modeling and screening by the University of Dayton [48] and the JETSCREEN project [47] revealed, that deterministic modeling methods are not sufficient to adequately predict desired properties. The return of just one property value and the neglection of uncertainties which inherently exist e.g. due to unidentified isomers, proved to be insufficient. This is especially problematic for synthetic jet fuels, where differences in the properties are significant for isomers of a family at a certain carbon number and need to be reflected in the modeling. The necessity of the consideration of uncertainties induced by unidentified isomers in the GCxGC measurement was therefore directly adopted and implemented in the prescreening process by Heyne and Rauch [24]. To create a modeling method tailored for fuel screening, Yang et al. [48] developed a probabilistic discrete component model. This modeling method considers multiple possible isomers by sampling property values of isomers assumed to be present in the fuels from a measurement database using Monte-Carlo sampling [48]. This improved the predictive capability of the model by allowing the estimation of a possible value range of the fuel property, reflecting the inherent uncertainty of the modeling problem. However, the model of Yang et al. proved to strongly rely on the availability of property measurements of multiple isomers, which are often not available in current property databases. If the number of available measurements is too low or the set of available isomers is not representative, deviations and invalid uncertainty estimations can occur [48]. This thesis extends selected modeling methods of the previously outlined work, with the explicit goal to tailor these models to meet the unique needs of jet fuel screening and design processes. The developed models should be able to accurately and reliably predict properties and uncertainties for the prescreening process. Identified limitations of the outlined deterministic modeling methods and the dependence of the probabilistic method of Yang et al. on measurement data should thereby be overcome. For this, probabilistic models from both modeling approaches are developed to model jet fuels both as entities and as mixtures of constituents. From the two approaches, three different modeling methods are derived: 1) Monte-Carlo sampling of predicted fuel component properties, 2) direct probabilistic correlation and 3) Mean Quantitative Structure-Property Relationship (M-QSPR) modeling. The M-QSPR is a specifically developed hybrid method that has characteristics of both the 1) and the 2) modeling approach. It models a fuel as an entity, however it requires a selection of representative components. The method therefore sits between both approaches in the family tree of Figure 2.1. The three methods differ fundamentally from each other. This allows the comparison of their individual advantages, disadvantages and limitations. Furthermore, potential benefits of using multiple modeling methods simultaneously for the use cases of jet fuel screening and design can be investigated. As part of this work, the developed models are also compared with existing 18 2.1 Principles of data-based modeling methods models from literature to outline the benefits of probabilistic modeling, both with respect to accuracy and the additional value of the estimated uncertainty. 2.1 Principles of data-based modeling methods Data-based models have long been used in the field of fuel property modeling and most of the methods presented in the introduction of this chapter are data-based. This section outlines the principles of data-based modeling in comparison to physical modeling and elaborates the rationale for the use of data-based methods in the field of fuel property modelling. The differentiation of the physical and data-based modeling approach requires a comparison of the underlying modeling philosophies and procedures. On a fundamental level, all modeling approaches for physical applications have the same intention: the replication of an objective reality in order to simulate possible events as bases for present decisions, for which experience or measurements are missing [57]. However, the physical and data-based modeling approach differ significantly in the way they replicate objective reality. Physical modeling approaches rely on a combination of known physical theory and observations derived from measurements. The theory itself must be derived from the measurements themselves or be already available from previous evaluations. Based on both theory and measurement, a conceptual model is prepared by human analysis, often in the form of a mathematical formula. This mathematical formula can then be implemented as a computational model, validated and if the validation is successful, utilized to simulate the desired events. A schematic flow diagram of the physical modeling process is given on the left of Figure 2.2 [57]. Figure 2.2: Spectrum of modeling approaches from physical models and empirical models to data- based models. 2 Fuel Property Modeling 19 Many scientists consider this modeling philosophy and procedure the true way of scientific modeling. However, physical approaches rely on three necessary preconditions that make the modeling procedure possible in the first place: 1) A problem with human comprehensible complexity and a physical theory that either already exists or can be derived from the available measurement data or knowledge. 2) Measurements that allow the derivation of all important and influential features; and 3) a modeling problem that can be simulated with existing computational resources [57]. The three preconditions of cause influence each other and are themselves interdependent, e.g. measurement methods often depend on previous knowledge about the features of interest, which can only be derived if the problem itself is comprehensible for current human understanding. Also, if the necessary theory does not fully exist yet, the available measurement methods are not able to identify all influential features, or the problem exceeds the current human understanding. If this is the case, reality can be approximated using data-based approaches. Data-based approaches approximate, meaning they do not exactly simulate the underlying mechanisms of reality, they imitate them based on previously made observations using correlation algorithms [58]. They therefore do not rely on a full existing theory and measurements with all influential features, but rather try to approximate the problem with available formulas, data and computational resources. The conceptual data-based model is not primarily derived from human understanding and identified physical formulas but directly from the observed data [57]. The approach thereby assumes that the available measurement methods are able to capture data, which intrinsically provides enough variance and influential features that a sufficient conceptual and mathematical model can be derived from it. Data is thereby generally needed in greater amounts. Data-based models can be differentiated into empirical and Machine Learning models [57]. For empirical models, the conceptual model is derived from human analysis such as investigating the data using statistical analysis