The fate of the second vast group of molecules in the bio- and geosphere is governed by the rather fundamental restraints of thermodynamics and kinetics (Figs. 1 and 2). In these intricate materials, the “classical” signatures of the (geogenic or ultimately biogenic) precursor molecules, like lipids, glycans and proteins, have been attenuated [26, 27], often beyond recognition, during a succession of biotic and abiotic (e.g. photo- and redox chemistry) reactions. Because of this loss of a biochemical signature, these materials can be designated nonrepetitive complex systems. The quantity of molecules in the Earth’s crust that can be attributed to these nonrepetitive complex materials, in the form of kerogens and natural organic matter (NOM) alone, exceeds the quantity of functional biomolecules by several orders of magnitude [28, 29]. Examples include freshwater, marine, and soil organic matter, kerogens and aerosols, among others. These materials typically exhibit an extremely complex array of chemical structures and interactions across a large range of size- and timescales, resulting in molecular signatures that reflect the fundamentals of chemical binding rather than those of their precursors. These novel signatures may in fact cover a sizable proportion of the theoretically feasible molecular composition space (Fig. 9). This extraordinary heterogeneity of molecularly diverse species renders these materials refractory and also implies a limited probability of detecting identical molecules [18]. This contrasts sharply with even the most complex mixtures of biomolecules extracted from any living organism, from which molecularly pure fractions can be readily obtained.
Given these unique features, nonrepetitive complex systems epitomize supermixtures. The purification of a supermixture would, in the ultimate sense, approach a molecule-by-molecule separation—a feat beyond our reach, both conceptually and practically. Therefore, these complex nonrepetitive systems are operationally defined according to their properties rather than according to their chemical structures, and their purification (in the conventional sense of the word) remains elusive [18].
While the analysis of complex biomolecules has advanced to the degree that it is possible to obtain well-resolved three-dimensional molecular structures and even meaningful descriptions of dynamics and interactions [30–35], the molecular-level precision analysis of complex nonrepetitive materials remains rather rudimentary in comparison [20, 36–40]. First of all, theoretically well-founded approaches to numerically describe the complex, polydisperse and nonstoichiometric characteristics of nonrepetitive unknowns are missing at present, limiting our understanding of molecular structures and any application of quantitative structure–activity relationships (QSAR) when modelling their properties. Novel approaches suitable for a quantitative description of various hierarchical levels of molecular organisation (e.g. elements, fragments, molecules) must be developed. Secondly, a meaningful molecular-level analysis of nonrepetitive systems—such as aerosols, natural organic matter and native cell extracts—obviously cannot rely on target analysis, as most of the chemical environments and linkages present are simply not known (Fig. 3).
Consequently, any comparative analysis of nonrepetitive unknowns with reference materials is very unlikely to provide satisfactory molecular resolution, because rather tiny variations in chemical binding may strongly and often unpredictably affect the properties commonly used for detection, such as retention times and spectral signatures. These fundamental restrictions that are intrinsic to comparative and target analysis are not easily circumvented and they necessitate an independent, spectroscopic “bottom-up” approach to the molecular-resolution characterisation of these complex unknowns.
) is sufficient to elaborate meaningful detail at molecular resolution from the most complex biological and biogeochemical
mixtures
Bulk data of complex systems, like physical parameters, total acidity and elemental analyses, seem to be more precisely defined [42], but exhibit limited resolution. However, any sound structural model of these materials must conform to the constraints defined by these “hard” bulk data. High-energy methods of organic structural organic spectroscopy, like XANES, UV/VIS and infrared spectroscopy, exhibit intermediate structural resolution, which is sufficient, for example, for the characterisation of specific chemical environments [43]; for instance, functional group analysis (carbonyl derivatives, aromatics, heterocycles) in intricate materials.
In general, the degree of significant detail generated by a certain analytical technique will depend on both the intrinsic resolution of the respective method and the characteristics of the analysed material. Any inadequate relationship between the resolving power of the technique and intrinsic analyte properties will be wasteful. Investigations of near-featureless materials with methods of supreme resolution could result in unnecessary effort and expenditure. Insufficient resolution of any analytical method with respect to the properties of the analyte will inevitably result in intrinsic averaging, which typically results in poorly resolved properties (which affect the separation) and/or poorly resolved chemical environments (which affect the spectra). Intrinsic averaging is visualized in Fig. 4 in the form of images of ever-degrading resolution. Similarly, insufficient resolution deteriorates detail in spectra and chromatograms of complex nonrepetitive materials, producing low-resolution signatures and limited bandwidths of variance in bulk and spectral properties.
Hence, any organic structural spectroscopy with a limited peak capacity (Fig. 5) will inevitably lead to a summary bulk-type description of complex materials and considerable averaging, rather than to a meaningful molecular-level resolution analysis. In the case of NOM, this inevitable relationship has been observed in many spectroscopic, separation and chemical experiments, resulting in data with a remarkably limited bandwidth of variance, even when advanced techniques (e.g. at the level of one-dimensional solid-state 13C NMR spectroscopy) are used [44, 45].
Analogously, the widespread use of the idiom HULIS (or humic-like substances) in the fields of, for instance, aerosol and remediation research [46–53] reflects the operational definition of humic materials as well as our current inability to perform a meaningful molecular-level analysis of complex unknowns, as materials currently denoted HULIS or humic-like substances undoubtedly encompass a wide range of very different species.
Due to the huge peak capacity of FTICR mass spectrometry, FT mass spectra provide the most convincing direct experimental evidence for the extraordinary molecular diversity of complex materials at present. In these, the molecular-level intricacy of the complex unknowns is most adequately converted into very highly resolved and, consequently, extremely information-rich signatures.
Analogous considerations to those given here for spectroscopic characterisation also apply to the separation of complex materials [54, 55, 56].
The 13C NMR spectrum of Suwannee river fulvic acid (SuwFA) shown in Fig. 7 was acquired at the GSF with a Bruker (Bremen, Germany) AC 400 NMR spectrometer, operating at 100 MHz for 13C. FTICR mass spectra were acquired at Bruker’s facilities with a 9.4-T APEXq FT mass spectrometer (data in Fig. 9) and a 12-T APEXq FT mass spectrometer at the GSF (Fig. 8). Here, FTMS spectra were acquired with a time domain size of 1 MWord (Fig. 8a; Fig. 9, typical resolution 3 × 105) or 4 MWord (Fig. 8b, typical resolution 7 × 105). For Figs. 8a and 8b, elemental compositions were computed with the DataAnalysis software, version 3.4 (Bruker), using the following restrictions: C, H, N, O, unlimited; S, P, 0–5; H/C ratio < 3, mass error ≤ 0.5 ppm; observance of the nitrogen rule. Exactly one elemental formula was obtained for each peak. The elemental formulae of Fig. 9 were batch-calculated using a software tool written in-house, as described elsewhere [36].
The two most influential organic structural spectroscopic methods for the investigation of complex materials, which depend upon high-precision frequency measurements, are NMR spectroscopy and FTICR mass spectrometry (Table 1, Figs. 6, 7). In NMR, the precession frequencies of individual atomic nuclei in an external magnetic field B 0 are influenced by their respective chemical surroundings; in FTICR mass spectrometry, the orbital frequencies of ions in an ion trap cell depend on the mass and charge of the molecule of interest [59]. Both methods are isotope-specific, and the combination of NMR and FTICR mass spectral data provides more useful spectral information on complex unknowns at the molecular level than any other spectroscopic method at present.
|
Molecular-level resolution technique |
Advantages |
Current weaknesses and future developments |
|---|---|---|
|
NMR spectroscopy |
NMR spectroscopy provides isotope-specific information, in unsurpassed detail, on short-range molecular order (the arrangement of chemical bonds, including connectivities, stereochemistry and spatial proximity), dynamics [33, 74] and reactivity [35] |
Relative insensitivity compared with other analytical techniques |
|
Nondestructive and isotope-specific [32] analysis across almost the entire periodic table combines with the most accurately defined near-quantitative relationship between the spin number and the area of the NMR signal. This key feature of NMR when applied to the analysis of complex systems implies the use of NMR spectrometry as a quantitative reference for other, complementary analytical methods [75–81] |
Intricate physics and chemistry of intra- and intermolecular interactions in complex mixtures may interfere with the direct relationship between chemical shift and molecular structure and, because of relaxation-induced variable line widths, quantification |
|
|
The unique ability to generate and analyse data from multiple 1,2,3-D NMR experiments performed on a single sample enables the significance and authenticity of individual spectra to be assessed [82–84] |
Near-identical chemical shifts do not necessarily imply similar chemical structures [85] |
|
|
Extensive and far-reaching information can be obtained, even from ill-resolved NMR spectra, for “small” nuclei (e.g. 1H, 13C, 15N, 31P) because of the plausible correspondence between chemical shift and extended substructures |
Sensitivity and resolution increased by high-field magnets [86, 87], cryogenic [88] and micro- [89–92] probes, and by changing (to nanoliter) sample size [93, 94] |
|
|
Throughput increased by using fast higher dimensional spectroscopy with superior sensitivity and through the parallel acquisition of NMR spectra [95–99] |
||
|
FTICR mass spectrometry |
Best combination of spectral resolution and sensitivity, which allows miniaturisation [100] and hyphenation of mass spectrometry with high-performance separation techniques like capillary electrophoresis and UPLC (CE/UPLC with mass-selective detection) |
Molecular-level structural information is mainly restricted to ionizable compounds |
|
Due to its supreme mass accuracy and resolution, molecular formulae from thousands of compounds can be obtained in a single experiment directly from mixtures [63, 69, 101–104] |
Isomer differentiation is a nontrivial task [71] |
|
|
Fragmentation provides further molecular-level structural information beyond molecular composition |
Quantification is difficult, even for identical molecules in mixtures, because of the variable ionization efficiencies of individual compounds, which strongly depend on the experimental conditions and mixture composition |
|
|
Column adsorption and fractionation as well as electrochemical and redox reactions associated with the spray conditions may interfere with authentic sample representation |
||
|
A wide range of ionization techniques (electrospray, ESI; chemical ionization, CI; photoionization, PI; desorption ionization, DESI; field ionization, FI, among others, all performed in either positive or/and negative modes) for mixtures is available under specifically adapted conditions [105–110] |
||
|
Mass-selective imaging is feasible with high spatial and mass resolution; qTOF mass spectrometry allows for very fast scan rates, and is perfectly suited for hyphenation with high-performance separation techniques (CE and UPLC) as well as mass-selective imaging |
Further miniaturisation of separation and detection devices in conjunction with ultrahigh-resolution FTICR mass spectrometry will permit highly resolved and information-rich data to be obtained from tiny amounts of sample (chip-MS) [100] |
|
|
High-performance separation techniques (UPLC/HPLC and capillary electrophoresis) |
Large separation capacity and extensive miniaturisation; is cost-effective; can be highly automated |
Gives only limited structure-specific information about the short-range molecular order [111, 112] |
|
Electrophoretic mobility and chromatographic retention time carry structure-specific information, which can be adapted to a wide range of experimental conditions in order to probe size, shape, charge characteristics and reactivity |
||
|
Sensitive and versatile suite of separation methods and of structure-specific (and nondestructive) detection systems, such as (laser-induced) fluorescence, UV/VIS, radioisotope or mass-selective detection |
||
|
CE complements NMR information about primary chemical structures (covalent bonds) by providing data on the corresponding secondary and tertiary structure |
||
|
Feasibility of up-scaling from capillary zone electrophoresis (CZE) to a preparative level by means of free flow electrophoresis (FFE) and from UPLC to any preparative LC method |
Further miniaturisation offers hyphenation options down to single-cell analysis and compartments within |
Molecular-level resolution spectroscopic data represent projections of the vast total structural space of molecules, for which count estimates range from 1060 to 10200 [72]. The complementarity of NMR and mass spectrometry for the spectral characterisation of intricate materials is caused by the entirely different atomic and molecular processes these methods rely upon (Fig. 6).
Mass spectra reflect the isomer-filtered complement of the entire space of molecular structures. The compositional space of molecules can be probed with ultrahigh-resolution FTICR mass spectroscopy, resulting in single peaks for molecules (in the absence of fragmentation). Two-dimensional projections of the structural space, like van Krevelen diagrams and Kendrick mass defect analyses, are indispensable tools for the evaluation of mass spectra of complex materials (Figs. 8, 9, 10, 11) [73, 113, 114]. NMR spectra represent site- and isotope-specific projections of the molecular environments. Therefore, typical organic molecules exhibit single mass peaks (molecule ions) in mass spectra and more elaborate NMR signatures (Figs. 6 and 7). Because these atomic and molecular signatures are not entirely orthogonal, the data provided by NMR and MS exhibit correlations that can be used to reconstruct chemical structures by empirical and mathematical back-projection.
FTICR mass spectra show supreme resolution, as indicated by the 12-T negative ionization ESI FT mass spectra of a barley extract (Fig. 8a) and IHSS Suwannee River Natural Organic Matter (International Humic Substances Society NOM; Fig. 8b). Here, CnHmOq molecules contribute most to the total ion count. These molecules can be arranged into series, which are related by the formal exchange of CH4 against oxygen. Figure 8c denotes the mass peaks corresponding to the 37 theoretically possible and chemically reasonable C,H,O-compositions depicted in Fig. 8d that have a nominal mass of 301 Da. Note that negative M–H+ ions (i.e. [M−H+e]−) are observed in the FTICR mass spectra (Fig. 8b), and the C,H,O-compositions of molecules M are denoted in Fig. 8c. M and M–H+ differ in mass by one hydrogen (1.007825032 Da) minus an electron (0.000548625 Da); in Fig. 8c this difference is decomposed into a mass shift of one (see the shift between the mass axes) and an additional small mass spacing Δm = 0.000233878 Da. The molecules in the barley extract exhibit mass peaks outside of the range accessible for any C,H,O-composition (dotted purple box in Fig. 8a), indicating the presence of additional heteroatoms (e.g. N, P, S) in these ions.
Figure 8d denotes a van Krevelen diagram of the 37 chemically reasonable CnHmOq molecules, in which the 16 C,H,O-ions observed in Fig. 8b are highlighted. The number of peaks identified corresponds to a coverage of 43% of the entire C,H,O-compositional space. These ions occupy an area for which the largest number of feasible C,H,O-isomers is expected (see Fig. 11).
Molecularly intricate materials, like natural organic matter (NOM), exhibit molecular signatures approaching the theoretical limits defined by the laws of chemical binding. In Fig. 9, a van Krevelen diagram of Suwannee River fulvic acid (SuwFA) depicts the elemental ratios of CnHmOq ions (the ions shown represent a consolidation of the ions obtained by ESI, APCI and APPI positive and negative ionization from 9.4-T FTICR mass spectra; unpublished data). The peaks observed in the negative/positive ionization mode only are coloured green/orange; peaks observed in both positive and negative modes are depicted in black. The lack of signatures from biochemical precursor molecules [123] indicates the considerable level of processing typical of NOM. Within a mass range of 200–700 Da and the given limits of the H/C and O/C ratios, the minimum consolidated number of individual C,H,O-molecular compositions (4270) represents a sizable fraction (23%) of the entire feasible compositional space of CnHmOq molecules (18414 in total; small grey dots). To further appreciate the remarkable intricacy of natural organic matter, it should be noted that any dot in the van Krevelen diagrams of these complex materials represents a projection of the elemental ratios derived from assigned molecular formulae, irrespective of molecular mass. Hence, the dots in the van Krevelen diagrams can represent multiple molecular formulae (Figs. 10 and 11), while any identified molecular composition reflects an intrinsic superposition of all feasible isomers (Fig. 11). Considering typical molecular weights of several hundreds of Daltons in the mass spectra of NOM (Fig. 6, bottom panels), it is readily anticipated that the mass spectra of such systems represent simplified (e.g. isomer-filtered) projections of a still hugely more expansive structural space (Fig. 7).
For any exceedingly complex material, it is logical to postulate that many isomers will contribute to any given molecular formula. Analogously, the intensities of the mass spectral peaks, which superimpose all of the isomers present, will be a function of the abundances of these isomers in these materials and the ionization efficiency of each isomer under the given experimental conditions.
For molecules of a given mass composed of carbon, hydrogen, and oxygen, two major and independent trends are expected to define the number of feasible isomers. First, decreasing the H/C ratio from fully saturated molecules (CnH2n+2) means removing hydrogen atoms, which is equivalent to introducing double bonds or (ali)cyclic structures (double bond equivalents, DBEs). Molecules with large H/C ratios are structurally fairly uniform, consisting mainly of various branched chains of single bonds. Introducing large numbers of DBEs will lead to many new structures with double bonds and or (ali)cyclic structures in various positions. For an H/C ratio of close to one, on average two carbons carry one DBE, and the introduction of further DBEs will lead to a lack of single bonds. Hence, the maximum number of feasible C,H,O-isomers is expected to occur for intermediate numbers of DBEs in a molecule, because the occurrence of a DBE (which solely depends on the H/C ratio) enables double-bond displacement and the formation of (ali)cyclic structures, both of which greatly enlarge the number of feasible isomers. In contrast, only highly condensed structures can be assembled at very low H/C ratios [37], and this constraint severely diminishes the number of feasible C,H,O-isomers (if mathematically possible but chemically unlikely isomers are excluded; see Fig. 11).
Second, the insertion of oxygen into potentially any carbon–carbon (creating C–O–C units) or carbon–hydrogen bond (creating C–OH functionalities) will result in many more feasible isomers at low O/C ratios; in the presence of DBEs, “terminal” carbonyl derivatives (C=O) can also be constructed. At higher O/C ratios, however, further insertion of oxygen decreases the number of feasible isomers for two reasons: oxygen provides fewer (two) options for forming (single) bonds with other partners than carbon (four); in addition, the higher mass of oxygen (16 Da) compared with that of carbon (12 Da) decreases the total number of “heavy” atoms available for the construction of CnHmOq molecules of a given mass.
These considerations imply that the number of feasible C,H,O-isomers for a given mass will reach maximum values at intermediate H/C and O/C ratios, and that these numbers will (sharply) decline at extreme (high and low) H/C and O/C ratios, respectively.
These dependencies are displayed in a van Krevelen diagram (Fig. 11), in which the numbers of chemically relevant isomers for any given molecular composition CnHmOq of a single nominal IUPAC mass are provided. For any given nominal mass, the mathematically possible and chemically relevant structures composed solely of carbon, hydrogen and oxygen atoms can be constructively enumerated for each composition (molecular formula) [58]. By “chemically relevant isomers”, we mean all mathematically possible isomers (not counting stereoisomers) except for those containing O–O bonds, C≡C bonds, three- or four-membered rings, or =C= fragments (cumulated double bonds), which are not assumed to occur in the materials of interest (natural organic matter here). These data are displayed in the right panel of Fig. 11, where they are arranged according to actual mass.
For practical reasons, we have selected CnHmOq compositions with a nominal IUPAC mass of 178, for which the number of isomers can be computed within a reasonable time on a desktop computer; within the given limits of H/C and O/C elemental ratios, eleven feasible C,H,O-molecules are found, which are grouped into three series of isobaric molecules, related by a formal exchange of CH4 for oxygen (Fig. 11).
Series 1 represents highly unsaturated molecules in which the number of isomers declines sharply with decreasing H/C ratio. Series 2 presents the maximum number of isomers at intermediate H/C (and O/C) ratios and the decline in the number of isomers at both high and low H/C (and O/C) ratios, as anticipated (see above).
The maximum H/C ratio found for a series 2 molecule amounts to almost 1.7, and the corresponding molecule C13H22 (2a) features three DBEs, thereby allowing for a much larger array of unsaturation-related isomers than obtained for a fully saturated parent molecule. This is demonstrated by the variance in the isomer count when the fully saturated analogue C13H28 (184 Da, 802 isomers) is compared with the series 2 “endmember” C13H22 (2a; 178 Da, 1.7 × 105 isomers); analogous relationships are found for the series 1 “endmember” C14H10 (1a; 178 Da, 5.3 × 106 isomers) in comparison with its fully saturated parent molecule C14H30 (198 Da, 1858 isomers).
A considerable fraction of the 16.6-fold increase in the isomer count observed when comparing C13H22 (2a; three DBEs) and C12H18O (2b; four DBEs) results from the ability to produce novel isomers with singly bonded oxygen and those with a C=O bond (carbonyl derivative). The maximum number of isomers (~1.1 × 107 each) is attained for the molecules C11H14O2 (2c; five DBEs) and C10H10O3 (2d; six DBEs), respectively. Further exchange of CH4 against oxygen again sharply decreases the number of feasible isomers [by a factor of 5.5 when proceeding from C10H10O3 (2d) to C9H6O4 (2e), and by a factor of 78 when changing from C9H6O4 (2e) to C8H2O5 (2f)].
The series 3 molecules C7H14O5 (3a) and C6H10O6 (3b) feature rather limited numbers of isomers because of their large O/C ratios (see above). A comparison of C8H2O5 (2f) and C7H14O5 (3a) indicates that extreme hydrogen deficiency restricts the feasible number of isomers more severely than almost full saturation. Molecules C14H10 (1a; ten DBEs, 5.3 × 106 isomers; series 1), C10H10O3 (3b; six DBEs, 1.1 × 107 isomers; series 2) and C6H10O6 (3b; two DBEs, 6 × 104 isomers; series 3) are all related by a formal exchange of four carbons for three oxygen atoms. The introduction of oxygen initially outweighs the decrease in the number of carbon atoms and DBEs available because of (i) the reduced severity of unsaturation and (ii) the availability of oxygen to construct isomers (see above). Upon the transition from C10H10O3 (2d) to C6H10O6 (3b), however, both the lesser ability of oxygen to participate in chemical bonding (two bonds for any oxygen instead of four for any carbon) and the decline in available DBEs lead to a drastic decrease in the number of accessible isomers.
In a highly processed and supposedly exceedingly complex material such as deep sea marine organic matter, most of the molecules of formula CnHmOq will contain an intermediate amount of unsaturation and numerous oxygen atoms [18, 20]. This flexibility to generate a potentially huge number of isomers implies that (in the absence of severe ion suppression) mass spectral intensities should correlate roughly with the number of feasible isomers for any given molecular composition. Recently, carboxyl-rich alicyclic molecules (CRAM) have been identified as prominent constituents of marine (and possibly freshwater and terrestrial) organic matter [18]. CRAM likely represent highly processed products of ultimately terpenoid origin and are expected to represent an extremely complex mixture of molecules. Based on the molecules of formulae CnHmOq and a recognition of FT mass spectral intensities, the CRAM that occur in deep ocean marine ultrafiltered organic matter comform mainly to the region inside the dotted ellipsoid in the van Krevelen diagram of Fig. 11, which appears to coincide with the maximum number of feasible C,H,O-isomers.
The availability of aromatic structures in terrestrially and freshwater-derived NOM, such as that in Suwannee river fulvic acid (SuwFA; Fig. 9), opens up the compositional space of chemically relevant NOM molecules (see above) to significantly lower H/C ratios than accessible solely on the basis of open-chain unsaturation (e.g. olefinic and carbonyl) and alicyclic double-bond equivalents (DBE). These dependencies are nicely illustrated by comparing the van Krevelen diagrams of marine ultrafiltered dissolved organic matter (UDOM) [18], a blackwater NOM [18, 124], and that of SuwFA (Figs. 9 and 11). While mass spectra of marine UDOM are dominated by carboxyl-rich alicyclic molecules (CRAM), composed mainly of carboxylic groups and alicyclic rings with only negligible aromatic and olefinic unsaturation [18], the significant terrestrial, aromatic-rich signature present in both blackwater NOM and SuwFA populates the compositional space with notably lower elemental H/C ratios than feasible in marine UDOM [18].
It should be noted that oxygen-depleted molecules of formula CnHm are less likely to be ionized in standard ESI FTICR mass spectra in comparison with oxygenated molecules of formula CnHmOq. Carbohydrates, which are oxygen-rich, also are less efficiently ionized under standard ESI-FTICR mass spectral conditions than carboxyl-rich molecules like CRAM. CRAM therefore represent the most likely constituents of NOM to produce strong signals in ESI-FTICR mass spectra.
The compositional space of Suwannee river fulvic acid (SuwFA) given in Figs. 9 and 11 is derived from consolidated positive and negative ion FTICR mass spectra, obtained via APCI+APPI+ESI ionization modes, thereby facilitating the observation of oxygen-depleted molecules (Fig. 9).
The current capacity to describe complex materials at molecular resolution can be visualized in the form of an analytical space comprising individual volumetric pixels (voxels). The range of this discrete and quantized space is 108–14 voxels, as defined by the significant resolution of the complementary techniques of nuclear magnetic resonance (102–5 buckets, depicting the short-range order of molecules), ultrahigh-resolution FTICR mass spectrometry (104–5 buckets, depicting molecular masses and formulae of gas-phase ions) and high-performance separation (102–4 buckets, has the capacity to investigate both ions and molecules, and so provides a way to validate NMR against MS data).
An investigation of these correlated data is feasible at the level of the direct hyphenation of separation and spectroscopy [e.g. LC/NMR and LC or CE/MS; corresponding to the rear faces of the voxel space [119, 125–128]) and by means of statistical heterospectroscopy (SHY) [129, 130]; corresponding to the top face (or any two faces) of the voxel space]. Any joint mathematical analysis of these correlated data will enhance the effective resolution of the data and the significance of the molecular-level analysis of complex unknowns [119, 121, 129].
This voxel space can be readily expanded to higher dimensions by including complementary data, like those derived from genomic and proteomic analyses [84, 131–134] or by recognising selective chemical reaction products [135–140]. Degradative approaches to the characterisation of complex systems produce limited amounts of unambiguously identifiable small molecules but lose crucial linkage information. Soft and selective biochemical and chemical reactions like mild hydrolysis, reduction, oxidation and derivatisation [141, 142] of complex systems will often result in larger fragments with valuable positional and stereochemical information for the assessment of synthesis and degradation pathways.
The chemical transformation of functional groups with NMR- and MS-recognisable labels enables isotope-specific functional group analysis based on structural rather than behavioural characteristics [143, 144]. Information concerning stereochemistry and stable isotope composition will become more important when assessing the origins and diagenesis of complex natural materials. Any progress in the determination of position-specific stable isotope composition (e.g. by NMR and MS methods) will be useful for advancing this field. Physical and chemical fractionation will greatly assist in these studies; further miniaturisation will enhance separation capacity and thereby improve the resolution of the analytical voxel space (Figs. 5 and 12).
Integrated biomarker profiling approaches [145–147] with higher resolutions, significances and accuracies will substantially improve the quality and relevance of current systems biology approaches in the health and environmental sciences. The great progress made in the molecular-level characterisation of complex systems over the last few years and foreseeable improvements in nascent technology and concepts will lead to strong synergetic effects that will further advance our understanding of any complex natural and living system whose properties and functioning depend on both strong (covalent) and weak (noncovalent) interactions.













