Image not found



EstimAge (Estimate of metilation Age) is a webservice for the computation of methylage, that is an epigenetic marker of age (epigenetic clock) identifying specific CpGs on the genome whose methylation state is globally (linear combination) correlated to chronological age in any tissue or cell type. From the pioneering work of Horvath (Horvath, 2013) epigenetic clocks’ employment, resulting from the evaluation of the difference (DeltaAge) between methylage and chronological age, has been under active research in a variety of clinical areas as marker of overall health condition during aging, with the potential to become a relevant risk indicator for the major age-related diseases, syndromes and conditions, including cancer. To date methylage has been offered, exploiting retrospective data, either as a tissue-independent or tissue-specific marker. It is now assessed that tissue-specificity enables better precision, therefore numerous efforts have been turned to the construction of sufficiently large bases of data (mostly Illumina Beads Arrays) extracted from specific tissues under specific conditions, to identify robust CpGs, enabling reproducibility of the results. Despite widespread interest in the matter, little standardization has been promoted so far, preventing efforts coordination. To the best of our knowledge, this is the first hub collecting epigenetic clocks.

Currently available clocks in EstimAge are listed here.

We encourage authors to submit novel, peer reviewed, software-equipped clocks to our webservice via the contact page.


Data submission



A worked example using a subset of real data (GEO accession number GSE72776) from whole blood (tissue Blood; D001769) is offered in the main page.


The initial submission page is shown in Figure 1 and Figure 2.

Image not found
Figure 1. Data submission steps
  1. Select the "Choose file" button next to the "Methylation data" label to upload a beta value file from your local folder. File format description here.

  2. Select the "Choose file" button next to the "Metadata" label to upload a metadata annotation file from your local folder. File format description here.

  3. Select the samples age unit (cell counts, kilobases, weeks, years). The selection should be consisent with the age unit values provided in the metadata file. Detailed description here.

  4. If missing data are present in the beta values file, they can be imputed using one of three different imputation methods. Detailed description here

  5. Select a tissue from the “Tissue (MeSH)” menu. The tissue selection opens further options, as shown in Figure 2. Detailed description here

    Image not found
    Figure 2. Data submission steps after Tissue selection
  6. Select a cell counting method. Only tissue-compatible methods are shown. Detailed description here

  7. Select clocks. Only tissue-compatible clocks are shown. List of available clocks here.

  8. Submit the job. Uploads the data and redirects to the wait page.
The job is scheduled for running as soon as the waiting queue is empty. Upon completion, a link to the result page is made available.

Detailed input description



  1. Methylation data file (beta values) – mandatory file. A comma-separated file with beta values, containing the following columns (Figure 3):
    1. ID_REF: containing the probes ID as indicated in the Illumina DNA methylation platforms (Infinium HumanMethylation450 BeadChip, GEO accession number GPL13534; Infinium HumanMethylation27, GEO accession number GPL8490; Infinium MethylationEPIC BeadChip, GEO accession number GPL21145), or the clock specific probes ID indicated in Table 1. If the CpGs required by a selected clock are not provided and cannot be imputed, NA will be returned in the output tables
    2. All other columns are named after the same sample IDs used in the Metadata file and contain the corresponding beta values.

    Image not found
    Figure 3. Example of beta values file. The figure shows the first 5 rows (4 CGs) and the first 9 columns (8 samples) of the example input file.
    If no data is upladed, the example GSE72776_restricted.data.csv.gz file (available in the main page) is used by default.

  2. Metadata file – mandatory file. A comma-separated file with beta values, containing the following columns (Figure 4):
    1. SampleID: specifies the sample ID, that should match the column names of the Methylation data file.
    2. Age: age of subjects (samples) in years, weeks, cell cycles or kilobases unit.
    3. Train.Test: optional column, specifying if the sample should be in the training or in the test set in EPM clock (see clock list), where the clock is trained on the training set ("Train") and then used to predict the methylage of both the training ("Train") and the test ("Test") samples. If missing, all samples are included in the training set.
    4. Ctrl.Case: optional colum, specifying if the sample is in the control group (“Ctrl”) or not (“Case”). This column is used by Estimage to generate the outputs as detailed in the Results. If missing, all samples are considered to be controls.

    Image not found
    Figure 4. Example of Sample annotation file. The figure shows the first 11 rows of the Sample annotation file, with 10 samples ID mapped to the Methylation data file.
    If no data is upladed, the example GSE72776_restricted.metadata.csv file (available in the main page) is used by default.

  3. Samples age unit - mandatory parameter. We have four distinct age units:
    1. cell counts (output of EpiTOC and MiAge clocks)
    2. kilobases (output of DNamTL clock)
    3. years (output of Multi-Tissue, Tissue-specific and Forensic clocks)
    4. weeks (output of Gestational clocks)

    The selected age unit should be consistent with the age values provided in the Metadata file. This parameter determines the type of plots/tables that will be shown in the result page, according to the selected clocks.

  4. Imputation - optional. We provide three distinct imputation methods:
    1. KNN (Troyanskaya et al., 2001) - Fast KNN (K-Nearest Neighbors) imputation method.
    2. mean - Simple and fast imputation method that simply replaces the missing value of a given variable by averaging all the known values for that variable.
    3. methyLImp (Di Lena et al., 2019) - Linear regression imputation method, specifically designed for the imputation of methylation data.

    The imputation, if selected, is performed separately on cases and control samples. Some clocks (e.g. Horvath13, epiTOC, DNAmGA, MEAT) provide their own imputation algorithm or alternative solutions that will be used if missing data imputation is not explicitly requested or if missing data are still present after imputation.

  5. Tissue - mandatory. Tissue selection should be consistent with the Methylation data provided in input. Tissue hierarchy and IDs are taken from the National Library of Medicine's (MeSH, (https://meshb-prev.nlm.nih.gov/search) controlled vocabulary thesaurus, that consists of sets of terms naming descriptors in a hierarchical structure to be queried at various levels of specificity. This enables better standardization for usage on users’ datasets. Currently, EstimAge’s available tissues are:
    1. Blood; D001769
    2. Endothelial cells; D042783
    3. Fetal Blood; D005312
    4. Mouth Mucosa; D009061
    5. Muscle, Skeletal; D018482
    6. Neoplasms; D009369
    7. Placenta; D010920
    8. Prefrontal Cortex; D017397
    9. Saliva; D012463
    10. Skin; D012867
    11. Other

    If any tissue among options 1)-10) is selected, EstimAge presents only the appropriate subset of tissue-compatible clocks. The user can then decide to select all or some of those. If none is selected, EstimAge preselection is computed by default, the rationale being that appropriate clocks exists and should be preferred. Equivalently, only tissue-compatible cell counting methods are shown (if any).

    If tissue option 11) is selected, all EstimAge clocks are shown (no pre-selection) to allow the user to make a custom and broad selection. If none is selected, only multi-tissue clocks will be computed, the rationale being that the specific clock does not exist and only multi-tissues should be computed. Equivalently, all cell counting methods become available for selection.

    The “Other” tissue should be selected if the user wants to escape EstimAge pre-selection and visualize all available clocks, as well as all available cell counting methods.

  6. Cell counts - optional. We provide several cell counting methods, for different tissues, following datasets used in standard Bioconductor Packages minfi and meffil:
    1. andrews and bakulski cord blood (Fetal Blood; D005312)
    2. blood gse35069 (Blood; D001769)
    3. blood gse35069 chen (Blood; D001769)
    4. blood gse35069 complete (Blood; D001769)
    5. combined cord blood (Fetal Blood; D005312)
    6. cord blood gse68456 (Fetal Blood; D005312)
    7. gervin and lyle cord blood (Fetal Blood; D005312)
    8. guintivano dlpfc (Cerebral Cortex; D002540)
    9. saliva gse48472 (Saliva; D012463)

    The cell counting method selection becomes available only after Tissue selection:
    1. If "Other" tissue is selected, all cell counting methods are available for selection
    2. Conversely, only the tissue-compatible cell counting methods are shown (if any)


Output



The results page contains a summary of the estimated methylage, as computed by the selected clocks, together with several statistical analyses both in tabular and graphical format. In detail, the result page is organized according to the following Sections (which can be quickly browsed through a global index at the beginning of the page):
  1. Input parameters.Summary of the input parameters
  2. Methylation Age. Tabular results of the epigenetic clock calculations and associated regression plots
  3. Acceleration (Difference). Tabular and graphical statistics of Age Acceleration (by difference)
  4. Acceleration (EEAA). Tabular and graphical statistics of Age Acceleration (by EEAA)
  5. Cell Counts. Tabular and graphical statistics of cell counts


Input parameters

This sections shows the input parameters selected for the analysis (Sample age unit, Tissue, Imputation method and Cell count method), as well as a sumary table of the selected clocks (see Figure 5).

Image not found
Figure 5. Input parameters summary



Methylation Age

This section contains the Methylage Table (see Figure 6), showing the information provided in the metadata file, followed the estimated methylage, one per clock, column-wise. If the CpGs required for the computation of the clock are not available and could not be imputed, then the corresponding column contains NAs (see the column related to Zpiekarska15 clock in Figure 6).

Image not found
Figure 6. Methylage Table

In the same section, EstimAge provides the R2 regression coefficients in tabular format and the regression lines in graphical format for each epigenetic clock (see Figure 7).

Image not found
Figure 7. R2 regression statistics in tabular and graphical format.



Acceleration (Difference)

This section provides age acceleration statistics. Accelerations or decelerations (positive and negative divergences, respectively) from physiological methylation values offer a measure of the divergence from a physiological state. Importantly, such variations are measurable long before clinical signs of diseases become overt.

The age acceleration is computed as the difference between methylage and age. These statistics are meaningful only when samples age unit is compatible with the clock’s units (years, weeks, cell counts, kilobases). Therefore EstimAge reports only the clocks for which this condition is true.

The age acceleration (by difference) statistics is reported in both tabular (see Figure 8) and graphical (see Figure 9) format.

Image not found
Figure 8. Age acceleration statistics (table).

Image not found
Figure 9. Age acceleration statistics (plot).



Acceleration (EEAA)

This section provides a second measure of age acceleration, the Extrinsic Epigenetic Age Acceleration (EEAA) statistics (Horvath et al., 2015; Smith et al., 2019), computed as the residual from a regression of methylage onto the sample Age. Differently form the acceleration statistics by difference, the EEAA statistics can be computed for all samples age unit measures. However, these statitics are not computed if the input data contains less than three samples.

Also in this case the EEAA statistics is reported in both tabular (see Figure 10) and graphical (see Figure 11) format. The plots are given separately for each age unit.

Image not found
Figure 10. Extrinsic Epigenetic Age Acceleration (EEAA) statistics (table).

Image not found
Figure 11. Extrinsic Epigenetic Age Acceleration (EEAA) statistics (plots).



Cell Counts

This section is devoted to cell counts calculations and satistics. The Cell Counts section appears only if some cell counting method is selected in the input interface.

The table of the estimated cell counts (see Figure 12) is provided only for cell types whose maximum relative abundance is higher than 10.

Image not found
Figure 12. Cell Count table

A third acceleration statistics (see Figure 13), the Intrinsic Epigenetic Age Acceleration (IEAA), is computed as the residual from a regression of methylage on the samples Age, adjusted for the estimated cell counts (Horvath et al., 2015; Smith et al., 2019). If all the samples are less than the number of cell types, the IEAA statistics cannot be computed and, thus, it is not shown. If computable, the IEAA statitics is also shown in separate plots for each age unit (see Figure 14).

Image not found
Figure 13. Intrinsic Epigenetic Age Acceleration (IEAA) statistics (table)

Image not found
Figure 14. Intrinsic Epigenetic Age Acceleration (IEAA) statistics (plots)

Finally, this secton includes a table showing the R2 of each regression model (see Figure 15), adjusted for the estimated cell counts.

Image not found
Figure 15. R2 regression statistics adjusted for the estimated cell counts




References

Di Lena, P., Sala, C., Prodi, A., Nardini, C. (2019) Missing value estimation methods for DNA methylation data. Bioinforma. Oxf. Engl. 35, 3786–3793. https://doi.org/10.1093/bioinformatics/btz134

Horvath, S. (2013) DNA methylation age of human tissues and cell types. Genome Biol 14, R115. https://doi.org/10.1186/gb-2013-14-10-r115

Horvath,S. and Ritz,B.R. (2015) Increased epigenetic age and granulocyte counts in the blood of Parkinson’s disease patients. Aging (Albany NY), 7, 1130–1142. https://doi.org/10.18632/aging.100859

Smith,J.A., Raisky,J., Ratliff,S.M., Liu,J., Kardia,S.L.R., Turner,S.T., Mosley,T.H. and Zhao,W. (2019) Intrinsic and extrinsic epigenetic age acceleration are associated with hypertensive target organ damage in older African Americans. BMC Med Genomics, 12, 141. https://doi.org/10.1186/s12920-019-0585-5

Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., Altman, R.B. (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17,520–525. https://doi.org/10.1093/bioinformatics/17.6.520