Title: | Access Open Target Genetics |
---|---|
Description: | Interact seamlessly with Open Target Genetics' GraphQL endpoint to query and retrieve tidy data tables, facilitating the analysis of genetic data. For more information about the Open Target Genetics API (<https://genetics.opentargets.org/api>). |
Authors: | Amir Feizi [aut, cre], Kamalika Ray [aut] |
Maintainer: | Amir Feizi <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.5 |
Built: | 2025-02-12 05:12:09 UTC |
Source: | https://github.com/amirfeizi/otargen |
This function queries the Open Targets Genetics GraphQL API to retrieve ChEMBL data for a specified gene and disease, including evidence from the ChEMBL datasource.
chemblQuery(ensemblId, efoId, size = 10, cursor = NULL)
chemblQuery(ensemblId, efoId, size = 10, cursor = NULL)
ensemblId |
Character: ENSEMBL ID of the target gene (e.g., ENSG00000169174). |
efoId |
Character: EFO ID of the disease (e.g., EFO_0004911). |
size |
Integer: Number of records to retrieve (default: 10). |
cursor |
Character: Cursor for pagination (default: NULL). |
Returns a data frame containing ChEMBL data for the specified gene and disease.
## Not run: result <- chemblQuery(ensemblId = "ENSG00000169174", efoId = "EFO_0004911", size = 10) result <- chemblQuery(ensemblId = "ENSG00000169174", efoId = "EFO_0004911", size = 10, cursor = NULL) ## End(Not run)
## Not run: result <- chemblQuery(ensemblId = "ENSG00000169174", efoId = "EFO_0004911", size = 10) result <- chemblQuery(ensemblId = "ENSG00000169174", efoId = "EFO_0004911", size = 10, cursor = NULL) ## End(Not run)
This function queries the Open Targets Genetics GraphQL API to retrieve ClinVar data for a specified gene and disease, including evidence from the NCBI datasource.
clinvarQuery(ensemblId, efoId, size = 10, cursor = NULL)
clinvarQuery(ensemblId, efoId, size = 10, cursor = NULL)
ensemblId |
Character: ENSEMBL ID of the target gene (e.g., ENSG00000130164). |
efoId |
Character: EFO ID of the disease (e.g., EFO_0004911). |
size |
Integer: Number of records to retrieve (default: 10). |
cursor |
Character: Cursor for pagination (default: NULL). |
Returns a data frame containing ClinVar data for the specified gene and disease.
## Not run: result <- clinvarQuery(ensemblId = "ENSG00000130164", efoId = "EFO_0004911", size = 10) result <- clinvarQuery(ensemblId = "ENSG00000130164", efoId = "EFO_0004911", size = 10, cursor = NULL) ## End(Not run)
## Not run: result <- clinvarQuery(ensemblId = "ENSG00000130164", efoId = "EFO_0004911", size = 10) result <- clinvarQuery(ensemblId = "ENSG00000130164", efoId = "EFO_0004911", size = 10, cursor = NULL) ## End(Not run)
Retrieves Colocalisation statistics for a gene using ENSEMBL gene IDs or gene symbol. Colocalisation analysis is performed between all studies in the Portal with at least one overlapping associated locus. This analysis tests whether two independent associations at the same locus are consistent with having a shared causal variant. The function supports multiple gene IDs as a list. The returned data frame (tibble format) includes studies that have evidence of colocalisation with molecular QTLs for the selected gene(s).
colocalisationsForGene(genes)
colocalisationsForGene(genes)
genes |
Character: Gene ENSEMBL ID (e.g. ENSG00000169174) or gene symbol (e.g. PCSK9). Multiple gene IDs are supported as a character vector. |
A data frame (tibble format) including the colocalisation data for the query gene(s).
The output tibble contains the following columns:
Study
: Character vector. Study identifier.
Trait_reported
: Character vector. Reported trait associated with the colocalisation.
Lead_variant
: Character vector. Lead variant associated with the colocalisation.
Molecular_trait
: Character vector. Molecular trait associated with the colocalisation.
Gene_symbol
: Character vector. Gene symbol associated with the colocalisation.
Tissue
: Character vector. Tissue where the colocalisation occurs.
Source
: Character vector. Source of the colocalisation data.
H3
: Numeric vector. H3 value associated with the colocalisation.
log2_H4_H3
: Numeric vector. Log2 ratio of H4 to H3 values.
Title
: Character vector. Title of the study.
Author
: Character vector. Author(s) of the study.
Has_sumstats
: Logical vector. Indicates if the study has summary statistics.
numAssocLoci
: Numeric vector. Number of associated loci in the study.
nInitial_cohort
: Numeric vector. Number of samples in the initial cohort.
study_nReplication
: Numeric vector. Number of samples in the replication cohort.
study_nCases
: Numeric vector. Number of cases in the study.
Publication_date
: Character vector. Publication date of the study.
Journal
: Character vector. Journal where the study was published.
Pubmed_id
: Character vector. PubMed identifier of the study.
## Not run: result1 <- colocalisationsForGene(gene = c("ENSG00000163946", "ENSG00000169174", "ENSG00000143001")) result2 <- colocalisationsForGene(gene = "ENSG00000169174") result3 <- colocalisationsForGene(gene = c("TP53", "TASOR")) result4 <- colocalisationsForGene(gene = "TP53") ## End(Not run)
## Not run: result1 <- colocalisationsForGene(gene = c("ENSG00000163946", "ENSG00000169174", "ENSG00000143001")) result2 <- colocalisationsForGene(gene = "ENSG00000169174") result3 <- colocalisationsForGene(gene = c("TP53", "TASOR")) result4 <- colocalisationsForGene(gene = "TP53") ## End(Not run)
This function queries the Open Targets Genetics GraphQL API to retrieve comparative genomics data for a specified gene.
compGenomicsQuery(ensemblId)
compGenomicsQuery(ensemblId)
ensemblId |
Character: ENSEMBL ID of the target gene (e.g., ENSG00000169174). |
Returns a data frame containing comparative genomics data for the specified gene.
## Not run: result <- compGenomicsQuery(ensemblId = "ENSG00000169174") ## End(Not run)
## Not run: result <- compGenomicsQuery(ensemblId = "ENSG00000169174") ## End(Not run)
This function takes a gene ENSEMBL id (e.g.ENSG00000169174) or a gene name (e.g.PCSK9) and returns a table containing details about input genes , such as genomic location, gene structure and its BioType.
geneInfo(gene)
geneInfo(gene)
gene |
Character: an ENSEMBL gene identifier (e.g.ENSG00000169174) or gene name (e.g. PCSK9). |
Returns a data frame (tibble format) with the following data structure:
id
symbol
bioType
description
chromosome
tss
start
end
fwdStrand
exons
## Not run: result <- geneInfo(gene="ENSG00000169174") result <- geneInfo(gene="PCSK9") ## End(Not run)
## Not run: result <- geneInfo(gene="ENSG00000169174") result <- geneInfo(gene="PCSK9") ## End(Not run)
This function retrieves all calculated prioritizing scores for surrounding genes of a specific variant based on the Open Target Genetics locus-to-gene (L2G) ML scoring pipeline. It provides detailed insights into the relationship between genetic variants and genes, allowing users to explore the impact of variants on gene expression, colocalization scores, chromatin interactions, and predicted functional effects. The function returns the information in a structured format, making it easier to analyze and interpret the results.
genesForVariant(variant_id)
genesForVariant(variant_id)
variant_id |
A character string specifying the ID of the variant for which to fetch gene information. |
A list with the following components:
v2g
: A data frame with all variant-to-gene information with the following data structure:
- gene.symbol
: character
- variant
: character
- overallScore
: numeric
- gene.id
: character
tssd
: A data frame with all details on colocalization scores effect of variant in expression of the genes across tissues with the following data structure:
- gene.symbol
: character
- variant
: character
- typeId
: character
- sourceId
: character
- aggregatedScore
: numeric
- tissues_distance
: integer
- tissues_score
: numeric
- tissues_quantile
: numeric
- tissues_id
: character
- tissues_name
: character
qtls
: List of QTL associations between genes and variants across analyzed tissues with the following data structure:
- gene.symbol
: character
- variant
: character
- typeId
: character
- aggregatedScore
: numeric
- tissues_quantile
: numeric
- tissues_beta
: numeric
- tissues_pval
: numeric
- tissues_id
: character
- tissues_name
: character
chromatin
: A data frame including all information on chromatin interactions effect involving genes and variants with the following data structure:
- gene.symbol
: character
- variant
: character
- typeId
: character
- sourceId
: character
- aggregatedScore
: numeric
- tissues_quantile
: numeric
- tissues_score
: numeric
- tissues_id
: character
- tissues_name
: character
functionalpred
: A data frame including predicted functional effects of variants on genes across tissues with the following data structure:
- gene.symbol
: character
- variant
: character
- typeId
: character
- sourceId
: character
- aggregatedScore
: numeric
- tissues_maxEffectLabel
: character
- tissues_maxEffectScore
: numeric
- tissues_id
: character
- tissues_name
: character
## Not run: result <- genesForVariant(variant_id = "1_154453788_C_T") result <- genesForVariant(variant_id = "rs4129267") print(result) ## End(Not run)
## Not run: result <- genesForVariant(variant_id = "1_154453788_C_T") result <- genesForVariant(variant_id = "rs4129267") print(result) ## End(Not run)
This function queries the Open Targets Genetics GraphQL API to retrieve genetic constraint data for a specified gene.
geneticConstraintQuery(ensemblId)
geneticConstraintQuery(ensemblId)
ensemblId |
Character: ENSEMBL ID of the target gene (e.g., ENSG00000169174). |
Returns a data frame containing genetic constraint data for the specified gene.
## Not run: result <- geneticConstraintQuery(ensemblId = "ENSG00000169174") ## End(Not run)
## Not run: result <- geneticConstraintQuery(ensemblId = "ENSG00000169174") ## End(Not run)
This function retrieves information for a specified region on a chromosome, including overlapping genes and their associated details such as ID, symbol, biotype, description, transcription start site (TSS), start position, end position, strand direction, and exons structure.
getLociGenes(chromosome, start, end)
getLociGenes(chromosome, start, end)
chromosome |
Character: Chromosome number as a string. |
start |
Integer: Start position of the specified region on the chromosome. |
end |
Integer: End position of the specified region on the chromosome. |
Returns a tibble data frame of all the overlapping genes in the specified region with the following data structure:
id
: Character. ID of the gene.
symbol
: Character. Symbol of the gene.
bioType
: Character. Biotype of the gene.
description
: Character. Description of the gene.
chromosome
: Character. Chromosome of the gene.
tss
: Integer. Transcription start site of the gene.
start
: Integer. Start position of the gene.
end
: Integer. End position of the gene.
fwdStrand
: Logical. Strand direction of the gene.
exons
: List. List of exons of the gene.
## Not run: result <- get_genes(chromosome = "2", start = 239634984, end = 241634984) ## End(Not run)
## Not run: result <- get_genes(chromosome = "2", start = 239634984, end = 241634984) ## End(Not run)
This function retrieves colocalisation data for a specific variant from a study with other GWAS studies. It returns a data frame of the studies that colocalise with the input variant and study, including details on the study and reported trait, index variant, and calculated coloc method (see Ref. below) outputs.
gwasColocalisation(study_id, variant_id)
gwasColocalisation(study_id, variant_id)
study_id |
Character: Open Target Genetics generated ID for the GWAS study. |
variant_id |
Character: Open Target Genetics generated ID for the variant (CHRPOSITION_REFALLELE_ALTALLELE or rsID). |
Returns a data frame of the studies that colocalise with the input variant and study. The table consists of the following data structure:
study.studyId
: Character vector. Study identifier.
study.traitReported
: Character vector. Reported trait associated with the colocalisation.
study.traitCategory
: Character vector. Trait category.
indexVariant.id
: Character vector. Index variant identifier.
indexVariant.position
: Integer vector. Index variant position.
indexVariant.chromosome
: Character vector. Index variant chromosome.
indexVariant.rsId
: Character vector. Index variant rsID.
beta
: Numeric vector. Beta value associated with the colocalisation.
h3
: Numeric vector. H3 value associated with the colocalisation.
h4
: Numeric vector. H4 value associated with the colocalisation.
log2h4h3
: Numeric vector. Log2 ratio of H4 to H3 values.
Giambartolomei, Claudia et al. “Bayesian test for colocalisation between pairs of genetic association studies using summary statistics.” PLoS genetics vol. 10,5 e1004383. 15 May. 2014, doi:10.1371/journal.pgen.1004383
## Not run: colocalisation_data <- gwasColocalisation(study_id = "GCST90002357", variant_id = "1_154119580_C_A") colocalisation_data <- gwasColocalisation(study_id = "GCST90002357", variant_id = "rs2494663") ## End(Not run)
## Not run: colocalisation_data <- gwasColocalisation(study_id = "GCST90002357", variant_id = "1_154119580_C_A") colocalisation_data <- gwasColocalisation(study_id = "GCST90002357", variant_id = "rs2494663") ## End(Not run)
By providing a genomic region (chromosome name with start and end position), this function returns information about colocalisation between GWAS studies and associated loci within a specified genomic region. It provides details on the studies that have at least one overlapping associated locus within the region, allowing for the assessment of potential shared causal variants. The query output includes data such as the study identifiers, traits, loci information, and other relevant attributes.
gwasColocalisationForRegion(chromosome, start, end)
gwasColocalisationForRegion(chromosome, start, end)
chromosome |
String: Chromosome number as a string. |
start |
Long: Start position of the specified chromosome. |
end |
Long: End position of the specified chromosome. |
Returns a data frame with the following columns:
leftVariant.id
: Character vector. ID of the left variant.
leftVariant.position
: Integer vector. Position of the left variant.
leftVariant.chromosome
: Character vector. Chromosome of the left variant.
leftVariant.rsId
: Character vector. rsID of the left variant.
leftStudy.studyId
: Character vector. Study identifier for the left study.
leftStudy.traitReported
: Character vector. Reported trait associated with the colocalisation in the left study.
leftStudy.traitCategory
: Character vector. Category of the reported trait in the left study.
rightVariant.id
: Character vector. ID of the right variant.
rightVariant.position
: Integer vector. Position of the right variant.
rightVariant.chromosome
: Character vector. Chromosome of the right variant.
rightVariant.rsId
: Character vector. rsID of the right variant.
rightStudy.studyId
: Character vector. Study identifier for the right study.
rightStudy.traitReported
: Character vector. Reported trait associated with the colocalisation in the right study.
rightStudy.traitCategory
: Character vector. Category of the reported trait in the right study.
h3
: Numeric vector. H3 value associated with the colocalisation.
h4
: Numeric vector. H4 value associated with the colocalisation.
log2h4h3
: Numeric vector. Log2 ratio of H4 to H3 values associated with the colocalisation.
- Giambartolomei, Claudia et al. “Bayesian test for colocalisation between pairs of genetic association studies using summary statistics.” PLoS genetics vol. 10,5 e1004383. 15 May. 2014, doi:10.1371/journal.pgen.1004383
## Not run: result <- gwasColocalisationForRegion(chromosome = "1", start = 153992685, end = 154155116) ## End(Not run)
## Not run: result <- gwasColocalisationForRegion(chromosome = "1", start = 153992685, end = 154155116) ## End(Not run)
Provided with a study ID and a lead variant ID, this function returns a data frame consisting of all the associated credible set tag variants with the corresponding statistical data.
gwasCredibleSet(study_id, variant_id)
gwasCredibleSet(study_id, variant_id)
study_id |
Character: Study ID(s) generated by Open Targets Genetics (e.g GCST90002357). |
variant_id |
Character: generated ID for variants by Open Targets Genetics (e.g. 1_154119580_C_A) or rsId (rs2494663). |
Returns a data frame of results from the credible set of variants for a specific lead variant with the following columns:
tagVariant.id
: Data frame. A table of IDs of the tag variant.
tagVariant.rsId
: Character vector. rsID of the tag variant.
beta
: Numeric. Beta value.
postProb
: Numeric. Posterior probability.
pval
: Numeric. P-value.
se
: Numeric. Standard error.
MultisignalMethod
: Character vector. Multisignal method.
logABF
: Numeric. Logarithm of approximate Bayes factor.
is95
: Logical. Indicates if the variant has a 95
is99
: Logical. Indicates if the variant has a 99
## Not run: result <- gwasCredibleSet(study_id="GCST90002357", variant_id="1_154119580_C_A") result <- gwasCredibleSet(study_id="GCST90002357", variant_id="rs2494663") ## End(Not run)
## Not run: result <- gwasCredibleSet(study_id="GCST90002357", variant_id="1_154119580_C_A") result <- gwasCredibleSet(study_id="GCST90002357", variant_id="rs2494663") ## End(Not run)
For a given study ID and chromosomal region information, this function returns data frame(tibble format) with all variants and their GWAS summary statistics.
gwasRegional(study_id, chromosome, start, end)
gwasRegional(study_id, chromosome, start, end)
study_id |
Character: Open Target Genetics generated ID for the GWAS study. |
chromosome |
Character: Chromosome number as a string. |
start |
Integer: Start position of the specified chromosome. |
end |
Integer: End position of the specified chromosome. |
Returns a data table of variant information and p-values with the following columns:
variant.id
: Character vector. Variant identifier.
variant.chromosome
: Character vector. Chromosome of the variant.
variant.position
: Integer vector. Position of the variant.
pval
: Numeric vector. P-value.
## Not run: result <- gwasRegional(study_id = "GCST90002357", chromosome = "1", start = 153992685, end = 154155116) ## End(Not run)
## Not run: result <- gwasRegional(study_id = "GCST90002357", chromosome = "1", start = 153992685, end = 154155116) ## End(Not run)
For an input tag variant ID, this function returns a data frame(tibble format) with population-level summary statistics data across various GWAS studies.
indexVariantsAndStudiesForTagVariant(variant_id, pageindex = 0, pagesize = 20)
indexVariantsAndStudiesForTagVariant(variant_id, pageindex = 0, pagesize = 20)
variant_id |
Character: generated ID for variants by Open Targets Genetics (e.g. 1_154119580_C_A) or rsId (rs2494663). |
pageindex |
Integer: Index of the current page, pagination index >= 0. |
pagesize |
Integer: Number of records in a page, pagination size > 0. |
Returns a data frame containing the variant associated with the input tag variant. The table consists of the following columns:
index_variant
: Data frame. Data frame of index variants with the following columns:
id
: Character vector. Variant ID.
rsId
: Character vector. rsID of the variant.
study
: Data frame. Data frame of studies with the following columns:
studyId
: Character vector. Study identifier.
traitReported
: Character vector. Reported trait associated with the colocalisation.
traitCategory
: Character vector. Trait category.
pval
: Numeric vector. P-value.
pval_mantissa
: Numeric vector. Mantissa of the p-value.
pval_exponent
: Integer vector. Exponent of the p-value.
n_total
: Integer vector. Total number of samples.
n_cases
: Integer vector. Number of cases.
overall_r2
: Numeric vector. Overall R-squared value.
afr1000g_prop
: Numeric vector. Proportion in African population in 1000 Genomes.
amr1000g_prop
: Numeric vector. Proportion in Admixed American population in 1000 Genomes.
eas1000g_prop
: Numeric vector. Proportion in East Asian population in 1000 Genomes.
eur1000g_prop
: Numeric vector. Proportion in European population in 1000 Genomes.
sas1000g_prop
: Numeric vector. Proportion in South Asian population in 1000 Genomes.
log10abf
: Numeric vector. Log10 ABF (Approximate Bayes Factor).
posterior_probability
: Numeric vector. Posterior probability.
odds_ratio
: Numeric vector. Odds ratio.
odds_ratio_ci_lower
: Numeric vector. Lower confidence interval of the odds ratio.
odds_ratio_ci_upper
: Numeric vector. Upper confidence interval of the odds ratio.
beta
: Numeric vector. Beta value.
beta_ci_lower
: Numeric vector. Lower confidence interval of the beta value.
beta_ci_upper
: Numeric vector. Upper confidence interval of the beta value.
direction
: Character vector. Direction of the effect.
## Not run: result <- indexVariantsAndStudiesForTagVariant(variant_id = "1_109274968_G_T") result <- indexVariantsAndStudiesForTagVariant(variant_id = "rs12740374", pageindex = 1, pagesize = 50) ## End(Not run)
## Not run: result <- indexVariantsAndStudiesForTagVariant(variant_id = "1_109274968_G_T") result <- indexVariantsAndStudiesForTagVariant(variant_id = "rs12740374", pageindex = 1, pagesize = 50) ## End(Not run)
This function queries the Open Targets Genetics GraphQL API to retrieve known drugs data for a specified gene.
knownDrugsQuery(ensgId, cursor = NULL, freeTextQuery = NULL, size = 10)
knownDrugsQuery(ensgId, cursor = NULL, freeTextQuery = NULL, size = 10)
ensgId |
Character: ENSEMBL ID of the target gene (e.g., ENSG00000169174). |
cursor |
Character: Cursor for pagination (default: NULL). |
freeTextQuery |
Character: Free text query to filter results (default: NULL). |
size |
Integer: Number of records to retrieve (default: 10). |
Returns a data frame containing known drugs data for the specified gene.
## Not run: result <- knownDrugsQuery(ensgId = "ENSG00000169174", size = 10) result <- knownDrugsQuery(ensgId = "ENSG00000169174", cursor = NULL, freeTextQuery = NULL, size = 10) ## End(Not run)
## Not run: result <- knownDrugsQuery(ensgId = "ENSG00000169174", size = 10) result <- knownDrugsQuery(ensgId = "ENSG00000169174", cursor = NULL, freeTextQuery = NULL, size = 10) ## End(Not run)
This function retrieves summary statistics for a given GWAS study ID, which are used to generate a Manhattan plot.
The Manhattan plot is a graphical representation of genetic association studies, particularly in genome-wide association studies (GWAS).
It displays the results of statistical associations between genetic variants and a trait or disease of interest across the genome.
This function returns a data frame of the underlying data, which can be used to recreate the Manhattan plot using the plot_manhattan()
function,
or for custom plots and downstream analysis.
manhattan(study_id, pageindex = 0, pagesize = 100)
manhattan(study_id, pageindex = 0, pagesize = 100)
study_id |
Character: Open Targets Genetics generated ID for the GWAS study. |
pageindex |
Int: Index of the current page (pagination index >= 0). |
pagesize |
Int: Number of records in a page (pagination size > 0). |
Returns a data frame containing the Manhattan associations for the input study ID. The table consists of the following columns:
pval_mantissa
: Numeric vector. Mantissa of the p-value.
pval_exponent
: Integer vector. Exponent of the p-value.
credible_set_size
: Integer vector. Size of the credible set.
ld_set_size
: Integer vector. Size of the LD set.
total_set_size
: Integer vector. Total size of the set.
pval
: Numeric vector. P-value.
odds_ratio
: Logical vector. Odds ratio.
odds_ratio_ci_lower
: Logical vector. Lower confidence interval of the odds ratio.
odds_ratio_ci_upper
: Logical vector. Upper confidence interval of the odds ratio.
beta
: Numeric vector. Beta value.
beta_ci_lower
: Numeric vector. Lower confidence interval of the beta value.
beta_ci_upper
: Numeric vector. Upper confidence interval of the beta value.
direction
: Character vector. Direction of the effect.
best_genes_score
: Numeric vector. Score of the best genes.
best_genes_gene_id
: Character vector. Gene ID of the best genes.
best_genes_gene_symbol
: Character vector. Gene symbol of the best genes.
best_coloc_genes_score
: Numeric vector. Score of the best colocated genes.
best_coloc_genes_gene_id
: Character vector. Gene ID of the best colocated genes.
best_coloc_genes_gene_symbol
: Character vector. Gene symbol of the best colocated genes.
best_locus2genes_score
: Numeric vector. Score of the best locus-to-genes.
best_locus2genes_gene_id
: Character vector. Gene ID of the best locus-to-genes.
best_locus2genes_gene_symbol
: Character vector. Gene symbol of the best locus-to-genes.
variant_id
: Character vector. Variant ID.
variant_position
: Integer vector. Variant position.
variant_chromosome
: Character vector. Variant chromosome.
variant_rs_id
: Character vector. Variant rsID.
## Not run: result <- manhattan(study_id = "GCST90002357") result <- manhattan(study_id = "GCST90002357", pageindex = 2, pagesize = 50) ## End(Not run)
## Not run: result <- manhattan(study_id = "GCST90002357") result <- manhattan(study_id = "GCST90002357", pageindex = 2, pagesize = 50) ## End(Not run)
This function queries the Open Targets Genetics GraphQL API to retrieve mouse phenotypes data for a specified gene.
mousePhenotypesQuery(ensemblId)
mousePhenotypesQuery(ensemblId)
ensemblId |
Character: ENSEMBL ID of the target gene (e.g., ENSG00000169174). |
Returns a data frame containing mouse phenotypes data for the specified gene.
## Not run: result <- mousePhenotypesQuery(ensemblId = "ENSG00000169174") ## End(Not run)
## Not run: result <- mousePhenotypesQuery(ensemblId = "ENSG00000169174") ## End(Not run)
For an input study ID and a list of other study IDs, this function returns two elements. One contains the overlap information in a table format, and the other element is the variant intersection set, representing an overlap between two variants of the two given studies.
overlapInfoForStudy(study_id, study_ids = list())
overlapInfoForStudy(study_id, study_ids = list())
study_id |
Character: Study ID(s) generated by Open Targets Genetics (e.g GCST90002357). |
study_ids |
Character: generated ID for variants by Open Targets Genetics (e.g. 1_154119580_C_A) or rsId (rs2494663). |
A list containing a data frame of overlap information and the variant intersection set. The overlap information table (overlap_info) consists of the following columns:
studyId
: Character vector. Study ID.
traitReported
: Character vector. Reported trait.
traitCategory
: Character vector. Trait category.
variantIdA
: Character vector. Variant ID from study A.
variantIdB
: Character vector. Variant ID from study B.
overlapAB
: Integer vector. Number of overlaps between variants A and B.
distinctA
: Integer vector. Number of distinct variants in study A.
distinctB
: Integer vector. Number of distinct variants in study B.
study.studyId
: Character vector. Study ID from study list.
study.traitReported
: Character vector. Reported trait from study list.
study.traitCategory
: Character vector. Trait category from study list.
The variant intersection set (variant_intersection_set) is a character vector representing the intersection of variants.
## Not run: result <- overlapInfoForStudy(study_id = "GCST90002357", study_ids = list("GCST90025975", "GCST90025962")) ## End(Not run)
## Not run: result <- overlapInfoForStudy(study_id = "GCST90002357", study_ids = list("GCST90025975", "GCST90025962")) ## End(Not run)
PheWAS (Phenome-wide association study) is a method that investigates the relationships between genetic variants and traits or phenotypes, helping in the study of their potential influence on multiple traits or diseases concurrently. This function retrieves the traits associated with a given variant in the UK Biobank, FinnGen, and/or GWAS Catalog summary statistics repository (only traits with a p-value less than 0.005 are returned).
pheWAS(variant_id)
pheWAS(variant_id)
variant_id |
Character: generated ID for variants by Open Targets Genetics (e.g. 1_154119580_C_A) or rsId (rs2494663). |
A data frame with PheWAS associations.
The output data frame contains the following columns:
totalGWASStudies
: An integer indicating the total number of GWAS studies where the variant is associated.
pval
: A numeric value representing the p-value of the association between the variant and the trait.
beta
: A numeric value representing the beta value, which represents the effect size of the variant on the trait.
oddsRatio
: A numeric value representing the odds ratio, measuring the association between the variant and the trait.
nTotal
: An integer indicating the total number of participants in the study.
study.studyId
: A character vector representing the study ID.
study.source
: A character vector representing the source of the study.
study.pmid
: A character vector representing the PubMed ID (PMID) of the study.
study.pubDate
: A character vector representing the publication date of the study.
study.traitReported
: A character vector representing the reported trait associated with the variant.
study.traitCategory
: A character vector representing the category of the trait.
Pendergrass, S A et al. “The use of phenome-wide association studies (PheWAS) for exploration of novel genotype-phenotype relationships and pleiotropy discovery.” Genetic epidemiology vol. 35,5 (2011): 410-22. doi:10.1002/gepi.20589
## Not run: result <- pheWAS(variant_id = "1_154549918_C_A") result <- pheWAS(variant_id = "rs72698179") ## End(Not run)
## Not run: result <- pheWAS(variant_id = "1_154549918_C_A") result <- pheWAS(variant_id = "rs72698179") ## End(Not run)
Generates a scatter plot using the results from colocalisationsForGene()
function as an input. The reported trait in each study are shown on the
x-axis and plotted against their corresponding -log2(H4/H3)
values on the y-axis, indicating the evidence of colocalisation between
the molecular QTLs reported in each study and the explored gene. The molecular
QTLs are mapped to the colors of the points. If the results of
colocalisationsForGene()
includes the data for multiple genes, they will
be plotted in separate panels.
plot_coloc(data, biobank = FALSE)
plot_coloc(data, biobank = FALSE)
data |
Data Frame: result of colocalisationsForGene function in data frame format, contacting the phewas information for a variant id |
biobank |
Logical: |
A horizontal bar plot for colocalisation of information.
## Not run: plot_out <- colocalisationsForGene(genes = "ENSG00000169174") %>% plot_coloc(biobank = FALSE) plot_out <- colocalisationsForGene(genes = "PCSK9") %>% plot_coloc(biobank = TRUE) ## End(Not run)
## Not run: plot_out <- colocalisationsForGene(genes = "ENSG00000169174") %>% plot_coloc(biobank = FALSE) plot_out <- colocalisationsForGene(genes = "PCSK9") %>% plot_coloc(biobank = TRUE) ## End(Not run)
studiesAndLeadVariantsForGeneByL2G()
This function returns a radar plot to compare the partial scores,
important for prioritising the causal genes that are obtained
from the studiesAndLeadVariantsForGeneByL2G()
function. The user can
decide to plot only for a specific disease by specifying an EFO
ID
for the disease
argument, otherwise the returned plot will will facet
based on existing traits/diseases in the outputs from studiesAndLeadVariantsForGeneByL2G()
.
plot_l2g(data, disease_efo = NULL, l2g_cutoff = 0.5, top_n_disease = 1)
plot_l2g(data, disease_efo = NULL, l2g_cutoff = 0.5, top_n_disease = 1)
data |
Data frame: result of |
disease_efo |
Character: Input EFO id to filter the L2G data for a particular disease. |
l2g_cutoff |
Numeric: Sets the minimum L2G score threshold for diseases to be considered in the plot. |
top_n_disease |
Numeric: Determines the number of top diseases to plot for each gene, ranked by L2G score. |
A radar plot for the input disease and the genes associated with that disease. The variables shown include L2G score, chromatin interaction, variant pathogenicity and distance.
## Not run: p <- studiesAndLeadVariantsForGeneByL2G(list("ENSG00000167207","ENSG00000096968", "ENSG00000138821", "ENSG00000125255")) %>% plot_l2g(disease = "EFO_0003767") p ## End(Not run)
## Not run: p <- studiesAndLeadVariantsForGeneByL2G(list("ENSG00000167207","ENSG00000096968", "ENSG00000138821", "ENSG00000125255")) %>% plot_l2g(disease = "EFO_0003767") p ## End(Not run)
manhattan()
This function generates a Manhattan plot using the statistical summary data
obtained from the manhattan()
function. Top 3 genes (based on p-value) are annotated per chromosome.
plot_manhattan(data)
plot_manhattan(data)
data |
Data frame containing the necessary columns from |
A Manhattan plot visualizing the GWAS results.
## Not run: p <- manhattan(study_id = "GCST003044") %>% plot_manhattan() p ## End(Not run)
## Not run: p <- manhattan(study_id = "GCST003044") %>% plot_manhattan() p ## End(Not run)
PheWAS()
results.This plot visualizes which traits are associated with the user's selected variant in the UK Biobank, FinnGen, and/or GWAS Catalog summary statistics repository based on PheWAS analysis. The associated traits are mapped onto the x-axis, and their corresponding -log10(p-value) values are plotted on the y-axis. A horizontal line is shown at a p-value cutoff of 0.005 to indicate significant associations. Associations above this cutoff are labeled with the trait's name, and the sources of the associations are color-coded as points.
plot_phewas( data, disease = TRUE, source = c("GCST", "FINNGEN", "NEALE", "SAIGE") )
plot_phewas( data, disease = TRUE, source = c("GCST", "FINNGEN", "NEALE", "SAIGE") )
data |
Data Frame: The result of the |
disease |
Logical: A logical variable indicating whether to filter the PheWAS data for disease (default: TRUE). |
source |
Character vector: Choices for the data sources of PheWAS analysis, including FINNGEN, GCST, NEALE (UKBiobank), and SAIGE. |
A plot to prioritize variants based on their -log10(p-value).
## Not run: p <- pheWAS(variant_id = "14_87978408_G_A") %>% plot_phewas(disease = TRUE) p ## End(Not run)
## Not run: p <- pheWAS(variant_id = "14_87978408_G_A") %>% plot_phewas(disease = TRUE) p ## End(Not run)
The colocalisation analysis in Open Target Genetics is performed using the coloc method (Giambartolomei et al., 2014). Coloc is a Bayesian method which, for two traits, integrates evidence over all variants at a locus to evaluate the following hypotheses: - H0: No association with either trait - H1: Association with trait 1, not with trait 2 - H2: Association with trait 2, not with trait 1 - H3: Association with trait 1 and trait 2, two independent SNPs - H4: Association with trait 1 and trait 2, one shared SNP This analysis tests whether two independent associations at the same locus are consistent with having a shared causal variant. Colocalisation of two independent associations from two GWAS studies may suggest a shared causal mechanism.
qtlColocalisationVariantQuery(study_id, variant_id)
qtlColocalisationVariantQuery(study_id, variant_id)
study_id |
Character: Study ID(s) generated by Open Targets Genetics (e.g GCST90002357). |
variant_id |
Character: generated ID for variants by Open Targets Genetics (e.g. 1_154119580_C_A) or rsId (rs2494663). |
Returns a data frame of the colocalisation information for a lead variant in a specific study. The output is a tidy data frame with the following data structure:
qtlStudyName
: Character vector. QTL study name.
phenotypeId
: Character vector. Phenotype ID.
gene.id
: Character vector. Gene ID.
gene.symbol
: Character vector. Gene symbol.
name
: Character vector. Tissue name.
indexVariant.id
: Character vector. Index variant ID.
indexVariant.rsId
: Character vector. Index variant rsID.
beta
: Numeric. Beta value.
h4
: Numeric. h4 value.
h3
: Numeric. h3 value.
log2h4h3
: Numeric. Log2(h4/h3) value.
## Not run: result <- qtlColocalisationVariantQuery(study_id = "GCST90002357", variant_id = "1_154119580_C_A") result <- qtlColocalisationVariantQuery(study_id = "GCST90002357", variant_id = "rs2494663") ## End(Not run)
## Not run: result <- qtlColocalisationVariantQuery(study_id = "GCST90002357", variant_id = "1_154119580_C_A") result <- qtlColocalisationVariantQuery(study_id = "GCST90002357", variant_id = "rs2494663") ## End(Not run)
In Open Targets Genetics, the lead variants are expanded into a more comprehensive set of candidate causal variants referred to as the tag variants. This function retrieves calculated summary statistics for tag variants included in a lead variant colocalization analysis for a given study (which links a top loci with a trait). The user can filter the results by desired biofeature (e.g tissue, cell type,...) the function obtains tag variant information.
qtlCredibleSet(study_id, variant_id, gene, biofeature)
qtlCredibleSet(study_id, variant_id, gene, biofeature)
study_id |
Character: Study ID(s) generated by Open Targets Genetics (e.g GCST90002357). |
variant_id |
Character: generated ID for variants by Open Targets Genetics (e.g. 1_154119580_C_A) or rsId (rs2494663). |
gene |
Character: Gene ENSEMBL ID (e.g. ENSG00000169174) or gene symbol (e.g. PCSK9). |
biofeature |
Character: Represents either a tissue, cell type, aggregation type, protein type, etc. |
Returns a data frame of results from the QTL credible set of variants consisting of the following columns:
tagVariant.id
: Character vector. Tag variant ID.
tagVariant.rsId
: Character vector. Tag variant rsID.
pval
: Numeric. P-value.
se
: Numeric. Standard error.
beta
: Numeric. Beta value.
postProb
: Numeric. Posterior probability.
MultisignalMethod
: Character vector. Multisignal method.
logABF
: Numeric. Logarithm of approximate Bayes factor.
is95
: Logical. Indicates if the variant has a 95
is99
: Logical. Indicates if the variant has a 99
## Not run: result <- qtlCredibleSet(study_id = "Braineac2", variant_id = "1_55053079_C_T", gene = "ENSG00000169174", biofeature = "SUBSTANTIA_NIGRA") result <- qtlCredibleSet(study_id = "Braineac2", variant_id = "rs7552841", gene = "PCSK9", biofeature = "SUBSTANTIA_NIGRA") ## End(Not run)
## Not run: result <- qtlCredibleSet(study_id = "Braineac2", variant_id = "1_55053079_C_T", gene = "ENSG00000169174", biofeature = "SUBSTANTIA_NIGRA") result <- qtlCredibleSet(study_id = "Braineac2", variant_id = "rs7552841", gene = "PCSK9", biofeature = "SUBSTANTIA_NIGRA") ## End(Not run)
Running custom GraphQL queries
run_custom_query(variableList, query, query_name)
run_custom_query(variableList, query, query_name)
variableList |
is a list format which includes the key value pair list of genes/variants/study ids to be queries. |
query |
is a GraphQL desired query body to be run. |
query_name |
is a string format of the query name |
a flatten json file format
## Not run: otargen::run_custom_query (variableList, query, query_name) ## End(Not run)
## Not run: otargen::run_custom_query (variableList, query, query_name) ## End(Not run)
The "locus-to-gene" (L2G) model derives features to prioritize likely causal genes at each GWAS locus based on genetic and functional genomics features. The main categories of predictive features are:
Distance: Distance from credible set variants to the gene.
Molecular QTL colocalization: Colocalization with molecular QTLs.
Chromatin interaction: Interactions, such as promoter-capture Hi-C.
Variant pathogenicity: Pathogenicity scores from VEP (Variant Effect Predictor).
studiesAndLeadVariantsForGeneByL2G(gene, l2g = NA, pvalue = NA, vtype = NULL)
studiesAndLeadVariantsForGeneByL2G(gene, l2g = NA, pvalue = NA, vtype = NULL)
gene |
Character: Gene ENSEMBL ID (e.g. ENSG00000169174) or gene symbol (e.g. PCSK9). This argument can take a list of genes too. |
l2g |
Numeric: Locus-to-gene (L2G) cutoff score. (Default: NA) |
pvalue |
Character: P-value cutoff. (Default: NA) |
vtype |
Character: Most severe consequence to filter the variant types, including "intergenic_variant", "upstream_gene_variant", "intron_variant", "missense_variant", "5_prime_UTR_variant", "non_coding_transcript_exon_variant", "splice_region_variant". (Default: NULL) |
The function also provides additional filtering parameters to narrow the results based following parameters (see below)
Returns a data frame containing the input gene ID and its data for the L2G model. The table consists of the following columns:
yProbaModel
: Numeric. L2G score.
yProbaDistance
: Numeric. Distance.
yProbaInteraction
: Numeric. Chromatin interaction.
yProbaMolecularQTL
: Numeric. Molecular QTL.
yProbaPathogenicity
: Numeric. Pathogenicity.
pval
: Numeric. P-value.
beta.direction
: Character. Beta direction.
beta.betaCI
: Numeric. Beta confidence interval.
beta.betaCILower
: Numeric. Lower bound of the beta confidence interval.
beta.betaCIUpper
: Numeric. Upper bound of the beta confidence interval.
odds.oddsCI
: Numeric. Odds ratio confidence interval.
odds.oddsCILower
: Numeric. Lower bound of the odds ratio confidence interval.
odds.oddsCIUpper
: Numeric. Upper bound of the odds ratio confidence interval.
study.studyId
: Character. Study ID.
study.traitReported
: Character. Reported trait.
study.traitCategory
: Character. Trait category.
study.pubDate
: Character. Publication date.
study.pubTitle
: Character. Publication title.
study.pubAuthor
: Character. Publication author.
study.pubJournal
: Character. Publication journal.
study.pmid
: Character. PubMed ID.
study.hasSumstats
: Logical. Indicates if the study has summary statistics.
study.nCases
: Integer. Number of cases in the study.
study.numAssocLoci
: Integer. Number of associated loci.
study.nTotal
: Integer. Total number of samples in the study.
study.traitEfos
: Character. Trait EFOs.
variant.id
: Character. Variant ID.
variant.rsId
: Character. Variant rsID.
variant.chromosome
: Character. Variant chromosome.
variant.position
: Integer. Variant position.
variant.refAllele
: Character. Variant reference allele.
variant.altAllele
: Character. Variant alternate allele.
variant.nearestCodingGeneDistance
: Integer. Distance to the nearest coding gene.
variant.nearestGeneDistance
: Integer. Distance to the nearest gene.
variant.mostSevereConsequence
: Character. Most severe consequence.
variant.nearestGene.id
: Character. Nearest gene ID.
variant.nearestCodingGene.id
: Character. Nearest coding gene ID.
ensembl_id
: Character. Ensembl ID.
gene_symbol
: Character. Gene symbol.
## Not run: result <- studiesAndLeadVariantsForGeneByL2G(genes = c("ENSG00000163946", "ENSG00000169174", "ENSG00000143001"), l2g = 0.7) result <- studiesAndLeadVariantsForGeneByL2G(genes = "ENSG00000169174", l2g = 0.6, pvalue = 1e-8, vtype = c("intergenic_variant", "intron_variant")) result <- studiesAndLeadVariantsForGeneByL2G(genes = "TMEM61") ## End(Not run)
## Not run: result <- studiesAndLeadVariantsForGeneByL2G(genes = c("ENSG00000163946", "ENSG00000169174", "ENSG00000143001"), l2g = 0.7) result <- studiesAndLeadVariantsForGeneByL2G(genes = "ENSG00000169174", l2g = 0.6, pvalue = 1e-8, vtype = c("intergenic_variant", "intron_variant")) result <- studiesAndLeadVariantsForGeneByL2G(genes = "TMEM61") ## End(Not run)
For a given study id, this function returns a data frame of relevant information about the GWAS study, such as PubMed ID, studied trait EFO ID, case/control size, etc.
studyInfo(study_id)
studyInfo(study_id)
study_id |
Character: Study ID(s) generated by Open Targets Genetics (e.g GCST90002357). |
Returns a data frame (in tibble format) containing the summary iformation about a GWAS study. The data frame has the following data structure:
studyId
: Character. Study ID.
traitReported
: Character. Reported trait.
source
: Character. Source.
traitEfos
: Character. Trait EFO ID.
pmid
: Character. PubMed ID.
pubDate
: Character. Publication date.
pubJournal
: Character. Publication journal.
pubTitle
: Character. Publication title.
pubAuthor
: Character. Publication author.
hasSumstats
: Character. Indicates if the study has summary statistics.
ancestryInitial
: Character. Initial ancestry.
nInitial
: Character. Initial sample size.
nReplication
: Character. Replication sample size.
traitCategory
: Character. Trait category.
numAssocLoci
: Character. Number of associated loci.
nTotal
: Character. Total sample size.
## Not run: result <- studyInfo(study_id = "GCST90002357") ## End(Not run)
## Not run: result <- studyInfo(study_id = "GCST90002357") ## End(Not run)
This function fetches the locus-to-gene (L2G) pipeline summary data table for the neighboring genes of a variant in a GWAS study.
studyLocus2GeneTable(study_id, variant_id)
studyLocus2GeneTable(study_id, variant_id)
study_id |
Character: Study ID(s) generated by Open Targets Genetics (e.g GCST90002357). |
variant_id |
Character: generated ID for variants by Open Targets Genetics (e.g. 1_154119580_C_A) or rsId (rs2494663). |
Returns a data frame with the summary statistics of the study and a data table containing various calculated scores and features for any lead variant. The output table has the following data structure:
studyId
: Character. Study ID.
variant.id
: Character. Variant ID.
variant.rsId
: Character. Variant rsID.
yProbaDistance
: Numeric. Distance score.
yProbaModel
: Numeric. Model score.
yProbaMolecularQTL
: Numeric. Molecular QTL score.
yProbaPathogenicity
: Numeric. Pathogenicity score.
yProbaInteraction
: Numeric. Interaction score.
hasColoc
: Logical. Indicates if colocalization data is available.
distanceToLocus
: Numeric. Distance to the locus.
gene.id
: Character. Gene ID.
gene.symbol
: Character. Gene symbol.
## Not run: result <- studyLocus2GeneTable(study_id = "GCST90002357", variant_id = "1_154119580_C_A") result <- studyLocus2GeneTable(study_id = "GCST90002357", variant_id = "rs2494663") ## End(Not run)
## Not run: result <- studyLocus2GeneTable(study_id = "GCST90002357", variant_id = "1_154119580_C_A") result <- studyLocus2GeneTable(study_id = "GCST90002357", variant_id = "rs2494663") ## End(Not run)
For an input study ID, this function returns information of all variants across associated loci. The output also includes information about the associated genes within the each loci.
studyVariants(study_id)
studyVariants(study_id)
study_id |
Character: Study ID(s) generated by Open Targets Genetics (e.g GCST90002357). |
Returns a list of two data frames.
the first data frame (tibble format) includes the loci data frame with following data structure:
variant.id
: Character. Variant ID.
pval
: Numeric. P-value.
variant.nearestCodingGene.symbol
: Character. Symbol of the nearest coding gene to the variant.
variant.rsId
: Character. Variant rsID.
variant.chromosome
: Character. Chromosome of the variant.
variant.position
: Integer. Position of the variant.
variant.nearestCodingGeneDistance
: Integer. Distance to the nearest coding gene.
credibleSetSize
: Integer. Size of the credible set.
ldSetSize
: Integer. Size of the LD set.
oddsRatio
: Numeric. Odds ratio.
beta
: Numeric. Beta value.
The second data frame includes gene information with following data structure:
score
: Numeric. Gene score.
gene.id
: Character. Gene ID.
gene.symbol
: Character. Gene symbol.
## Not run: result <- studyVariants(study_id = "GCST003155") ## End(Not run)
## Not run: result <- studyVariants(study_id = "GCST003155") ## End(Not run)
For an input index variant ID, this function fetches information about the tag variants and associated studies, including scores.
tagVariantsAndStudiesForIndexVariant(variant_id, pageindex = 0, pagesize = 20)
tagVariantsAndStudiesForIndexVariant(variant_id, pageindex = 0, pagesize = 20)
variant_id |
Character: Open Targets Genetics generated ID for a variant (CHRPOSITION_REFALLELE_ALTALLELE or rsId). |
pageindex |
Integer: Index of the current page for pagination (>= 0). |
pagesize |
Integer: Number of records in a page for pagination (> 0). |
Returns a data frame containing the variant associations connected to the input index variant. The columns in the data frame are as follows:
tagVariant.id
: Character. Tag variant ID.
tagVariant.chromosome
: Character. Chromosome of the tag variant.
tagVariant.rsId
: Character. rsID of the tag variant.
tagVariant.position
: Integer. Position of the tag variant.
study.studyId
: Character. Study ID.
study.traitReported
: Character. Reported trait of the study.
study.traitCategory
: Character. Category of the trait in the study.
pval
: Numeric. P-value.
pvalMantissa
: Numeric. Mantissa of the p-value.
pvalExponent
: Integer. Exponent of the p-value.
nTotal
: Integer. Total number of samples.
nCases
: Integer. Number of cases in the study.
overallR2
: Numeric. Overall R-squared value.
afr1000GProp
: Numeric. Proportion in African 1000 Genomes population.
amr1000GProp
: Numeric. Proportion in Admixed American 1000 Genomes population.
eas1000GProp
: Numeric. Proportion in East Asian 1000 Genomes population.
eur1000GProp
: Numeric. Proportion in European 1000 Genomes population.
sas1000GProp
: Numeric. Proportion in South Asian 1000 Genomes population.
oddsRatio
: Numeric. Odds ratio.
oddsRatioCILower
: Numeric. Lower bound of the odds ratio confidence interval.
oddsRatioCIUpper
: Numeric. Upper bound of the odds ratio confidence interval.
posteriorProbability
: Numeric. Posterior probability.
beta
: Numeric. Beta value.
betaCILower
: Numeric. Lower bound of the beta value confidence interval.
betaCIUpper
: Numeric. Upper bound of the beta value confidence interval.
direction
: Character. Direction of the effect.
log10Abf
: Numeric. Log base 10 of the approximate Bayes factor.
## Not run: result <- tagVariantsAndStudiesForIndexVariant(variant_id = "1_109274968_G_T") result <- tagVariantsAndStudiesForIndexVariant(variant_id = "1_109274968_G_T" ,pageindex = 1, pagesize = 50) ## End(Not run)
## Not run: result <- tagVariantsAndStudiesForIndexVariant(variant_id = "1_109274968_G_T") result <- tagVariantsAndStudiesForIndexVariant(variant_id = "1_109274968_G_T" ,pageindex = 1, pagesize = 50) ## End(Not run)
For a provided study ID, the function, retrieves top studies with overlap in their identified loci with the queried study loci.
topOverlappedStudies(study_id, pageindex = 0, pagesize = 20)
topOverlappedStudies(study_id, pageindex = 0, pagesize = 20)
study_id |
Character: Open Targets Genetics generated ID for a GWAS study. |
pageindex |
Integer: Index of the current page for pagination (>= 0). |
pagesize |
Integer: Number of records in a page for pagination (> 0). |
Returns a data frame with the top studies containing the following columns:
study.studyId
: Character. Study ID of the input study.
study.traitReported
: Character. Reported trait of the input study.
study.traitCategory
: Character. Category of the trait in the input study.
topStudiesByLociOverlap.studyId
: Character. Study ID of the top associated studies.
topStudiesByLociOverlap.study.studyId
: Character. Study ID of the top associated studies.
topStudiesByLociOverlap.study.traitReported
: Character. Reported trait of the top associated studies.
topStudiesByLociOverlap.study.traitCategory
: Character. Category of the trait in the top associated studies.
topStudiesByLociOverlap.numOverlapLoci
: Integer. Number of loci overlapped with the input study.
## Not run: result <- topOverlappedStudies(study_id = "GCST006614_3") result <- topOverlappedStudies(study_id = "NEALE2_6177_1", pageindex = 1, pagesize = 50) ## End(Not run)
## Not run: result <- topOverlappedStudies(study_id = "GCST006614_3") result <- topOverlappedStudies(study_id = "NEALE2_6177_1", pageindex = 1, pagesize = 50) ## End(Not run)
For a given variant ID, this function retrieves information about the variant, including its chromosome, position, reference allele, alternative allele, rsID, nearest gene, most severe consequence, and allele frequencies in different populations from gnomAD databse. The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community (see the reference).
variantInfo(variant_id)
variantInfo(variant_id)
variant_id |
Character: generated ID for variants by Open Targets Genetics (e.g. 1_154119580_C_A) or rsId (rs2494663). |
Returns a data frame (in tibble format) containing information about the variant. The data frame has the following structure:
chromosome
: Character. Chromosome of the variant.
position
: Integer. Position of the variant.
refAllele
: Character. Reference allele.
altAllele
: Character. Alternative allele.
rsId
: Character. Variant rsID.
chromosomeB37
: Character. Chromosome of the variant in build 37 coordinates.
positionB37
: Integer. Position of the variant in build 37 coordinates.
id
: Character. Variant ID.
nearestGene.id
: Character. ID of the nearest gene to the variant.
nearestGene.symbol
: Character. Symbol of the nearest gene to the variant.
nearestGeneDistance
: Integer. Distance to the nearest gene.
nearestCodingGene.id
: Character. ID of the nearest coding gene to the variant.
nearestCodingGene.symbol
: Character. Symbol of the nearest coding gene to the variant.
nearestCodingGeneDistance
: Integer. Distance to the nearest coding gene.
mostSevereConsequence
: Character. Most severe consequence of the variant.
caddRaw
: Numeric. CADD raw score.
caddPhred
: Numeric. CADD phred score.
gnomadAFR
: Numeric. Allele frequency in the African/African-American population in gnomAD.
gnomadAMR
: Numeric. Allele frequency in the Latino/Admixed American population in gnomAD.
gnomadASJ
: Numeric. Allele frequency in the Ashkenazi Jewish population in gnomAD.
gnomadEAS
: Numeric. Allele frequency in the East Asian population in gnomAD.
gnomadFIN
: Numeric. Allele frequency in the Finnish population in gnomAD.
gnomadNFE
: Numeric. Allele frequency in the Non-Finnish European population in gnomAD.
gnomadNFEEST
: Numeric. Allele frequency in the Estonian population in gnomAD.
gnomadNFENWE
: Numeric. Allele frequency in the Northwest European population in gnomAD.
gnomadNFESEU
: Numeric. Allele frequency in the Southern European population in gnomAD.
gnomadNFEONF
: Numeric. Allele frequency in the Other Non-Finnish European population in gnomAD.
gnomadOTH
: Numeric. Allele frequency in other populations in gnomAD.
https://gnomad.broadinstitute.org/
## Not run: result <- variantInfo(variant_id = "rs2494663") ## End(Not run)
## Not run: result <- variantInfo(variant_id = "rs2494663") ## End(Not run)