Processing mapping files for the omics FixID tool

This repository contains the source code for data processing to create identifier (IDs) mapping files for secondary IDs (outdated/deprecated/split/megred). The following databases have been included in this project:

Datasource	license	citation
ChEBI (config)	CC BY 4.0	Hastings J, Owen G, Dekker A, et al. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Research. 2016 Jan;44(D1):D1214-9. DOI: 10.1093/nar/gkv1031. PMID: 26467479; PMCID: PMC4702775.
HMDB (config)	CC0	Wishart DS, Guo A, Oler E, Wang F, Anjum A, Peters H, Dizon R, Sayeeda Z, Tian S, Lee BL, Berjanskii M, Mah R, Yamamoto M, Jovel J, Torres-Calzada C, Hiebert-Giesbrecht M, Lui VW, Varshavi D, Varshavi D, Allen D, Arndt D, Khetarpal N, Sivakumaran A, Harford K, Sanford S, Yee K, Cao X, Budinski Z, Liigand J, Zhang L, Zheng J, Mandal R, Karu N, Dambrova M, Schiöth HB, Greiner R, Gautam V. HMDB 5.0: the Human Metabolome Database for 2022. Nucleic Acids Res. 2022 Jan 7;50(D1):D622-D631. doi: 10.1093/nar/gkab1062. PMID: 34986597; PMCID: PMC8728138.
HGNC (config)	link	Seal RL, Braschi B, Gray K, Jones TEM, Tweedie S, Haim-Vilmovsky L, Bruford EA. Genenames.org: the HGNC resources in 2023. Nucleic Acids Res. 2023 Jan 6;51(D1):D1003-D1009. doi: 10.1093/nar/gkac888. PMID: 36243972; PMCID: PMC9825485.
NCBI (config)	link	Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, Connor R, Funk K, Kelly C, Kim S, Madej T, Marchler-Bauer A, Lanczycki C, Lathrop S, Lu Z, Thibaud-Nissen F, Murphy T, Phan L, Skripchenko Y, Tse T, Wang J, Williams R, Trawick BW, Pruitt KD, Sherry ST. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2022 Jan 7;50(D1):D20-D26. doi: 10.1093/nar/gkab1112. PMID: 34850941; PMCID: PMC8728269.
UniProt (config)	CC BY 4.0	UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489. doi: 10.1093/nar/gkaa1100. PMID: 33237286; PMCID: PMC7778908.
Wikidata (config)	CC0	Vrandecic, D., Krotzsch, M. Wikidata: a free collaborative knowledgebase. Communications of the ACM. 2014. doi: 10.1145/2629489.

You can access the executable libraries to create mapping files here.

Contributing

If you wish to develop the code further, install the source code requiring Java 8 (or 11) as JRE (depending on the version used in BridgeDb.

Clone the code from this repository
Add this project in Eclipse and build from maven using 'clean install', or run the build from your command line:

Build from Command Line

sudo apt update
sudo apt install gh 
gh repo clone sec2pri/mapping_preprocessing
sudo apt install openjdk-8-jre-headless #or: sudo apt install openjdk-11-jre-headless
sudo apt install maven #to build the code

This will create an executable java file called 'mapping_preprocessing-0.0.1.jar'

Create ID mapping files

Visit the location where the executable java file is located (in folder 'target').

#sudo apt-get install gzip #if not available
RELEASE_NUMBER="247"
wget "http://ftp.ebi.ac.uk/pub/databases/chebi/archive/rel${RELEASE_NUMBER}/SDF/chebi_3_stars.sdf.gz"
gunzip chebi_3_stars.sdf.gz
inputFile="chebi_3_stars.sdf"
outputDir="$(pwd)"
java -cp ".:*" target/mapping_preprocessing-0.0.1.jar org.sec2pri.chebi_sdf $inputFile $outputDir
java -cp target/mapping_preprocessing-0.0.1.jar org.sec2pri.hmdb_xml $inputFile $outputDir
3) NCBI txt
java -cp target/mapping_preprocessing-0.0.1.jar org.sec2pri.ncbi_txt $inputFile $outputDir

InputFile: the input file directory and file name (ChEBI: SDF download and unzipping; HMDB: XML download, unzipping, and splitting the file into individual XMLs per entry; NCBI: download the data).

outputDir: the directory in which the output file(s) should be saved.

Releases

The mapping files are released and archived on Zenodo link tba

Name		Name	Last commit message	Last commit date
Latest commit History 839 Commits
.github		.github
datasources		datasources
java		java
r/src		r/src
.gitignore		.gitignore
AUTHORS		AUTHORS
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Processing mapping files for the omics FixID tool

Contributing

Build from Command Line

Create ID mapping files

Releases

About

Uh oh!

Releases

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Processing mapping files for the omics FixID tool

Contributing

Build from Command Line

Create ID mapping files

Releases

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors

Uh oh!

Languages