Gene Expression Data Retrieval from GEO (NCBI) – B.Sc. Bioinformatics Practical
Aim of the Experiment
To retrieve and analyze gene expression data from the Gene Expression Omnibus (GEO) database of NCBI.
Principle
The Gene Expression Omnibus (GEO) is a public repository maintained by NCBI that stores high-throughput gene expression data (microarray and RNA-Seq).
- Each dataset is assigned an accession number (e.g., GSE, GSM, GPL)
- Data represents gene activity under different biological conditions
- GEO allows retrieval, visualization, and comparison of expression profiles
Requirements
- Computer with internet connection
- Web browser
- Access to NCBI
- Basic knowledge of genes and expression analysis
Step-by-Step Procedure
Step 1: Open GEO Database
- Visit NCBI
- From the database dropdown, select GEO DataSets
Step 2: Enter Search Query
- Type keywords such as:
- Gene name (e.g., TP53)
- Disease (e.g., cancer)
- Organism (e.g., Homo sapiens)
Example:
TP53 breast cancer Homo sapiens
Step 3: Run Search
- Click Search
- A list of GEO datasets (GSE) will appear
Step 4: Select Dataset (GSE)
- Click on a relevant GSE accession number
- Review:
- Study title
- Organism
- Experimental design
Step 5: Explore Dataset Information
- Check:
- Number of samples
- Platform used (GPL)
- Type of experiment (microarray/RNA-Seq)
Step 6: View Sample Data (GSM)
- Scroll to Samples (GSM)
- Click any sample to view expression values
Step 7: Analyze Gene Expression
- Use “Analyze with GEO2R” option
- Divide samples into groups (e.g., control vs treated)
- Click Top 250 to view differentially expressed genes
Step 8: Interpret Results
- Observe:
- Fold change
- p-value
- Upregulated/downregulated genes
Step 9: Download Data
- Click Download → Series Matrix File(s)
- Save expression dataset for further analysis
Step 10: Record Observations
- Note in practical file:
- Dataset accession number (GSE)
- Organism
- Number of samples
- Key differentially expressed genes
Result
Gene expression dataset successfully retrieved and analyzed using GEO2R tool, identifying differentially expressed genes.
Precautions
- Choose datasets with proper experimental design
- Ensure correct grouping of samples in GEO2R
- Verify organism and platform
- Interpret p-values carefully
Applications
- Disease gene identification
- Biomarker discovery
- Drug target analysis
- Functional genomics
- Personalized medicine
Viva Voce Questions (with Answers)
- What is GEO?A gene expression database of NCBI.
- What does GSE stand for?GEO Series (dataset).
- What is GSM?Individual sample in GEO.
- What is GPL?Platform used for experiment.
- What is GEO2R?Online tool for comparing gene expression.
- What is fold change?Measure of gene expression difference.
- What is p-value?Indicates statistical significance.
- What is upregulation?Increase in gene expression.
- What is downregulation?Decrease in gene expression.
- Which data types are in GEO?Microarray and RNA-Seq data.
0 Comments