PhD projects in Bayesian statistics for high-dimensional data:
The project will be in the area of large scale, high-dimensional Bayesian models for data integration and variable selection. The research will be highly motivated by applications in chronic disease epidemiology, integrating multiple high-dimensional data sets simultaneously, enabling researchers to model mediation effects of intermediate exposures and biological markers.
Possible directions include:
Bayesian models for high-dimensional longitudinal data, with variable selection and latent variables
Modelling mediation effects in high-dimensional epidemiological big data sets
Methods for exploring latent structure in Bayesian graphical models
This project will look at Bayesian methods for simultaneous clustering and variable selection. This is a challenging problem, to find latent structure in high-dimensional data whilst simultaneously discovering the variables which best predict this structure.
The project will aim to extend Bayesian clustering, mixture or factor analysis models to incorporate automatic feature selection.
Areas of application may be in epidemiology, health economics or bioinformatics.
For further details please contact me
My main research area is developing Bayesian methods in statistical genomics and epidemiology, in particular Bayesian hierarchical models and variable selection models. I have worked on Bayesian models for analysing high-throughput molecular biology data, including gene expression microarrays, next-generation RNA-sequence data and metabolomics data. My current research is on methods for data integration and variable selection for multiple "omics" data sets.
I also work on methods in the Classical statistical framework, and apply these methods in genetic epidemiology and medical applications. I am particularly interested in variable selection and multiple testing issues.
I have a background in Mathematics and a PhD in Cosmology, where I worked on detecting non-Gaussianity in the cosmic microwave background and on analysis methods for Type Ia supernovae light curves.
- Statistical methodology: Highly structured stochastic systems; Bayesian hierarchical models;
Variable selection and prediction; Bayesian model criticism; Methods for multiple testing.
- Statistical genomics and genetic epidemiology: Variable selection in high-dimensional modelling of genomics, epigenomics, transcriptomics, proteomics and metabolomics data.
- Molecular Biology: Statistical methods for modelling high-throughput molecular biology data, including microarray and sequencing data.
Some recent publications:
Lewin A et al. (2015), MT-HESS: an efficient Bayesian approach for simultaneous association detection in
OMICS datasets, with application to eQTL mapping in multiple tissues. Bioinformatics (in press).
Janes J, Hu F, Lewin AM, Turro E. (2015). A comparative study of RNA-seq analysis strategies. Briefings in
Bioinformatics, 2015. doi: 10.1093/bib/bbv007.
Van der Valk et al. (2015), A novel common variant in DCST2 is associated with length in early life and height in
adulthood. Hum Mol Genet. 24(4):1155-68.
Chambers J et al. (2014), The South Asian Genome. PLoS ONE 9(8): e102645.
Kirk P, Witkover A, Bangham CR, Richardson S, Lewin AM, Stumpf MP. (2013), Balancing the robustness and
predictive performance of biomarkers. J. Comp. Biol. December 2013, 20(12): 979-989.
Thillai M, Eberhardt C, Lewin AM, Potiphar L, Hingley-Wilson S, et al. (2012), Sarcoidosis and Tuberculosis
Cytokine Profiles: Indistinguishable in Bronchoalveolar Lavage but Different in Blood.
PLoS ONE 7(7):
Kirk P, Witkover A, Courtney A, Lewin A, Wait R, Stumpf M, Richardson S, Taylor G and Bangham C (2011),
Plasma proteome analysis in HTLV-1-associated myelopathy/tropical spastic paraparesis. Retrovirology.
2011 Oct 12;8:81.
Turro E, Su S-Y, Goncalves A, Coin L J M, Richardson S and Lewin A (2011), Haplotype and isoform specific
expression estimation using multi-mapping RNA-seq reads. Genome Biology Vol. 12, R13.
Felix, JF. , Bradfield, JP. , Monnereau, C. , van der Valk, RJP. , et al. (2016) 'Genome-wide association analysis identifies three new susceptibility loci for childhood body mass index'. HUMAN MOLECULAR GENETICS, 25 (2). pp. 389 - 403. doi: 10.1093/hmg/ddv472
Lewin, A. , Saadi, H. , Peters, JE. , Moreno-Moral, A. , et al. (2016) 'MT-HESS: an efficient Bayesian approach for simultaneous association detection in OMICS datasets, with application to eQTL mapping in multiple tissues'. BIOINFORMATICS, 32 (4). pp. 523 - 532. doi: 10.1093/bioinformatics/btv568
Jänes, J. , Hu, F. , Lewin, A. and Turro, E. (2015) 'A comparative study of RNA-seq analysis strategies'. Briefings in Bioinformatics, 16 (6). pp. 932 - 940. doi: 10.1093/bib/bbv007
van der Valk, RJ. , Kreiner-Møller, E. , Kooijman, MN. , Guxens, M. , et al. (2015) 'A novel common variant in DCST2 is associated with length in early life and height in adulthood.'. Human Molecular Genetics, 24 (4). pp. 1155 - 1168. doi: 10.1093/hmg/ddu510 Download publication
Kirk, P. , Witkover, A. , Bangham, CRM. , Richardson, S. , et al. (2013) 'Balancing the robustness and predictive performance of biomarkers'. Journal of Computational Biology, 20 (12). pp. 979 - 989. doi: 10.1089/cmb.2013.0018
Taal, HR. , St Pourcain, B. , Thiering, E. , Das, S. , et al. (2012) 'Common variants at 12q15 and 12q24 are associated with infant head circumference'. NATURE GENETICS, 44 (5). pp. 532 - +. doi: 10.1038/ng.2238
Ikram, MA. , Fornage, M. , Smith, AV. , Seshadri, S. , et al. (2012) 'Common variants at 6q22 and 17q21 are associated with intracranial volume'. Nature Genetics, 44 (5). pp. 539 - 544. doi: 10.1038/ng.2245
Thillai, M. , Eberhardt, C. , Lewin, AM. , Potiphar, L. , et al. (2012) 'Sarcoidosis and tuberculosis cytokine profiles: Indistinguishable in bronchoalveolar lavage but different in blood'. PLoS ONE, 7 (7). doi: 10.1371/journal.pone.0038083 Download publication
Kirk, PDW. , Witkover, A. , Courtney, A. , Lewin, AM. , et al. (2011) 'Plasma proteome analysis in HTLV-1-associated myelopathy/tropical spastic paraparesis'. Retrovirology, 8 (1). pp. 81 - 81. doi: 10.1186/1742-4690-8-81
Turro, E. , Su, SY. , Gonçalves, A. , Coin, LJM. , et al. (2011) 'Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads'. Genome Biology, 12 (2). doi: 10.1186/gb-2011-12-2-r13 Download publication
Kulinskaya, E. and Lewin, A. (2009) 'Testing for linkage and Hardy-Weinberg disequilibrium'. Annals of Human Genetics, 73 (2). pp. 253 - 262. doi: 10.1111/j.1469-1809.2008.00501.x
Kulinskaya, E. and Lewin, A. (2009) 'On fuzzy familywise error rate and false discovery rate procedures for discrete distributions'. Biometrika, 96 (1). pp. 201 - 211. doi: 10.1093/biomet/asn061
Turro, E. , Lewin, A. , Rose, A. , Dallman, MJ. and Richardson, S. (2009) 'MMBGX: A method for estimating expression at the isoform level and detecting differential splicing using whole-transcript Affymetrix arrays'. Nucleic Acids Research, 38 (1). pp. e4 - e4. doi: 10.1093/nar/gkp853 Download publication
Lewin, A. , Bochkina, N. and Richardson, S. (2007) 'Fully Bayesian Mixture Model for Differential Gene Expression: Simulations and Model Checks'. Statistical Applications in Genetics and Molecular Biology, 6 (1). doi: 10.2202/1544-6115.1314
Lewin, A. and Grieves, I. (2006) 'Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data'. BMC Bioinformatics, 7 doi: 10.1186/1471-2105-7-426
Lewin, A. , Richardson, S. , Marshall, C. , Glazier, A. and Aitman, T. (2006) 'Bayesian Modeling of Differential Gene Expression'. Biometrics, 62 (1). pp. 10 - 18. doi: 10.1111/j.1541-0420.2005.00394.x
Broet, P. , Lewin, A. , Richardson, S. , Dalmasso, C. and Magdelenat, H. (2004) 'A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments'. Bioinformatics, 20 (16). pp. 2562 - 2571. doi: 10.1093/bioinformatics/bth285
Jarup, L. , Briggs, D. , de Hoogh, C. , Morris, S. , et al. (2002) 'Cancer risks in populations living near landfill sites in Great Britain'. Br J Cancer. doi: 10.1038/sj.bjc.6600311
Lewin, A. and Albrecht, A. (2001) 'Can inflationary models of cosmic perturbations evade the secondary oscillation test?'. Physical Review D, 64 (2). doi: 10.1103/PhysRevD.64.023514
Lewin, A. , Albrecht, A. and Magueijo, J. (1999) 'A new statistic for picking out non-Gaussianity in the CMB'. Monthly Notices of the Royal Astronomical Society, 302 (1). pp. 131 - 138. doi: 10.1046/j.1365-8711.1999.02104.x