## Kamel Jebreen

## Biostatistician

**Biography**

**My domain focuses on machine learning for classification and interaction networks. In my recent work as a Senior Research Engineer in Statistical Learning at Clinical Research Unit, Paris University, AP-HP, Paris, France, I use machine learning and Bayesian statistics for analyzing medical protocols in addition to survival analysis, Stochastic Processes, data management, and experimental designs.**

## Research Interests

- Statistical learning.
- Bayesian statistics.
- Experimental design.
- Survival analysis.
- Stochastic Processes.
- Graphical model for classification and time series.
- Data Management with SAS.
- Statistical Genomics Linkage, and Mapping.

## Education

- PhD in Machine learning on graphical model and time series.

**Aix Marseille University**, Marseille, France.

- Master in Applied Mathematics.

**PPU**, Hebron, Palestine.

- Bachelor in Applied Mathematics.

**PPU**, Hebron, Palestine.

## Projects:

## Clinical Research and Survival Analysis (2019 - today)

Prepare and analyze many clinical Protocols:

1. Glucocorticoids with low-dose anti-IL1 anakinra rescue in severe non-ICU COVID-19 infection: A cohort study.

2. Role of candidate genes in the progression to heart failure: study in a prospective cohort of patients who presented with a myocardial infarction (PREGICA cohort: Genetic PREDisposition to Heart Failure).

3. Randomized study evaluating the efficacy and tolerance of a parietal membrane implanted during the closure of a temporary ileostomy after laparoscopic rectal surgery.

4. Naloxegol administration to prevent opioids induced gastrointestinal motility disturbance in brain Injured Patients.

## Detect and Map Copy Number Variants from Segregation Data (2018- 2019)

Single nucleotide polymorphisms (SNPs) are used widely for detecting quantitative trait loci, or for searching for causal variants of diseases. Nevertheless, structural variations such as copy-number variants (CNVs) represent a large part of natural genetic diversity, and contribute significantly to trait variation. Numerous methods and softwares based on different technologies (amplicons, CGH, tiling, or SNP arrays, or sequencing) have already been developed to detect CNVs, but they bypass a wealth of information such as genotyping data from segregating populations, produced, e.g., for QTL mapping. Here, we propose an original method to both detect and genetically map CNVs using mapping panels. Specifically, we exploit the apparent heterozygous state of duplicated loci: peaks in appropriately defined genome-wide allelic profiles provide highly specific signatures that identify the nature and position of the CNVs. Our original method and software can detect and map automatically up to 33 different predefined types of CNVs based on segregation data only. We validate this approach on simulated and experimental biparental mapping panels in two maize populations and one wheat population. Most of the events found correspond to having just one extra copy in one of the parental lines, but the corresponding allelic value can be that of either parent. We also find cases with two or more additional copies, especially in wheat, where these copies locate to homeologues. More generally, our computational tool can be used to give additional value, at no cost, to many datasets produced over the past decade from genetic mapping panels.

## Probabilities of Multilocus Genotypes in SIB Recombinant Inbred Lines (2018- 2019)

Recombinant Inbred Lines (RILs) are obtained through successive generations of inbreeding. In 1931 Haldane and Waddington published a landmark paper where they provided the probabilities of achieving any combination of alleles in 2-way RILs for 2 and 3 loci. In the case of sibling RILs where sisters and brothers are crossed at each generation, there has been no progress in treating 4 or more loci, a limitation we overcome here without much increase in complexity. In the general situation of L loci, the task is to determine 2L probabilities, but we find that it is necessary to first calculate the 4L “identical by descent” (IBD) probabilities that a RIL inherits at each locus its DNA from one of the four originating chromosomes. We show that these 4L probabilities satisfy a system of linear equations that follow from self-consistency. In the absence of genetic interference— crossovers arising independently—the associated matrix can be written explicitly in terms of the recombination rates between the different loci. We provide the matrices for L up to 4 and also include a computer program to automatically generate the matrices for higher values of L. Furthermore, our framework can be generalized to recombination rates that are different in female and male meiosis which allows us to show that the Haldane and Waddington 2-locus formula is valid in that more subtle case if the meiotic recombination rate is taken as the average rate across female and male. Once the 4L IBD probabilities are determined, the 2L probabilities of RIL genotypes are obtained via summations of these quantities. In fine, our computer program allows to determine the probabilities of all the multilocus genotypes produced in such sibling-based RILs for L<=10, a huge leap beyond the L = 3 restriction of Haldane and Waddington.

## Graphical model for classification and time series (2018- 2019)

I introduced two approaches: **Firstly**, I combine such approaches together with feature selection and discretization to show that such a combination gives rise to powerful classifiers using Bayesian networks. The application to Epilepsy type prediction based on PET scan data. **Secondly**, I performed modeling interaction networks between a set of variables in the context of time series and high dimension. fMRI and simulated data were used to present the results.

## Experience

## Senior Research And Development Engineer

## Dec, 2019 - Present

We design the experiment, prepare the clinical protocol, analyze the information, and publish the results.

## Assistant Professor

## Sep, 2014 - Present

Teaching and Research.

Machine Learning (Graduate course)

Mathematical Statistic (Graduate course)

Biostatistics (undergraduate course)

## Research And Development Engineer

## Feb, 2018 - Oct, 2019

I worked in statistical genetics, exploiting genetic and genomic data, I provided novel solutions in each of these projects:

1. I used data from genotyping arrays (mainly 50 K SNP) on segregating populations to infer which markers are involved in genomic structural variations. This work is novel and published and the R package available online.

2. I calculate the probabilities of all possible multi-locus genotypes arising in recombinant inbred lines of the « SIB » type. This had never been done for more than 3 loci. This work is published and the code available online.

## Doctoral Mission in Statistics (Machine learning and Big Data)

## Sep, 2014 - Sep, 2017

I introduced two approaches: **Firstly**, I combine such approaches together with feature selection and discretization to show that such a combination gives rise to powerful classifiers using Bayesian networks. The application to Epilepsy type prediction based on PET scan data. **Secondly**, I performed modeling interaction networks between a set of variables in the context of time series and high dimension. fMRI and simulated data were used to present the results.