Alexandra Zaharia

Experience

Platform.sh (remote)

Cloud Software Engineer

Contributing to the orchestration layer which manages clusters, containers and services.

Environment: Linux, Python, Docker, LXC, git, GitLab

Qiet (Home Labs, Paris)

Back-end Engineer / Embedded Systems Engineer

Designed a flexible framework for testing hardware in factory assembly line settings. Both types of end-users are taken into account: test authors (who have an expressive, well documented and easy to use API at their disposal) and test operators (with an emphasis on test bench operator time optimization).
Implemented a proprietary TCP protocol for sending alert messages to the remote monitoring center. Integrated this microservice within our back-end ecosystem.

Environment: Linux, systemd, Python, sockets, asyncio, interfacing with serial devices, PostgreSQL, finite state machines, markdown, Bugzilla, MantisBT, mercurial

Freebox (Iliad/Free, Paris)

Embedded Systems Engineer

Prototyped a flat file system for embedded and external EEPROMs that minimizes device wear all while ensuring data consistency in case of hardware or external failure.
Conceived and prototyped a fingerprint recognition algorithm using images acquired through a capacitive sensor.
Implemented a generic async I2C via GPIO driver for the in-house RTOS using finite state machines.
Ported the in-house RTOS to STM32 platforms.

Environment: Linux, Python, C, RTOS, EFM32, STM32, finite state machines, gdb, JLink, openocd, mercurial

MICALIS (INRA, Jouy-en-Josas)

Machine Learning Engineer (Post-Doc)

Designed machine learning and deep learning methods for the prediction and ranking of the most likely products of given chemical transformations. This approach complements and enriches classical design-build-test retrosynthetic workflows.
Assembled and explored big data learning datasets that could not fit in memory (~100 GB).
Delivered an open-source tool that enables researchers to extract relevant data from a knowledge base that is notoriously difficult to parse.

Environment: Linux, Python, Keras, scikit-learn, numpy, snakemake, HDF5, XML, markdown, GitLab

Laboratoire de Recherche en Informatique (Orsay)

Research Associate (PhD)

Investigated a NP-hard graph theory problem. Devised methodology and algorithms for the identification of conserved metabolic and genomic neighborhoods across multiple species.
Delivered CoMetGeNe, an open-source Python pipeline that implements this approach.
Attended trainings in Fortran, MPI, distributed computing, project management.

Environment: Linux, bash, Python, graph theory, concurrency, XML, LaTeX, git

Paris Sud University (Orsay)

Teaching Assistant

Taught algorithms and data structures, RDBMS design and object-oriented programming at M1 level in the Bioinformatics and Biostatistics Master's program.
Devised lab assignments from scratch on topics that I deemed vital for the students' knowledge (such as file I/O, multi-file projects and makefiles, or threading).
Conceived and graded 11 student projects in total, 2 of which were extensive, semester-long R&D projects focused on solving bioinformatics problems rooted in my research interests.

Environment: Linux, C, valgrind, PostgreSQL, Java, Swing, LaTeX, bash

Research Intern (4 internships / 10 months in total)

View internships

Laboratoire de Recherche en Informatique (Orsay)

M2 Intern (February 2015 – May 2015)

Designed a graph mining algorithm for biological pattern discovery in heterogeneous networks.
Implemented the algorithm and applied it to metabolic and genomic data integrated from the KEGG knowledge base.
Pointed out current limitations and future research directions.

Environment: Linux, Python, graph theory, XML, LaTeX, git

Laboratoire de Recherche en Informatique (Orsay)

M1 Intern (June 2014 – July 2014)

Investigated the possibility of adapting a heuristic graph mining algorithm to biological networks.
Adapted the algorithm to the biological data of interest.
Obtained preliminary results on biological data integrated from several public databases (KEGG, SGD, STRING).

Environment: Linux, Python, graph theory, XML

Institut de Biologie Intégrative de la Cellule (Orsay)

L3 Intern (June 2013 – July 2013)

Assessed the consistency of groups of orthologous fungal sequences established during the research project of a PhD student.
Revealed an important inconsistency, which allowed the PhD student to improve their methodology.
Designed and implemented a Perl pipeline for inserting new genomes into a PostgreSQL database.

Environment: Linux, Perl, R, PostgreSQL

Neuro-PSI (CNRS, Gif-sur-Yvette)

L2 Intern (June 2012 – July 2012)

Performed an exploratory analysis of fruit fly videotracking data.
Designed and implemented a suite of R scripts for data analysis and visualization.

Environment: Windows, R, video acquisition software, CSV, Statistica

BitDefender (Bucharest, Romania)

UNIX QA Engineer

Performed quality assurance for anti-malware software solutions (GNU/Linux, *BSD and OpenSolaris platforms).
Automated testing tasks in the QA workflow through bash script pipelines.
Developed a web-based monitoring system for the testing infrastructure using rrdtool.

Environment: Linux, FreeBSD, OpenBSD, NetBSD, OpenSolaris, bash, cron, rrd, Perl, JIRA

Projects

Gal4Xy (personal project)

A turn-based, 4X strategy game, for the moment console-based, written in pure C. The aim is to defeat the AI player(s) and conquer the galaxy. In the near future, Gal4Xy will be multiplayer, meaning you will be able to set up a Gal4Xy server and play it with (or against!) your friends, no LAN required. I just need to figure out how to properly handle data serialization/deserialization across Internet domain sockets first. :-)

View project

libgcds (personal project)

A library for generic C data structures such as stacks, queues, linked lists, or vectors. Although libgcds may be compiled from source, it is also distributed as a static library that you can use by simply linking it to your projects using -lgcds with gcc.

View project

Linux-IPC (personal project)

A collection of Linux inter-process communication (IPC) mechanisms implemented in C: UNIX domain sockets, message queues, shared memory, and signals (TODO). These IPC mechanisms are jointly applied for simulating routing and ARP table managers running on a server and being synchronized across every connected client process.

View project

CoMetGeNe (PhD project)

A bioinformatics tool that I developed in Python during my PhD. It identifies longest sequences of reactions in the metabolic pathways of a query organism such that the genes involved in these reactions are neighbors on the chromosome. No manual data download is necessary, as CoMetGeNe retrieves the required data automatically from the KEGG database for species of your choosing. It is also possible to study conserved neighborhood patterns in metabolic and genomic contexts for several species. In order to facilitate the analysis of multi-species data sets, CoMetGeNe takes full advantage of multiprocessing.

View project

View more projects

Genome coverage (Master's project)

A bioinformatics tool that takes as input genomes and genomic sequences called "reads", then computes and displays genome coverage curves. Genome coverage at a given position translates to the number of reads that overlap that particular position. The problem of (exact) string matching is handled using an optimized version of suffix arrays, thus rendering it very fast in practice. GenomeCoverage is written in Java, the GUI is built with Swing, and genome coverage graphics are generated with JFreeChart.

View project

BRENDA-Parser (personal project)

BRENDA is an extremely valuable resource for studying metabolism as it is the knowledge base that contains the most detailed information on enzymatic activities. Unfortunately it is not free, not even for academics. They do offer a free version though, which comes as a flat file with notoriously horrendous formatting. The formatting is so bad I haven't been able to find a parser that actually parses the flat file correctly, so I wrote BRENDA-Parser in Python for this purpose.

View project

Alexandra Zaharia

Software Engineer

About me

Experience

Platform.sh (remote)

Cloud Software Engineer

Qiet (Home Labs, Paris)

Back-end Engineer / Embedded Systems Engineer

Freebox (Iliad/Free, Paris)

Embedded Systems Engineer

MICALIS (INRA, Jouy-en-Josas)

Machine Learning Engineer (Post-Doc)

Laboratoire de Recherche en Informatique (Orsay)

Research Associate (PhD)

Paris Sud University (Orsay)

Teaching Assistant

Research Intern (4 internships / 10 months in total)

View internships

Laboratoire de Recherche en Informatique (Orsay)

M2 Intern (February 2015 – May 2015)

Laboratoire de Recherche en Informatique (Orsay)

M1 Intern (June 2014 – July 2014)

Institut de Biologie Intégrative de la Cellule (Orsay)

L3 Intern (June 2013 – July 2013)

Neuro-PSI (CNRS, Gif-sur-Yvette)

L2 Intern (June 2012 – July 2012)

BitDefender (Bucharest, Romania)

UNIX QA Engineer

Education

Paris Sud University (Orsay)

PhD in Computer Science

Paris Sud University (Orsay)

MSc in Bioinformatics and Biostatistics

Paris Sud University (Orsay)

BSc in Biology

Spiru Haret University (Bucharest, Romania)

BSc in Computer Science

Projects

Gal4Xy (personal project)

libgcds (personal project)

Linux-IPC (personal project)

CoMetGeNe (PhD project)

Genome coverage (Master's project)

BRENDA-Parser (personal project)

Skills

Get in touch