Alexandra Zaharia

Software Engineer

Download résumé (EN)      Télécharger CV (FR)

About me

GNU/Linux enthusiast since 2006 and perpetual learner, I thrive in team settings where I can explore ideas and exchange knowledge. Having a high-level scientific formation, I naturally gravitate towards analyzing and solving theoretical as well as technical problems. My guiding principle in software craftsmanship is my conviction that complex does not have to equate complicated.

After a PhD focused on algorithm design for NP-hard graph theory problems applied to bioinformatics and a post-doc in chemoinformatics aimed at establishing deep learning approaches for improving retrosynthesis workflows, I was in pursuit of a different professional context. After a couple of years working in embedded systems, I am now thrilled to be part of an international, distributed and fully remote team building an awesome polyglot and multicloud PaaS.

With over 7 years of professional experience in software engineering and broad domain knowledge including IoT, machine learning, bioinformatics and cyber security, I am always in pursuit of novel challenges.

Check out my dev blog or my GitHub or LinkedIn profiles.

Experience (remote)

Cloud Software Engineer

  • Contributing to the orchestration layer which manages clusters, containers and services.

Environment: Linux, Python, Docker, LXC, git, GitLab

Qiet (Home Labs, Paris)

Back-end Engineer / Embedded Systems Engineer

  • Designed a flexible framework for testing hardware in factory assembly line settings. Both types of end-users are taken into account: test authors (who have an expressive, well documented and easy to use API at their disposal) and test operators (with an emphasis on test bench operator time optimization).
  • Implemented a proprietary TCP protocol for sending alert messages to the remote monitoring center. Integrated this microservice within our back-end ecosystem.

Environment: Linux, systemd, Python, sockets, asyncio, interfacing with serial devices, PostgreSQL, finite state machines, markdown, Bugzilla, MantisBT, mercurial

Freebox (Iliad/Free, Paris)

Embedded Systems Engineer

  • Prototyped a flat file system for embedded and external EEPROMs that minimizes device wear all while ensuring data consistency in case of hardware or external failure.
  • Conceived and prototyped a fingerprint recognition algorithm using images acquired through a capacitive sensor.
  • Implemented a generic async I2C via GPIO driver for the in-house RTOS using finite state machines.
  • Ported the in-house RTOS to STM32 platforms.

Environment: Linux, Python, C, RTOS, EFM32, STM32, finite state machines, gdb, JLink, openocd, mercurial

MICALIS (INRA, Jouy-en-Josas)

Machine Learning Engineer (Post-Doc)

  • Designed machine learning and deep learning methods for the prediction and ranking of the most likely products of given chemical transformations. This approach complements and enriches classical design-build-test retrosynthetic workflows.
  • Assembled and explored big data learning datasets that could not fit in memory (~100 GB).
  • Delivered an open-source tool that enables researchers to extract relevant data from a knowledge base that is notoriously difficult to parse.

Environment: Linux, Python, Keras, scikit-learn, numpy, snakemake, HDF5, XML, markdown, GitLab

Laboratoire de Recherche en Informatique (Orsay)

Research Associate (PhD)

  • Investigated a NP-hard graph theory problem. Devised methodology and algorithms for the identification of conserved metabolic and genomic neighborhoods across multiple species.
  • Delivered CoMetGeNe, an open-source Python pipeline that implements this approach.
  • Attended trainings in Fortran, MPI, distributed computing, project management.

Environment: Linux, bash, Python, graph theory, concurrency, XML, LaTeX, git

Paris Sud University (Orsay)

Teaching Assistant

  • Taught algorithms and data structures, RDBMS design and object-oriented programming at M1 level in the Bioinformatics and Biostatistics Master's program.
  • Devised lab assignments from scratch on topics that I deemed vital for the students' knowledge (such as file I/O, multi-file projects and makefiles, or threading).
  • Conceived and graded 11 student projects in total, 2 of which were extensive, semester-long R&D projects focused on solving bioinformatics problems rooted in my research interests.

Environment: Linux, C, valgrind, PostgreSQL, Java, Swing, LaTeX, bash

Research Intern (4 internships / 10 months in total)

View internships

Laboratoire de Recherche en Informatique (Orsay)

M2 Intern (February 2015 – May 2015)

  • Designed a graph mining algorithm for biological pattern discovery in heterogeneous networks.
  • Implemented the algorithm and applied it to metabolic and genomic data integrated from the KEGG knowledge base.
  • Pointed out current limitations and future research directions.

Environment: Linux, Python, graph theory, XML, LaTeX, git

Laboratoire de Recherche en Informatique (Orsay)

M1 Intern (June 2014 – July 2014)

  • Investigated the possibility of adapting a heuristic graph mining algorithm to biological networks.
  • Adapted the algorithm to the biological data of interest.
  • Obtained preliminary results on biological data integrated from several public databases (KEGG, SGD, STRING).

Environment: Linux, Python, graph theory, XML

Institut de Biologie Intégrative de la Cellule (Orsay)

L3 Intern (June 2013 – July 2013)

  • Assessed the consistency of groups of orthologous fungal sequences established during the research project of a PhD student.
  • Revealed an important inconsistency, which allowed the PhD student to improve their methodology.
  • Designed and implemented a Perl pipeline for inserting new genomes into a PostgreSQL database.

Environment: Linux, Perl, R, PostgreSQL

Neuro-PSI (CNRS, Gif-sur-Yvette)

L2 Intern (June 2012 – July 2012)

  • Performed an exploratory analysis of fruit fly videotracking data.
  • Designed and implemented a suite of R scripts for data analysis and visualization.

Environment: Windows, R, video acquisition software, CSV, Statistica

BitDefender (Bucharest, Romania)

UNIX QA Engineer

  • Performed quality assurance for anti-malware software solutions (GNU/Linux, *BSD and OpenSolaris platforms).
  • Automated testing tasks in the QA workflow through bash script pipelines.
  • Developed a web-based monitoring system for the testing infrastructure using rrdtool.

Environment: Linux, FreeBSD, OpenBSD, NetBSD, OpenSolaris, bash, cron, rrd, Perl, JIRA


Paris Sud University (Orsay)


PhD in Computer Science

Dissertation title: Mining conserved neighborhood patterns in metabolic and genomic contexts   (link).

Paris Sud University (Orsay)


MSc in Bioinformatics and Biostatistics

Relevant coursework: statistics, combinatorics, data mining, systems biology, sequence analysis, structural bioinformatics.

Paris Sud University (Orsay)


BSc in Biology

Studied various subjects allowing me to apprehend computational challenges in biology.

Spiru Haret University (Bucharest, Romania)


BSc in Computer Science

Relevant coursework: computer architecture, operating systems, graph theory, algorithms and data structures, cryptography, compiler theory.


Gal4Xy (personal project)

A turn-based, 4X strategy game, for the moment console-based, written in pure C. The aim is to defeat the AI player(s) and conquer the galaxy. In the near future, Gal4Xy will be multiplayer, meaning you will be able to set up a Gal4Xy server and play it with (or against!) your friends, no LAN required. I just need to figure out how to properly handle data serialization/deserialization across Internet domain sockets first. :-)

View project

libgcds (personal project)

A library for generic C data structures such as stacks, queues, linked lists, or vectors. Although libgcds may be compiled from source, it is also distributed as a static library that you can use by simply linking it to your projects using -lgcds with gcc.

View project

Linux-IPC (personal project)

A collection of Linux inter-process communication (IPC) mechanisms implemented in C: UNIX domain sockets, message queues, shared memory, and signals (TODO). These IPC mechanisms are jointly applied for simulating routing and ARP table managers running on a server and being synchronized across every connected client process.

View project

CoMetGeNe (PhD project)

A bioinformatics tool that I developed in Python during my PhD. It identifies longest sequences of reactions in the metabolic pathways of a query organism such that the genes involved in these reactions are neighbors on the chromosome. No manual data download is necessary, as CoMetGeNe retrieves the required data automatically from the KEGG database for species of your choosing. It is also possible to study conserved neighborhood patterns in metabolic and genomic contexts for several species. In order to facilitate the analysis of multi-species data sets, CoMetGeNe takes full advantage of multiprocessing.

View project
View more projects

Genome coverage (Master's project)

A bioinformatics tool that takes as input genomes and genomic sequences called "reads", then computes and displays genome coverage curves. Genome coverage at a given position translates to the number of reads that overlap that particular position. The problem of (exact) string matching is handled using an optimized version of suffix arrays, thus rendering it very fast in practice. GenomeCoverage is written in Java, the GUI is built with Swing, and genome coverage graphics are generated with JFreeChart.

View project

BRENDA-Parser (personal project)

BRENDA is an extremely valuable resource for studying metabolism as it is the knowledge base that contains the most detailed information on enzymatic activities. Unfortunately it is not free, not even for academics. They do offer a free version though, which comes as a flat file with notoriously horrendous formatting. The formatting is so bad I haven't been able to find a parser that actually parses the flat file correctly, so I wrote BRENDA-Parser in Python for this purpose.

View project


Get in touch