# Research

**Randomized and Derandiomized Algorithms**

We here in Kiel are interested in the multicolor discrepancy of arithmetic structures, e.g. arithmetic progressions in the first N integer, theirs various generalisations, like cartesian products, sums of arithmetic progressions, central arithmetic progressions (Bohr sets) in Zp and linear hyperplanes in finite vector spaces. We gave a theory of multicolor discrepancy, like upper and lower bounds for general hypergraphs and the multicolor generalization of several classical 2-color theorems. It is shown that at several places phenomena not visible in the 2-color theory show up, among them the multicoloring of products of hypergraphs. One focus are proofs of lower bounds for the multicolor discrepancy for the hypergraphs mentioned above, where often the application of Fourier analysis or linear algebra techniques is not sufficient and has to be combined with combinatorial arguments. Currently, we are working on the derandomization of new randomized discrepanca minimization algorithms (A. Srivastav, Mayank and Jan Geest).

__Discrepancy Theory and Discrete Harmonic Analysis__

We work on discrepancy theory for hypergraphs with arithmetic structures, e.g. arithmetic progressions in the first N integers and their various generalizations, like Cartesian products, sums of arithmetic progressions, central arithmetic progressions in Zp and linear hyperplanes in finite vector spaces. We adopted the notion of multicolor discrepancy and show how the 2-color theory generalizes to multicolors exhibiting new phenomena at several places not visible in the 2-color theory, for example in the coloring of products of hypergraphs. A focus of our work is on proving lower bounds for the multicolor discrepancy for hypergraphs with arithmetic structures. Here, the application of Fourier analysis or linear algebra techniques is often not sufficient and has to be combined with combinatorial arguments, in the form of an interplay between the examination of suitable color classes and the Fourier analysis.

__Cooperations__

Benjamin Doerr, LIX, École Polytechnique, Palaiseau, France

Carola Doerr, LIP6, Sorbonne University, Paris Cedex, France

Michael Gnewuch, Department of Mathematics, Osnabrück University, Germany

Nils Hebbinghaus, d-fine GmbH, Frankfurt am Main, Germany

Petra Wehr, SAP AG Walldorf, Germany

__Publications__

William W. L. Chen, Anand Srivastav, and Giancarlo Travaglini, eds.: A Panorama of discrepancy theory. 2014, Lecture Notes in Mathematics 2107, Springer-Verlag.

Nils Hebbinghaus and Anand Srivastav: Discrepancy of centered arithmetic progressions in Zp. 2014, European Journal of Combinatorics 35, 324–334.

Nils Hebbinghaus and Anand Srivastav: Multicolor discrepancy of arithmetic structures. 2014, Panorama of Discrepancy Theory, ed. by William W. L. Chen, Anand Srivastav, and Giancarlo Travaglini, Springer-Verlag, 319–424.

Lasse Kliemann, Ole Kliemann, C. Patvardhan, Volkmar Sauerland, and Anand Srivastav: A new QEA computing near-optimal low-discrepancy colorings in the hypergraph of arithmetic progressions. 2013, Proceedings of the 12th International Symposium on Experimental and Efficient Algorithms, Rome, Italy, June 2013 (SEA 2013), ed. by Vincenzo Bonifaci,

Camil Demetrescu, and Alberto Marchetti-Spaccamela, Lecture Notes in Computer Science 7933, 67–78.

Nils Hebbinghaus, Tomasz Schoen, and Anand Srivastav: One-sided discrepancy of linear hyperplanes in finite vector spaces. 2009, Analytic Number Theory, Essays in Honour of Klaus Roth, ed. by William W. L. Chen, William Timothy Gowers, Heini Halberstam, Wolfgang M. Schmidt, and Robert Charles Vaughan, Cambridge University Press, 205–223.

Michael Gnewuch, Anand Srivastav, and Carola Winzen: Finding optimal volume subintervals with k points and calculating the star discrepancy are NP-hard problems”. 2009, Journal of Complexity 25, 115–127.

Benjamin Doerr, Michael Gnewuch, and Anand Srivastav: Bounds and constructions for the star-discrepancy via delta-covers. 2005, Journal of Complexity 21(5), 691–709.

Benjamin Doerr, Anand Srivastav, and Petra Wehr: Discrepancy of cartesian products of arithmetic progressions. 2004, Electronic Journal of Combinatorics 11(1), 16 pages.

Benjamin Doerr and Anand Srivastav: Multi-color discrepancies. 2003, Combinatorics, Probability and Computing 12, 365–399.

Benjamin Doerr and Anand Srivastav: Recursive randomized coloring beats fair dice coloring. 2001, Proceedings of the 18th Annual Symposium on Theoretical Aspects of Computer Science, Dresden, Germany, February 2001 (STACS 2001), ed. by Afonso Ferreira and Horst Reichel, Lecture Notes in Computer Science 2010, 183–194.

Benjamin Doerr and Anand Srivastav: Approximation of multicolor discrepancy: Proceedings of the 2nd International Workshop on Approximation Algorithms and Combinatorial Optimization, Berkeley, CA, USA, August 1999 (APPROX 1999), ed. by Dorit S. Hochbaum, Klaus Jansen, José D. P. Rolim, and Alistair Sinclair, Lecture Notes in Computer Science 1671, 39–50.

__Algorithms for Big Data Problems__

We are designing algorithms for different combinatorial optimization problems in the context of Big Data inputs in various models of space-efficient computations.

**Streaming Euler Tours**

Given an undirected graph G on n nodes and m edges in the form of a data stream we study the problem of finding an Euler tour in G. We give the first one-pass streaming algorithm computing an Euler tour of G in the form of an edge successor function with only O(n log(n)) RAM, which is optimal for this setting. Since the output size can be much larger, we use a write-only tape to gradually output the solution. Our approach is to partition the edges into edge-disjoint cycles and to merge the cycles until a single Euler tour is achieved. We bypass the RAM limitation with a new edge swapping technique, for which storing two certain edges per node is sufficient to merge tours without having all tour edges in RAM.

Christian Glazik, Jan Schiemann and Anand Srivastav: A One Pass Streaming Algorithm for Finding Euler Tours, (unpublished).

**Streaming Graph Matching**

We give an introduction to classical algorithms for the maximum matching problem in bipartite graphs and then present a 1+Ɛ approximation algorithm for the same problem in bipartite graph streams, which only requires O(ո‧poly log ո) bits of random-access memory and O(Ɛ-5) passes over the input. All the algorithms are combinatorial in nature and use the augmenting paths technique. The streaming part of these notes is based on joint work with Sebastian Eggert, Peter Munstermann, and Anand Srivastav from Kiel University.

**Bipartite Matching in the Semi-streaming Model**

We present the first deterministic 1+ε approximation algorithm for finding a large matching in a bipartite graph in the semi-streaming model which requires only O((1/ε)^5) passes over the input stream. In this model, the input graph G=(V,E) is given as a stream of its edges in some arbitrary order, and storage of the algorithm is bounded by O(n polylog n) bits, where n=|V|. The only previously known arbitrarily good approximation for general graphs is achieved by the randomized algorithm of McGregor (Proceedings of the International Workshop on Approximation Algorithms for Combinatorial Optimization Problems and Randomization and Computation, Berkeley, CA, USA, pp. 170–181, 2005), which uses Ω((1/ε)^1/ε ) passes. We show that even for bipartite graphs, McGregor’s algorithm needs Ω(1/ε)^Ω(1/ε) passes, thus it is necessarily exponential in the approximation parameter. The design as well as the analysis of our algorithm require the introduction of some new techniques. A novelty of our algorithm is a new deterministic assignment of matching edges to augmenting paths which is responsible for the complexity reduction, and gets rid of randomization.

**A Streaming Algorithm for the Undirected Longest Path Problem**

We present the first streaming algorithm for the longest path problem in undirected graphs. The input graph is given as a stream of edges and RAM is limited to only a linear number of edges at a time (linear in the number of vertices n). We prove a per-edge processing time of O(n), where a naive solution would have required Ω(n^2). Moreover, we give a concrete linear upper bound on the number of bits of RAM that are required.

On a set of graphs with various structure, we experimentally compare our algorithm with three leading RAM algorithms: Warnsdorf (1823), Pohl-Warnsdorf (1967), and Pongrácz (2012). Although conducting only a small constant number of passes over the input, our algorithm delivers competitive results: with the exception of preferential attachment graphs, we deliver at least 71% of the solution of the best RAM algorithm. The same minimum relative performance of 71% is observed over all graph classes after removing the 10% worst cases. This comparison has strong meaning, since for each instance class there is one algorithm that on average delivers at least 84% of a Hamilton path. In some cases we deliver even better results than any of the RAM algorithms.

**Exponential Sketching**

In the data stream model, we are given a vector x ∈ Z^n as a series of updates and have to compute a function of x in sub-linear space. Such problems have been in the focus for the last two decades, motivated by the demand to process rapidly increasing amounts of data. Research so far has concentrated on forms of sum type, while our interest is in forms of product type. In particular, we look at the geometric moment G(x) and some related and generalized generalized forms. We show that for all these forms, a relative approximation in one pass requires Ω(n) bits, hence sub-linear space computation in one pass is impossible. On the other hand, for a very similar form we present a sub-linear one-pass-algorithm that uses sketching based on truncated exponential random variables.

__Combinatorial Games__

**The Maker-Breaker Triangle Game**

The triangle game introduced by Chvátal and Erdös (1978) is one of the most famous combinatorial games. For n, q ∈ N, the (n, q)-triangle game is played by two players, called Maker and Breaker, on the complete graph Kn . Alternately Maker claims one edge and thereafter Breaker claims q edges of the graph. Maker wins the game if he can claim all three edges of a triangle, otherwise Breaker wins. Chvátal and Erdős (1978) proved that for q < (2n + 2)^{1/2} − 5/2 ≈ 1.414 n Maker has a winning strategy, and for q ≥ 2 n Breaker has a winning strategy. Balogh and Samotij (2011) slightly improved the Chvátal-Erdös constant for Breaker’s winning strategy from 2 to 1.935 with a randomized approach. We present a new deterministic strategy for Breaker’s win whenever n is sufficiently large and q ≥ ((8/3 + o(1))n)^{1/2} ≈ 1.633 n^{1/2}, significantly reducing the gap towards the lower bound.

**Mastermind**

Mastermind is a famous two-player game introduced by M. Meirowitz (1970). Its combinatorics has gained increased interest over the last years for different variants. We consider a version known as the Black-Peg AB Game, where one player creates a secret code consisting of c colors on p ≤ c pegs, where each color is used at most once. The second player tries to guess the secret code with as few questions as possible. For each question he receives the number of correctly placed colors. In the static variant the second player doesn’t receive the answers one at a time, but all at once after asking the last question. The most prominent case is n := p = c, where the secret code and all questions are permutations. Our main result is an upper bound of O(n^(1.525)) questions for the static Black-Peg AB Game. With a slight modification of the arguments of Doerr et al. (2016) we also give a lower bound of Ω(n log(n)).

In cooperation with Dr. Gerold Jäger, Umeå University.

Christian Glazik, Gerold Jäger, Jan Schiemann and Anand Srivastav: Bounds for Static Black-Peg AB Mastermind. In: Proceedings of 11th Annual International Conference on Combinatorial Optimization and Applications (COCOA 2017), Springer Berlin/Heidelberg, 2017.

__Life Science: Genome and Metagenome Assembly__

**Filtering Reads for Genome Assembly**

For single-cell or metagenomic sequencing projects, it is necessary to sequence with a very high mean coverage in order to make sure that all parts of the sample DNA get covered by the reads produced. This leads to huge datasets with lots of redundant data. A filtering of this data prior to assembly is advisable. We present Bignorm, a faster and quality-conscious read filtering algorithm. An important new feature is the use of phred quality scores together with a detailed analysis of the k-mer counts to decide which reads to keep.

__Cooperations__

Thorsten Reusch, Marine Ecology, GEOMAR Helmholtz Centre for Ocean Research Kiel

Philip Rosenstiel, Institute of Clinical Molecular Biology, Kiel University

__Publications__

Axel Wedemeyer, Lasse Kliemann, Anand Srivastav, Christian Schielke, Thorsten Reusch and Philip Rosenstiel.

An improved filtering algorithm for big read datasets and its application to single-cell assembly.

BMC Bioinformatics. 2017 Jul 3;18(1):324. doi:10.1186/s12859-017-1724-7.

**Filtering Reads for Metagenome Assembly**

We show that using our software Bignorm to filter metagenomic datasets prior to assembly outperforms subsampling in assembly quality (largest contig, total length, genomic fraction and number of predicted genes), using about the same time and memory.

__Cooperations__

Group of Tal Dagan, Institute of Microbiology, Kiel University: Devani Romero Picazo and Anne Kupczok

Group of Ute Hentschel Humeida, Marine Ecology, GEOMAR Helmholtz Centre for Ocean Research Kiel: Martin Jahn

__Mathematical Optimization and Simulation in Marine Science__

**Calibration and Optimization of Marine Biogeochemical Models**

Biogeochemical (BGC) ocean models are often embedded into earth system models and play an important role in projections about major environmental factors. Due to parametric uncertainties, BGC models must be calibrated, i.e., their parameters must be adjusted in order to fit model simulations to corresponding real-world observations. In order to obtain reliable future predictions it is necessary to couple BGC models to global ocean circulation models which act on long time scales and are, thus, computationally expensive. We work on parallel population-based optimization algorithms to calibrate global BGC ocean models. Additionally, we work on methods to compute lower bounds on the best attainable misfit between model simulations and corresponding real-world observations. Sufficiently tight bounds might be applied as a stopping criterion for expensive parameter optimizations. Due to the parametric uncertainty, a good fit to one observation might not necessarily result in a good match to another. We examine how Multiobjective optimization, providing an ensemble of compromise solutions, helps to substantiate dependencies between parameter values and model skills.

**Uncertanity Quantification**

Uncertainties in ocean models – stemming, for example, from incomplete knowledge of the physical or biological processes, intentional simplifications or measurement errors – have to be taken into account in order to assess model outcome and predictions reliably. To date, most ocean models are evaluated in a deterministic setting or with basic stochastic tools only. This project approaches the various sources of uncertainties in ocean modeling with stochastic differential equations. Stochastic partial differential equations involve a stochastic process which represents factors that might have been neglected in the model or inherent uncertainties of the underlying processes due to, for example, unresolved scales or unresolved diversity. In order to investigate the impact of parameter uncertainty, the model can be embedded in the setting of random differential equations. This motivates research on numerical methods for stochastic differential equations that can then be used to analyze such phenomena. By considering some selected applications, we aim to show that the different types of stochastic equations can display dynamics that differ from the deterministic setting.

__Cooperations__

Josefin Ahlkrona, Department of Mathematics, Kiel University

Julia Getzlaff, GEOMAR, Kiel

Iris Kriest, GEOMAR Kiel

Ulrike Löptien, GEOMAR Kiel

Andreas Oschlies, GEOMAR Kiel

Andreas Rößler, Institute of Mathematics, University of Lübeck

__Publications__

V. Sauerland and I. Kriest and A. Oschlies and A. Srivastav: Assessing a Global Biogeochemical Ocean Model by Multi-Objective Optimization. 2018, Submitted to AGU Journal of Advances in Modeling Earth Systems (JAMES).

C. Leonhard and A. Rößler: Iterated Stochastic Integrals in Infinite Dimensions - Approximation and Error Estimates. 2018, To appear in Stochastics and Partial Differential Equations: Analysis and Computations.

C. Leonhard and A. Rößler: Enhacing the order of the Milstein scheme for SPDEs with commutative noise. 2018, SIAM J. Numer. Anal., 56(4), 2585–2622.

V. Sauerland, U. Löptien, C. Leonhard, A. Oschlies, and A. Srivastav: Error assessment of biogeochemical models by lower bound methods. 2018, Geosci. Model Dev., 11, 1181-1198.

V. Sauerland: Non-parametric optimization methods for model assessment (NOMMA-1.0). 2017, doi.org/10.5281/zenodo.1162769.

I. Kriest, V. Sauerland, S. Khatiwala, A. Srivastav and A. Oschlies: Calibrating a global three-dimensional biogeochemical ocean model (MOPS-1.0). 2017, Geosci. Model Dev., 10(1): 127-154.