Conferences

SCAPE - Semantically Context-Aware Password Generation using Word Embeddings

Nadine-Sarah Schüler, Maximilian von Zastrow, Tobias Vent, and Michael Eichberg

18th International Conference on Similarity Search and Applications, SISAP 2025

LNCS Springer, 2025

Abstract

SCAPE is a novel password-guessing method that extends Probabilistic Context-Free Grammar (PCFG) based approaches by introducing context awareness through word embeddings and similarity search. Evaluated on real-world datasets, including RockYou and various forum leaks, SCAPE consistently outperforms state-of-the-art methods. Our results show that combining classic NLP techniques with semantic similarity search is a powerful and efficient strategy for password guessing in cybersecurity applications

SePass: Semantic Password Guessing using k-nn Similarity Search in Word Embeddings

Maximilian Hünemörder, Levin Schäfer, Nadine-Sarah Schuüler, Michael Eichberg, and Peer Kröger

Proceedings of the International Conference on Advanced Data Mining and Applications

LNAI Springer, 2022

Abstract

Password guessing describes the process of finding a password for a secured system. Use cases include password recovery, IT forensics and measuring password strength. Commonly used tools for password guessing work with password leaks and use these lists for candidate generation based on handcrafted or inferred rules. These methods are often limited in their capability of producing entirely novel passwords, based on vocabulary not included in the given password lists. However, there are often semantic similarities between words and phrases of the given lists that are highly relevant for guessing the actual used passwords. In this paper, we propose SePass, a novel method that utilizes word embeddings to discover and exploit these semantic similarities. We compare SePass to a number of competitors and illustrate that our method not only is on par with these competitors, but also generates a significant higher amount of entirely novel password candidates. Using SePass in combination with existing methods, such as PCFG, improves the number of correctly guessed passwords considerably.

Modular Collaborative Program Analysis in OPAL

Dominik Helm, Florian Kübler, Michael Reif, Michael Eichberg and Mira Mezini

Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

ACM, 2020

Abstract

Current approaches combining multiple static analyses deriving different, independent properties focus either on modularity or performance. Whereas declarative approaches facilitate modularity and automated, analysis-independent optimizations, imperative approaches foster manual, analysis-specific optimizations. In this paper, we present a novel approach to static analyses that leverages the modularity of blackboard systems and combines declarative and imperative techniques. Our approach allows exchangeability, and pluggable extension of analyses in order to improve sound(i)ness, precision, and scalability and explicitly enables the combination of otherwise incompatible analyses. With our approach integrated in the OPAL framework, we were able to implement various dissimilar analyses, including a points-to analysis that outperforms an equivalent analysis from Doop, the state-of-the-art points-to analysis framework.

A Programming Model for Semi-Implicit Parallelization of Static Analyses

Dominik Helm, Florian Kübler, Jan Thomas Kölzer, Philipp Haller, Michael Eichberg, Guido Salvaneschi and Mira Mezini

Proceedings of The ACM SIGSOFT International Symposium on Software Testing and Analysis

ACM, 2020

Abstract

Parallelization of static analyses is necessary to scale to real-world programs, but it is a complex and difficult task and, therefore, often only done manually for selected high-profile analyses. In this paper, we propose a programming model for semi-implicit parallelization of static analyses which is inspired by reactive programming. Reusing the domain-expert knowledge on how to parallelize anal- yses encoded in the programming framework, developers do not need to think about parallelization and concurrency issues on their own. The programming model supports stateful computations, only requires monotonic computations over lattices, and is independent of specific analyses. Our evaluation shows the applicability of the programming model to different analyses and the importance of user-selected scheduling strategies. We implemented an IFDS solver that was able to outperform a state-of-the-art, specialized parallel IFDS solver both in absolute performance and scalability.

Judge: Identifying, Understanding, and Evaluating Sources of Unsoundness in Call Graphs

Michael Reif; Florian Kübler; Michael Eichberg; Dominik Helm; Mira Mezini

Proceedings of The ACM SIGSOFT International Symposium on Software Testing and Analysis

ACM, 2019

Abstract

Call graphs are widely used; in particular for advanced control- and data-flow analyses. Even though many call graph algorithms with different precision and scalability properties have been proposed, a comprehensive understanding of sources of unsoundness, their relevance, and the capabilities of existing call graph algorithms in this respect is missing. To address this problem, we propose Judge, a toolchain that helps with understanding sources of unsoundness and improving the soundness of call graphs. In several experiments, we use Judge and an extensive test suite related to sources of unsoundness to (a) compute capability profiles for call graph implementations of Soot, WALA, DOOP, and OPAL, (b) to determine the prevalence language features and APIs that affect soundness in modern Java Bytecode, (c) to compare the call graphs of Soot, WALA, DOOP, and OPAL, highlighting important differences in their implementations, and (d) to evaluate the necessary effort to achieve project-specific reasonable sound call graphs. We show that soundness-relevant features/APIs are frequently used and that support for them differs vastly, up to the point where comparing call graphs computed by the same base algorithms (e.g., RTA) but different frameworks is bogus. We also show that Judge can support users in establishing the soundness of call graphs with reasonable effort.

A Unified Lattice Model and Framework for Purity Analyses

Dominik Helm, Florian Kübler, Michael Eichberg, Michael Reif and Mira Mezini

Proceedings of ASE ’18, September 3–7, 2018, Montpellier, France

ACM, 2018

Abstract

Analyzing methods in object-oriented programs whether they are side-effect free and also deterministic, i.e., mathematically pure, has been the target of extensive research. Identifying such methods helps to find code smells and security related issues, and also helps analyses detecting concurrency bugs. Pure methods are also used by formal verification approaches as the foundations for specifications and proving the pureness is necessary to ensure correct specifications.
However, so far no common terminology exists which describes the purity of methods. Furthermore, some terms (e.g., pure or side- effect free) are also used inconsistently. Further, all current approaches only report selected purity information making them only suitable for a smaller subset of the potential use cases.
In this paper, we present a fine-grained unified lattice model which puts the purity levels found in the literature into relation and which adds a new level that generalizes existing definitions. We have also implemented a scalable, modularized purity analysis which produces significantly more precise results for real-world programs than the best-performing related work. The analysis shows that all defined levels are found in real-world projects.

CodeMatch: Obfuscation Won't Conceal Your Repackaged App

Leonid Glanz, Sven Amann, Michael Eichberg, Michael Reif, Ben Hermann, Johannes Lerch and Mira Mezini

Proceedings of the 2017 11TH Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering

ACM, 2017

Abstract

An established way to steal the income of app developers, or to trick users into installing malware, is the creation of repackaged apps. These are clones of – typically – successful apps.
To conceal their nature, they are often obfuscated by their creators. But, given that it is a common best practice to obfuscate apps, a trivial identification of repackaged apps is not possible. The problem is further intensified by the prevalent usage of libraries. In many apps, the size of the overall code base is basically determined by the used libraries. Therefore, two apps, where the obfuscated code bases are very similar, do not have to be repackages of each other.
To reliably detect repackaged apps, we propose a two step approach which first focuses on the identification and removal of the library code in obfuscated apps. This approach – LibDetect – relies on code representations which abstract over several parts of the underlying bytecode to be resilient against certain obfuscation techniques. Using this approach, we are able to identify on average 70% more used libraries per app than previous approaches. After the removal of an app's library code, we then fuzzy hash the most abstract representation of the remaining app code to ensure that we can identify repackaged apps even if very advanced obfuscation techniques are used. This makes it possible to identify repackaged apps.
Using our approach, we found that ~15% of all apps in Android app stores are repackages.

Reactive Async: Expressive Deterministic Concurrency

Philipp Haller, Simon Geries, Michael Eichberg and Guido Salvaneschi

Proceedings of the 2016 7th ACM SIGPLAN Symposium on Scala

ACM, 2016

Abstract

Concurrent programming is infamous for its difficulty. An important source of difficulty is non-determinism, stemming from unpredictable interleavings of concurrent activities. Futures and promises are widely-used abstractions that help designing deterministic concurrent programs, although this property cannot be guaranteed statically in mainstream programming languages. Deterministic-by-construction concurrent programming models avoid this issue, but they typically restrict expressiveness in important ways.
This paper introduces a concurrent programming model, Reactive Async, which decouples concurrent computations using so-called cells, shared locations which generalize futures as well as recent deterministic abstractions such as LVars. Compared to previously proposed programming models Reactive Async provides (a) a fallback mechanism for the case where no computation ever computes the value of a given cell, and (b) explicit and optimized handling of cyclic dependencies. We present a complete implementation of the Reactive Async programming model as a library in Scala. Finally, the paper reports on a case study applying Reactive Async to static analyses of JVM bytecode based on the Opal framework.

Call Graph Construction for Java Libraries

Michael Reif, Michael Eichberg, Ben Hermann, Johannes Lerch and Mira Mezini

Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering

ACM, 2016

Abstract

Today, every application uses software libraries. Yet, while a lot of research exists w.r.t. analyzing applications, research that targets the analysis of libraries independent of any application is scarce. This is unfortunate, because, for developers of libraries, such as the Java Development Kit (JDK), it is crucial to ensure that the library behaves as intended regardless of how it is used. To fill this gap, we discuss the construction of call graphs for libraries that abstract over all potential library usages. Call graphs are particularly relevant as they are a precursor of many advanced analyses, such as inter-procedural data-flow analyses.
We show that the current practice of using call graph algorithms designed for applications to analyze libraries leads to call graphs that, at the same time, lack relevant call edges and contain unnecessary edges. This motivates the need for call graph construction algorithms dedicated to libraries. Unlike algorithms for applications, call graph construction algorithms for libraries must take into consideration the goals of subsequent analyses. Specifically, we show that it is essential to distinguish between the scenario of an analysis for potential exploitable vulnerabilities from the scenario of an analysis for general software quality attributes, e.g., dead methods or unused fields. This distinction affects the decision about what constitutes the library-private implementation, which therefore, needs special treatment. Thus, building one call graph that satisfies all needs is not sensical. Overall, we observed that the proposed call graph algorithms reduce the number of call edges up to 30% when compared to existing approaches.

Hidden Truths in Dead Software Paths - A Journey with Scalable, Lightweight Abstract Interpretation

Michael Eichberg, Ben Hermann, Mira Mezini and Leonid Glanz

Proceedings of FSE '15, 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering

ACM, 2015

Abstract

Approaches and techniques for statically finding a multitude of issues in source code have been developed in the past. A core property of these approaches is that they are usually targeted towards finding only a very specific kind of issue and that the effort to develop such an analysis is significant. This strictly limits the number of kinds of issues that can be detected. In this paper, we discuss a generic approach based on the detection of infeasible paths in code that can discover a wide range of code smells ranging from useless code that hinders comprehension to real bugs. Code issues are identified by calculating the difference between the control-flow graph that contains all technically possible edges and the corresponding graph recorded while performing a more precise analysis using abstract interpretation. We have evaluated the approach using the Java Development Kit as well as the Qualitas Corpus (a curated collection of over 100 Java Applications) and were able to find thousands of issues across a wide range of categories.

Getting to know you. . . Towards a Capability Model for Java

Ben Hermann, Michael Reif, Michael Eichberg and Mira Mezini

Proceedings of FSE '15, 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering

ACM, 2015

Abstract

Developing software from reusable libraries lets developers face a security dilemma: Either be efficient and reuse libraries as they are or inspect them, know about their resource usage, but possibly miss deadlines as reviews are a time consuming process. In this paper, we propose a novel capability inference mechanism for libraries written in Java. It uses a coarse-grained capability model for system resources that can be presented to developers. We found that the capability inference agrees by 86.81% on expectations towards capabilities that can be derived from project documentation. Moreover, our approach can find capabilities that cannot be discovered using project documentation. It is thus a helpful tool for developers mitigating the aforementioned dilemma.

Blending and Reusing Rules for Architectural Degradation Prevention

Alessandro Gurgel, Isela Macia, Alessandro Garcia, Arndt von Staa, Mira Mezini, Michael Eichberg, Ralf Mitschke

Proceedings of the 13rd International Conference on Aspect-Oriented Software Development (AOSD 2014)

ACM, 2014

Abstract

As software systems are maintained, their architecture often de-grades through the processes of architectural drift and erosion. These processes are often intertwined and the same modules in the code become the locus of both drift and erosion symptoms. Thus, architects should elaborate architecture rules for detecting occur-rences of both degradation symptoms. While the specification of such rules is time-consuming, they are similar across software projects adhering to similar architecture decompositions. Unfortu-nately, existing anti-degradation techniques are limited as they focus only on detecting either drift or erosion symptoms. They also do not support the reuse of recurring anti-degradation rules. In this context, the contribution of this paper is twofold. First, it presents TamDera, a domain-specific language for: (i) specifying rule-based strategies to detect both erosion and drift symptoms, and (ii) promoting the hierarchical and compositional reuse of design rules across multiple projects. The language was designed with usual concepts from programming languages in mind such as, inheritance and modularization. Second, we evaluated to what extent developers would benefit from the definition and reuse of hybrid rules. Our study involved 21 versions pertaining to 5 software projects, and more than 600 rules. On average 45% of classes that had drift symptoms in first versions presented inter-related erosion problems in latter versions or vice-versa. Also, up to 72% of all the TamDera rules in a project are from a pre-defined library of reusable rules. They were responsible for detecting on average of 73% of the inter-related degradation symptoms across the projects.

Modular specification and checking of structural dependencies

Ralf Mitschke, Michael Eichberg, M. Mezini, Alessandro Garcia, Isela Macia

Proceedings of the 12th annual international conference on Aspect-oriented software development (AOSD 2013)

ACM, 2013

Abstract

Checking a software's structural dependencies is a line of research on methods and tools for analyzing, modeling and checking the conformance of source code w.r.t. specifications of its intended static structure. Existing approaches have focused on the correctness of the specification, the impact of the approaches on software quality and the expressiveness of the modeling languages. However, large specifications become unmaintainable in the event of evolution without the means to modularize such specifications. We present Vespucci, a novel approach and tool that partitions a specification of the expected and allowed dependencies into a set of cohesive slices. This facilitates modular reasoning and helps individual maintenance of each slice. Our approach is suited for modeling high-level as well as detailed low-level decisions related to the static structure and combines both in a single modeling formalism. To evaluate our approach we conducted an extensive study spanning nine years of the evolution of the architecture of the object-relational mapping framework Hibernate.

Reify your collection queries for modularity and speed!

Paolo G. Giarrusso, Klaus Ostermann, Michael Eichberg, Ralf Mitschke, Tillmann Rendel, Christian Kästner

Proceedings of the 12th annual international conference on Aspect-oriented software development (AOSD 2013)

ACM, 2013

Abstract

Modularity and efficiency are often contradicting requirements, such that programers have to trade one for the other. We analyze this dilemma in the context of programs operating on collections. Performance-critical code using collections need often to be hand-optimized, leading to non-modular, brittle, and redundant code. In principle, this dilemma could be avoided by automatic collection-specific optimizations, such as fusion of collection traversals, usage of indexing, or reordering of filters. Unfortunately, it is not obvious how to encode such optimizations in terms of ordinary collection APIs, because the program operating on the collections is not reified and hence cannot be analyzed.

We propose SQuOpt, the Scala Query Optimizer--a deep embedding of the Scala collections API that allows such analyses and optimizations to be defined and executed within Scala, without relying on external tools or compiler extensions. SQuOpt provides the same "look and feel" (syntax and static typing guarantees) as the standard collections API. We evaluate SQuOpt by re-implementing several code analyses of the FindBugs tool using SQuOpt, show average speedups of 12x with a maximum of 12800x and hence demonstrate that SQuOpt can reconcile modularity and efficiency in real-world applications.

Compiling Prolog to Idiomatic Java

Michael Eichberg

Technical Communications of the 27th Int’l. Conference on Logic Programming (ICLP’11)

Leibniz International Proceedings in Informatics (LIPIcs); 2011

Abstract

Today, Prolog is often used to solve well-defined, domain-specific problems that are part of larger applications. In such cases, a tight integration of the Prolog program and the rest of the application, which is commonly written in a different language, is necessary. One common approach is to compile the Prolog code to (native) code in the target language. In this case, the effort necessary to build, test and deploy the final application is reduced.

However, most of the approaches that achieve reasonable performance compile Prolog to object-oriented code that relies on some kind of virtual machine (VM). These VMs are libraries implemented in the target language and implement Prolog’s execution semantics. This adds a significant layer to the object-oriented program and results in code that does not look and feel native to developers of object-oriented programs. Further, if Prolog’s execution semantics is implemented as a library the potential of modern runtime environments for object-oriented programs, such as the Java Virtual Machine, to effectively optimize the program is more limited. In this paper, we report on our approach to compile Prolog to high-level, idiomatic object-oriented Java code. The generated Java code closely resembles code written by Java developers and is effectively optimized by the Java Virtual Machine.

Incremental Concrete Syntax for Embedded Languages

Tom Dinkelaker, Michael Eichberg and Mira Mezini

26th Symposium On Applied Computing

ACM; 2011

Abstract

Embedded domain-specific languages (EDSLs) are known to improve the productivity of developers. However, for many domains no DSL implementation is available. Two important reasons are: First, the effort to implement embedded DSLs that provide the domain’s established syntax (called concrete syntax) is very high. Second, the embedded DSL and its underlying general-purpose programming language (GPL) are typically tightly integrated which hampers reusability across different GPLs. In this paper, we present an approach that significantly reduces the necessary effort to implement embedded DSLs with concrete syntax. The idea is to use island grammars to specify the EDSL’s concrete syntax. This enables the developer to implement the embedded DSL as a library and to incrementally specify the concrete syntax using meta-data. Only those parts of the EDSL’s grammar need to be specified that deviate from the grammar of the GPL and which is required to enable the integration with the GPL.

Model-driven Engineering of Machine Executable Code

Michael Eichberg, Martin Monperrus, Sven Kloppenburg and Mira Mezini

Sixth European Conference on Modeling Foundations and Applications

Springer; 2010

Abstract

Implementing static analyses of machine-level executable code is labor intensive and complex. We show how to leverage model-driven engineering to facilitate the design and implementation of programs doing static analyses. Further, we report on important lessons learned on the benefits and drawbacks while using the following technologies: using the Scala programming language as target of code generation, using XML-Schema to express a metamodel, using XSLT to implement transformations and for implementing a lint like tool. Finally, we report on the use of Prolog for writing model transformations.

Component Composition Using Feature Models

Michael Eichberg, Karl Klose, Ralf Mitschke and Mira Mezini

13th International Symposium on Component Based Software Engineering

Springer; 2010

Abstract

In general, components provide and require services and two components are bound if the first component provides a service required by the second component. However, certain variability in services – w.r.t. how and which functionality is provided or required – cannot be described using standard interface description languages. If this variability is relevant when selecting a matching component then human interaction is required to decide which components can be bound. We propose to use feature models for making this variability explicit and (re-)enabling automatic component binding. In our approach, feature models are one part of service specifications. This enables to declaratively specify which service variant is provided by a component. By referring to a service’s variation points, a component that requires a specific service can list the requirements on the desired variant. Using these specifications, a component environment can then determine if a binding of the components exists that satisfies all requirements. The prototypical environment Columbus demonstrates the feasibility of the approach.

An Architecture for Composing Embedded Domain-Specific Languages

Tom Dinkelaker, Michael Eichberg and Mira Mezini

Proceedings of 9th International Conference on Aspect-Oriented Software Development (AOSD)

ACM Press; 2010

Abstract

Embedded domain-specific languages (EDSLs) are said to be easier to compose than DSLs that are implemented by preprocessors. However, existing approaches focus on composition scenarios where the use of abstractions from one do- main does not affect the interpretation of abstractions from another domain. This leads to programs that exhibit scattering and tangling symptoms if multiple EDSLs with cross-cutting domain semantics are used. To address this issue, we propose an architecture for embedding DSLs that makes use of meta-object protocols and aspect-oriented concepts to support crosscutting composition of EDSLs. This enables to write modularized EDSL programs where each program addresses one concern.

Deﬁning and Continuous Checking of Structural Program Dependencies

Michael Eichberg and Sven Kloppenburg and Karl Klose and Mira Mezini

Proceedings of International Conference on Software Engineering (ICSE)

ACM Press; 2008

Abstract

Dependencies between program elements need to be modeled from diﬀerent perspectives reﬂecting architectural, design, and implementation level decisions. To avoid erosion of the intended structure of the code, it is necessary to explicitly codify these diﬀerent perspectives on the permitted dependencies and to detect violations continuously and incrementally as software evolves. We propose an approach that uses declarative queries to group source elements — across programming language module boundaries — into overlapping ensembles. The dependencies between these ensembles are also speciﬁed as logic queries. The approach has been integrated into the incremental build process of Eclipse to ensure continuous checking, using an engine for tabled and incremental evaluation of logic queries. Our evaluation shows that our approach is fast enough for day-to-day use along the incremental build process of modern IDEs.

Automatic Incrementalization of Prolog based Static Analyses

Michael Eichberg and Matthias Kahl and Diptikalyan Saha and Mira Mezini and Klaus Ostermann

Proceedings of Practical Aspects of Declarative Languages, 9th International Symposium (PADL)

Springer [Lecture Notes in Computer Science (LNCS)](vol. 4354); 2007

Abstract

Modern development environments integrate various static analyses into the build process. Analyses that analyze the whole project whenever the project changes are impractical in this context. We present an approach to automatic incrementalization of analyses that are specified as tabled logic programs and evaluated using incremental tabled evaluation, a technique for efficiently updating memo tables in response to changes in facts and rules. The approach has been implemented and integrated into the Eclipse IDE. Our measurements show that this technique is effective for automatically incrementalizing a broad range of static analyses.

Integrating and Scheduling an Open Set of Static Analyses

Michael Eichberg and Mira Mezini and Sven Kloppenburg and Klaus Ostermann and Benjamin Rank

Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering (ASE)

IEEE Computer Society; 2006

Abstract

To improve the productivity of the development process, more and more tools for static software analysis are tightly integrated into the incremental build process of an IDE. If multiple interdependent analyses are used simultaneously, the coordination between the analyses becomes a major obstacle to keep the set of analyses open. We propose an approach to integrating and scheduling an open set of static analyses which decouples the individual analyses and coordinates the analysis executions such that the overall time and space consumption is minimized. The approach has been implemented for the Eclipse IDE and has been used to integrate a wide range of analyses such as finding bug patterns, detecting violations of design guidelines, or type system extensions for Java.

QScope: an Open, Extensible Framework for Measuring Software Projects

Michael Eichberg and Daniel Germanus and Mira Mezini and Lukas Mrokon and Thorsten Schäfer

Proceedings of the Conference on Software Maintenance and Reengineering (CSMR)

IEEE Computer Society; 2006

Abstract

To measure the particularities of modern software development projects that use different types of documents for the implementation of a program, new metrics need to be defined. Further, well established metrics, such as e.g., lack of cohesion or coupling between objects need to be reconsidered in the presence of new language features. Not being able to thoroughly measure a project can lead to false conclusions with respect to the measured source files. Currently, a large number of metrics tools exists, but unfortunately most tools are not extensible, or they are limited with respect to the types of documents that can be taken into account. Further, support for testing newly developed metrics is also missing. In this paper we present QScope - an open, extensible metrics framework. QScope is open with respect to the supported artifacts and explicitly enables the user to implement new metrics by reasoning over all artifacts using a declarative query language. As we will show in this paper, using a declarative query language enables a concise definition of new metrics.

Comprehensive Software Understanding with Sextant

Michael Eichberg and Mira Mezini and Michael Haupt and Thorsten Schäfer

Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM)

IEEE Computer Society; 2005

Abstract

Current tools for software understanding mostly concentrate on one comprehension technique, e.g., visualization, or bottom-up navigation through software elements via hyperlinks. In this paper, we argue that to effectively assist developers in understanding today's software systems, a combination of several comprehension techniques is needed, including seamless integration of top-down querying and bottom-up navigation strategies that work across different kinds of software artifacts; furthermore, application-domain and/or technology specific relationships between software elements should be taken into consideration; last but not least, a tight integration of such tools into development environments is crucial. We present Sextant, a software exploration tool tightly integrated into the Eclipse IDE that satisfies these requirements. In two case studies, we demonstrate how Sextant's features are conducive in tracking down the source of erroneous behavior, respectively, in discovering "bad smells" in the software structure which should lead to code refactorings.

Using Annotations to Check Structural Properties of Classes

Michael Eichberg and Thorsten Schäfer and Mira Mezini

Proceedings of Fundamental Approaches to Software Engineering: 8th International Conference (FASE)

Springer [Lecture Notes in Computer Science (LNCS)](vol. 3442); 2005

Abstract

The specification of meta-information, by using attributes in .NET or annotations in Java, along with the source code is gaining widespread use. Meta-information is used for different purposes such as code generation or configuration of the environment in which a class is deployed. However, in most cases using an annotation also implies that constraints, beyond those defined by the language's semantics, have to be followed. E.g., a class must define a no-arguments constructor or the parameters of a method must have specific types. Currently, these constraints are not checked at all or only to a very limited extend. Hence, a violation can remain undetected and result in deployment-time or even subtle run-time errors. In this paper, we present a user-extensible framework that enables the definition of constraints to check the properties of annotated elements. Further, we demonstrate the application of the framework to check the constraints defined in the EJB 3.0 specification, and an evaluation of the approach based on checking the xPetstore-EJB3.0 project from within Eclipse to test the performance.

An execution layer for aspect-oriented programming languages

Michael Haupt, Mira Mezini, Christoph Bockisch, Tom Dinkelaker, Michael Eichberg, Michael Krebs

Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments

ACM 2005

Abstract

Language mechanisms deserve language implementation effort. While this maxim has led to sophisticated support for language features specific to object-oriented, functional and logic programming languages, aspect-oriented programming languages are still mostly implemented using postprocessors. The Steamloom virtual machine, based on IBM's Jikes RVM, provides support for aspect-oriented programming at virtual machine level. A bytecode framework called BAT was integrated with the Jikes RVM to replace its bytecode management logic. While preserving the functionality needed by the VM, BAT also allows for querying application code for join point shadows, avoiding redundancy in bytecode representation. Performance measurements show that an AOP-enabled virtual machine like Steamloom does not inflict unnecessary performance penalties on a running application; when it comes to executing AOP-related operations, there even are significant performance gains compared to other approaches.

XIRC: A Kernel for Cross-Artifact Information Engineering in Software Development Environments

Michael Eichberg and Mira Mezini and Klaus Ostermann and Thorsten Schäfer

Proceedings of the 11th Working Conference on Reverse Engineering (WCRE)

IEEE Computer Society; 2004

Abstract

We describe XIRC, a tool and architecture that enables to define queries over a uniform representation of all artifacts of a software project. These queries can be used for general cross-artifact information retrieval or for more special applications like checking implementation restrictions or conformance to style guides. XIRC is also a good basis to implement a broad range of tools for refactoring, generators, aspect-oriented programming and many other domains on top of it.

Pointcuts as Functional Queries

Michael Eichberg and Mira Mezini and Klaus Ostermann

Proceedings of Programming Languages and Systems: Second Asian Symposium (APLAS)

Springer [Lecture Notes in Computer Science (LNCS)](vol. 3302); 2004

Abstract

Most aspect-oriented languages provide only a fixed, built-in set of pointcut designators whose denotation is only described informally. As a consequence, these languages do not provide operations to manipulate or reason about pointcuts beyond weaving. In this paper, we investigate the usage of the functional query language XQuery for the specification of pointcuts. Due to its abstraction and module facilities XQuery enables powerful composition and reusability mechanisms for pointcuts.

Enforcing System-Wide Properties

Michael Eichberg and Mira Mezini and Thorsten Schäfer and Claus Beringer and Karl-Matthias Hamel

Proceedings of the 2004 Australian Software Engineering Conference; (ASWEC)

IEEE Computer Society; 2004

Abstract

Policy enforcement is a mechanism for ensuring that system components follow certain programming practices, comply with specified rules, and meet certain assumptions. Unfortunately, the most common mechanisms used today for policy enforcement are documentation, training, and code reviews. The fundamental problem is that these mechanisms are expensive, time-consuming, and still error-prone. To cope with this problem, in this paper, we present IRC (Implementation Restriction Checker), an extensible framework for automatically enforcing system-wide policies or contracts. The framework is built on top of a platform for aspect-oriented programming at the level of Java bytecode instructions and is available as an Eclipse plug-in as well as a standalone application. It includes a set of directly usable checkers and can be easily extended to implement new ones.

Workshops

TACAI: An Intermediate Representation based on Abstract Interpretation

Michael Reif, Florian Kübler, Dominik Helm, Ben Hermann, Michael Eichberg and Mira Mezini

SOAP 2020

Abstract

Most Java static analysis frameworks provide an intermediate presentation (IR) of Java Bytecode to facilitate the development of static analyses. While such IRs are often based on three-address code, the transformation itself is a great opportunity to apply optimizations to the transformed code, such as constant propagation.
In this paper, we propose TACAI, a refinable IR that is based on abstract interpretation results of a method's bytecode. Exchanging the underlying abstract interpretation domains enables the creation of various IRs of different precision levels. Our evaluation shows that TACAI can be efficiently computed and provides slightly more precise receiver-type information than Soot's Shimple representation. Furthermore, we show how exchanging the underlying abstract domains directly impacts the generated IR.

A Dataset of Parametric Cryptographic Misuses

Anna-Katharina Wickert, Michael Reif, Michael Eichberg, Anam Dodhy, Mira Mezini

MSR 2019 MSR Data Showcase

Abstract

Cryptographic APIs (Crypto APIs) provide the foundations for the development of secure applications. Unfortunately, most applications do not use Crypto APIs securely and end up being insecure, e.g., by the usage of an outdated algorithm, a constant initialization vector, or an inappropriate hashing algorithm. Two different studies [1], [2] have recently shown that 88% to 95% of those applications using Crypto APIs are insecure due to misuses. To facilitate further research on these kinds of misuses, we created a collection of 201 misuses found in real-world applications along with a classification of those misuses. In the provided dataset, each misuse consists of the corresponding open-source project, the project’s build information, a description of the misuse, and the misuse’s location. Further, we integrated our dataset into MUBench [3], a benchmark for API misuse detection. Our dataset provides a foundation for research on Crypto API misuses. For example, it can be used to evaluate the precision and recall of detection tools, as a foundation for studies related to Crypto API misuses, or as a training set.

Lattice Based Modularization of Static Analyses

Michael Eichberg, Florian Kübler, Dominik Helm, Michael Reif, Guido Salvaneschi and Mira Mezini

SOAP 2018

Abstract

Static analyses which compute conceptually independent information, e.g., class immutability or method purity are typically developed as standalone, closed analyses. Complementary information that could improve the analyses is either ignored by making a sound over-approximation or it is also computed by the analyses, but at a rudimentary level. For example, an immutability analysis requires field mutability information, alias/escape information, and information about the concurrent behavior of methods to correctly classify classes like java.lang.String or java.util.BigDecimal. As a result, without properly supporting the integration of independently developed, mutually benefiting analysis, many analyses will not correctly classify relevant entities. We propose to use explicitly reified lattices that encode the information about a source code element’s properties (e.g., a method’s purity or a class’ immutability) as the sole interface between mutually dependent analyses. This enables the composition of multiple analyses. Our case study shows that using such an approach enables highly scalable, lightweight implementations of modularized static analyses.

Systematic Evaluation Of The Unsoundness Of Call Graph Construction Algorithms For Java

Michael Reif, Florian Kübler, Michael Eichberg, and Mira Mezini

SOAP 2018

Abstract

Call graphs are at the core of many static analyses ranging from the detection of unused methods to advanced control- and data-flow analyses. Therefore, a comprehensive understanding of the precision and recall of the respective graphs is crucial to enable an assessment which call-graph construction algorithms are suited in which analysis scenario. For example, malware is often obfuscated and tries to hide its intent by using Reflection. Call graphs that do not represent reflective method calls are, therefore, of limited use when analyzing such apps.
In general, the precision is well understood, but the recall is not, i.e., in which cases a call graph will not contain any call edges. In this paper, we discuss the design of a comprehensive test suite that enables us to compute a fingerprint of the unsoundness of the respective call-graph construction algorithms. This suite also enables us to make a comparative evaluation of static analysis frameworks. Comparing Soot and WALA shows that WALA currently has better support for new Java 8 features and also for Java Reflection. However, in some cases both fail to include expected edges.

Hermes: Assessment and Creation of Effective Test Corpora
Michael Reif, Michael Eichberg, Ben Hermann and Mira Mezini

SOAP 2017

Abstract

An integral part of developing a new analysis is to validate the correctness of its implementation and to demonstrate its usefulness when applied to real-world code. As a foundation for addressing both challenges developers typically use custom or well-established collections of Java projects. The hope is that the collected projects are \emph{representative} for the analysis' target domain and therefore ensure a sound evaluation. But, without proper means to understand how and to which degree the features relevant to an analysis are found in the projects, the evaluation necessarily remains inconclusive. Additionally, it is likely that the collection contains many projects which are -- w.r.t.\ the developed analysis -- basically identical and therefore do not help the overall evaluation/testing of the analysis, but still cost evaluation time.
To overcome these limitations we propose \tool, a framework that enables the systematic assessment of given corpora and the creation of new corpora of Java projects. To show the usefulness of \tool, we used it to comprehend the nature of the projects belonging to the Qualitas Corpus (QC) and then used it to compute a minimal subset of all QC projects useful for generic data- and control-flow analyses. This subset enables effective and efficient integration test suites.

Don’t let data Go astray
Ka I Pun, Martin Steffen, Volker Stolz, Anna-Katharina Wickert, Eric Bodden and Michael Eichberg

The 28th Nordic Workshop on Programming Theory

Abstract

Taint analysis is a form of data flow analysis aiming at secure information flow. For example, unchecked user input is considered typically as "tainted", i.e., as untrusted and potentially dangerous. Untrusted data may lead to corrupt memory, undermine the correct functioning or privacy concerns of the software otherwise, if it reaches program points it is not supposed to. Many common attack vectors exploit vulnerabilities based on unchecked data and the programmer’s negligence of foreseeing all possible user inputs (including malicious ones) and the resulting information flows through the program.
We present a static taint analysis for Go, a modern, statically typed programming language. Go in particular features concurrent programming, supporting light-weight threads dubbed “goroutines”, and message-based communication. Beside a classical context-sensitive taint analysis, the paper presents a solution for analyzing channel communication in Go.

Toward an Automated Benchmark Management System (ABM)

Lisa Nguyen Quang Do, Michael Eichberg and Eric Bodden

SOAP'16; Proceedings of the 5th ACM SIGPLAN International Workshop on the State of the Art in Java Program Analysis
ACM 2016
The collection of Java Web Applications collected using ABM.

Abstract

The systematic evaluation of program analyses as well as software-engineering tools requires benchmark suites that are representative of real-world projects in the domains for which the tools or analyses are designed. Such benchmarks currently only exist for a few research areas and even where they exist, they are often not effectively maintained, due to the required manual effort. This makes evaluating new analyses and tools on software that relies on current technologies often impossible.

We describe ABM, a methodology to semi-automatically mine software repositories to extract up-to-date and representative sets of applications belonging to specific domains. The proposed methodology facilitates the creation of such collections and makes it easier to release updated versions of a benchmark suite. Resulting from an instantiation of the methodology, we present a collection of current real-world Java business web applications. The collection and methodology serve as a starting point for creating current, targeted benchmark suites, and thus helps to better evaluate current program-analysis and software-engineering tools.

A software product line for static analyses: the OPAL framework

Michael Eichberg and Ben Hermann

SOAP'14; Proceedings of the 3rd ACM SIGPLAN International Workshop on the State of the Art in Java Program Analysis

ACM 2014

Abstract

Implementations of static analyses are usually tailored toward a single goal to be efficient, hampering reusability and adaptability of the components of an analysis. To solve these issues, we propose to implement static analyses as highly-configurable software product lines (SPLs). Furthermore, we also discuss an implementation of an SPL for static analyses – called OPAL – that uses advanced language features offered by the Scala programming language to get an easily adaptable and (type-)safe software product line.
OPAL is a general purpose library for static analysis of Java Bytecode that is already successfully used. We present OPAL and show how a design based on software produce line engineering benefits the implementation of static analyses with the framework.

Towards Virtual Traits in Scala

Manuel Weiel, Ingo Maier, Sebastian Erdweg, Michael Eichberg and Mira Mezini

Scala'14

ACM 2014

Abstract

Scala is a powerful language that supports a variety of fea- tures, but it lacks virtual traits. Virtual traits are class- valued object attributes and can be redefined within sub- traits. They support higher-order hierarchies and family polymorphism. This work introduces virtual traits into Scala and explains how to encode virtual traits on top of existing Scala features. We have implemented this encoding using Scala annotation macros and have conducted two small case studies.

Encoding the Java Virtual Machine’s Instruction Set

Michael Eichberg and Andreas Sewe

BYTECODE 2010, 5TH Workshop on bytecode semantics, verification, analysis and transformation

Elsevier [Electronic Notes in Theoretical Computer Science (ENTCS)](vol. 264); 2011

Abstract

New toolkits that read in, analyze, and transform Java Bytecode are regularly developed from scratch to get a representation suitable for a particular purposes. Nevertheless, the functionality implemented by the toolkits to read in class files and do basic control- and data-flow analyses is comparable and is implemented over and over again. Differences mainly manifest themselves in minor technical issues. To avoid the repetitive development of similar functionality, we have developed an XML-based language for specifying bytecode based instructions. Using this language, we have encoded the instruction set of the Java Class File Format such that it can directly be used, e.g., to generate the skeleton of bytecode-based tools. The XML format specifies (a) the format of the instructions and (b) the effect of each instruction on the stack and the local registers when the instruction is executed. This enables to generate generic control- and data-flow analyses such as an analysis that converts Java Bytecode into static single assignment form. To assess the usefulness of our approach, we have used the encoding of the Java Class Format to develop a framework to analyze and transform Java class files. The evaluation shows that using the specification significantly reduces the development effort when compared with manual development.

Incremental Confined Types Analysis

Michael Eichberg, Sebastian Kanthaka, Sven Kloppenburg, Mira Mezini and Tobias Schuh

Proceedings of the Sixth Workshop on Language Descriptions, Tools and Applications (LDTA)

Elsevier [Electronic Notes in Theoretical Computer Science (ENTCS)](vol. 164); 2006

Abstract

Research related to alias protection and related concepts, such as, confined types and ownership types has a long tradition and is a promising concept for the design and implementation of more reliable and secure software. Unfortunately, the use of these concepts is not widespread as most implementations are proofs of concept and fall short with respect to the integration with standard software development tools and processes. In this paper, we discuss an implementation of confined types based on Java 5 annotations. The contribution of this paper is twofold: First, we discuss the incrementalization of the confined types analysis and second, we present the integration of the analysis into Eclipse using the static analysis platform Magellan.

BAT2XML: XML-based Java Bytecode Representation

Michael Eichberg

Proceedings of Bytecode '05

Elsevier [Electronic Notes in Theoretical Computer Science (ENTCS)](vol. 141); 2005

Abstract

The creation, transformation and analysis of bytecode is widespread. Nevertheless, several problems related to the reusability and comprehensibility of the results and tools exist. In particular, the results of tools for bytecode analysis are usually represented in proprietary tool dependent ways, which makes it hard to build more sophisticated analysis on top of the results generated by tools for lower-level analysis. Furthermore, intermediate results, such as e.g., the results of basic control flow and dataflow analysis, are usually not explicitly represented at all; though, required by many more sophisticated analysis. This lack of a common format, for the well structured representation of the (intermediate) results of code analysis, makes the creation of new tools or the integration of the results generated by different tools costly and ineffective.

Alice: Modularization of Middleware using Aspect-Oriented Programming

Michael Eichberg and Mira Mezini

Proceedings of Software Engineering and Middleware: 4th International Workshop (SEM)

Springer [Lecture Notes in Computer Science (LNCS)](vol. 3437); 2004

Abstract

In this paper we identify three problems with current component middleware. First, the implementation of services is usually not modularized, making it hard to adapt the platform to application specific needs, to exchange services to cope with changing requirements or to use it on different devices. Second, mapping components to objects results in a complex programming model and is making the component code dependent on the used component framework. Third, application level crosscutting concerns are not modularized. To solve these problems we propose an aspect-oriented programming approach complemented by standard Java 1.5 annotations to provide meta information about the components and a sophisticated query language for pointcut designation based on annotations.

Component-based Software Development with Aspect-Oriented Programming

Michael Eichberg

6TH GPCE Young Researchers Workshop 2004

Journal of Object Technology; 2004

Abstract

Middleware for component-based software development already provides some separation of concerns between the components implementing the business functionality and the component environment implementing the infrastructural services. However, the implementation of the services is usually not modularized, making it hard to adapt the platform to application specific needs, to exchange services to cope with changing requirements or to use it on different devices. Also, mapping components to objects results in code where the crosscutting concerns encapsulated in the middleware show up at several places, complicating the programming model and making the component code dependent on the used component framework. In this paper an approach to solve these problems based on the ideas of aspect-oriented programming is proposed.

Journals

Modular Specification and Checking of Structural Dependencies

Ralf Mitschke, Michael Eichberg, Mira Mezini, Allessandro Garcia, Isela Macia

Transactions on Aspect-Oriented Software Development XI

Lecture Notes in Computer Science, Springer; 2014

Abstract

Checking a software’s structural dependencies is a line of research on methods and tools for analyzing, modeling, and checking the conformance of source code w.r.t. specifications of its intended static structure. Existing approaches have focused on the correctness of the specification, the impact of the approaches on software quality and the expressiveness of the modeling languages. However, large specifications become unmaintainable in the event of evolution without the means to modularize such specifications. We present Vespucci, a novel approach and tool that partitions a specification of the expected and allowed dependencies into a set of cohesive slices. This facilitates modular reasoning and helps individual maintenance of each slice. Our approach is suited for modeling high-level as well as detailed low-level decisions related to the static structure and combines both in a single modeling formalism. To evaluate our approach, we conducted an extensive study spanning 9 years of the evolution of the architecture of the object-relational mapping framework Hibernate.

Incremental concrete syntax for embedded languages with support for separate compilation

Tom Dinkelaker, Michael Eichberg and Mira Mezini

Science of Computer Programming

Elsevier; 2013

Abstract

Embedded domain-specific languages (EDSLs) are known to improve the productivity of developers. However, for many domains no DSL implementation is available and two important reasons for this are: First, the effort to implement EDSLs that provide the domain’s established syntax (called concrete syntax) is very high. Second, the EDSL and its underlying general-purpose programming language (GPL) are typically tightly integrated. This hampers reusability across different GPLs. Besides these implementation issues, the productivity gains of using EDSLs are also limited by the lack of explicit tool support for EDSL users—such as syntax highlighting or code analyses.

In this paper, we present an approach that significantly reduces the necessary effort to implement embedded DSLs with concrete syntax. The idea is to use island grammars to specify the EDSL’s concrete syntax. This enables the developer to implement the embedded DSL as a library and to incrementally specify the concrete syntax using meta-data. Only those parts of the EDSL’s grammar need to be specified that deviate from the grammar of the GPL. By analyzing an EDSL’s implementation using reflection, it is possible to provide tool support for EDSLs without having the developer implement it explicitly, such as syntax highlighting. An evaluation demonstrates the feasibility of our approach by embedding a real-world DSL into a GPL.

What should developers be aware of? An empirical study on the directives of API documentation

Martin Monperrus, Michael Eichberg, Elif Tekes and Mira Mezini

Empirical Software Engineering

Springer; 2012

Abstract

Application Programming Interfaces (API) are exposed to developers in order to reuse software libraries. API directives are natural-language statements in API documentation that make developers aware of constraints and guidelines related to the usage of an API. This paper presents the design and the results of an empirical study on the directives of API documentation of object-oriented libraries. Its main contribution is to propose and extensively discuss a taxonomy of 23 kinds of API directives.

The SEXTANT Software Exploration Tool

Thorsten Schäfer, Michael Eichberg, Michael Haupt and Mira Mezini

IEEE Transactions on Software Engineering

IEEE; 2006

Abstract

In this paper, we discuss a set of functional requirements for software exploration tools and provide initial evidence that various combinations of these features are needed to effectively assist developers in understanding software. We observe that current tools for software exploration only partly support these features. This has motivated the development of SEXTANT, a software exploration tool tightly integrated into the Eclipse IDE that has been developed to fill this gap. By means of case studies, we demonstrate how the requirements fulfilled by SEXTANT are conducive to an understanding needed to perform a maintenance task

Others

Open Integrated Development and Analysis Environments

Michael Eichberg

Dissertation

Darmstadt University of Technology [D17]; 2007

Abstract

Comprehensive tool support is essential to enable developers to cope with the complexity of modern software development projects. Software projects are getting larger and larger, are being developed using different languages, and make use of many third-party libraries as well as frameworks. Hence, tools are required: for software comprehension, for checking that libraries and frameworks are correctly used, and to ensure that the design does not degrade over time. Though numerous successful tools have already been developed for these tasks, several issues remain: the tools are usually highly specialized, their extensibility is limited, and an integration between the tools is lacking. Furthermore, IDE integration and in particular an integration with the incremental build process offered by modern IDEs is also often missing. Unfortunately, the direct integration of several code analysis tools with the incremental build process is not possible. When each tool processes the project’s resources on its own and also maintains its own source model, the overall memory requirements and analysis time is prohibitive. To address these issues, this thesis proposes the concept of a Build Process Integrated Open Static Analysis Platform. The core functionality of such a platform is to coordinate the execution of static analyses that are encapsulated into modules with well-defined interfaces. These interfaces specify what the analyses require and provide in terms of the data they process. For a tool that is built upon such a platform it is sufficient to specify the data it requires. The platform can then determine the set of analyses and their execution order to satisfy the tool’s requirements. Modeling analyses as modular producer-consumer units facilitates the simultaneous integration of several tools into the incremental build process of modern IDEs. When compared to using several independent tools, the overall memory requirements are reduced, since the source model derived by the executed analyses is shared among all tools built upon the platform. Furthermore, the overall analysis time is also reduced since analyses are executed at most once, even if the derived information is required by more than one tool. The overall analysis time is further minimized by the parallel execution of those analyses that process different information. The feasibility of the proposed approach is demonstrated by Magellan. Magellan is an open static analysis platform tightly integrated with the incremental build process of the Eclipse IDE. This integration turns Eclipse into an Integrated Development and Analysis Environment. The set of modules implementing the static analyses is freely extensible and the data model of the database is open. An open data model is crucial to support new analyses that need to store derived information for the use by subsequent analyses. Besides featuring a fully flexible analysis stack, Magellan also supports the embedding of query engines. Supporting the execution of queries is indispensable for enabling end-users to define application specific analyses. The ability to execute queries is also required to facilitate software comprehension tools. As a proof of concept an XQuery processor and a Prolog system are embedded into Magellan. Both engines are evaluated w.r.t. using them for the execution of queries along with the incremental build process. The XQuery engine is additionally evaluated in the context of software comprehension tools as a means to enable the end-user to define new ways to navigate through code. The platform is validated by four tools built on top of it: a software exploration tool, a metrics tool, an optional type system, and a set of lightweight static analyses that check structural properties of source code.