Conferences
LNAI Springer, 2022
Abstract
Password guessing describes the process of finding a password for a secured system. Use cases include password recovery, IT forensics and measuring password strength. Commonly used tools for password guessing work with password leaks and use these lists for candidate generation based on handcrafted or inferred rules. These methods are often limited in their capability of producing entirely novel passwords, based on vocabulary not included in the given password lists. However, there are often semantic similarities between words and phrases of the given lists that are highly relevant for guessing the actual used passwords. In this paper, we propose SePass, a novel method that utilizes word embeddings to discover and exploit these semantic similarities. We compare SePass to a number of competitors and illustrate that our method not only is on par with these competitors, but also generates a significant higher amount of entirely novel password candidates. Using SePass in combination with existing methods, such as PCFG, improves the number of correctly guessed passwords considerably.
ACM, 2020
Abstract
Current approaches combining multiple static analyses deriving different, independent properties focus either on modularity or performance. Whereas declarative approaches facilitate modularity and automated, analysis-independent optimizations, imperative approaches foster manual, analysis-specific optimizations. In this paper, we present a novel approach to static analyses that leverages the modularity of blackboard systems and combines declarative and imperative techniques. Our approach allows exchangeability, and pluggable extension of analyses in order to improve sound(i)ness, precision, and scalability and explicitly enables the combination of otherwise incompatible analyses. With our approach integrated in the OPAL framework, we were able to implement various dissimilar analyses, including a points-to analysis that outperforms an equivalent analysis from Doop, the state-of-the-art points-to analysis framework.
ACM, 2020
Abstract
Parallelization of static analyses is necessary to scale to real-world programs, but it is a complex and difficult task and, therefore, often only done manually for selected high-profile analyses. In this paper, we propose a programming model for semi-implicit parallelization of static analyses which is inspired by reactive programming. Reusing the domain-expert knowledge on how to parallelize anal- yses encoded in the programming framework, developers do not need to think about parallelization and concurrency issues on their own. The programming model supports stateful computations, only requires monotonic computations over lattices, and is independent of specific analyses. Our evaluation shows the applicability of the programming model to different analyses and the importance of user-selected scheduling strategies. We implemented an IFDS solver that was able to outperform a state-of-the-art, specialized parallel IFDS solver both in absolute performance and scalability.
ACM, 2019
Abstract
Call graphs are widely used; in particular for advanced control- and data-flow analyses. Even though many call graph algorithms with different precision and scalability properties have been proposed, a comprehensive understanding of sources of unsoundness, their relevance, and the capabilities of existing call graph algorithms in this respect is missing. To address this problem, we propose Judge, a toolchain that helps with understanding sources of unsoundness and improving the soundness of call graphs. In several experiments, we use Judge and an extensive test suite related to sources of unsoundness to (a) compute capability profiles for call graph implementations of Soot, WALA, DOOP, and OPAL, (b) to determine the prevalence language features and APIs that affect soundness in modern Java Bytecode, (c) to compare the call graphs of Soot, WALA, DOOP, and OPAL, highlighting important differences in their implementations, and (d) to evaluate the necessary effort to achieve project-specific reasonable sound call graphs. We show that soundness-relevant features/APIs are frequently used and that support for them differs vastly, up to the point where comparing call graphs computed by the same base algorithms (e.g., RTA) but different frameworks is bogus. We also show that Judge can support users in establishing the soundness of call graphs with reasonable effort.
ACM, 2018
Abstract
Analyzing methods in object-oriented programs whether they are side-effect free and also
deterministic, i.e., mathematically pure, has been the target of extensive research. Identifying
such methods helps to find code smells and security related issues, and also helps analyses
detecting concurrency bugs. Pure methods are also used by formal verification approaches as the
foundations for specifications and proving the pureness is necessary to ensure correct
specifications.
However, so far no common terminology exists which describes the purity of methods. Furthermore,
some terms (e.g., pure or side- effect free) are also used inconsistently. Further, all current
approaches only report selected purity information making them only suitable for a smaller
subset of
the potential use cases.
In this paper, we present a fine-grained unified lattice model which puts the purity levels
found in
the literature into relation and which adds a new level that generalizes existing definitions.
We
have also implemented a scalable, modularized purity analysis which produces significantly more
precise results for real-world programs than the best-performing related work. The analysis
shows
that all defined levels are found in real-world projects.
ACM, 2017
Abstract
An established way to steal the income of app developers, or to trick users into installing
malware,
is the creation of repackaged apps. These are clones of – typically – successful apps.
To conceal their nature, they are often obfuscated by their creators. But, given that it is a
common
best practice to obfuscate apps, a trivial identification of repackaged apps is not possible.
The
problem is further intensified by the prevalent usage of libraries. In many apps, the size of
the
overall code base is basically determined by the used libraries. Therefore, two apps, where the
obfuscated code bases are very similar, do not have to be repackages of each other.
To reliably detect repackaged apps, we propose a two step approach which first focuses on the
identification and removal of the library code in obfuscated apps. This approach – LibDetect –
relies on code representations which abstract over several parts of the underlying bytecode to
be
resilient against certain obfuscation techniques. Using this approach, we are able to identify
on
average 70% more used libraries per app than previous approaches. After the removal of an app's
library code, we then fuzzy hash the most abstract representation of the remaining app code to
ensure that we can identify repackaged apps even if very advanced obfuscation techniques are
used.
This makes it possible to identify repackaged apps.
Using our approach, we found that ~15% of all apps in Android app stores are repackages.
ACM, 2016
Abstract
Concurrent programming is infamous for its difficulty. An important source of difficulty is
non-determinism, stemming from unpredictable interleavings of concurrent activities. Futures and
promises are widely-used abstractions that help designing deterministic concurrent programs,
although this property cannot be guaranteed statically in mainstream programming languages.
Deterministic-by-construction concurrent programming models avoid this issue, but they typically
restrict expressiveness in important ways.
This paper introduces a concurrent programming model, Reactive Async, which decouples concurrent
computations using so-called cells, shared locations which generalize futures as well as recent
deterministic abstractions such as LVars. Compared to previously proposed programming models
Reactive Async provides (a) a fallback mechanism for the case where no computation ever computes
the
value of a given cell, and (b) explicit and optimized handling of cyclic dependencies. We
present a
complete implementation of the Reactive Async programming model as a library in Scala. Finally,
the
paper reports on a case study applying Reactive Async to static analyses of JVM bytecode based
on
the Opal framework.
ACM, 2016
Abstract
Today, every application uses software libraries. Yet, while a lot of research exists w.r.t.
analyzing applications, research that targets the analysis of libraries independent of any
application is scarce. This is unfortunate, because, for developers of libraries, such as the
Java
Development Kit (JDK), it is crucial to ensure that the library behaves as intended regardless
of
how it is used. To fill this gap, we discuss the construction of call graphs for libraries that
abstract over all potential library usages. Call graphs are particularly relevant as they are a
precursor of many advanced analyses, such as inter-procedural data-flow analyses.
We show that the current practice of using call graph algorithms designed for applications to
analyze libraries leads to call graphs that, at the same time, lack relevant call edges and
contain
unnecessary edges. This motivates the need for call graph construction algorithms dedicated to
libraries. Unlike algorithms for applications, call graph construction algorithms for libraries
must
take into consideration the goals of subsequent analyses. Specifically, we show that it is
essential
to distinguish between the scenario of an analysis for potential exploitable vulnerabilities
from
the scenario of an analysis for general software quality attributes, e.g., dead methods or
unused
fields. This distinction affects the decision about what constitutes the library-private
implementation, which therefore, needs special treatment. Thus, building one call graph that
satisfies all needs is not sensical. Overall, we observed that the proposed call graph
algorithms
reduce the number of call edges up to 30% when compared to existing approaches.
ACM, 2015
Abstract
Approaches and techniques for statically finding a multitude of issues in source code have been developed in the past. A core property of these approaches is that they are usually targeted towards finding only a very specific kind of issue and that the effort to develop such an analysis is significant. This strictly limits the number of kinds of issues that can be detected. In this paper, we discuss a generic approach based on the detection of infeasible paths in code that can discover a wide range of code smells ranging from useless code that hinders comprehension to real bugs. Code issues are identified by calculating the difference between the control-flow graph that contains all technically possible edges and the corresponding graph recorded while performing a more precise analysis using abstract interpretation. We have evaluated the approach using the Java Development Kit as well as the Qualitas Corpus (a curated collection of over 100 Java Applications) and were able to find thousands of issues across a wide range of categories.
ACM, 2015
Abstract
Developing software from reusable libraries lets developers face a security dilemma: Either be efficient and reuse libraries as they are or inspect them, know about their resource usage, but possibly miss deadlines as reviews are a time consuming process. In this paper, we propose a novel capability inference mechanism for libraries written in Java. It uses a coarse-grained capability model for system resources that can be presented to developers. We found that the capability inference agrees by 86.81% on expectations towards capabilities that can be derived from project documentation. Moreover, our approach can find capabilities that cannot be discovered using project documentation. It is thus a helpful tool for developers mitigating the aforementioned dilemma.
ACM, 2014
Abstract
As software systems are maintained, their architecture often de-grades through the processes of architectural drift and erosion. These processes are often intertwined and the same modules in the code become the locus of both drift and erosion symptoms. Thus, architects should elaborate architecture rules for detecting occur-rences of both degradation symptoms. While the specification of such rules is time-consuming, they are similar across software projects adhering to similar architecture decompositions. Unfortu-nately, existing anti-degradation techniques are limited as they focus only on detecting either drift or erosion symptoms. They also do not support the reuse of recurring anti-degradation rules. In this context, the contribution of this paper is twofold. First, it presents TamDera, a domain-specific language for: (i) specifying rule-based strategies to detect both erosion and drift symptoms, and (ii) promoting the hierarchical and compositional reuse of design rules across multiple projects. The language was designed with usual concepts from programming languages in mind such as, inheritance and modularization. Second, we evaluated to what extent developers would benefit from the definition and reuse of hybrid rules. Our study involved 21 versions pertaining to 5 software projects, and more than 600 rules. On average 45% of classes that had drift symptoms in first versions presented inter-related erosion problems in latter versions or vice-versa. Also, up to 72% of all the TamDera rules in a project are from a pre-defined library of reusable rules. They were responsible for detecting on average of 73% of the inter-related degradation symptoms across the projects.
ACM, 2013
Abstract
Checking a software's structural dependencies is a line of research on methods and tools for analyzing, modeling and checking the conformance of source code w.r.t. specifications of its intended static structure. Existing approaches have focused on the correctness of the specification, the impact of the approaches on software quality and the expressiveness of the modeling languages. However, large specifications become unmaintainable in the event of evolution without the means to modularize such specifications. We present Vespucci, a novel approach and tool that partitions a specification of the expected and allowed dependencies into a set of cohesive slices. This facilitates modular reasoning and helps individual maintenance of each slice. Our approach is suited for modeling high-level as well as detailed low-level decisions related to the static structure and combines both in a single modeling formalism. To evaluate our approach we conducted an extensive study spanning nine years of the evolution of the architecture of the object-relational mapping framework Hibernate.
ACM, 2013
Abstract
Modularity and efficiency are often contradicting requirements, such that programers have to trade one for the other. We analyze this dilemma in the context of programs operating on collections. Performance-critical code using collections need often to be hand-optimized, leading to non-modular, brittle, and redundant code. In principle, this dilemma could be avoided by automatic collection-specific optimizations, such as fusion of collection traversals, usage of indexing, or reordering of filters. Unfortunately, it is not obvious how to encode such optimizations in terms of ordinary collection APIs, because the program operating on the collections is not reified and hence cannot be analyzed.
We propose SQuOpt, the Scala Query Optimizer--a deep embedding of the Scala collections API that allows such analyses and optimizations to be defined and executed within Scala, without relying on external tools or compiler extensions. SQuOpt provides the same "look and feel" (syntax and static typing guarantees) as the standard collections API. We evaluate SQuOpt by re-implementing several code analyses of the FindBugs tool using SQuOpt, show average speedups of 12x with a maximum of 12800x and hence demonstrate that SQuOpt can reconcile modularity and efficiency in real-world applications.
Leibniz International Proceedings in Informatics (LIPIcs); 2011
Abstract
Today, Prolog is often used to solve well-defined, domain-specific problems that are part of larger applications. In such cases, a tight integration of the Prolog program and the rest of the application, which is commonly written in a different language, is necessary. One common approach is to compile the Prolog code to (native) code in the target language. In this case, the effort necessary to build, test and deploy the final application is reduced.
However, most of the approaches that achieve reasonable performance compile Prolog to object-oriented code that relies on some kind of virtual machine (VM). These VMs are libraries implemented in the target language and implement Prolog’s execution semantics. This adds a significant layer to the object-oriented program and results in code that does not look and feel native to developers of object-oriented programs. Further, if Prolog’s execution semantics is implemented as a library the potential of modern runtime environments for object-oriented programs, such as the Java Virtual Machine, to effectively optimize the program is more limited. In this paper, we report on our approach to compile Prolog to high-level, idiomatic object-oriented Java code. The generated Java code closely resembles code written by Java developers and is effectively optimized by the Java Virtual Machine.
ACM; 2011
Abstract
Embedded domain-specific languages (EDSLs) are known to improve the productivity of developers. However, for many domains no DSL implementation is available. Two important reasons are: First, the effort to implement embedded DSLs that provide the domain’s established syntax (called concrete syntax) is very high. Second, the embedded DSL and its underlying general-purpose programming language (GPL) are typically tightly integrated which hampers reusability across different GPLs. In this paper, we present an approach that significantly reduces the necessary effort to implement embedded DSLs with concrete syntax. The idea is to use island grammars to specify the EDSL’s concrete syntax. This enables the developer to implement the embedded DSL as a library and to incrementally specify the concrete syntax using meta-data. Only those parts of the EDSL’s grammar need to be specified that deviate from the grammar of the GPL and which is required to enable the integration with the GPL.
Springer; 2010
Abstract
Implementing static analyses of machine-level executable code is labor intensive and complex. We show how to leverage model-driven engineering to facilitate the design and implementation of programs doing static analyses. Further, we report on important lessons learned on the benefits and drawbacks while using the following technologies: using the Scala programming language as target of code generation, using XML-Schema to express a metamodel, using XSLT to implement transformations and for implementing a lint like tool. Finally, we report on the use of Prolog for writing model transformations.
Springer; 2010
Abstract
In general, components provide and require services and two components are bound if the first component provides a service required by the second component. However, certain variability in services – w.r.t. how and which functionality is provided or required – cannot be described using standard interface description languages. If this variability is relevant when selecting a matching component then human interaction is required to decide which components can be bound. We propose to use feature models for making this variability explicit and (re-)enabling automatic component binding. In our approach, feature models are one part of service specifications. This enables to declaratively specify which service variant is provided by a component. By referring to a service’s variation points, a component that requires a specific service can list the requirements on the desired variant. Using these specifications, a component environment can then determine if a binding of the components exists that satisfies all requirements. The prototypical environment Columbus demonstrates the feasibility of the approach.
ACM Press; 2010
Abstract
Embedded domain-specific languages (EDSLs) are said to be easier to compose than DSLs that are implemented by preprocessors. However, existing approaches focus on composition scenarios where the use of abstractions from one do- main does not affect the interpretation of abstractions from another domain. This leads to programs that exhibit scattering and tangling symptoms if multiple EDSLs with cross-cutting domain semantics are used. To address this issue, we propose an architecture for embedding DSLs that makes use of meta-object protocols and aspect-oriented concepts to support crosscutting composition of EDSLs. This enables to write modularized EDSL programs where each program addresses one concern.
ACM Press; 2008
Abstract
Dependencies between program elements need to be modeled from different perspectives reflecting architectural, design, and implementation level decisions. To avoid erosion of the intended structure of the code, it is necessary to explicitly codify these different perspectives on the permitted dependencies and to detect violations continuously and incrementally as software evolves. We propose an approach that uses declarative queries to group source elements — across programming language module boundaries — into overlapping ensembles. The dependencies between these ensembles are also specified as logic queries. The approach has been integrated into the incremental build process of Eclipse to ensure continuous checking, using an engine for tabled and incremental evaluation of logic queries. Our evaluation shows that our approach is fast enough for day-to-day use along the incremental build process of modern IDEs.
Springer [Lecture Notes in Computer Science (LNCS)](vol. 4354); 2007
Abstract
Modern development environments integrate various static analyses into the build process. Analyses that analyze the whole project whenever the project changes are impractical in this context. We present an approach to automatic incrementalization of analyses that are specified as tabled logic programs and evaluated using incremental tabled evaluation, a technique for efficiently updating memo tables in response to changes in facts and rules. The approach has been implemented and integrated into the Eclipse IDE. Our measurements show that this technique is effective for automatically incrementalizing a broad range of static analyses.
IEEE Computer Society; 2006
Abstract
To improve the productivity of the development process, more and more tools for static software analysis are tightly integrated into the incremental build process of an IDE. If multiple interdependent analyses are used simultaneously, the coordination between the analyses becomes a major obstacle to keep the set of analyses open. We propose an approach to integrating and scheduling an open set of static analyses which decouples the individual analyses and coordinates the analysis executions such that the overall time and space consumption is minimized. The approach has been implemented for the Eclipse IDE and has been used to integrate a wide range of analyses such as finding bug patterns, detecting violations of design guidelines, or type system extensions for Java.
IEEE Computer Society; 2006
Abstract
To measure the particularities of modern software development projects that use different types of documents for the implementation of a program, new metrics need to be defined. Further, well established metrics, such as e.g., lack of cohesion or coupling between objects need to be reconsidered in the presence of new language features. Not being able to thoroughly measure a project can lead to false conclusions with respect to the measured source files. Currently, a large number of metrics tools exists, but unfortunately most tools are not extensible, or they are limited with respect to the types of documents that can be taken into account. Further, support for testing newly developed metrics is also missing. In this paper we present QScope - an open, extensible metrics framework. QScope is open with respect to the supported artifacts and explicitly enables the user to implement new metrics by reasoning over all artifacts using a declarative query language. As we will show in this paper, using a declarative query language enables a concise definition of new metrics.
IEEE Computer Society; 2005
Abstract
Current tools for software understanding mostly concentrate on one comprehension technique, e.g., visualization, or bottom-up navigation through software elements via hyperlinks. In this paper, we argue that to effectively assist developers in understanding today's software systems, a combination of several comprehension techniques is needed, including seamless integration of top-down querying and bottom-up navigation strategies that work across different kinds of software artifacts; furthermore, application-domain and/or technology specific relationships between software elements should be taken into consideration; last but not least, a tight integration of such tools into development environments is crucial. We present Sextant, a software exploration tool tightly integrated into the Eclipse IDE that satisfies these requirements. In two case studies, we demonstrate how Sextant's features are conducive in tracking down the source of erroneous behavior, respectively, in discovering "bad smells" in the software structure which should lead to code refactorings.
Springer [Lecture Notes in Computer Science (LNCS)](vol. 3442); 2005
Abstract
The specification of meta-information, by using attributes in .NET or annotations in Java, along with the source code is gaining widespread use. Meta-information is used for different purposes such as code generation or configuration of the environment in which a class is deployed. However, in most cases using an annotation also implies that constraints, beyond those defined by the language's semantics, have to be followed. E.g., a class must define a no-arguments constructor or the parameters of a method must have specific types. Currently, these constraints are not checked at all or only to a very limited extend. Hence, a violation can remain undetected and result in deployment-time or even subtle run-time errors. In this paper, we present a user-extensible framework that enables the definition of constraints to check the properties of annotated elements. Further, we demonstrate the application of the framework to check the constraints defined in the EJB 3.0 specification, and an evaluation of the approach based on checking the xPetstore-EJB3.0 project from within Eclipse to test the performance.
ACM 2005
Abstract
Language mechanisms deserve language implementation effort. While this maxim has led to sophisticated support for language features specific to object-oriented, functional and logic programming languages, aspect-oriented programming languages are still mostly implemented using postprocessors. The Steamloom virtual machine, based on IBM's Jikes RVM, provides support for aspect-oriented programming at virtual machine level. A bytecode framework called BAT was integrated with the Jikes RVM to replace its bytecode management logic. While preserving the functionality needed by the VM, BAT also allows for querying application code for join point shadows, avoiding redundancy in bytecode representation. Performance measurements show that an AOP-enabled virtual machine like Steamloom does not inflict unnecessary performance penalties on a running application; when it comes to executing AOP-related operations, there even are significant performance gains compared to other approaches.
IEEE Computer Society; 2004
Abstract
We describe XIRC, a tool and architecture that enables to define queries over a uniform representation of all artifacts of a software project. These queries can be used for general cross-artifact information retrieval or for more special applications like checking implementation restrictions or conformance to style guides. XIRC is also a good basis to implement a broad range of tools for refactoring, generators, aspect-oriented programming and many other domains on top of it.
Springer [Lecture Notes in Computer Science (LNCS)](vol. 3302); 2004
Abstract
Most aspect-oriented languages provide only a fixed, built-in set of pointcut designators whose denotation is only described informally. As a consequence, these languages do not provide operations to manipulate or reason about pointcuts beyond weaving. In this paper, we investigate the usage of the functional query language XQuery for the specification of pointcuts. Due to its abstraction and module facilities XQuery enables powerful composition and reusability mechanisms for pointcuts.
IEEE Computer Society; 2004
Abstract
Policy enforcement is a mechanism for ensuring that system components follow certain programming practices, comply with specified rules, and meet certain assumptions. Unfortunately, the most common mechanisms used today for policy enforcement are documentation, training, and code reviews. The fundamental problem is that these mechanisms are expensive, time-consuming, and still error-prone. To cope with this problem, in this paper, we present IRC (Implementation Restriction Checker), an extensible framework for automatically enforcing system-wide policies or contracts. The framework is built on top of a platform for aspect-oriented programming at the level of Java bytecode instructions and is available as an Eclipse plug-in as well as a standalone application. It includes a set of directly usable checkers and can be easily extended to implement new ones.
Workshops
SOAP 2020
Abstract
Most Java static analysis frameworks provide an intermediate presentation (IR) of Java Bytecode
to
facilitate the development of static analyses. While such IRs are often based on three-address
code,
the transformation itself is a great opportunity to apply optimizations to the transformed code,
such as constant propagation.
In this paper, we propose TACAI, a refinable IR that is based on abstract interpretation results
of
a method's bytecode. Exchanging the underlying abstract interpretation domains enables the
creation
of various IRs of different precision levels. Our evaluation shows that TACAI can be efficiently
computed and provides slightly more precise receiver-type information than Soot's Shimple
representation. Furthermore, we show how exchanging the underlying abstract domains directly
impacts
the generated IR.
Abstract
Cryptographic APIs (Crypto APIs) provide the foundations for the development of secure applications. Unfortunately, most applications do not use Crypto APIs securely and end up being insecure, e.g., by the usage of an outdated algorithm, a constant initialization vector, or an inappropriate hashing algorithm. Two different studies [1], [2] have recently shown that 88% to 95% of those applications using Crypto APIs are insecure due to misuses. To facilitate further research on these kinds of misuses, we created a collection of 201 misuses found in real-world applications along with a classification of those misuses. In the provided dataset, each misuse consists of the corresponding open-source project, the project’s build information, a description of the misuse, and the misuse’s location. Further, we integrated our dataset into MUBench [3], a benchmark for API misuse detection. Our dataset provides a foundation for research on Crypto API misuses. For example, it can be used to evaluate the precision and recall of detection tools, as a foundation for studies related to Crypto API misuses, or as a training set.
Abstract
Static analyses which compute conceptually independent information, e.g., class immutability or method purity are typically developed as standalone, closed analyses. Complementary information that could improve the analyses is either ignored by making a sound over-approximation or it is also computed by the analyses, but at a rudimentary level. For example, an immutability analysis requires field mutability information, alias/escape information, and information about the concurrent behavior of methods to correctly classify classes like java.lang.String or java.util.BigDecimal. As a result, without properly supporting the integration of independently developed, mutually benefiting analysis, many analyses will not correctly classify relevant entities. We propose to use explicitly reified lattices that encode the information about a source code element’s properties (e.g., a method’s purity or a class’ immutability) as the sole interface between mutually dependent analyses. This enables the composition of multiple analyses. Our case study shows that using such an approach enables highly scalable, lightweight implementations of modularized static analyses.
SOAP 2018
Abstract
Call graphs are at the core of many static analyses ranging from the detection of unused methods
to
advanced control- and data-flow analyses. Therefore, a comprehensive understanding of the
precision
and recall of the respective graphs is crucial to enable an assessment which call-graph
construction
algorithms are suited in which analysis scenario. For example, malware is often obfuscated and
tries
to hide its intent by using Reflection. Call graphs that do not represent reflective method
calls
are, therefore, of limited use when analyzing such apps.
In general, the precision is well understood, but the recall is not, i.e., in which cases a call
graph will not contain any call edges. In this paper, we discuss the design of a comprehensive
test
suite that enables us to compute a fingerprint of the unsoundness of the respective call-graph
construction algorithms. This suite also enables us to make a comparative evaluation of static
analysis frameworks. Comparing Soot and WALA shows that WALA currently has better support for
new
Java 8 features and also for Java Reflection. However, in some cases both fail to include
expected
edges.
Michael Reif, Michael Eichberg, Ben Hermann and Mira Mezini
SOAP 2017
Abstract
An integral part of developing a new analysis is to validate the correctness of its
implementation
and to demonstrate its usefulness when applied to real-world code. As a foundation for
addressing
both challenges developers typically use custom or well-established collections of Java
projects.
The hope is that the collected projects are \emph{representative} for the analysis' target
domain
and therefore ensure a sound evaluation. But, without proper means to understand how and to
which
degree the features relevant to an analysis are found in the projects, the evaluation
necessarily
remains inconclusive. Additionally, it is likely that the collection contains many projects
which
are -- w.r.t.\ the developed analysis -- basically identical and therefore do not help the
overall
evaluation/testing of the analysis, but still cost evaluation time.
To overcome these limitations we propose \tool, a framework that enables the systematic
assessment
of given corpora and the creation of new corpora of Java projects. To show the usefulness of
\tool,
we used it to comprehend the nature of the projects belonging to the Qualitas Corpus (QC) and
then
used it to compute a minimal subset of all QC projects useful for generic data- and control-flow
analyses. This subset enables effective and efficient integration test suites.
Ka I Pun, Martin Steffen, Volker Stolz, Anna-Katharina Wickert, Eric Bodden and Michael Eichberg
The 28th Nordic Workshop on Programming Theory
Abstract
Taint analysis is a form of data flow analysis aiming at secure information flow. For example,
unchecked user input is considered typically as "tainted", i.e., as untrusted and potentially
dangerous. Untrusted data may lead to corrupt memory, undermine the correct functioning or
privacy
concerns of the software otherwise, if it reaches program points it is not supposed to. Many
common
attack vectors exploit vulnerabilities based on unchecked data and the programmer’s negligence
of
foreseeing all possible user inputs (including malicious ones) and the resulting information
flows
through the program.
We present a static taint analysis for Go, a modern, statically typed programming language. Go
in
particular features concurrent programming, supporting light-weight threads dubbed “goroutines”,
and
message-based communication. Beside a classical context-sensitive taint analysis, the paper
presents
a solution for analyzing channel communication in Go.
ACM 2016
The collection of Java Web Applications collected using ABM.
Abstract
The systematic evaluation of program analyses as well as software-engineering tools requires benchmark suites that are representative of real-world projects in the domains for which the tools or analyses are designed. Such benchmarks currently only exist for a few research areas and even where they exist, they are often not effectively maintained, due to the required manual effort. This makes evaluating new analyses and tools on software that relies on current technologies often impossible.
We describe ABM, a methodology to semi-automatically mine software repositories to extract up-to-date and representative sets of applications belonging to specific domains. The proposed methodology facilitates the creation of such collections and makes it easier to release updated versions of a benchmark suite. Resulting from an instantiation of the methodology, we present a collection of current real-world Java business web applications. The collection and methodology serve as a starting point for creating current, targeted benchmark suites, and thus helps to better evaluate current program-analysis and software-engineering tools.
ACM 2014
Abstract
Implementations of static analyses are usually tailored toward a single goal to be efficient,
hampering reusability and adaptability of the components of an analysis. To solve these issues,
we
propose to implement static analyses as highly-configurable software product lines (SPLs).
Furthermore, we also discuss an implementation of an SPL for static analyses – called OPAL –
that
uses advanced language features offered by the Scala programming language to get an easily
adaptable
and (type-)safe software product line.
OPAL is a general purpose library for static analysis of Java Bytecode that is already
successfully
used. We present OPAL and show how a design based on software produce line engineering benefits
the
implementation of static analyses with the framework.
ACM 2014
Abstract
Scala is a powerful language that supports a variety of fea- tures, but it lacks virtual traits. Virtual traits are class- valued object attributes and can be redefined within sub- traits. They support higher-order hierarchies and family polymorphism. This work introduces virtual traits into Scala and explains how to encode virtual traits on top of existing Scala features. We have implemented this encoding using Scala annotation macros and have conducted two small case studies.
Elsevier [Electronic Notes in Theoretical Computer Science (ENTCS)](vol. 264); 2011
Abstract
New toolkits that read in, analyze, and transform Java Bytecode are regularly developed from scratch to get a representation suitable for a particular purposes. Nevertheless, the functionality implemented by the toolkits to read in class files and do basic control- and data-flow analyses is comparable and is implemented over and over again. Differences mainly manifest themselves in minor technical issues. To avoid the repetitive development of similar functionality, we have developed an XML-based language for specifying bytecode based instructions. Using this language, we have encoded the instruction set of the Java Class File Format such that it can directly be used, e.g., to generate the skeleton of bytecode-based tools. The XML format specifies (a) the format of the instructions and (b) the effect of each instruction on the stack and the local registers when the instruction is executed. This enables to generate generic control- and data-flow analyses such as an analysis that converts Java Bytecode into static single assignment form. To assess the usefulness of our approach, we have used the encoding of the Java Class Format to develop a framework to analyze and transform Java class files. The evaluation shows that using the specification significantly reduces the development effort when compared with manual development.
Elsevier [Electronic Notes in Theoretical Computer Science (ENTCS)](vol. 164); 2006
Abstract
Research related to alias protection and related concepts, such as, confined types and ownership types has a long tradition and is a promising concept for the design and implementation of more reliable and secure software. Unfortunately, the use of these concepts is not widespread as most implementations are proofs of concept and fall short with respect to the integration with standard software development tools and processes. In this paper, we discuss an implementation of confined types based on Java 5 annotations. The contribution of this paper is twofold: First, we discuss the incrementalization of the confined types analysis and second, we present the integration of the analysis into Eclipse using the static analysis platform Magellan.
Elsevier [Electronic Notes in Theoretical Computer Science (ENTCS)](vol. 141); 2005
Abstract
The creation, transformation and analysis of bytecode is widespread. Nevertheless, several problems related to the reusability and comprehensibility of the results and tools exist. In particular, the results of tools for bytecode analysis are usually represented in proprietary tool dependent ways, which makes it hard to build more sophisticated analysis on top of the results generated by tools for lower-level analysis. Furthermore, intermediate results, such as e.g., the results of basic control flow and dataflow analysis, are usually not explicitly represented at all; though, required by many more sophisticated analysis. This lack of a common format, for the well structured representation of the (intermediate) results of code analysis, makes the creation of new tools or the integration of the results generated by different tools costly and ineffective.
Springer [Lecture Notes in Computer Science (LNCS)](vol. 3437); 2004
Abstract
In this paper we identify three problems with current component middleware. First, the implementation of services is usually not modularized, making it hard to adapt the platform to application specific needs, to exchange services to cope with changing requirements or to use it on different devices. Second, mapping components to objects results in a complex programming model and is making the component code dependent on the used component framework. Third, application level crosscutting concerns are not modularized. To solve these problems we propose an aspect-oriented programming approach complemented by standard Java 1.5 annotations to provide meta information about the components and a sophisticated query language for pointcut designation based on annotations.
Journal of Object Technology; 2004
Abstract
Middleware for component-based software development already provides some separation of concerns between the components implementing the business functionality and the component environment implementing the infrastructural services. However, the implementation of the services is usually not modularized, making it hard to adapt the platform to application specific needs, to exchange services to cope with changing requirements or to use it on different devices. Also, mapping components to objects results in code where the crosscutting concerns encapsulated in the middleware show up at several places, complicating the programming model and making the component code dependent on the used component framework. In this paper an approach to solve these problems based on the ideas of aspect-oriented programming is proposed.
Journals
Lecture Notes in Computer Science, Springer; 2014
Abstract
Checking a software’s structural dependencies is a line of research on methods and tools for analyzing, modeling, and checking the conformance of source code w.r.t. specifications of its intended static structure. Existing approaches have focused on the correctness of the specification, the impact of the approaches on software quality and the expressiveness of the modeling languages. However, large specifications become unmaintainable in the event of evolution without the means to modularize such specifications. We present Vespucci, a novel approach and tool that partitions a specification of the expected and allowed dependencies into a set of cohesive slices. This facilitates modular reasoning and helps individual maintenance of each slice. Our approach is suited for modeling high-level as well as detailed low-level decisions related to the static structure and combines both in a single modeling formalism. To evaluate our approach, we conducted an extensive study spanning 9 years of the evolution of the architecture of the object-relational mapping framework Hibernate.
Elsevier; 2013
Abstract
Embedded domain-specific languages (EDSLs) are known to improve the productivity of developers. However, for many domains no DSL implementation is available and two important reasons for this are: First, the effort to implement EDSLs that provide the domain’s established syntax (called concrete syntax) is very high. Second, the EDSL and its underlying general-purpose programming language (GPL) are typically tightly integrated. This hampers reusability across different GPLs. Besides these implementation issues, the productivity gains of using EDSLs are also limited by the lack of explicit tool support for EDSL users—such as syntax highlighting or code analyses.
In this paper, we present an approach that significantly reduces the necessary effort to implement embedded DSLs with concrete syntax. The idea is to use island grammars to specify the EDSL’s concrete syntax. This enables the developer to implement the embedded DSL as a library and to incrementally specify the concrete syntax using meta-data. Only those parts of the EDSL’s grammar need to be specified that deviate from the grammar of the GPL. By analyzing an EDSL’s implementation using reflection, it is possible to provide tool support for EDSLs without having the developer implement it explicitly, such as syntax highlighting. An evaluation demonstrates the feasibility of our approach by embedding a real-world DSL into a GPL.
Springer; 2012
Abstract
Application Programming Interfaces (API) are exposed to developers in order to reuse software libraries. API directives are natural-language statements in API documentation that make developers aware of constraints and guidelines related to the usage of an API. This paper presents the design and the results of an empirical study on the directives of API documentation of object-oriented libraries. Its main contribution is to propose and extensively discuss a taxonomy of 23 kinds of API directives.
IEEE; 2006
Abstract
In this paper, we discuss a set of functional requirements for software exploration tools and provide initial evidence that various combinations of these features are needed to effectively assist developers in understanding software. We observe that current tools for software exploration only partly support these features. This has motivated the development of SEXTANT, a software exploration tool tightly integrated into the Eclipse IDE that has been developed to fill this gap. By means of case studies, we demonstrate how the requirements fulfilled by SEXTANT are conducive to an understanding needed to perform a maintenance task
Others
Darmstadt University of Technology [D17]; 2007
Abstract
Comprehensive tool support is essential to enable developers to cope with the complexity of modern software development projects. Software projects are getting larger and larger, are being developed using different languages, and make use of many third-party libraries as well as frameworks. Hence, tools are required: for software comprehension, for checking that libraries and frameworks are correctly used, and to ensure that the design does not degrade over time. Though numerous successful tools have already been developed for these tasks, several issues remain: the tools are usually highly specialized, their extensibility is limited, and an integration between the tools is lacking. Furthermore, IDE integration and in particular an integration with the incremental build process offered by modern IDEs is also often missing. Unfortunately, the direct integration of several code analysis tools with the incremental build process is not possible. When each tool processes the project’s resources on its own and also maintains its own source model, the overall memory requirements and analysis time is prohibitive. To address these issues, this thesis proposes the concept of a Build Process Integrated Open Static Analysis Platform. The core functionality of such a platform is to coordinate the execution of static analyses that are encapsulated into modules with well-defined interfaces. These interfaces specify what the analyses require and provide in terms of the data they process. For a tool that is built upon such a platform it is sufficient to specify the data it requires. The platform can then determine the set of analyses and their execution order to satisfy the tool’s requirements. Modeling analyses as modular producer-consumer units facilitates the simultaneous integration of several tools into the incremental build process of modern IDEs. When compared to using several independent tools, the overall memory requirements are reduced, since the source model derived by the executed analyses is shared among all tools built upon the platform. Furthermore, the overall analysis time is also reduced since analyses are executed at most once, even if the derived information is required by more than one tool. The overall analysis time is further minimized by the parallel execution of those analyses that process different information. The feasibility of the proposed approach is demonstrated by Magellan. Magellan is an open static analysis platform tightly integrated with the incremental build process of the Eclipse IDE. This integration turns Eclipse into an Integrated Development and Analysis Environment. The set of modules implementing the static analyses is freely extensible and the data model of the database is open. An open data model is crucial to support new analyses that need to store derived information for the use by subsequent analyses. Besides featuring a fully flexible analysis stack, Magellan also supports the embedding of query engines. Supporting the execution of queries is indispensable for enabling end-users to define application specific analyses. The ability to execute queries is also required to facilitate software comprehension tools. As a proof of concept an XQuery processor and a Prolog system are embedded into Magellan. Both engines are evaluated w.r.t. using them for the execution of queries along with the incremental build process. The XQuery engine is additionally evaluated in the context of software comprehension tools as a means to enable the end-user to define new ways to navigate through code. The platform is validated by four tools built on top of it: a software exploration tool, a metrics tool, an optional type system, and a set of lightweight static analyses that check structural properties of source code.