Skip to content
This repository was archived by the owner on Oct 27, 2021. It is now read-only.

Commit

Permalink
First version of joern-scan (#13)
Browse files Browse the repository at this point in the history
* Boilerplate for `JoernScan`

* Push reflection voodoo for safe keeping.

* `QueryDatabase` done

* Collapse hierarchy

* Add `name` field to queries

* Simplify package layout

* More simplification

* Fix build

* Remove schema extension because it's slow and not needed here

* QueryDb works with EngineContext now

* Fix build, add test

* Add `joern-scan` script

* Update README.md

* Tuning of queries and shell script

* Factor out joern-specific code from `QueryDatabase`

* Cleanup

* Improve README.md
  • Loading branch information
fabsx00 authored Jan 1, 2021
1 parent a71d1ca commit 91f2356
Show file tree
Hide file tree
Showing 29 changed files with 376 additions and 414 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ jobs:
./install.sh
mkdir /tmp/foo
echo "int foo(int a, int b, int c, int d, int e, int f) {}" > /tmp/foo/foo.c
./joern --src /tmp/foo --run codequalityscanner
./joern --src /tmp/foo --run scan
154 changes: 28 additions & 126 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Joern Query Database
# Joern Query Database ("Joern-Scan")

This is the central query database for the open-source code analysis
platform [Joern](https://github.com/ShiftLeftSecurity/joern). It has
Expand All @@ -9,9 +9,7 @@ two purposes:

The query database is distributed as a standalone library that
includes Joern as a dependency. This means that it is not necessary to
install Joern to make use of the scanners in the database. Instead,
scanners can be invoked from any JVM-based program - as the automatic
tests included in the database demonstrate.
install Joern to make use of the queries in the database.

At the same time, the database is a Joern extension, that is, when
dynamically loaded at startup, its functionality becomes available on
Expand All @@ -23,47 +21,51 @@ for inclusion in the default distribution.

## Installing and running

The installation scripts downloads joern and installs it in a sub-directory.
The installation script downloads joern and installs it in a sub-directory.
The query database is installed as an extension.

```
./install.sh
```

The query database currently makes available the following scanners:

* codequalityscanner - a code quality scanner for C code
* cvulnscanner - a vulnerability scanner for C code

You can run scanners as follows:
You can run all queries as follows:

```
./joern --src path/to/code --run <scannername> --param k1=v1,...
./joern-scan path/to/code
```

For example,

```
mkdir foo
echo "int foo(int a, int b, int c, int d, int e, int f) {}" > foo/foo.c
./joern --src foo --run codequalityscanner
./joern-scan --src foo
```

runs the code quality scanner and determines that the function `foo` has too many parameters.
runs all queries on the sample code in the directory `foo`, determining that the function `foo`
has too many parameters.

## Adding your own queries

## Database overview
Please follow the rules below for a tear-free query writing experience:

Each scanner is hosted in a sub package of `io.joern.scanners`, that
is, it is located in a directory in
`src/main/scala/io/joern/scanners`. As an example, let us look into
the `CodeQualityScanner` at `src/main/scala/io/joern/scanners`. The
file `Metrics.scala` contains its queries:
* Queries in the package `io.joern.scanners` are picked up automatically at runtime,
so please put your queries there.
* Each query must begin with the annotation `@q` and must be placed in a query bundle.
A query bundle is simply an `object` that derives from `QueryBundle`
* Queries can have parameters,but you must provide a default value for each parameter
* Please add unit tests for queries. These also serve as a spec for what your query does.
* Please format the code before sending a PR using `sbt scalafmt` and `sbt test:scalafmt`

Take a look at the query bundle `Metrics` at `src/main/scala/io/joern/scanners/c/Metrics.scala`
as an example:

```
object Metrics {
object Metrics extends QueryBundle {
@q
def tooManyParameters(n: Int = 4): Query = Query(
name = "too-many-parameters",
title = s"Number of parameters larger than $n",
description =
s"This query identifies functions with more than $n formal parameters",
Expand All @@ -72,6 +74,7 @@ object Metrics {
}
)
@q
def tooHighComplexity(n: Int = 4): Query = Query(
title = s"Cyclomatic complexity higher than $n",
description =
Expand All @@ -84,38 +87,10 @@ object Metrics {
}
```

As you can see, each query is implemented in a function that receives
a code property graph (type `Cpg`) and returns a list of findings
(type `List[nodes.NewFinding]`).

These queries are invoked in sequence in `CodeQualityPass` in the file
`CodeQualityScanner.scala`:

```
...
class CodeQualityPass(cpg: Cpg) extends CpgPass(cpg) {
import Metrics._
/**
* All we do here is call all queries and add a node to
* the graph for each result.
* */
override def run(): Iterator[DiffGraph] = {
val diffGraph = DiffGraph.newBuilder
(tooManyParameters()(cpg) ++ tooManyLoops()(cpg) ++ tooNested()(cpg) ++
tooLong()(cpg) ++ tooHighComplexity()(cpg) ++ multipleReturns()(cpg))
.foreach(diffGraph.addNode)
Iterator(diffGraph.build)
}
}
...
```
Apart from these query invocations, `CodeQualityScanner.scala` merely
contains boilerplate code that turns the scanner into a Joern extension.

Corresponding tests for queries are located in
`src/test/scala/io/joern/scanners`. For example, tests for the metrics
queries are located in
`src/test/scala/io/joern/scanners/c/codequality/MetricsTests.scala`:
`src/test/scala/io/joern/scanners/c/MetricsTests.scala`:

```
class MetricsTests extends Suite {
Expand Down Expand Up @@ -153,83 +128,10 @@ follows:
sbt test
```

Automatic code formatting can be performed as follows:
You can test newly developed queries

```
sbt scalafmt
sbt test:scalafmt
```

## Adding queries to existing scripts

You can add queries to an existing bundles by creating a new query set
in the script package. For example, query sets for the C scanner can
be placed here:

https://github.com/joernio/batteries/blob/main/src/main/scala/io/joern/batteries/c/vulnscan/

The file [`SampleQuerySet.scala`](https://github.com/joernio/batteries/blob/main/src/main/scala/io/joern/batteries/c/vulnscan/SampleQuerySet.scala) serves as a template.
If you want to test newly created queries with `joern-scan` as follows:

```
object SampleQuerySet {
def myQuery1(cpg: Cpg): List[nodes.NewFinding] = {
// Add your query here
}
def myQuery2(cpg: Cpg): List[nodes.NewFinding] = {
// Add another query here
}
// ...
}
class SampleQuertSet(cpg: Cpg) extends CpgPass(cpg) {
import SampleQuerySet._
override def run(): Iterator[DiffGraph] = {
val diffGraph = DiffGraph.newBuilder
// Execute queries
myQuery1(cpg).foreach(diffGraph.addNode)
myQuery2(cpg).foreach(diffGraph.addNode)
Iterator(diffGraph.build)
}
```

Finally, add
a `runPass` line to the script [here](https://github.com/joernio/batteries/blob/main/src/main/scala/io/joern/batteries/c/vulnscan/CScanner.scala#L23):

```
class CScanner(options: CScannerOptions) extends LayerCreator {
override val overlayName: String = CScanner.overlayName
override val description: String = CScanner.description
override def create(context: LayerCreatorContext,
storeUndoInfo: Boolean): Unit = {
runPass(new IntegerTruncations(context.cpg), context, storeUndoInfo)
// add more `runPass` calls to execute query sets by default
}
```

## Adding Tests

Please add tests for your queries to ensure that they continue functioning.
Tests also serve as a specification for what your queries should and should not do.

A template for an automated query set test can be found [here](https://github.com/joernio/batteries/blob/main/src/test/scala/io/joern/batteries/c/vulnscan/SampleQuerySetTests.scala)

```
package io.joern.batteries.c.vulnscan
class SampleQuerySetTests extends Suite {
override val code: String =
"""
void place_your_code_here() {}
"""
"find ..." in {
// test code goes here
}
}
./install.sh && ./joern-scan <src>
```
6 changes: 3 additions & 3 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ ThisBuild/scalaVersion := "2.13.0"
enablePlugins(JavaAppPackaging)
enablePlugins(GitVersioning)

lazy val schema = project.in(file("schema"))
dependsOn(schema)
libraryDependencies ++= Seq(
"com.lihaoyi" %% "upickle" % "1.2.2",
"com.github.pathikrit" %% "better-files" % "3.8.0",
Expand All @@ -19,7 +17,6 @@ libraryDependencies ++= Seq(
"io.shiftleft" %% "fuzzyc2cpg" % Versions.cpg % Test,
"org.scalatest" %% "scalatest" % "3.1.1" % Test
)
excludeDependencies += ExclusionRule("io.shiftleft", "codepropertygraph-domain-classes_2.13")

// We exclude a few jars that the main joern distribution already includes
Universal / mappings := (Universal / mappings).value.filterNot {
Expand All @@ -31,6 +28,9 @@ Universal / mappings := (Universal / mappings).value.filterNot {
path.contains("com.lihaoyi.u")
}

sources in (Compile,doc) := Seq.empty
publishArtifact in (Compile, packageDoc) := false

lazy val createDistribution = taskKey[Unit]("Create binary distribution of extension")
createDistribution := {
(Universal/packageZipTarball).value
Expand Down
9 changes: 1 addition & 8 deletions install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ set -o pipefail
set -o nounset
set -eu

readonly JOERN_VERSION="v1.1.63"
readonly JOERN_VERSION="v1.1.64"

if [ "$(uname)" = 'Darwin' ]; then
# get script location
Expand Down Expand Up @@ -71,10 +71,3 @@ pushd $SCRIPT_ABS_DIR
./joern --add-plugin ./querydb.zip
rm lib
popd

echo "Adapting CPG schema"
cp ${SCHEMA_SRC_DIR}/*.json ${JOERN_INSTALL}/schema-extender/schemas/
pushd $JOERN_INSTALL
./schema-extender.sh
popd

15 changes: 15 additions & 0 deletions joern-scan
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env sh

if [ "$(uname -s)" = "Darwin" ]; then
SCRIPT_ABS_PATH=$(greadlink -f "$0")
else
SCRIPT_ABS_PATH=$(readlink -f "$0")
fi
SCRIPT_ABS_DIR=$(dirname "$SCRIPT_ABS_PATH")

if [ "$#" -lt 1 ]; then
echo "Pass in the source directory to scan"
exit 1
fi

$SCRIPT_ABS_DIR/joern --run scan --src "$@"
2 changes: 1 addition & 1 deletion project/Versions.scala
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* Declare dependency versions in one place */
object Versions {
val cpg = "1.3.16"
val cpg = "1.3.25"
val overflowdb = "1.24"
}
55 changes: 0 additions & 55 deletions schema/build.sbt

This file was deleted.

1 change: 0 additions & 1 deletion schema/src/main/resources/schema/ext.json

This file was deleted.

Loading

0 comments on commit 91f2356

Please sign in to comment.