Octosuite: A New Tool to Conduct Open Source Investigations on GitHub
GitHub is one of the most popular code-hosting platforms on the internet, with a global community of more than 90 million people reported to use its services.
This makes GitHub a key platform for coders and developers. Many useful tools that aid open source investigations can be found on the site (including on Bellincgat’s GitHub repository).
Given the amount of information users share on the platform, GitHub itself can also be a useful source for online investigators.
For example, information available on GitHub can be cross-referenced with other social media or online content that has been publicly shared.
While GitHub has an intuitive user interface, it requires many click-throughs and is limited to opening one entity (be that an individual page or user or organisation profile) at a time.
On top of this, there is no easy way to save the information one comes across on GitHub.
This is where Octosuite comes in. Octosuite is an advanced GitHub framework written in Python that uses GitHub’s Public API to make the process of investigating accounts and repositories on the platform more efficient, while also creating a set of automated and easily reproducible queries.
What can Octosuite do?
Octosuite comes with a wide range of commands that can be used to obtain information on accounts and repositories that are publicly visible on GitHub.
With Octosuite, one can easily find public information on:
- Users: profile information, gists (small code snippets), account activity (via events like subscribe, create, follow), repositories, organisations, subscriptions, followers and follows
- Organisations: profile information, account activity, repositories, and public members
- Repositories: contributors, coding languages, stargazers (equivalent of likes in the platform), forks (details who has created a public copy of the repository) and releases
Octosuite also includes a search feature that looks for users, repositories, topics (a development tag that helps understand the purpose of the code), commits (a response or change to a file or set of files made by a user) and issues (conversation threads the community can use to flag problems or ask for features or help).
All outputs from these searches are available in a readable format and can be exported in comma-separated value (CSV) format.
Getting started with Octosuite
Setting up Octosuite is a straightforward process.
It can be installed and used in two ways; as a command-line interface (CLI) or as a graphical user interface (GUI).
If you are not comfortable with the command line, the GUI option (with installation instructions on Windows and macOS) is obviously preferable. The GUI version of the tool allows users to select search commands from a dropdown menu.
However, the CLI can be more flexible in processing the scraped data, or batch processing it. You will also need to know the command line basics to install the GUI version of the tool. For full instructions on how to install the GUI version of Octosuite, see this GitHub guide.
The remainder of this article will detail how to use the CLI version of the tool.
If you’re familiar with the command line (on Windows, Linux or Mac), you can simply open a terminal window and enter the following command to install Octosuite: `pip3 install octosuite`
But make sure you have Python 3 installed before running the command.
A beginners guide to using the command line can be found here.
Once the installation process is complete, you can start Octosuite by running the command: `octosuite`
Alternatively, you can use the following command to see available options to run Octosuite with command line arguments: `octosuite –help`
You will get an initial prompt asking if you would like to enable colours in the program (this makes the experience more engaging), choose ‘y’ for yes, and ‘n’ for no. After that, you will see the main screen.
From there, you can start with the `help` command to see a list of available commands.
Octosuite investigation commands have subcommands with their own unique functionality. To list them just type `help:investigation_command`. For example if you want to see all subcommands for the user command you should type the following: `help:user`.
A table showing all subcommands for the user command will then appear.
Let’s try to get the profile information of a user.
You can do this by entering the command `user:profile`. You will be prompted to enter a username. After doing so, hit enter.
The below screenshot shows the output containing the profile information of a user (with some details redacted). Octosuite will ask if you want to save the output to a CSV file. You can read saved CSV files with the command `csv:read`, delete a single CSV file with `csv:delete` or delete all CSV files by typing `csv:clear`.
Having the entire GitHub API at the tips of your fingers opens new possibilities for cross referencing data points or crafting specific queries. Octosuite can be extended to generalise some of these. Some current examples include:
- Check if user A follows user B: `user:follows`
- Check if a user A belongs to an organisation: `org:member`
- Get a list of files in a specified directory D of a repository R: `repo:path_contents`
- Iterate a list of usernames and call user_profile
In the following example we use the ‘–method’ flag to specify which method we want to run Octosuite with, and then the ‘–username’ flag to indicate the value to search for. Each line of usernames.txt is expected to have a single github username.
We also use the ‘–colours’ option, which will run Octosuite with colours enabled, the ‘–log-to-csv’ will enable us to log the output to a csv file, this is useful for later analysis of the results.
`while read username; do octosuite —method user_profile —username "$username" —colours —log-to-csv done; < usernames.txt`
Other Octosuite Uses
Octosuite can also be used to investigate episodes like the 2022 GitHub malware attack that originated from a single user account and affected more than 35,000 repositories.
A string that appeared in a URL that was present in a number of repositories compromised by this attack was ‘.ovz1’. Searching for all instances of .ovz1 would therefore enable a researcher to check other potential compromised repositories. Octosuite enables this type of search in the following ways:
- With the ‘search:commits’ command (or if running with command line arguments, use the search_commits method), entering ‘.ovz1’ as a query string.
- With the –method ‘commits_search –query .ovz1 –colors –log-to-csv’ query.
Another 2022 incident, that saw Python’s ctx library and a fork of PHP’s phpass compromised, can also be investigated using Octosuite.
By entering the following commands, users can quickly search for discover GitHub repositories impacted:
- ‘search:commits’ command and entering “phpass” “ctx” as the query string.
- ‘octosuite –method commits_search –query “phpass” “ctx” –colors –log-to-csv’
These commands operate in the same way as the ovz.1 searches and locate repositories that have used Python’s ctx library and phpass.
There will likely be many more uses for Octosuite in open source investigations than detailed here. Users are encouraged to explore the tool and discover its full possibilities.
Bellingcat is also keen to hear how Octosuite is put to use. If you utilise the tool in your research or investigation, please feel free to let us know by filling in this form.
This article was produced as part of Bellingcat’s Tech Fellowship program, which seeks to create investigative tools and online resources for open source researchers. Applications for our next intake of fellows will begin soon. Please keep an eye on our website and social media channels for further details.
Bellingcat is a non-profit and the ability to carry out our work is dependent on the kind support of individual donors. If you would like to support our work, you can do so here. You can also subscribe to our Patreon channel here. Subscribe to our Newsletter and follow us on Twitter here.