Q1 - Progress Report
What did you do to reach this Milestone?*
We employed two different tech approaches for the liquid investigations toolkit: (1) the classical cloud approach, where we offer a Software Bundle (SB) as a ready-to-use hosted web platform, as well as (2) a federated approach (FA) based on hardware (ARM class devices with preset configurations). Therefore our milestones and user scenarios refer to both approaches.
1. Software bundle (SB) - Develop basic SB + web platform
Most important activities:
- Conducted an extensive research on software candidates to be included in the software bundle. We evaluated software constraints for ARM boards from multiple perspectives: hardware resources, connectivity, security, usability to identify the best suited technologies.
As far as the software stack is concerned we decided to include the following technologies:
- The Hoover search tool, allowing the end user to search through large data collections (PDF/email). It glues together proven open-source technologies like ElasticSearch and Apache Tika to aid the work of investigative journalists.
- Hypothes.is digital annotations - allowing the end user to mark his/her findings directly on the data source, and share this context with his/her team.
- RocketChat is a highly configurable chat platform
- DokuWiki is a simple to use and highly versatile Open Source wiki software that doesn’t require a database. Instead, all the wiki pages are stored as files on disk. Also, it has several functionalities such as full text search and namespaces.
- Davros for file-sharing . Davros does not require a database, as it only uses the filesystem for storage.
- Etherpad for real time collaborative editing, , allowing authors to simultaneously edit a text document, and see all of the participants’ edits in real-time, with the ability to display each author’s text in their own color.
More details can be found in the prototype architecture documentation https://github.com/liquidinvestigations/liquidinvestigations/wiki/Architecture-Documentation
The initial setup process was conducted manually. We’ve installed and tested various software apps (individually and combined) on ARM boards. Test candidates were: Hoover search stack, DokuWiki, RocketChat, Davros, NextCloud,etc; ARM boards were Pine A64 and ODROID C2.
Further on, to improve usability, we focused on an automated set-up process: we’ve build an automated installer script for the SD system image on boards (that builds a system image from scratch) - to automatically install the software stack on an ARM board; for this purpose we used ansible. This automation stage happened sooner than we promised, so we consider this a big step forward.
- Tech improvements to the Hypothes.is technology to make it easier to incorporate it into the tech bundle;
- Tech improvements to optimize the Hoover search stack for Liquid investigations; we re-packaged the technology to make it easier to install Hoover on a server/ local instance /board (installer scripts)
Integration of several functionalities (ie. user management, search, annotations) into the SB; this entailed setting up an admin application and unified authentication system to deploy on Linux;
- Set-up a basic web platform to host the software bundle and project documentation. In this respect please check out our project website and Github repository.
2. Federated Approach (FA) - Documenting architecture + testing ARM class devices with basic configurations
Most important activities:
Tested and selected the reference ARM boards to be compatible with the above mentioned software stack.As far as ARM boards are concerned we initially started with an ODROID C2 but along the way decided to use the Khadas VIM arm64 (ARM Cortex-A53) board with 2GB of memory, 8GB internal storage (eMMC), and WiFi. By default, it boots an Android-based image from internal memory, which doesn’t listen for SSH connections.
- Identified and documented user scenarios for the project. More details can be found in the prototype architecture documentation (living document growing each quarter) https://github.com/liquidinvestigations/liquidinvestigations/wiki/Architecture-Documentation
Focused on an automated installer script for the SD system image on boards (that builds a system image from scratch) - to automatically install the software stack on an ARM board; we used ansible
- Testing two boards working within a basic network setup and access services from one box using another box as secure proxy or VPN https://github.com/liquidinvestigations/setup/tree/vpn
Can you please describe the stage of your project and the achievements realised so far?*
This is mainly an integration project, so throughout Q1, we’ve tested several existing software and hardware candidates. We can now successfully assess how individual software apps run on different ARM boards and we have a clearer picture of which software candidates we want to use for our software bundle and on which ARM boards.
The main conclusion so far is that all the tested software candidates are running successfully on ARM boards (Pine A64, ODROID C2 and Khadas Vim).Tech details and test results are thoroughly documented in our internal DokuWiki. We can supply more details upon request.
Now we are the stage where we wrote the bare bones glue code to tie all these applications into an admin app/ unified authentication system, and integrate them seamlessly.
The components needed for the liquid investigation tool kit are ARM board, SD card, USB Key, Data Hard-drive. The SD card contains an operating system based on Ubuntu 16.04 LTS for arm processors and It also comes configured with a software stack (the actual liquid investigations tool kit).
We took successful steps into the direction of automating the installation of software apps on the actual boards. We created an automated installer script for the SD system image on boards (that builds a system image from scratch) - to automatically install the software stack on an ARM board; we used https://www.ansible.com.
Other successes are adapting existing technologies (search stack and digital annotations) to fit the constraints we defined for our software bundle. All these details are mentioned above.
We’ve managed to thoroughly document all these steps both internally as well externally. Here are some good external facing resources for the liquid investigations project: Github (https://github.com/liquidinvestigations) and a dedicated website (https://liquidinvestigations.org/wordpress/)
What evidence can you provide to prove that you reached the Milestone? *
1. Software bundle (SB) - Develop basic SB + web platform
Evidence for completion: As an engineer I can test various functionalities such as user management, search and digital annotations under a unified authentication system, making it easier to share information among these tools. These bundled functionalities will be hosted in a cloud service. No admin UI at this point. Any engineer can now download bundle source code from our GitHub instance and install it on a standard Linux distribution.
2. Federated Approach (FA) - Documenting architecture + testing ARM class devices with basic configurations Evidence for completion: As an engineer,I can use my ARM class device (ARM64) with preset configurations (installed automatically). This entails using a micro SD card containing the following software (operating system: Linux + functionalities + software). As an engineer, I can use the ARM board within a basic network setup and access services from one box using another box as secure proxy or VPN.
What were your biggest impediments so far?*
This project has a very heavy tech component - based on on this new hardware approach involving ARM boards. We’ve compiled a list of all the hardware needs for the entire project and we’ve broken it down for each quarter. We’ve ordered everything for Q1 but we weren’t aware that ARM boards (especially ODROIDs and Khadas Vims) are not as widespread and readily available as as Raspberry Pies. Luckily we had an ARM board available right at the beginning of the project. However, to be able to test ARM boards in a network setup, we had to order multiple boards from warehouses in China. There were a lot of delays and shipping difficulties. Orders arrived rather late in the process, making testing a bit cumbersome and adding an unnecessary time pressure. Lesson learned: we are already assessing the hardware needs for Q2 and are planning to start the ordering process very soon.
What caused problems?*
See previous question (shipping delays from Chinese warehouses)
Any other things you want to mention here?
We’ve conducted a series of successful outreach activities:
We held a presentation about the liquid investigations project at the Madeira Interactive Technologies Institute. Our presentation generated an article in the local press - http://www.m-iti.org/node/4187
We’ve been talking about the availability and potential of the liquid toolkit in the context of the Perugian festival of journalism and received positive feedback. http://www.journalismfestival.com/programme/2017/how-technology-can-empower-investigative-journalism
So far our outreach efforts have been successful and early feedback indicates that the journalistic community finds this project really useful and interesting. In this respect, we’ve already been invited to hold presentations about the progress on this project at various industry conferences (i.e. http://www.journalismfund.eu/dataharvest-conferences).