VFRAME.io is a computer vision toolkit designed by Adam Harvey and his team for human rights and accountability use cases. It aims to bridge the gap between AI used in the commercial sector and the needs of researchers and investigative journalists working with video and image datasets from conflict zones. A recent development in the project is a 3D-rendering system for creating high-fidelity training data. Adam Harvey is a critical artist, researcher, and software developer based in Berlin focusing on computer vision, privacy, and surveillance. His research has been featured in various exhibitions and publications in Europe and the United States.
Adam Harvey on Wartime Tech
What did you learn from your work on the war in Syria and how does that apply to understanding the war in Ukraine? Do emergent technologies work off the shelf everywhere?
I learned a general lesson that AI needs to be hyperfocused in order to be effective. This means studying every detail of how munitions are being used and documented: the sensor resolution and lenses of devices, the photographer's pose when recording the video, the urgency of the situations people are in when documenting which can induce motion blur, the compression algorithms used in sharing platforms, and the lighting conditions. To build effective object detection algorithms, which is my current focus, these biases are deliberately encoded into the training datasets. When bias is ignored, it not only becomes problematic, it also becomes a means to tailor technology towards a highly specific application. The models I trained for analyzing videos from Syria contain a low-resolution bias because most footage was either low-resolution or heavily compressed. That’s different in Ukraine. Images there have more resolution, so the training datasets are being designed to reflect that.
Another important bias to consider is how and why munitions can be detected. For example, the RBK-500 cargo munition found in Syria, and now also Ukraine, can be detected because it lodges into the ground after departing from its payload (many other munitions explode completely and leave no signature trace). When the RBK-500 metal hits the ground, it deforms and the tailfins are bent, which means there are rarely RBK-500s without damage. Therefore, the detection model needs to understand the latent space of these deformities and how far it can be pushed. For example, if half the fins are missing, is it still the same object? It’s important to consider every detail as well as listen to researchers and ordnance removal experts who can provide further insight about the various appearances. Ideally, the AI model can encode the collective knowledge and advice of researchers and munition experts to amplify their perceptual capabilities.
Since emergent technologies are still developing, they may be more malleable than turnkey technologies that were developed for a specific use case. However, both are important. For example, with computer vision, it would be ideal to have a turnkey computer vision processing system and then be able to adapt the detection models to find relevant objects in Ukraine. VFRAME is designed with this in mind. The “ModelZoo”, a collection of object detection models, can be tailored to specific objects for specific regions. Sometimes this even overlaps between projects. Because Russia is using similar tactics and munitions from Syria, previous technologies developed for monitoring the Syrian conflict can be immediately applied to monitoring Ukraine.
Because Russia is using similar tactics and munitions as in Syria, previous technologies developed for monitoring the Syrian conflict can be immediately applied to monitoring Ukraine.
What role does technology play in the way wars are conducted?
War is the worst theatre of technology – and of course the links between computer vision and war are well established. However, this shouldn’t discourage its use in the humanitarian, civic, or peace sector where it is becoming an increasingly important tool to understand the staggering amount of documentation. I think technology will continue to play an important role in facilitating the type of investigations being driven by groups like Bellingcat and Mnemonic that utilize distributed, crowd-sourced fact gathering. Computer vision is going to play an important role in facilitating that work, in automating their perceptual labor, and enabling small research groups to analyze massive amounts of data without being tethered to any cloud service or third-party technology provider. On the flip side, we can already see how technology is being used to try and dilute the value of crowd-sourced information through disinformation campaigns.
Why does it seem like tech-enterprises have a tendency to use conflict zones as a testbed for their applications and hardware?
It can be disheartening to see large technology corporations attempt to help and then leave before making any lasting contribution. However, I would argue that there are also significant indirect contributions from technology companies in the form of surplus open-source technology just waiting to be applied. One of the best contributions they can make is to continue publishing open source. This will enable other developers to pursue more political objectives such as conflict zone analysis.
When non-military softwares become useful enough to be integrated into operational systems for warfare, they might receive additional government funding.
Are there software(s) that are only relevant to war – what does research and development funding have to do with it?
There is a lot of overlap between consumer and military technology. Machine learning frameworks like PyTorch and TensorFlow could be used for developing shopping or missile targeting technologies alike with no significant modifications. There is no technical difference between using OpenCV, the most widely used computer vision library, for interactive games or for drone attacks. These are just code libraries that work with data. When non-military software technologies become useful enough to be integrated into operational systems for warfare, they might receive additional government funding to ensure their long-term stability, security, and support. A good example is how In-Q-Tel invests in software companies like MongoDB, which is clearly not intended as a "software of war", yet at the same time receives funding so it can be used for that. This points to the reality that "software of war" comprises a lot of administrative software of logistics rather than actual violent technologies.
What should be considered when facial recognition technologies or other computer vision programs are used in war zones? (e.g., Bachu Info, Clearview)
Since facial recognition technologies were originally designed for military and law enforcement purposes, it's no surprise to see it becoming a reality. What should be considered is that this particular technology can only be developed with millions of people's faces. While Clearview is the most well-known, it should also be taken into account that Russia's StilSoft.ru facial recognition technology has been partially honed on the same datasets as Clearview, namely the MegaFace dataset that includes 3.7 million photos from Flickr, where the most popular tag was #wedding. And now both companies are used for defensive purposes. As I've pointed out in my Exposing.ai research, this is only possible because of the lack of any meaningful restriction on how our faces are used. In the context of AI, we need to reconsider how to protect public data from being easily exploited by any adversary for defense-related technologies without consent.
In the area of computer vision, we need more accurate, efficient, and accessible tools for monitoring content for open-source intelligence.
Is it legitimate to adopt and, if necessary, reverse the functioning of technologies (e.g., Protestware)? What does “winning” look like in this area?
Protestware is a short-term and rather ineffective cost attack. In the long term, it may cost the open-source community more than the intended victims. If all open-source code now needs to be audited for malicious code, then it could become too burdensome or costly to maintain or update. I don't think significant gains can be made through "protestware", only attention-grabbing headlines.
Which technologies/knowledge do civil society and judiciary still lack access to and why? What would be needed more?
I think it's only fair for me to talk about the technologies I work with, particularly computer vision. In this area, what's needed is more accurate, efficient, and accessible tools for monitoring content for open-source intelligence. Bellingcat, Mnemonic, and similar human rights research groups are setting an important precedent for how this strategy works. Additionally, groups like Forensic Architecture show there is still a lot of room for design and creativity to be effective tools for truth and communication. I've been inspired by the work from all these groups and, with VFRAME, try to channel and encode communal knowledge into algorithms to help automate the perceptual labor of monitoring conflict zones.
Adam Harvey is an artist and researcher whose work focuses on computer vision, digital imaging technologies, and counter surveillance.