Dennis Groß – Safe Machine Learning

AI Philosopher’s Roundtable: A Python Script for Enhancing AI-Driven Debates with Real-Time Analysis

AI-driven discussions present a unique opportunity for intellectual engagement and growth in today’s dynamic and rapidly changing world. Facilitated or generated by artificial intelligence (AI) systems—such as advanced language models like OpenAI’s GPT series—these discussions can take various forms, including virtual debates among AI entities. AI-driven discussions enable users to engage with diverse topics anytime, anywhere, fostering a flexible learning experience. These discussions broaden users’ understanding and encourage critical thinking by presenting fresh perspectives on controversial or complex issues. Serving as a valuable resource for brainstorming sessions, AI-driven discussions can help researchers and creative professionals generate new ideas and insights. Moreover, they facilitate time efficiency by concisely summarizing vast amounts of information or presenting multiple viewpoints. Importantly, these automated discussions are devoid of personal biases or emotions, which often impede productive debates, allowing for more objective and focused discourse.

I created a Python script enabling multiple AI language models to engage in an AI-moderated discussion on any topic. An additional AI model provides real-time analysis and critique to improve the conversation’s quality further.

The AI Philosopher’s Roundtable Script

The Python script harnesses OpenAI’s GPT-4 to create an interactive setting in which three distinct AI entities, each assigned specific roles, engage in a structured dialogue:

Moderator: This AI model ensures the conversation remains focused, provides guidance, and promotes productive discourse. GPT-based analysis system: This AI model summarizes and assesses the debate in real-time, offering valuable insights and constructive feedback to enhance the conversation’s quality. The script starts by requesting the user to input a discussion topic. Once entered, the conversation begins with the Moderator setting the stage. The AI philosophers, System1 and System2, alternate in contributing to the debate, with the Moderator periodically intervening to maintain focus.

System1 and System2: These AI models represent philosophers celebrated for their critical thinking and capacity to propel discussions forward.

Evaluator: After a predetermined number of iterations, the GPT-based analysis system evaluates and summarizes the discussion.

The script can be found here.

Analysis

Despite the numerous advantages, there are also some drawbacks to AI-driven discussions. One significant limitation is that AI language models are based on existing knowledge and might not be able to provide truly original insights or ideas. They may lack the depth and nuance that human experts can bring to a conversation, resulting in oversimplifications of certain topics. This can be observed in the discussion records. They may also inadvertently reproduce biases present in the data they were trained on. Misinterpretations or inaccuracies may also arise, as AI models might not fully comprehend the context or nuances behind a specific subject. Lastly, the absence of emotions and personal experiences may limit the empathetic understanding and interpersonal connections that can be fostered through human-to-human discussions.

Nevertheless, even with such a simple script, it is already possible with GPT-4 to produce plausible discussions. It will be interesting to see how these discussions involve with more advanced system information prompts, models, and scripts.

Read Further

Quite similar to the topic discussed in this blog post are Auto-GPT and BabyAGI.

These projects attempt to create AI agents that can perform multistep tasks autonomously. While they currently require significant human input and are not yet fully autonomous, they represent early steps towards more complex AI models.

Auto-GPT, created by Toran Bruce Richards, chains together GPT-4 outputs to achieve a set goal. It currently requires user permission for each step and can’t make purchases, but it demonstrates the potential for AI assistants. BabyAGI, created by Yohei Nakajima, is inspired by the idea of using GPT-4 as an AI co-founder for businesses and has a task-oriented approach. Both projects face limitations due to GPT-4’s narrow range of interpretive intelligence and the issue of confabulations.

Read the full article to learn more about Auto-GPT, BabyAGI, and their implications for AI development.

Another self-looping ChatGPT agent system is described in the paper “Generative Agents: Interactive Simulacra of Human Behavior.” Implemented in a sandbox environment inspired by The Sims, these agents exhibit realistic individual and social behaviors. The research emphasizes the significance of observation, planning, and reflection in creating convincing simulations and demonstrates the integration of large language models with interactive agents.

Barrier-free Websites for GPT Models and Search Engine Optimization

Barrier-free websites are like digital superheroes, battling against the evil of discrimination and exclusion by empowering everyone to engage with the digital world with ease and confidence, regardless of ability or disability. These websites aim to accommodate individuals with various disabilities, including visual impairments, hearing impairments, motor impairments, cognitive impairments, and seizure disorders.

With the integration of newer versions of language models like the Prometheus model (a successor of ChatGPT) into the Microsoft Edge web browser, website accessibility for language models will play a crucial role in the future. The ability to summarize website content and answer questions about it will be a valuable tool for people with and without disabilities, may influence how likely they visit a website, and could even impact the ranking of websites in search engines.

As a result, the optimization of website accessibility for language models will become an important aspect of future search engine optimization (SEO). This could involve adjusting website content and language to give the best output for language models, leading to higher search engine rankings and a better user experience for everyone.

Evaluating ChatGPT’s Forecasts

In our previous post, we explored the potential of ChatGPT as a forecasting support tool. In this post, we put ChatGPT to the test and evaluate its predictions made entirely on its own, without any human assistance. To do this, we will use the normalized mean square error (NMSE) as our evaluation metric. The NMSE is a measure of the accuracy of a prediction. It is calculated by dividing the mean square error (MSE) of the prediction by the variance of the true values. In general, the NMSE is preferred over the MSE when you want to compare the accuracy of different predictions that are based on datasets with different variances.

def calc_nmse(true_values, predicted_values):
  """Calculate the normalized mean square error (NMSE)"""
  # Calculate the mean square error (MSE)
  mse = sum([(y - ŷ)**2 for y, ŷ in zip(true_values, predicted_values)]) / len(true_values)
  
  # Calculate the variance of the true values
  variance = sum([(y - sum(true_values)/len(true_values))**2 for y in true_values]) / (len(true_values) - 1)
  
  # Calculate the NMSE
  nmse = mse / variance
  
  return nmse

If you want to do your own estimations and compare them to ChatGPT, don’t scroll further and estimate them here:

How many cars are there in the United States?
How many minutes of video are uploaded to YouTube every day?
How many flights take off from airports around the world every day?
How many babies are born every day?
How many people visit Disneyland every year?
How many cells are there in the human body?
How many words are there in the English language?

We now let ChatGPT estimate the following values. We used the following chat message: “Estimate via Fermi quiz method QUESTION.”

How many cars are there in the United States?
Estimated: 495 million cars
Actual: 276 million cars
How many minutes of video are uploaded to YouTube every day?
Estimated: 333,333,333 hours
Actual: 720,000 hours
How many flights take off from airports around the world every day?
Estimated: 250,000 flights/day
Actual: 100,000 flights/day
How many babies are born every day?
Estimated: 400,000 people
Actual: 385,000 babies
How many people visit Disneyland every year?
Estimated: 18 million people
Actual: 8.5 million visitors
How many cells are there in the human body?
Estimated: 100 trillion
Actual: 30 trillion
How many words are there in the English language?
Estimated: 500,000
Actual: 171,146 words

The NMSE of ChatGPT is 5.44.
A value of 0 indicates a perfect fit, while a value greater than 1 indicates a poor fit.

Have you calculated the NMSE for your forecasts? If so, please leave a comment with your result or send me your result directly. It would be interesting to see how ChatGPT’s performance compares to that of a human forecaster.

Superforecasting with ChatGPT

The Fermi Quiz is a powerful tool for making accurate estimates and solving problems quickly. Named after physicist Enrico Fermi, this method involves breaking a problem down into smaller, more manageable pieces and using your knowledge and experience to make educated guesses. By following a few simple steps, you can use the Fermi Quiz to solve problems ranging from estimating the number of coffee shops in a city to calculating the number of stars in the universe. In this post, I will explain how to use the Fermi Quiz to make accurate estimates and demonstrate how ChatGPT, a chatbot, can help us generate more manageable pieces for our estimates and may even improve them.

Fermi Quiz

The Fermi Quiz is a method of solving problems and making estimates by breaking a problem down into smaller, more manageable pieces and using your knowledge and experience to make educated guesses. Here’s how it works:

Define the scope of your estimate: First, you need to clearly define the problem or question that you are trying to solve. This will help you focus your efforts and make it easier to come up with a good estimate.
For example: How many bike stores are in the Netherlands?
Once you have defined the scope of your estimate, you can begin to break the problem down into smaller, more manageable pieces that help you answer the overall question independently.
For example:
1. Piece: How many bike stores are in a dutch city on average? How many cities are in the Netherlands?
2. Piece: How many people in the Netherlands go on average in one week to a bike store? How many people can one bike store handle in a week?
3. Piece: How many bikes are in the Netherlands? How many bikes have an average bike store sold since its initial opening?
Answer all questions and estimate the actual value for the overall question with each piece independently. Average all of the estimates together to get the final estimate. This method is based on the wisdom-of-crowds effect, which states that averaging independent judgments often leads to improved accuracy.

ChatGPT for manageable piece generation

As a rule of dumb, more manageable pieces make your final result more precise. However, at some point, it can be difficult to generate more pieces.
Therefore, we can utilize the chatbot ChatGPT to do it for us. You can use the following messages to generate the pieces via ChatGPT (note that the ChatGPT outputs vary, so you may have to tweak the messages a bit):

Estimate how many bike stores are in the Netherlands by using the Fermi quiz method and do not give me estimates.

[ChatGPT ANSWER]

What are five examples of breaking the problem down into smaller, more manageable pieces that I mentioned in my previous response?

[MULTIPLE IDEAS] (Piece 2 and Piece 3 were actually created by ChatGPT)

Estimate each generated manageable piece a value and average it with your previous estimated values.

Why did I not want to get an estimate from ChatGPT yet?

Estimate how many bike stores are in the Netherlands by using the Fermi quiz method and do not give me estimates.

The anchoring effect is a cognitive bias that refers to the tendency for people to rely too heavily on the first piece of information they receive (the “anchor”) when making decisions or judgments. This can lead to distorted judgments and decisions, as people may give too much weight to the initial anchor and not consider other relevant information. Therefore, knowing the estimate of the chatGPT (which is not necessarily precise) may influence your estimate.

Can ChatGPT improve our forecasting?

Now for every manageable piece, we use ChatGPT to get some estimates. Note that multiple times, the same question results in different estimates. This is not a big problem and we can handle it by, for example, averaging the estimates for each subquestion.

Let’s calculate the ChatGPT estimates.

1. Piece

How many bike stores are in a dutch municipality on average? How many cities are in the Netherlands?

Estimate via the Fermi quiz method how many bike stores are in a dutch municipality on average?
-> ANWSERS: 5

Estimate via the Fermi quiz method how many municipalities are in the Netherlands.
-> ANWSER: 233

ESTIMATE:
5 * 233 = 1165

2. Piece

How many people in the Netherlands go on average in one week to a bike store?
-> 525000
How many people can one bike store handle in a week?
-> 500

ESTIMATE:
525000/500=1050

3. Piece

How many bikes are in the Netherlands?
-> 35 million bikes
How many bikes have an average bike store in the Netherlands sold in its life span?
-> 10000 bikes

ESTIMATE:
35,000,000/10,000 = 3500

FINAL CHATGPT ESTIMATE: (1165 + 1050 + 3500)/3 = 1905

Now that we have generated additional pieces using ChatGPT, we can average its estimate with your own to create a more precise estimate for the problem. To see how accurate your final estimate is, you can compare it to the actual number of bike stores in the Netherlands, which was approximately 3080 in 2020.

If you have tried using ChatGPT to generate additional manageable pieces for the Fermi Quiz method, please let me know in the comments how it worked for you. Did it help you come up with a more accurate estimate? Did combining your own estimate with ChatGPT’s estimate bring you closer to the actual number? I would love to hear your thoughts and experiences with using ChatGPT to improve the accuracy of your Fermi Quiz estimates. Please share your comments below.

Luxury Handbag Investment – A Data-Driven Point of View

In the investment landscape, designer handbags are undoubtedly worth taking a look at. According to Art Market Research (AMR), designer handbags outperform art, classic cars, and rare whiskies in terms of investment potential. Some handbags, from Hermes, Chanel, and Louis Vuitton, have even experienced a valuation spike of an average of 83% in the last ten years. To put that into context, watches have increased by 72%.

Average Prices of different handbag models on different reseller platforms in December 2021.

When it comes to considering designer handbags as an investment it’s important to have the right expectations. A quality designer handbag can be a great wardrobe investment. Selling your designer handbags years later for a profit is only true for certain designer handbags.

Where do you get them?

Whether you are on the lookout for a classic Louis Vuitton bag, or desperately want a Hermès Birkin and don’t want to wait on their list, luxury resale websites are the new place to be. The most popular luxury resale sites are Vestiaire Collective, The Luxury Closet, and Rebelle.

Short-Term Strategy

When reselling fashion items like handbags, you have to understand the trends. A good way to understand the trends is to analyze the sales on the previously mentioned reselling platforms. They give you an overview of how certain handbag models are performing. A good performance indicator is for example the turnaround time (the duration of how long certain products are on the market). Lower turnaround times indicate that certain models are more wanted than other models.

Average turnaround times of different handbag models in December 2021. Each model sample size is larger than 20 items (so currently not the biggest one).

When setting a price, do not forget to take platform fees into account (mostly around 25% of the price). Therefore, a quite nice scenario would be to buy a handbag 25% less than its average price and sell it a bit more than the average price.

Long-Term Strategy

Designer bags go in and out of fashion, but a well-chosen designer bag can last forever. Classic brands, such as Hermes, Chanel, and Louis Vuitton, and classic handbag styles may hold their value. Taking good care of your bag is necessary—both when in use and not—to guarantee interest if you’re looking to trade it in.

Microchips – Demand, Industry, and Shortage

The microchip became one of the most important strategic materials in the 21st century. Almost everything we use depends on microchips. From your iPhone, your toaster to fighter jets, and automobiles. Microchips became a part of our daily lives and, therefore, the heart of our modern society. The development of AI, the internet of things, and the self-driving car revolution won’t stop this trend.

From Semiconductors To Microchips

All this technological advancement builds on top of a simple group of materials called semiconductors. When passing through a conductor, electricity faces little resistance, creating a free-flowing current. In an insulator, electrical current cannot travel due to high levels of resistance. Semiconductors sit somewhere between these two extremes, allowing a degree of control over the flow of electricity by providing a change of electric fields. Silicon semiconductors are the industry standard for most transistors. Transistors are devices that regulate current and act as switches for electronic signals. These transistors are crucial to microchip manufacturing, from processors to memory cards.

Semiconductor Industry

The semiconductor industry has professionalized, and today, companies in the field typically specialize in one of the following domains:

Mining: China is with two-thirds of the worldwide production by far the world’s largest producer of silicon and therefore the producer of the essential material for microchips. Other producers are Russia, the USA, Norway, and Brazil.

Chip Design defines how many cores a microchip should have, how those and other components such as memory are arranged on the silicon, and how the circuits should actually look like. Chip Designers normally outsource the chip manufacturing to fab foundries (microchip manufacturers). Famous chip designers are AMD, Apple, Amazon, Alphabet, and a lot more.

Fabrication: There are a handful of fab foundries. Intel, Samsung, and TSMC are the Top 3 leading companies by sales revenue in this field. While Intel designs its own microchips, there are other companies like TSMC specializing in manufacturing microchips for other companies and is, therefore, a pure-play fab foundry. In the field of fab foundries, TSMC (ca. 50% market share, Taiwan), Samsung (ca. 20% Market Share, South Korea), Global Foundries (ca. 8% market share, USA), UMC (ca. 7%, Taiwan), SMIC (ca. 5%, China) are the most noticeable ones. TSMC delivers its microchips to famous tech players like AMD, Apple, ARM, Broadcom, Nvidia, and Qualcomm. TrendForce and ReportLinker estimated a foundries revenue in 2020 with 70 Billion dollars and an average turnover of 10% per year over the next decade.

Equipment: The high-tech industry of semiconductors needs one of the advanced engineered machines in the world. Without the most advanced machines, no manufacturer would keep up with the competition. The Dutch company ASML makes lithography systems, which are machines that are used to make chips. All major chipmakers use their technology because ASML lithography systems are the most advanced systems in this field with years of distance.

Semiconductor Microchip Shortage

The 2020 global microchip shortage is an ongoing crisis. The demand for microchips is greater than the supply and has led to major shortages and queues amongst consumers, not only in the information technology sector. According to AlixPartners, the chip shortage could cost the automotive industry around the world a loss of 61 Billion dollars. So how did this shortage started?

One major reason is the tech war between the USA and China. The outsourcing of AMD’s chip production to TSMC created additional pressure on TSMC production plants during the pandemic, and the Covid-19 crisis itself.

Semiconductors are no longer just components, but strategic resources that all major economies must secure.
Arisa Liu (Analyst, Taiwan Economic Research Institute)

Amazingly there are only a handful of major microchip manufacturers in the world (TSMC, Samsung, Intel). Whoever has secure access to microchips can make their economy more robust against these global shortages. In this way, microchips became more or less the new oil of the 21st century.

So, there will be an increased effort for all countries to secure the demand for microchips for their economies in the future. This effect can be already observed in various countries like the USA, strengthening their semiconductor microchip production.

The Future of Freelancer Platforms

The digital transformation of the workplace has only just begun. The notion that you have to move to Silicon valley to get employed by one of the world-class organizations is just not the case anymore.

Platforms like Fiverr and Upwork give freelancers the possibility to advertise their services to millions of customers remotely. That offers an excellent opportunity for people who want to travel around the world and still want to earn money.

Remote freelancing allows people from third-world countries to easily participate in the western world markets without leaving their homes. Remote freelancing from a third-world country allows freelancers to improve their lifestyle. This will also lead to economic growth in these third-world countries, especially in areas with high unemployment rates.

While businesses compete for local talents, remote freelancers give smaller startups a larger talent pool to choose from. Instead of hiring a graphic designer in the west, startups gain access to a far broader and deeper talent pool with these freelancer platforms than those who limit themselves to one geographic area. And for managers, organizing and coordinating a remote team’s work is crucial to winning recognition and advancement in the coming years.

So what will change in the future? I think that the prices for services that can be done remotely will drop, and they will be more and more outsourced in third-world countries. People who can do their work remotely may move to nicer places and do not need to live where their employer is located. This could have a quite interesting effect in, for example, Europe. Nowadays, low-wage countries move to North-European countries to earn more money while North-European citizens move to South-European to enjoy a friendlier climate.

Smartphones and Wearables for Remote Diagnostics

Over the past few years, medical diagnostic apps are on the rise. Rapidly emerging technologies used to diagnose diseases result in more personalized patient care. About 70% of medical decisions are supported by diagnostics [2]. However, short of specialists and relatively low diagnostic accuracy calls for a new way of diagnostic strategy, in which deep learning may play a significant role. Smartphones and wearable devices can play a key role in health monitoring and diagnostics. They already support remote diagnostics and decrease the workload of GPs.

Diagnostic Apps

Modern smartphones are fitted with several sensors. These sensors allow for the sensing of several health parameters and health conditions.

Built-in sensors in a typical smartphone and the number of sensors are rising *[1]*.

Respiratory sounds are important indicators of respiratory health and respiratory disorders. When a person breathes, the sound emitted is directly related to air movement, changes within lung tissue, and the position of secretions within the lung. Depending on the sound, it is possible to monitor asthma, to record respiratory sounds for snoring and sleep apnea severity, detect respiratory symptoms like sneeze and cough, to detect bronchitis, bronchiolitis, and pertussis, to record wheezes in pediatric populations, and to detect the chronic obstructive pulmonary disease (COPD).

Leukocoria is an abnormal white reflection from ophthalmologists’ retina to detect several different eye diseases. As well as being an early indication of retinoblastoma, a pediatric eye cancer, leukocoria can also be a sign of pediatric cataract, Coats’ disease, amblyopia, strabismus, and other childhood eye disorders [4]. A portable 3d-printed device connected to a smartphone makes precise images of the retina to detect back-of-the-eye (fundus) disease at a far lower cost than conventional methods [3]. It is also possible to analyze images of the eye taken with this retinal camera to detect diabetic retinopathy, one of the leading causes of blindness.

Heart rate variability (HRV) is a measure of variations in the time intervals between your heartbeats and describes how “uneven” your heart beats. This metric can be used for different purposes:

to measure stress levels
to diagnose chronic health problems
to assess the immune system
to predict the recovery time after a severe illness

Ear and mastoid disease can easily be treated by early detection and appropriate medical care.

Skin diseases are widespread nowadays and spreading widely among people. The resolution of smartphone cameras has such a high resolution nowadays that these kinds of diseases can be detected [5].

In the modern world, psychological health issues like anxiety and depression have become very common among the masses. Smartphone technology can be used to diagnose depression and anxiety [2].

And this is not the end of the medical usage of smartphones. Future sensors for analyzing sweat, isoline levels, and more are coming and will support more diagnostic applications for smartphones and wearables.

Datasets

The lack of large training data sets is often mentioned as an obstacle. However, this is only partially correct. Nowadays, hospitals are huge data storages, and databases in, e.g., radiology, are filled with millions of images. There are also large public data sets available on the internet.

What ML model for what task?

A survey name ‘A Survey on Deep Learning in Medical Image Analysis’ concluded that the exact deep learning architecture is not the most important determinant in getting a good solution. More important is expert knowledge about the task that can provide advantages beyond adding more layers to a CNN. Novel data preprocessing strategies also contribute to more accurate and more robust neural networks [6].

Problems

There remain still some problems with the accuracy of such systems. 98% accuracy of a diagnostic system can be quite expressing, but when we scale this up to a user base of millions, we could overrun our health systems with wrong diagnosed patients. So it is essential to make these diagnostic systems more and more robust.

References

[1] Majumder, Sumit, and M. Jamal Deen. “Smartphone sensors for health monitoring and diagnosis.” Sensors 19.9 (2019): 2164.

[2] https://www.bupa.com/newsroom/our-views/the-future-of-diagnostics

[3] https://medicalxpress.com/news/2019-06-portable-device-eye-disease-remotely.html

[4] Munson, Micheal C., et al. “Autonomous early detection of eye disease in childhood photographs.” Science advances 5.10 (2019): eaax6363.

[5] Chan, Stephanie, et al. “Machine Learning in Dermatology: Current Applications, Opportunities, and Limitations.” Dermatology and therapy (2020): 1-22.

[6] Litjens, Geert, et al. “A survey on deep learning in medical image analysis.” Medical image analysis 42 (2017): 60-88.

Defense against Slaughterbot Attacks

Slaughterbots is a video that presents a dramatized near-future scenario where swarms of inexpensive microdrones use artificial intelligence, explosives, and facial recognition to assassinate political opponents by crashing into them. In my opinion, it is one of the most dystopian and depressing near-future scenario that I know.

When I watched the video the first time in 2017, I had no idea how we could defend towns against such terror attacks. Shooting against so many microdrones does not make sense. It also makes no sense to block the radio signals because the microdrones fly totally autonomous. Now, some years later, I realized that we could use secure machine learning to defend security-critical areas like shopping malls or train stations.

Many practical machine learning systems, like self-driving cars, are operating in the physical world. By adding adversarial stickers (patches) on top of, e.g., traffic signs, self-driving cars get fooled by these stickers.

Patch attacks projected on monitors in hallways, train stations, and so on could fool facial recognization systems on such suicide bombers. In this scenario, it is important to iterate over a lot of pretested patch attacks on test classifiers to find a potential weakness in the mini drones. After an effective attack was found, we could project this successful attack on all available screens in the attacked area.

When we imagine that nowadays shopping malls have physical barriers against terrorist truck attacks, nuclear underground utilities, and explosives in Swiss bridges it is not hard to imagine that we could develop an emergency program for public available monitors which could help defense against adversarial Slaughterbot attacks.

Problems of Generating Real-World Patch Attacks

Of course, there are still some problems left for generating real-world patch attacks. Images of the same objects, for example, are in real-world conditions unlikely exactly the same. To successfully realize physical attacks, attackers need to find image patches that are independent of the exact imaging condition, such as changes in pose and lighting. For that, we need to find adversarial patches that generalize beyond a single image. To enhance the generality of the patch, we look for patches that can cause any image in a set of inputs to be misclassified. For that reason, we formalize the generation of real-world patch attack as an optimization problem.

To find a quite universal patch for a certain classifier, it is important that we solved the optimisation problem for a lot of different classifiers before the actual attack.

AI Safety Concerns in Warfare

Governments around the world are increasingly investing in autonomous military systems. Many are already developing programs and technologies that they hope will give them an edge over their adversaries.

AI in Warfare

Target Systems Analysis (TSA) and Target Audience Analysis (TAA) are intelligence-related methods to develop a deep understanding of potential areas for operations. Detection of military assets on the ground can be performed by applying deep learning-based object detectors on drone surveillance footage. For military forces on the ground, the challenge is to conceal their presence as much as possible from discovery from the air. A common way of hiding military assets from sight is camouflage, for example, by using camouflage nets.

Algorithmic Targeting: Today autonomous weapon platforms are using computer vision to identify, track targets, and shoot targets. These algorithmic targeting technologies are nowadays so precise, that even the military adds randomness to the targeting process to spread the bullets over the target. Otherwise, the bullet would just fly through the already generated holes.

Problem

These previously mentioned AI systems outperform human experts dramatically. But there are still major issues with these technologies. These systems are vulnerable to adversarial noise. You can imagine this (adversarial) noise as blurry pixels on your Instagram photos, usually, these pixel perturbations are so small that none of your human followers will see them but when Instagram uses an image classification software to check your account for illegal posts, it could happen (in the WORST CASE) that Instagram will block you.

Manipulating these AI systems

So what can you do when you forgot your camouflage net or when you don’t want a huge hole in your battleship? Well, you could try to generate this previously mentioned adversarial noise and stick it on your battleship or airplane. So from the point of mobility, it would be much easier to print a specific pattern on top of your battleship to hide from adversarial drones as carrying camouflage nets.

A plane with an adversarial noise patch camouflage that can hide it from being automatically detected from the air (Source: https://arxiv.org/pdf/2008.13671.pdf).

Summary

In this post, I wanted to describe the current weaknesses of AI systems, especially in warfare. The biggest problem of the current AI technology is (not only in warfare), that they are quite vulnerable against (adversarial) noise. This should be considered before releasing fully autonomous war machines that could make huge damage because of perturbated pixels in input data.