What Are Animals Saying? AI May Finally Tell Us
Articles

What Are Animals Saying? AI May Finally Tell Us

Published 8 min read
Alones/Shutterstock.com

The world sounds very different when you listen like a scientist instead of a tourist. Hidden in ocean waves, forests, and the shrubbery of city parks are constant broadcasts from whales, frogs, birds, and insects, most of which we can’t understand. Earth Species Project (ESP) believes artificial intelligence can help us finally tune in to animals. By pairing powerful language models with massive collections of animal vocalizations, ESP aims to translate mysterious chirps and clicks into meaningful information. If they succeed, our relationship with the rest of nature could be transformed as profoundly as human society was by the invention of radio.

What Is the Earth Species Project?

Earth Species Project is a nonprofit research organization dedicated to decoding nonhuman communication. Holly Brewer, Communications Manager at ESP, emphasizes: “Our long-term vision is a relationship with the rest of nature that allows the diversity of life to thrive.”

To achieve this goal, the organization focuses on developing AI models to study animal vocalizations and behavior. The group works with biologists, field recordists, and conservation organizations who collect huge audio datasets from species such as elephants, crows, whales, and frogs. Their team brings together machine learning engineers, ethologists, and nonprofit leaders who share a single goal: to amplify the “voices” of nature and rebalance how humans relate to other species.

A Murder of Crows

We know that highly intelligent species like crows communicate with each other. Can we tap into that communication with AI?

Why Earth Species Project Says AI Is the Right Tool for Animal Communication

Human hearing and attention have limits, but AI models can scan months of audio in hours. Traditional bioacoustics tools often need custom training for each species and task, which slows down research. Modern audio and language models can learn patterns across species, detect faint calls in noisy environments, and connect sound types with behavior or context, all without manual labels for every clip. ESP says their approach treats animal communication as a kind of language problem, so the same kinds of models that power chatbots now help reveal structure in birdsong, whale calls, and other vocalizations.

Our long-term vision is a relationship with the rest of nature that allows the diversity of life to thrive.


Holly Brewer, Communications Manager at Earth Species Project

What Makes NatureLM‑Audio New

NatureLM-audio is ESP’s flagship foundation model, designed specifically to analyze animal sounds rather than general human audio. It links a powerful audio encoder to a modern language model so it can “listen” to recordings and answer questions about them in plain English. The model was trained on a diverse mix of data—including large archives of wildlife recordings, human speech, and music—which helps it recognize communication patterns in both familiar and unfamiliar species. Most excitingly, NatureLM-audio can generalize across species and tasks, allowing insights from one animal species to transfer to others. This means researchers can try new analyses by changing prompts instead of retraining separate models for each study.

Aerial view of a large dark North Atlantic right whale mother swimming beside her small calf in deep blue ocean water.

Researchers are training AI on a variety of animal vocalizations, including whalesong.

How NatureLM‑audio Actually Works

When a user uploads an audio clip to a NatureLM-audio interface, the sound first passes through a specialized encoder that turns the waveform into a detailed numerical representation. That representation flows into a language model based on LLaMA 3.1, which has been fine-tuned so it can connect acoustic patterns with text labels and natural language answers. Because the model was trained on paired audio and text, it can respond to prompts like “Which species are calling here?” or “Describe what is happening in this recording” with short captions or lists, much like an expert familiar with the animals.

Tasks NatureLM‑audio Can Handle

NatureLM-audio can classify which species appear in a recording, detect when calls happen, and even estimate how many individuals might be present. It can identify thousands of species from multiple groups, like birds, whales, and frogs, without needing custom training for each one. The model is also designed to predict call types, such as whether a vocalization is a contact call, alarm call, or song, and it can generate short text captions that summarize what it “hears.” In some bird datasets, it goes further by predicting life stage—such as chick or adult—which could help researchers monitor the age structure of wild populations.

Listening, Not Conversing

Right now, NatureLM-audio mainly listens; however, ESP’s long-term vision reaches beyond passive monitoring. By decoding patterns in calls and linking them with behavior, context, and social structure, researchers can begin to understand what different signals mean for different species. As these patterns become more reliable, future systems might generate synthetic calls or playback sequences designed to test whether animals respond as if “understanding” a message, such as a warning or invitation. ESP emphasizes that the goal is to listen more deeply so we can restore our relationship with the natural world. In the words of Brewer: “We decode animal communication with advanced AI to illuminate the diverse intelligences on earth. Not to have two-way communication with animals.”

A No‑Code Tool for Everyday Researchers

ESP and its partners released a no-code demo so people can experiment with NatureLM-audio without writing a single line of code. Users upload recordings, choose tasks such as species identification or call detection, and then ask questions in simple language like “Is there a frog calling here?” or “Which bird species show up in this clip?” This kind of interface makes bioacoustics accessible to park rangers, citizen scientists, and small conservation groups that may not have full-time data scientists but still collect valuable audio in the field.

Real‑World Tests, From Frogs to Whales

green-eyed tree frog

A pilot project in Australia focuses on the vocalizations of frogs.

The model and its earlier versions have already been tested on real-world projects. In one collaboration, the FrogID citizen science project used NatureLM-audio to help analyze large numbers of frog recordings gathered by volunteers across Australia. The model’s ability to generalize across species and tasks reduces manual labeling time and speeds up biodiversity assessments, which supports more timely conservation decisions. ESP also partners with biologists who study whales, elephants, and carrion crows, where paired sensor data and audio are starting to reveal how vocalizations link to movement, social roles, or environmental changes.

Who Is Behind ESP’s Work?

Earth Species Project was co-founded by technologist Aza Raskin, entrepreneur Britt Selvitelle, and Kate Zacarian, along with other collaborators who saw how rapidly language models were improving. The organization now includes AI researchers, engineers, product builders, ethicists, and impact specialists, all focused on decoding and amplifying the voices of nature. ESP also collaborates with academic institutions, conservation organizations, and citizen science projects worldwide, which contribute data and domain expertise that keep the AI grounded in real ecology.

Ethics, Openness, and Safeguards

Working at the edge of AI and animal communication raises tough ethical questions, and ESP tries to address them directly. The group highlights risks such as disturbing animals with excessive playbacks, misusing decoded signals to exploit animals, or allowing surveillance tools to harm ecosystems instead of helping them. Brewer states, “Many of these risks stem from a deeper issue of human exceptionalism and disconnection from the rest of nature, which has historically shaped how technologies are developed and deployed. Without careful stewardship, advances in understanding animal communication could reinforce that divide.”

To reduce those risks, ESP publishes technical details, open-sources many of its models, and encourages broad public discussion about how interspecies understanding should guide policy, conservation, and governance. In doing so, the organization hopes not only to accelerate science but to create a shift in how humans relate to the natural world. The goal is not to control nature but to listen to its communication in ways that support thriving habitats and more thoughtful human decisions.

Why NatureLM‑audio Matters for the Future

NatureLM-audio and related tools could help make the living world a more understandable and responsive partner, rather than just a silent backdrop. If we can rapidly detect stressed populations, changing migration routes, or new behaviors through their soundscapes, conservation can move from late emergency reactions to earlier, informed action. For young people growing up with AI, these systems offer a chance to see technology not only as a human convenience tool but also as a bridge to the many other minds that share our planet. Listening better may not make animals “talk” like characters in a movie, but it can make us better neighbors—and that might be the most important translation of all.

Drew Wood

About the Author

Drew Wood

Drew is a college professor and freelance writer who graduated from the University of Virginia. His travels have taken him to 25 countries and 44 states, where he has enjoyed learning about wildlife in a wide range of environments. In addition to his love of animals, he enjoys scary movies, landscaping, strategy games, and philosophical discussions over a cup of coffee. He is also an emotional support human to a neurotic Spanish Water Dog and a hyperactive Chihuahua mix.

Thank you for reading! Have some feedback for us?