Amazon Is Investigating Perplexity Over Claims of Scraping Abuse

Amazon's cloud division has launched an investigation into Perplexity AI. At issue is whether the AI search startup is violating Amazon Web Services rules by scraping websites that attempted to prevent it from doing so, WIRED has learned.
An AWS spokesperson, who talked to WIRED on the condition that they not be named, confirmed the company's investigation of Perplexity.
WIRED had previously found that the startup-which has backing from the Jeff Bezos family fund and Nvidia, and was recently valued at $3 billion-appears to rely on content from scraped websites that had forbidden access through the Robots Exclusion Protocol, a common web standard.
While the Robots Exclusion Protocol is not legally binding, terms of service generally are.
The Robots Exclusion Protocol is a decades-old web standard that involves placing a plaintext file on a domain to indicate which pages should not be accessed by automated bots and crawlers.
While companies that use scrapers can choose to ignore this protocol, most have traditionally respected it.
The Amazon spokesperson told WIRED that AWS customers must adhere to the robots.
Scrutiny of Perplexity's practices follows a June 11 report from Forbes that accused the startup of stealing at least one of its articles.
WIRED investigations confirmed the practice and found further evidence of scraping abuse and plagiarism by systems linked to Perplexity's AI-powered search chatbot.
Engineers for Condé Nast, WIRED's parent company, block Perplexity's crawler across all its websites using a robots.
WIRED found the company had access to a server using an unpublished IP address-44.221.181.252-which visited Condé Nast properties at least hundreds of times in the past three months, apparently to scrape Condé Nast websites.
The machine associated with Perplexity appears to be engaged in widespread crawling of news websites that forbid bots from accessing their content.
Spokespeople for The Guardian, Forbes, and The New York Times also say they detected the IP address on its servers multiple times.
WIRED traced the IP address to a virtual machine known as an Elastic Compute Cloud instance hosted on AWS, which launched its investigation after we asked whether using AWS infrastructure to scrape websites that forbade it violated the company's terms of service.
He refused to name the company, citing a nondisclosure agreement.


This Cyber News was published on www.wired.com. Publication date: Thu, 27 Jun 2024 22:43:05 +0000


Cyber News related to Amazon Is Investigating Perplexity Over Claims of Scraping Abuse

Is Web Scraping Illegal? - Legal actions against web scraping are slow and vary by country, leaving organizations to fend for themselves. Web scraping is a technique to swiftly pull large amounts of data from websites using automated software. Web scraping differs from screen ...
6 months ago Imperva.com
Amazon Is Investigating Perplexity Over Claims of Scraping Abuse - Amazon's cloud division has launched an investigation into Perplexity AI. At issue is whether the AI search startup is violating Amazon Web Services rules by scraping websites that attempted to prevent it from doing so, WIRED has learned. An AWS ...
3 days ago Wired.com
Cisco Foundation Grantees prioritize Indigenous leadership to protect the Amazon Basin - This is the first of our three-part series on Cisco Foundation grantees working in the Amazon and South America region. This series will introduce you to eight Cisco Foundation Climate Impact & Regeneration grantees working to support preservation ...
5 months ago Feedpress.me
Master the Art of Data Security - As we step further into the digital age, the importance of data security becomes increasingly apparent. As with all data storage services, it's crucial to ensure that the data stored on Amazon S3 is secure, particularly when it's 'at rest'-that is, ...
6 months ago Feeds.dzone.com
The Dark Side of Digital Reading: E-Books as Corporate Surveillance Tools - Americans are reading digital books at a rate of three out of ten. In a market where the majority of readers are subject to both Big Publishing's greed and those of Big Tech, it is no surprise that these readers are subject to both the greed of Big ...
6 months ago Cysecurity.news
Rundown of Security News from AWS re:Invent 2023 - Amazon Web Services has been unveiling a steady stream of announcements during its AWS re:Invent 2023 event in Las Vegas this week. The focus over the four days, as expected, is on AI as AWS strives to show that its offerings can match - or surpass - ...
7 months ago Darkreading.com
ACM will no longer cross sign certificates with Starfield Class 2 starting August 2024 - AWS Certificate Manager is a managed service that you can use to provision, manage, and deploy public and private TLS certificates for use with Elastic Load Balancing, Amazon CloudFront, Amazon API Gateway, and other integrated AWS services. Starting ...
3 days ago Aws.amazon.com
Amazon Prime Video Ads 5 February - Adverts will start appearing for UK users of Amazon Video Prime on 5 February 2024, unless extra fee is paid. Amazon has confirmed that adverts will begin appearing for UK customers of the Amazon Prime Video service in early 2024. In an email to UK ...
6 months ago Silicon.co.uk
Amazon sues REKK fraud gang that stole millions in illicit refunds - Amazon's Customer Protection and Enforcement team has taken legal action against an underground store refund scheme that has resulted in the theft of millions of dollars worth of products from Amazon's online platforms. This lawsuit targets 20 ...
6 months ago Bleepingcomputer.com
9 Best DDoS Protection Service Providers for 2024 - eSecurity Planet content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More. One of the most powerful defenses an organization can employ against distributed ...
6 months ago Esecurityplanet.com
No Robots(.txt): How to Ask ChatGPT and Google Bard to Not Use Your Website for Training - Both OpenAI and Google have released guidance for website owners who do not want the two companies using the content of their sites to train the company's large language models. We've long been supporters of the right to scrape websites-the process ...
6 months ago Eff.org
Navigating the New Frontier of AI-Driven Cybersecurity Threats - A few weeks ago, Best Buy revealed its plans to deploy generative AI to transform its customer service function. Best Buy's initiative is a harbinger of generative AI deployment in enterprise settings, aiming to increase productivity and improve ...
1 month ago Securityboulevard.com
A Handbook for Managing Containers on Amazon Web Services - Container management is a way to help you create, govern, and maintain your containers. There are tools and services available that can automate the creation, deployment, maintenance, scaling, and monitoring of application or system containers. In ...
1 year ago Trendmicro.com
Coming Soon to a Network Near You: More Shadow IoT - News of former Microsoft head of product Panos Panay's exit caused a small stir in the tech industry when it was learned he would join Amazon to lead that company's product division. Precisely what Amazon and Panay have in mind for that ecosystem has ...
5 months ago Securityweek.com
Change Healthcare's New Ransomware Nightmare Goes From Bad to Worse - Change Healthcare is facing a new cybersecurity nightmare after a ransomware group began selling what it claims is Americans' sensitive medical and financial records stolen from the health care giant. RansomHub claimed it had health care data on ...
2 months ago Wired.com
Amazon Wins $274m Tax Battle With EU - Defeat for European Commission after court rules Amazon does not have to pay 250m euros in back taxes to Luxembourg. The European Commission and EU antitrust chief Margrethe Vestager have been handed a defeat in their attempts to crack down on ...
6 months ago Silicon.co.uk
General Electric investigates claims of cyber attack, data theft - General Electric is investigating claims that a threat actor breached the company's development environment in a cyberattack and leaked allegedly stolen data. General Electric is an American multinational company with divisions in power, renewable ...
7 months ago Bleepingcomputer.com
Insurers Use Claims Data to Recommend Cybersecurity Technologies - Businesses using a managed detection and response provider cut their median response time to a cyber incident by half, and saw a commensurate - and dramatic - reduction in the impact of each incident, according to an analysis of insurance claims ...
4 months ago Darkreading.com
Anthropic confirms it suffered a data leak - It's been an eventful week for AI startup Anthropic, creator of the Claude family of large language models and associated chatbots. The company says that on Monday, January 22nd, it became aware that a contractor inadvertently sent a file containing ...
5 months ago Venturebeat.com
General Electric, DARPA Hack Claims Raise National Security Concerns - General Electric and the Defense Advanced Research Projects Agency have reportedly been breached, according to claims on the Dark Web that the organizations' highly sensitive stolen data is up for sale. A screen capture from the Dark Web ad shows a ...
7 months ago Darkreading.com
General Electric, DARPA Hack Claims Raise National Security Concerns - General Electric and the Defense Advanced Research Projects Agency have reportedly been breached, according to claims on the Dark Web that the organizations' highly sensitive stolen data is up for sale. A screen capture from the Dark Web ad shows a ...
7 months ago Darkreading.com
"Amazon got hacked" messages are a false alarm - Amazon customers have been seeing a message on social media that has caused some alarm. Hub lockers are local secure places for people to pick up their Amazon order rather than risk them being left on a doorstep, so the concern was that someone could ...
6 months ago Malwarebytes.com
What to do when receiving unprompted MFA OTP codes - Receiving an unprompted one-time passcode sent as an email or text should be a cause for concern as it likely means your credentials have been stolen. One of the initial components of a cyberattack is the theft of legitimate credentials to corporate ...
6 months ago Bleepingcomputer.com
Twitch To Lay Off 35 Percent Of Staff - Amazon-owned Twitch is handing a sizeable portion of its workforce the worst news in early 2024 by axing their jobs. Amazon-owned game streaming service Twitch is reportedly about to lay off a large number of its workforce, in more bad news on the ...
5 months ago Silicon.co.uk
User Outcry as Slack Scrapes Customer Data for AI Model Training - Enterprise workplace collaboration platform Slack has sparked a privacy backlash with the revelation that it has been scraping customer data, including messages and files, to develop new AI and ML models. By default, and without requiring users to ...
1 month ago Securityweek.com

Latest Cyber News


Cyber Trends (last 7 days)


Trending Cyber News (last 7 days)