Alex Schapiro

How I Reverse Engineered a Billion-Dollar Legal AI Tool and Found 100k+ Confidential Files

2025-12-02T04:00:00-05:00

Update: This post received a large amount of attention on Hacker News — see the discussion thread.

Update #2: These things happen to every big company routinely but often the person finding the vulnerability is paid and signs an NDA. Filevine allowed me to disclose this vulnerability and it should not become weaponized against them – that just drives companies to hide vulnerabilities instead of being transparent about them.

Timeline & Responsible Disclosure

Initial Contact: Upon discovering this vulnerability on October 27, 2025, I immediately reached out to Filevine’s security team via email.

November 4, 2025: Filevine’s security team thanked me for the writeup and confirmed they would review the vulnerability and fix it quickly.

November 20, 2025: I followed up to confirm the patch was in place from my end, and informed them of my intention to write a technical blog post.

November 21, 2025: Filevine confirmed the issue was resolved and thanked me for responsibly reporting it.

Publication: December 3, 2025.

The Filevine team was responsive, professional, and took the findings seriously throughout the disclosure process. They acknowledged the severity, worked to remediate the issues, allowed responsible disclosure, and maintained clear communication. Following conversations I’ve had with the Filevine team, it is clear that this incident is only related to a single law firm, no other Filevine clients were impacted – this was a non-production instance and this was not a system-wide Filevine issue. Filevine was appreciative of my efforts to find and alert them to this issue. This is another great example of how organizations should handle security disclosures.

AI legal-tech companies are exploding in value, and Filevine, now valued at over a billion dollars, is one of the fastest-growing platforms in the space. Law firms feed tools like this enormous amounts of highly confidential information.

Because I’d recently been working with Yale Law School on a related project, I decided to take a closer look at how Filevine handles data security. What I discovered should concern every legal professional using AI systems today.

When I first navigated to the site to see how it worked, it seemed that I needed to be part of a law firm to actually play around with the tooling, or request an official demo. However, I know that companies often have a demo environment that is open, so I used a technique called subdomain enumeration (which I had first heard about in Gal Nagli’s article last year) to see if there was a demo environment. I found something much more interesting instead.

I saw a subdomain called margolis.filevine.com. When I navigated to that site, I was greeted with a loading page that never resolved:

I wanted to see what was actually loading, so I opened Chrome’s developer tools, but saw no Fetch/XHR requests (the request you often expect to see if a page is loading data). Then, I decided to dig through some of the Javascript files to see if I could figure out what was supposed to be happening. I saw a snippet in a JS file like POST await fetch(${BOX_SERVICE}/recommend). This piqued my interest – recommend what? And what is the BOX_SERVICE? That variable was not defined in the JS file the fetch would be called from, but (after looking through minified code, which SUCKS to do) I found it in another one: “dxxxxxx9.execute-api.us-west-2.amazonaws.com/prod”. Now I had a new endpoint to test, I just had to figure out the correct payload structure to it. After looking at more minified js to determine the correct structure for this endpoint, I was able to construct a working payload to /prod/recommend:

{"projectName":"Very sensitive Project"}

(the name could be anything of course). No authorization tokens needed, and I was greeted with the response:

At first I didn’t entirely understand the impact of what I saw. No matter the name of the project I passed in, I was recommended the same boxFolders and couldn’t seem to access any files. Then, not realizing I stumbled upon something massive, I turned my attention to the boxToken in the response.

After reading some documentation on the Box Api, I realized this was a live maximum access fully scoped admin token to the current, entire Box filesystem (like an internal shared Google Drive) of this law firm. This includes all confidential files, logs, user information, etc. Once I was able to prove this had an impact (by searching for “confidential” and getting nearly 100k results back)

I immediately stopped testing and responsibly disclosed this to Filevine. They responded quickly and professionally and remediated this issue.

If someone had malicious intent, they would have been able to extract every single file used by Margolis lawyers – countless data protected by HIPAA and other legal standards, internal memos/payrolls, literally millions of the most sensitive documents this law firm has in their possession. Documents protected by court orders! This could have been a real nightmare for both the law firm and the clients whose data would have been exposed.

To companies who feel pressure to rush into the AI craze in their industry – be careful! Always ensure the companies you are giving your most sensitive information to secure that data.

Note: After publishing this article, I was contacted by someone from the law firm Margolis PLLC asking me to confirm that the affected law firm was not theirs. I can confirm it was not.

Brute-Forceable Airline Reservation API Left Millions of Passenger Records Vulnerable

2025-11-20T00:00:00-05:00

Timeline & Responsible Disclosure

Initial Contact: Upon discovering this vulnerability on October 15, 2025, I immediately reached out to security contacts at Avelo Airlines via email.

October 16, 2025: The Avelo cybersecurity team responded quickly and professionally. We had productive email exchanges where I detailed the vulnerability, including the lack of last name verification and rate limiting on reservation endpoints.

November 13, 2025: Avelo pushed a fix to production and notified me that the vulnerabilities were patched. I independently verified the fixes were in place before publication, and informed the Avelo team of my intention to write a technical blog post about this vulnerability, highlighting their cooperative and responsive approach to security disclosure.

Publication: November 20, 2025.

The Avelo team was responsive, professional, and took the findings seriously throughout the disclosure process. They acknowledged the severity, worked quickly to remediate the issues, and maintained clear communication. This is a model example of how organizations should handle security disclosures.

After my 9 AM Akkadian class, I sat down to change my flight out of New Haven with Avelo Airlines, and noticed that my computer was making some unusual requests. After digging a little further, I stepped into a landmine of customer information exposure. In the wrong hands, this critical vulnerability could allow an attacker to access full reservation details, including PII, government ID numbers, and partial payment info, for every Avelo passenger, past and present.

Before I walk you through my work on that Tuesday morning, let’s establish how airlines generally manage their reservations.

How Airline Logins Should Work

Normally, to access a flight reservation (which often contains sensitive information like passport numbers, Known Traveler Numbers, and partial credit card data), you need at least two pieces of information: a confirmation code and the passenger’s last name.

This two-factor system is generally secure. The space of all 6-character alphanumeric confirmation codes combined with all possible last names is astronomically large, making it impossible to “guess” a valid pair.

But what if the last name check was missing?

Suddenly, the problem becomes much simpler. The entire keyspace an attacker needs to guess is just the confirmation code. In Avelo’s case, their codes are 6-character alphanumeric strings ([A-Z0-9]).

Let’s do the math:

Keyspace: 36 characters (26 letters + 10 digits)
Length: 6
Total Combinations: 36^6 = 2,176,782,336 (~2.18 billion)

That’s a big number, but it’s not “astronomically large.” It’s well within the reach of a modern brute-force attack.

The Attack Timeline

How long would it take to try all 2.18 billion combinations? The time is just 2.18 billion / (requests per second).

At 1,000 req/s (a modest script): 2.18 million seconds, or ~25 days.
At 10,000 req/s (a decent server): 218,000 seconds, or ~2.5 days.
At 100,000 req/s (a small cluster of servers, costing $400-$700)¹: 21,800 seconds, or ~6 hours.

Bottom line: If Avelo’s flight system has no rate limiting and doesn’t require a last name, an adversary could extract all passenger data in about 6 hours for less than a thousand dollars.

Even Faster Than 6 Hours

Even worse, they don’t need to run for 6 hours. With an estimated 8 million tickets sold, the “hit rate” is roughly 1 in every 270 guesses (2.18B / 8M). An attacker would start getting valid PII back in seconds.

Back to the Story: Finding the Flaw

This was all just theory until I looked at my network traffic. As I was changing my reservation, I saw a GET request to an API endpoint:

https://www.aveloair.com/payment/services/reservation/{code}

The parameter at the end didn’t seem like a reservation code, but the response contained all relevant reservation data, so I decided to probe further. On a hunch, I swapped that token for my actual 6-character code and re-sent the request.

Voila. The server responded with a massive JSON object containing my entire reservation.

This endpoint wasn’t asking for my last name. The only other security was a standard authentication cookie… but was that cookie tied to my reservation?

I quickly texted a friend for their old Avelo confirmation code. I plugged it into the URL, kept my own cookie, and hit send. But there was no way it could poss-

It worked.

I was looking at their full reservation. Any valid authentication cookie could be used to query any reservation, using only the 6-character code. The theoretical flaw was real.

Executing the Attack: No Rate Limiting

The only remaining (partial) defense was rate-limiting. I wrote a quick multi-threaded Python script to generate random 6-character codes and hit the endpoint.

The requests flew. There was no WAF, no IP blocking, no CAPTCHA.

The script quickly finding valid reservation codes

Within minutes, my script was logging hundreds of valid reservations. Troves of data were being returned, including from passengers flying on government business with @dot.gov and @faa.gov email addresses.

A successful hit returned the entire reservation object. This was a complete data breach for each passenger – including myself!

(Note: During further testing, I discovered a similar vulnerability on a different reservation endpoint. I promptly notified the Avelo team, and they patched that endpoint as well before publication.)

What Data Was Leaked?

For every valid code, the API returned:

Full Passenger PII: FullName, DateOfBirth, Gender
Government IDs: IDDocuments.IDNumber (this field contained Known Traveler Numbers (KNTs) and, in other cases, Passport Numbers)
Contact Info: phone numbers, email addresses
Full Itinerary: Flight numbers, dates, times, and SeatLocation
Payment Details: CardNumber (masked: ************8), DateTimeExpiration, and billing Address.PostalCode
Vouchers: PaymentInternals.AccountNumber and Amount.Value
PCI Data: PaymentCards.TrackData — This field seemed to contain partial magnetic-stripe data

Example of exposed payment card data returned by the API

Example of exposed Known Traveler Number (KNT) and other PII in API response

The Fallout

This flaw was critical. An attacker could:

Run the 6-hour brute-force attack to enumerate millions of valid passenger reservation codes (PNRs) — or simply run the script for a few minutes and start harvesting valid passenger data immediately
Extract comprehensive PII including full names, dates of birth, contact information, flight itineraries, and government ID numbers (Known Traveler Numbers and passport numbers) for identity theft and fraud
Access partial payment card data including last 4 digits, expiration dates, and billing zip codes
View complete travel history and passenger boarding status
Modify or cancel all Avelo passengers’ reservations, causing widespread travel disruption

I immediately disclosed this to the Avelo team. They were responsive, professional, and took the findings seriously, patching the issues promptly.

Key Takeaways

This incident is a stark reminder of how critical simple security checks are. A single missing lastName check and an absent rate-limit configuration exposed millions of sensitive passenger records to trivial enumeration.

For developers:

Always require multiple factors for accessing sensitive data (e.g., confirmation code + last name)
Implement rate limiting on all enumerable endpoints
Ensure authentication cookies are properly scoped to user sessions

I’m glad we could get this fixed, and I hope this write-up helps other developers avoid similar pitfalls.

AWS Lambda: requests billed at $0.20 per million plus compute billed per GB‑second; at 2.18B requests, request charges are about 2,176.8 million × $0.20 ≈ $435 ↩

How Broken OTPs and Open Endpoints Turned a Dating App Into a Stalker’s Playground

2025-04-21T08:30:00-04:00

Update: This post received a large amount of attention on Hacker News — see the discussion thread.

Startups Need to Take Security Seriously

Timeline & Responsible Disclosure: Upon identifying these vulnerabilities, I reached out to the Cerca team via email on February 23, 2025. The next day (Feb 24), we held a productive video call to discuss the vulnerabilities, potential mitigations, and next steps. During our conversation, the Cerca team acknowledged the seriousness of these issues, expressed gratitude for the responsible disclosure, and assured me they would promptly address the vulnerabilities and inform affected users.

Since then, I have reached out multiple times (on March 5 and March 13) seeking updates on remediation and user notification plans. Unfortunately, as of today’s publication date (April 21, 2025), I have been met with radio silence. To my knowledge, Cerca has not publicly acknowledged this incident or informed users about this vulnerability, despite their earlier assurances to me. They also never followed up with me following our call and ignored all my follow up emails.

However, I was able to independently confirm that the vulnerabilities detailed in this blog post have since been patched, enabling me to responsibly publish these findings.

Too few people know how to make secure apps – and the rush to market puts consumers at risk. Some of my friends were saying that they’d gotten texts from this new dating app called Cerca. Obviously, dating apps require a lot of personal information, so I wanted to make sure that my friends’ data was safe before they started using this app.

I downloaded the app and booted up Charles Proxy (using the iPhone app) to intercept the network requests and see what this app was doing under the hood.

First things first, let’s log in. They only use OTP-based sign in (just text a code to your phone number), so I went to check the response from triggering the one-time password. BOOM – the OTP is directly in the response, meaning anyone’s account can be accessed with just their phone number.

However, I now needed to figure out a way to determine who has an account—I don’t just want to guess phone numbers. So I went to the api.cercadating.com endpoint and used a directory fuzzer to enumerate paths, hoping to find relevant endpoints. I couldn’t access any part of the site without the relevant app header:

So I passed that header through using Gobuster and to my (semi) surprise all endpoints were exposed, thanks to finding the /docs endpoint which served openapi.json!

I powered up Burp Suite and used the match-and-replace tools to always pass that app-version header, along with the bearer token I extracted from Charles proxy. Here is where it gets even more interesting.

Some unprotected endpoints seemed to affect only business logic—such as this one I could use to force two people to match with each other:

But others, like the get user profile endpoint (user/{user_id}), seemed more interesting. This endpoint takes a valid user ID and returns all sorts of personal information (including the phone numbers necessary for total account takeover, thanks to the OTP vulnerability). I wrote a quick Python script to figure out valid user IDs, and then BANG – I’m in. I could enumerate over all users; the response format looked something like this:

{

  "status": "success",

  "message": "string",

  "results": 0,

  "data": {

    "first_name": "string",

    "last_name": "string",

    "gender": "MALE",

    "interested_genders": [

      "MALE"

    ],

    "city": "string",

    "latitude": 0,

    "longitude": 0,

    "university_email": "user@example.com",

    "university_email_verified": false,

    "industry": "string",

    "profession": "string",

    "date_of_birth": "2025-02-21",

    "height": 0,

    "university_id": 0,

    "university_name": "string",

    "profile_completed": false,

    "national_id_verified": false,

    "mobile_verified": false,

    "email_verified": false,

    "premium": false,

    "premium_expiry": "2025-02-21T21:31:06.213Z",

    "active": true,

    "paused": false,

    "onboarded": false,

    "profile_type": "PROFESHIONAL",

    "mobile_number": "string",

    "email": "user@example.com",

    "user_type": [

      "user"

    ],

    "user_id": 0,

    "remaining_searches": 0,

    "profile_images": [],

    "university": {

      "id": 0,

      "name": "string"

    },

    "score": [],

    "match_preferences": [],

    "user_prompts": [],

    "mutual_contact_previews": [],

    "mutual_contact_preview_data": [],

    "mutual_contact_count": 0,

    "created_at": "2025-02-21T21:31:06.213Z",

    "updated_at": "2025-02-21T21:31:06.213Z",

    "zodiac_info": {},

    "distance_km": 0,

    "final_score": 0,

    "age": 0

  },

  "meta": {}

}

Now not only could I figure out all valid phone numbers linked to an account (which can then be taken over using the OTP misconfiguration), but all of this PII is out there without OTP sign in needed! But it gets worse – the national_id_verified field seems especially concerning. Sure enough, they store your passport or ID information in the system too, like this:

{

  "status": "success",

  "message": "string",

  "results": 0,

  "data": {

    "verification_type": "PASSPORT",

    "document_number": "string",

    "front_side_url": "string",

    "back_side_url": "string",

    "selfie_url": "string",

    "status": "pending",

    "id": 0,

    "user_id": 0

  },

  "meta": {}

}

This is only available to the signed-in user, but since I could sign in as any user, I could see anyone’s ID information if they had submitted it (again, I did not do this). Not only could I see anyone’s personal messages with potential dates, I may be able to see their passport information! I ran a quick script to see how many users I could get information about, how many were registered as Yale students (I assume more were Yale students and maybe just didn’t fill in their university), and how many users had input their ID information. The script basically just counted how many valid users it saw; if after 1,000 consecutive IDs it found none, then it stopped. So there could be more out there (Cerca themselves claimed 10k users in the first week), but I was able to find 6,117 users, 207 who had put their ID information in, and 19 who claimed to be Yale students.

This is an insane leak!! I have access to sexual preferences, intimate messages, and all sorts of PII from (according to Cerca themselves) tens of thousands of unsuspecting users. Cerca, in their privacy policy, says that “We use encryption and other industry-standard measures to protect your data,” but that is clearly misleading. This poses significant risks to user safety and privacy. Considering that I’m just a college student looking at this casually, it’s entirely possible other critical vulnerabilities may exist (though complete account takeover sets a pretty high bar).

The fallout from this vulnerability is a complete invasion of privacy with potentially very harmful real-world consequences. People need to learn how to make secure apps, and not claim their apps are safe when they aren’t. Especially for a dating app! You can’t expect all users to do the checking that I did in this article. Who knows how many people already had access to all this data before I found it? Someone out there could’ve already downloaded a full database of 6,000+ users’ personal info and intimate chats, ready to exploit it. If someone with malicious intent got their hands on this info it could lead to identity theft, stalking, blackmail – you name it. These types of vulnerabilities are really scary, they can ruin lives overnight. People need to prioritize securing user data, not just shipping an app they think can go viral. And I did not set out to find this vulnerability to write this blog post, but since Cerca has not responded to any of my mails since our call nor alerted any of their users, I thought that this was a fair post to publish. Not looking to pwn anyone, just want a safer internet!