/FIELD NOTE

What a Real Web Application Penetration Test Looks Like

4 March 2026 // 13 min read // Basalt Cyber Defense Division

Plenty of organisations have bought a penetration test, received a forty page PDF, and quietly filed it away unsure of what actually happened. A real web application penetration test is not an automated scan with a logo on the cover. It is a structured, manual investigation of how your application can be misused, run by people who think like attackers but report like engineers. This post walks through what a genuine engagement looks like end to end, the way we run it at Basalt Cyber, so you know what you are paying for and how to judge whether you got it.

Scoping and rules of engagement come first

Before any traffic touches your application, the scope has to be nailed down. That means agreeing which hostnames, subdomains, APIs, and user roles are in play, what time windows testing can run in, and which actions are off limits (for example, no destructive data deletion against production, or no denial of service testing). We confirm whether the target is production, staging, or a dedicated test environment, because that changes how aggressive we can be. We also agree on test accounts for each privilege level, because much of the interesting work in modern apps is about what a low privileged user can reach that they should not.

A good rules of engagement document protects both sides. It gives the tester written authorisation (which matters legally), and it gives you a clear boundary so nothing unexpected happens. If a vendor skips this step, that is a warning sign.

Phase one: reconnaissance and mapping

The first technical phase is understanding the application as a system rather than a set of pages. We enumerate endpoints, parameters, hidden directories, JavaScript bundles, and API routes. Front end JavaScript is often a goldmine, because it reveals API paths, feature flags, and sometimes hardcoded keys that the developers forgot were shipped to the browser.

We map the technology stack: the framework, the web server, any reverse proxies or WAFs in front, and the authentication mechanism. We build a picture of the trust boundaries, where user input enters, and where the application talks to internal services. This mapping aligns with the reconnaissance and information gathering stages of the OWASP Web Security Testing Guide, which is the methodology we anchor to so that coverage is consistent rather than dependent on one tester's habits.

Phase two: authentication, session, and access control

This is where many real breaches actually live, so we spend real time here. On authentication we test password policy, account lockout, multi factor enforcement, password reset flows, and whether session tokens are predictable or improperly invalidated on logout. Reset flows are a frequent offender, leaking tokens in URLs or failing to bind a reset to the requesting account.

Session management testing checks cookie flags (HttpOnly, Secure, SameSite), token entropy, fixation, and concurrent session handling. Then comes access control, which deserves its own attention.

Broken access control: the most common serious finding

We methodically test horizontal access control (can user A read user B's data by changing an ID?) and vertical access control (can a standard user reach admin functions by guessing the route or replaying a privileged request?). Insecure direct object references, missing function level checks, and mass assignment all surface here. These bugs rarely show up in scanners because they require understanding the business meaning of each object. They consistently rank at the top of the OWASP Top 10, and in our experience they are the findings that most often turn into a full account or tenant compromise.

Phase three: injection, input handling, and server side flaws

With the structure understood, we test how the application handles hostile input. This covers SQL and NoSQL injection, command injection, server side template injection, server side request forgery (SSRF), cross site scripting (reflected, stored, and DOM based), and insecure deserialization. We test file upload handling, XML parsing for XXE, and any place the server fetches a URL the user controls.

The point is not to fire a payload list and move on. It is to understand the sink, confirm the vulnerability is real, and assess what an attacker could actually do with it. A reflected XSS in a low value page is a different risk to a stored XSS in an admin dashboard, and our reporting reflects that difference.

Phase four: business logic and exploit chaining

Automated tools cannot understand your business rules. Humans can. We test things like whether you can apply a discount twice, manipulate a multi step checkout to skip payment, replay a one time action, or abuse a race condition to redeem the same credit concurrently. These logic flaws are unique to your application and often carry direct financial impact.

The most valuable part of a manual test is exploit chaining: combining several low or medium findings into one high impact attack. A self XSS that seems harmless can become critical when chained with a CSRF weakness and a privileged action. An information disclosure that leaks an internal hostname becomes serious when paired with an SSRF that can reach it. We deliberately try to build these chains, because real attackers do, and a single chained proof of concept is far more persuasive to a board than ten isolated medium findings.

Reporting by exploitability, and the retest

Findings are ranked by real world exploitability and business impact, not just by a raw CVSS number. Each one includes a clear description, reproduction steps, evidence, the affected component, and concrete remediation guidance written for the developers who have to fix it. We separate an executive summary (what matters and why) from the technical detail (how to reproduce and fix), so both audiences are served.

Crucially, the engagement is not finished when the report lands. After your team remediates, we run a free retest of the reported issues to confirm the fixes actually closed the gap and did not introduce regressions. A test that does not verify its own fixes leaves you guessing. You can see how this fits into a broader programme on our services page, and you can start a scoped engagement through our contact page.

Frequently asked questions

How long does a web app pentest take?

A typical application takes one to two weeks of testing depending on size, number of roles, and API surface. We scope this up front so there are no surprises.

Will testing break production?

We avoid destructive and denial of service style tests against production unless explicitly authorised. Most testing is read oriented and carefully controlled, and we prefer a staging environment that mirrors production where one exists.

Do you just run a scanner?

No. We use tooling to accelerate discovery, but the findings that matter (access control, business logic, exploit chains) come from manual testing by an experienced tester.

What do we get at the end?

A prioritised report with an executive summary, technical detail, reproduction steps, remediation guidance, and a free retest after you fix the issues.