Agent Engineer

I build AI agents.
Then I find out if they actually work.

Four projects, one shared engine. I grade them by exact matching and span overlap, not by asking another model if the answer looks right. Every number here is real, including the ones that did not go my way.

See the workgithub.com/7P3ng
27.8% to 0.0%false positives
F1 0.548FieldAgent
K=3Quorum verif.
3 reposCI green

The thesis

One engine, three hard problems.

Most portfolios show demos. These show the evaluation. If an idea is solid, it should hold up across different problems, so I built one core (Quorum's core/) and ran it on three. Where it holds, the numbers are here. Where it does not, that is in the write-up too.