Storyful is an equal opportunity employer
Job Description
*Reporting to:*
Chief Product & Technology Officer (CPTO)
*Dublin (Hybrid — 3 days/week in office)
*Team:
Product & Engineering (foundational hire for Data Science / AI function)
Mission:
Build a next-generation
Risk & Insights Intelligence Platform
that disrupts
media monitoring, social listening, and LLM monitoring
—from early prototypes to commercially successful, market-leading products.
This role is for someone who can
architect and build
(hands-on) agentic LLM Systems in production, partner deeply with Data Scientists, and obsess over
evaluation, quality, and cost
—while thriving in the ambiguity of zero-to-one product creation.
Why This Role Exists
We're building an AI-native platform that detects, explains, and helps teams respond to reputational and narrative risk. You'll shape the technical direction including network science and explainability early:
agent ecosystems, information retrieval (e.g. RAG + Graph RAG), multi-document reasoning, classification, scoring, evaluation, and LLMOps
—and turn them into reliable product experiences.
What You'll Do (Responsibilities)**
* Architect and ship agentic GenAI systems
* Design and implement agent ecosystems (multi-agent architectures) that deliver real product outcomes (not demos).
* Build specialized agents for workflows like adverse media / risk detection, entity investigation, source authenticity, classification, and summarization—and orchestrate them reliably.
* Own the translation from research/prototypes into production-grade features (latency, reliability, observability, cost).
* Build RAG + Graph RAG for multi-doc intelligence
* Deliver RAG chatbots for investigation and exploration across large document sets.
* Implement multi-document summarization, including Graph RAG patterns (graph extraction, linking entities/claims, narrative threads).
* Implement semantic chunking / paragraph splitting, retrieval strategies, and citation/grounding patterns suitable for risk/comms teams.
* deep agents or deep research; graph traversal strategies (network science); agentic RAG
* Multi-document classification + scoring (risk-focused)
* Build instruction-based and ML-assisted classification pipelines for multi-document inputs (themes, narratives, risk taxonomy). Explore generating data to fine tune small models.
* Create scoring methodologies (e.g., risk score, severity, momentum/growth, confidence, exposure) with a clear rationale and calibration approach.
* Bonus: experience building "risk detection" classifiers and adverse media style pipelines.
* Context engineering + automatic prompt improvement
* Lead prompt engineering practices across the product: reusable prompt assets, versioning, guardrails, and domain adaptation.
* Implement prompt evolution techniques (e.g., automated prompt iteration / prompt improvement loops) where it makes commercial sense.
* Understand the impact of the words in a prompt into the distribution of probabilities the LLM outputs, managing context, through graphs and information retrieval
* Evaluation: make quality measurable and repeatable
* Build robust evaluation methodologies for prompts, RAG, summarization, and classification.
* Apply multiple evaluation techniques, including:
* offline metrics (precision/recall/F1 where appropriate)
* retrieval metrics and ablations
* LLM-as-a-judge style evaluations with rubrics, controls, and drift detection
* Define quality gates that allow the team to move fast without breaking trust.
* Understanding an LLM as a neural network, and not only something that can be prompted and observed from the outside. For example understanding how entropy can be a signal to detect hallucinations while they unfold through the layers of the model.
* LLMOps + cost control
* Implement LLMOps: experiment tracking, model/prompt versioning, dataset management, observability, and release practices.
* Build monitoring for quality + safety + cost, and actively optimize infrastructure spend in cloud environments.
* Deploying and maintaining open source models
* Lead by influence (and occasionally by direct leadership)
* Bring "Senior/Lead Engineer" judgement: clean architecture, pragmatic decisions, mentoring, unblock teams.
* Partner tightly with Product, Design, Data Science, and Engineering—while also being able to execute independently.
*What success looks like (first 6–12 months)*
* A production-grade agentic architecture powering key workflows (investigate → summarize → classify → score → recommend action).
* A measurable evaluation framework where quality improves release over release.
* A Graph RAG (or equivalent) capability that materially improves multi-doc summarization accuracy and defensibility.
* Clear cost/performance tradeoffs and observability that make the system operable at scale.
* A team around you that's leveled up in GenAI engineering practices.
Required Experience (Must-have)
* Proven background as a Senior / Lead Engineer (or equivalent staff-level scope), owning architecture and delivery.
* Demonstrated experience building agentic GenAI architecture for commercially successful product features (not only internal prototypes).
* Strong experience working with Data Scientists on ML algorithms, NLP, evaluation design, and productionization.
* Hands-on experience in AWS and GCP (Azure acceptable as additional).
* Production experience with:
* RAG chatbots
* multi-document summarization (ideally Graph RAG)
* multi-document classification
* scoring methodologies (risk scoring is a strong bonus)
* Deep expertise in prompt engineering and evaluation, including both classical metrics (e.g., precision/recall) and LLM-as-a-judge approaches.
* Strong LLMOps and GenAI product design experience: experimentation → deployment → monitoring → iteration.
*Nice-to-have (Strong bonuses)*
* Experience in risk/compliance domains (e.g., adverse media, AML, entity investigation workflows).
* Knowledge graphs in production (e.g., Neo4j) and graph extraction pipelines.
* Experience running annotation programs / building labeled datasets for NLP tasks.
*Skills & tools (examples)
We don't require exact matches, but we do expect you to be fluent in this class of tooling and able to choose pragmatically.
GenAI frameworks & LLMs*
* LangChain, LlamaIndex
* OpenAI / Gemini / Claude
* Vector RAG + Graph RAG patterns
*LLMOps / experimentation / observability*
* MLflow (experiments, tracking)
* Langfuse (prompt & trace observability)
*Data & retrieval*
* Neo4j (graph), ElasticSearch
* Vector stores (Pinecone-style capability), embeddings, semantic chunking
*Cloud / infrastructure (examples)*
* AWS: Lambda, SQS/SNS, Kinesis, Glue, Athena, Redshift, DynamoDB, RDS, API Gateway, CloudFront, SageMaker, Comprehend, Kendra, Lex
* GCP (plus Azure exposure helpful)
*Languages*
* Python (primary), TypeScript, Java (Ruby on Rails experience welcome)
Job Category
Storyful - Product & Technology