Storyful is an equal opportunity employerJob Description*Reporting to:*Chief Product & Technology Officer (CPTO)*Dublin (Hybrid — 3 days/week in office)*Team:Product & Engineering (foundational hire for Data Science / AI function)Mission:Build a next-generationRisk & Insights Intelligence Platformthat disruptsmedia monitoring, social listening, and LLM monitoring—from early prototypes to commercially successful, market-leading products.This role is for someone who canarchitect and build(hands-on) agentic LLM Systems in production, partner deeply with Data Scientists, and obsess overevaluation, quality, and cost—while thriving in the ambiguity of zero-to-one product creation.Why This Role ExistsWe're building an AI-native platform that detects, explains, and helps teams respond to reputational and narrative risk. You'll shape the technical direction including network science and explainability early:agent ecosystems, information retrieval (e.g. RAG + Graph RAG), multi-document reasoning, classification, scoring, evaluation, and LLMOps—and turn them into reliable product experiences.What You'll Do (Responsibilities)**Architect and ship agentic GenAI systemsDesign and implement agent ecosystems (multi-agent architectures) that deliver real product outcomes (not demos).Build specialized agents for workflows like adverse media / risk detection, entity investigation, source authenticity, classification, and summarization—and orchestrate them reliably.Own the translation from research/prototypes into production-grade features (latency, reliability, observability, cost).Build RAG + Graph RAG for multi-doc intelligenceDeliver RAG chatbots for investigation and exploration across large document sets.Implement multi-document summarization, including Graph RAG patterns (graph extraction, linking entities/claims, narrative threads).Implement semantic chunking / paragraph splitting, retrieval strategies, and citation/grounding patterns suitable for risk/comms teams.deep agents or deep research; graph traversal strategies (network science); agentic RAGMulti-document classification + scoring (risk-focused)Build instruction-based and ML-assisted classification pipelines for multi-document inputs (themes, narratives, risk taxonomy). Explore generating data to fine tune small models.Create scoring methodologies (e.g., risk score, severity, momentum/growth, confidence, exposure) with a clear rationale and calibration approach.Bonus: experience building "risk detection" classifiers and adverse media style pipelines.Context engineering + automatic prompt improvementLead prompt engineering practices across the product: reusable prompt assets, versioning, guardrails, and domain adaptation.Implement prompt evolution techniques (e.g., automated prompt iteration / prompt improvement loops) where it makes commercial sense.Understand the impact of the words in a prompt into the distribution of probabilities the LLM outputs, managing context, through graphs and information retrievalEvaluation: make quality measurable and repeatableBuild robust evaluation methodologies for prompts, RAG, summarization, and classification.Apply multiple evaluation techniques, including:offline metrics (precision/recall/F1 where appropriate)retrieval metrics and ablationsLLM-as-a-judge style evaluations with rubrics, controls, and drift detectionDefine quality gates that allow the team to move fast without breaking trust.Understanding an LLM as a neural network, and not only something that can be prompted and observed from the outside. For example understanding how entropy can be a signal to detect hallucinations while they unfold through the layers of the model.LLMOps + cost controlImplement LLMOps: experiment tracking, model/prompt versioning, dataset management, observability, and release practices.Build monitoring for quality + safety + cost, and actively optimize infrastructure spend in cloud environments.Deploying and maintaining open source modelsLead by influence (and occasionally by direct leadership)Bring "Senior/Lead Engineer" judgement: clean architecture, pragmatic decisions, mentoring, unblock teams.Partner tightly with Product, Design, Data Science, and Engineering—while also being able to execute independently.*What success looks like (first 6–12 months)*A production-grade agentic architecture powering key workflows (investigate → summarize → classify → score → recommend action).A measurable evaluation framework where quality improves release over release.A Graph RAG (or equivalent) capability that materially improves multi-doc summarization accuracy and defensibility.Clear cost/performance tradeoffs and observability that make the system operable at scale.A team around you that's leveled up in GenAI engineering practices.Required Experience (Must-have)Proven background as a Senior / Lead Engineer (or equivalent staff-level scope), owning architecture and delivery.Demonstrated experience building agentic GenAI architecture for commercially successful product features (not only internal prototypes).Strong experience working with Data Scientists on ML algorithms, NLP, evaluation design, and productionization.Hands-on experience in AWS and GCP (Azure acceptable as additional).Production experience with:RAG chatbotsmulti-document summarization (ideally Graph RAG)multi-document classificationscoring methodologies (risk scoring is a strong bonus)Deep expertise in prompt engineering and evaluation, including both classical metrics (e.g., precision/recall) and LLM-as-a-judge approaches.Strong LLMOps and GenAI product design experience: experimentation → deployment → monitoring → iteration.*Nice-to-have (Strong bonuses)*Experience in risk/compliance domains (e.g., adverse media, AML, entity investigation workflows).Knowledge graphs in production (e.g., Neo4j) and graph extraction pipelines.Experience running annotation programs / building labeled datasets for NLP tasks.*Skills & tools (examples)We don't require exact matches, but we do expect you to be fluent in this class of tooling and able to choose pragmatically.GenAI frameworks & LLMs*LangChain, LlamaIndexOpenAI / Gemini / ClaudeVector RAG + Graph RAG patterns*LLMOps / experimentation / observability*MLflow (experiments, tracking)Langfuse (prompt & trace observability)*Data & retrieval*Neo4j (graph), ElasticSearchVector stores (Pinecone-style capability), embeddings, semantic chunking*Cloud / infrastructure (examples)*AWS: Lambda, SQS/SNS, Kinesis, Glue, Athena, Redshift, DynamoDB, RDS, API Gateway, CloudFront, SageMaker, Comprehend, Kendra, LexGCP (plus Azure exposure helpful)*Languages*Python (primary), TypeScript, Java (Ruby on Rails experience welcome)Job CategoryStoryful - Product & Technology