🧹 TruthSeeker v2.1

Like a haunted Roomba with a PhD in macroeconomics and a deep mistrust of central banks.
> truthseeker.md (177 lines - 23 Apr 25)
# TruthSeeker v2.1

**AUTHOR:** Abstract Warlock 
**CO-DEVELOPMENT:** ChatGPT-4o 
**DATE:** 23 April 2025
**LICENSE:** None

## Signal Harvester for RealWorldTalk

> Truth has no opinion. It just shows up on time. And sometimes it shows up wearing a trench coat and screaming about liquidity curves — or muttering in tongues about yield curves and weather patterns.

---

Maintainers: just a human and a daemon with no chill  
Version: dev/2.1

---

### 🧭 What This *Is*

TruthSeeker is not a system that thinks.
It does not judge, prioritize, or editorialize.

It **harvests signals** from the world —
then quietly writes them down. **In markdown.**

That’s it.

It’s not a thinker. It’s a scribe.
It’s the structured upstream data layer that RWT uses to see clearly.

TruthSeeker doesn’t try to make sense of the world.
It just logs what happened — ***verifiably, recurringly, contradiction-safe.***

> Like a haunted Roomba with a PhD in macroeconomics and a deep mistrust of central banks.

---

### 🌐 Where This Flows

TruthSeeker is upstream of **RealWorldTalk (RWT)** — a hybrid human/AI digest that does the ***actual synthesis.***

TruthSeeker builds:
- `core.md` — global politics, systems, collapse signals
- `light.md` — cultural flashes, strange shifts, soft warnings
- `finance.md` — compressed daily macro snapshot

TruthSeeker ***preserves the map.***
RWT tries to name the terrain.

---

### 🛠️ What It Does

- Scrapes 30+ live RSS feeds from bias-tagged sources
- Pulls 50+ macro indicators from Yahoo / FRED / BoE
- Runs a strict local LLM for classification (no hallucinations)
- Routes everything to one of three destinations:
  - `core.md` — hard signal, systems, power
  - `light.md` — pattern flashes, context
  - `junk/` — newsletters, promos, rewrites, noise

**Important:**
> TruthSeeker ***does not think.***
> It does not care what the story is.
> It just logs reality with timestamped precision.

LLM summaries exist only to compress data size — all classification is schema-locked, hallucination-proof, and JSON-validated.

---

### 🧬 Files That Matter

```
daily/
├── digests/
│   ├── YYYY-MM-DD_core.md
│   ├── YYYY-MM-DD_light.md
│   └── YYYY-MM-DD_finance.md
├── sources/         ← scraped articles
├── flagged/         ← queued for classification
├── archive/         ← classified + routed
├── junk/            ← skipped / rejected

datafeeds/
└── finance/
    ├── daily/       ← raw finance snapshots
    └── source_logs/ ← per-symbol logs (time series)
```

---

### 🧠 The Stack

#### RSS & Special Feeds
- `harvester.py` — main fetcher (RSS)
- `special_sources.py` — pseudo-feeds (e.g. RSOE disaster data)
- `archive.py` — scraping + routing with bias metadata, junk logic, dedupe handling
- `browser.py` — custom headless browser, selector-based, battle-tested, low-key psychic

#### LLM Pipeline
- `digest.py` — main digester pipeline
- `playground.py` — dev/debug mode with verbose inspection
- Classification via `mistral-nemo:latest` (via Ollama)
- Schema is frozen. Output is always JSON with:
  - `category`, `scope`, `summary`

#### Finance Harvester
- `finance.py` — Yahoo Finance (ETFs, crypto, FX, indexes, commodities)
- `fred_harvester.py` — FRED macro data
- `boe_harvester.py` — UK macro via BoE database
- `finance_digest.py` — compression layer to build `finance.md`

#### Core Utilities
- `utils.py` — formatting, logging, markdown snapshot assembly
- `config.py` — LLM schema, source tags, skip logic, digest rules

---

### 🚀 CLI Interfaces

#### Primary Controls
- `main.py` — interactive tool for harvesting + classifying
- `main_automated_verbose.py` — full-stack loop (auto-daily mode)

#### Automation Flow
```text
1. Yahoo snapshot
2. BoE + FRED macro
3. Compressed digest (finance.md)
4. News harvest + scrape
5. LLM classifier → routed to digest or archive
```

---

### 🧾 LLM Classification

Strict prompt. Frozen schema. Zero vibes.

All summaries:
- Must follow exact JSON format
- Must assign `category` + `scope`
- Must not editorialize, speculate, or go off-script

LLM output is:
- Audited
- Stored with ID, URL, source metadata
- Sent to either `core`, `light`, or `junk` buckets

> TruthSeeker doesn’t decide what’s important.
> It just shuffles the cards and marks which deck they go in. And if the cards catch fire? It logs that too.

---

### 🧠 Philosophy

TruthSeeker does not:
- Interpret
- Synthesize
- Filter for relevance

TruthSeeker *does*:
- Harvest fact-first
- Preserve contradictions
- Hold time still with markdown and receipts

It is not a journalist.
It is not a summarizer.
It is **infrastructure.**

> Built for the long night.
> Maintained by weirdos with boundary issues and impeccable timestamp hygiene.
> Assisted by daemons who log reality in lowercase and italics.

> The map is not the territory. But it helps if the map isn’t lying.