HarvardLawReviewScraper

Harvard Law Review
maps to Harvard Law Review (id 1)
Latest Status
partial #4240
Latest Metrics
d=9  |  skip=105  |  err=0
t=557.2s
Implementation
LightBaseScraper
playwright
Law Review
Uploads Pending
0
Last Upload
2026-05-31 21:25:20
uabox:Law_Review_Project/harvard_law_review_20260531_212500.zip

Definition

scraper_id
HarvardLawReviewScraper
canonical_name
Harvard Law Review
institution_code
-
platform
playwright
base_class
LightBaseScraper
class_name
HarvardLawReviewScraper
module_path
scrapers.harvard_law_review_scraper
file_path
scrapers/harvard_law_review_scraper.py
has_cli_entrypoint
true
is_abstract
false
discovered_at
2026-03-30 20:11:12
updated_at
2026-06-17 03:35:43

Run History

Showing 7 runs (law_review_id=1) — use ?limit=200 for more.
Run Status Start End Runtime Metrics Error / Details Logs
#4240 partial 2026-05-31T19:06:29+00:00 2026-05-31T19:15:47+00:00 557.2s d=9  |  skip=105  |  err=0
discovered=114  |  processed=114
-
extra_json
{"automation_cycle_id": 1499, "canonical_name": "Harvard Law Review", "child_pid": 1238525, "discovery_cutoff": true, "discovery_cutoff_details": {"consecutive_duplicates": 40, "processed_articles": 114, "queued_items": 9}, "discovery_cutoff_elapsed_seconds": 487, "discovery_cutoff_max_runtime_seconds": null, "discovery_cutoff_phase": "discovery", "discovery_cutoff_reason": "duplicate_streak", "file_path": "scrapers/harvard_law_review_scraper.py", "heartbeat_at": "2026-05-31T19:15:29+00:00", "heartbeat_source": "orchestrator", "law_review_id": 1, "orchestrator": "lrscraper", "orchestrator_started_at": "2026-05-31T19:06:29+00:00", "run_kind": "scheduled_active", "scraper_id": "HarvardLawReviewScraper", "script_path": "scrapers/harvard_law_review_scraper.py", "stderr_path": "logs/orchestrator_runs/1780254389_HarvardLawReviewScraper.err.log", "stdout_path": "logs/orchestrator_runs/1780254389_HarvardLawReviewScraper.out.log", "timeout_minutes": 45}
stdout | stderr
#3566 partial 2026-05-01T17:29:19+00:00 2026-05-01T17:30:27+00:00 67.2s d=20  |  skip=87  |  err=0
discovered=107  |  processed=107
-
extra_json
{"automation_cycle_id": 588, "canonical_name": "Harvard Law Review", "child_pid": 856921, "discovery_cutoff": true, "discovery_cutoff_details": {"consecutive_duplicates": 40, "processed_articles": 107, "queued_items": 20}, "discovery_cutoff_elapsed_seconds": 52, "discovery_cutoff_max_runtime_seconds": null, "discovery_cutoff_phase": "discovery", "discovery_cutoff_reason": "duplicate_streak", "file_path": "scrapers/harvard_law_review_scraper.py", "heartbeat_at": "2026-05-01T17:30:19+00:00", "heartbeat_source": "orchestrator", "law_review_id": 1, "orchestrator": "lrscraper", "orchestrator_started_at": "2026-05-01T17:29:19+00:00", "run_kind": "scheduled_active", "scraper_id": "HarvardLawReviewScraper", "script_path": "scrapers/harvard_law_review_scraper.py", "stderr_path": "logs/orchestrator_runs/1777656559_HarvardLawReviewScraper.err.log", "stdout_path": "logs/orchestrator_runs/1777656559_HarvardLawReviewScraper.out.log", "timeout_minutes": 45}
stdout | stderr
#2696 partial 2026-03-08T05:09:49+00:00 2026-03-08T05:10:31+00:00 42.0s d=7  |  skip=75  |  err=0
discovered=82  |  processed=82
-
extra_json
{"canonical_name": "Harvard Law Review", "child_pid": 2439295, "discovery_cutoff": true, "discovery_cutoff_details": {"consecutive_duplicates": 40, "processed_articles": 82, "queued_items": 7}, "discovery_cutoff_elapsed_seconds": 36, "discovery_cutoff_max_runtime_seconds": null, "discovery_cutoff_phase": "discovery", "discovery_cutoff_reason": "duplicate_streak", "file_path": "scrapers/harvard_law_review_scraper.py", "heartbeat_at": "2026-03-08T05:10:19+00:00", "heartbeat_source": "orchestrator", "law_review_id": 1, "orchestrator": "lrscraper", "orchestrator_started_at": "2026-03-08T05:09:49+00:00", "scraper_id": "HarvardLawReviewScraper", "script_path": "scrapers/harvard_law_review_scraper.py", "stderr_path": "logs/orchestrator_runs/1772946589_HarvardLawReviewScraper.err.log", "stdout_path": "logs/orchestrator_runs/1772946589_HarvardLawReviewScraper.out.log", "timeout_minutes": 45}
stdout | stderr
#590 success 2026-02-06T02:43:34+00:00 2026-02-06T02:52:11+00:00 516.5s d=9  |  skip=0  |  err=0
discovered=9  |  processed=9
-
extra_json
{"canonical_name": "Harvard Law Review", "child_pid": 2256464, "file_path": "scrapers/harvard_law_review_scraper.py", "heartbeat_at": "2026-02-06T02:52:04+00:00", "heartbeat_source": "orchestrator", "law_review_id": 1, "orchestrator": "lrscraper", "orchestrator_started_at": "2026-02-06T02:43:34+00:00", "scraper_id": "HarvardLawReviewScraper", "script_path": "scrapers/harvard_law_review_scraper.py", "stderr_path": "logs/orchestrator_runs/1770345814_HarvardLawReviewScraper.err.log", "stdout_path": "logs/orchestrator_runs/1770345814_HarvardLawReviewScraper.out.log", "timeout_minutes": 25}
stdout | stderr
#521 failed 2026-01-28T05:32:57+00:00 2026-01-28T05:33:21+00:00 24.0s d=0  |  skip=0  |  err=0
discovered=0  |  processed=0
BrokenPipeError: [Errno 32] Broken pipe
traceback
Traceback (most recent call last):
  File "/home/arbel/sites/lrscraper/scrapers/harvard_law_review_scraper.py", line 103, in discover_urls
    self.print_status(f"Found: {metadata['title']} ({filename})")
  File "/home/arbel/sites/lrscraper/light_base_scraper.py", line 107, in print_status
    print(msg)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/arbel/sites/lrscraper/light_base_scraper.py", line 309, in run
    items = await self.discover_urls()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arbel/sites/lrscraper/smart_scraper_simple.py", line 385, in wrapped_discover
    items = await original_discover()
            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arbel/sites/lrscraper/scrapers/harvard_law_review_scraper.py", line 109, in discover_urls
    self.print_status(f"Error processing {article_url}: {e}", "error")
  File "/home/arbel/sites/lrscraper/light_base_scraper.py", line 107, in print_status
    print(msg)
BrokenPipeError: [Errno 32] Broken pipe
extra_json
{"canonical_name": "Harvard Law Review"}
-
#452 timeout 2026-01-22T12:30:34+00:00 2026-01-22T13:15:34+00:00 2700.0s d=0  |  skip=0  |  err=1
discovered=-  |  processed=-
timeout: Timeout after 45 minutes
extra_json
{"returncode": null}
-
#1 success 2025-12-22T22:46:11.634716 2025-12-23T00:14:34.578532 5302.9s d=11  |  skip=0  |  err=0
discovered=-  |  processed=-
- -

Runs (scraper_name = HarvardLawReviewScraper)

These are runs recorded explicitly under this scraper_id.
Run Status Start End Runtime Metrics Error / Details Logs
#4240 partial 2026-05-31T19:06:29+00:00 2026-05-31T19:15:47+00:00 557.2s d=9  |  skip=105  |  err=0
discovered=114  |  processed=114
-
extra_json
{"automation_cycle_id": 1499, "canonical_name": "Harvard Law Review", "child_pid": 1238525, "discovery_cutoff": true, "discovery_cutoff_details": {"consecutive_duplicates": 40, "processed_articles": 114, "queued_items": 9}, "discovery_cutoff_elapsed_seconds": 487, "discovery_cutoff_max_runtime_seconds": null, "discovery_cutoff_phase": "discovery", "discovery_cutoff_reason": "duplicate_streak", "file_path": "scrapers/harvard_law_review_scraper.py", "heartbeat_at": "2026-05-31T19:15:29+00:00", "heartbeat_source": "orchestrator", "law_review_id": 1, "orchestrator": "lrscraper", "orchestrator_started_at": "2026-05-31T19:06:29+00:00", "run_kind": "scheduled_active", "scraper_id": "HarvardLawReviewScraper", "script_path": "scrapers/harvard_law_review_scraper.py", "stderr_path": "logs/orchestrator_runs/1780254389_HarvardLawReviewScraper.err.log", "stdout_path": "logs/orchestrator_runs/1780254389_HarvardLawReviewScraper.out.log", "timeout_minutes": 45}
stdout | stderr
#3566 partial 2026-05-01T17:29:19+00:00 2026-05-01T17:30:27+00:00 67.2s d=20  |  skip=87  |  err=0
discovered=107  |  processed=107
-
extra_json
{"automation_cycle_id": 588, "canonical_name": "Harvard Law Review", "child_pid": 856921, "discovery_cutoff": true, "discovery_cutoff_details": {"consecutive_duplicates": 40, "processed_articles": 107, "queued_items": 20}, "discovery_cutoff_elapsed_seconds": 52, "discovery_cutoff_max_runtime_seconds": null, "discovery_cutoff_phase": "discovery", "discovery_cutoff_reason": "duplicate_streak", "file_path": "scrapers/harvard_law_review_scraper.py", "heartbeat_at": "2026-05-01T17:30:19+00:00", "heartbeat_source": "orchestrator", "law_review_id": 1, "orchestrator": "lrscraper", "orchestrator_started_at": "2026-05-01T17:29:19+00:00", "run_kind": "scheduled_active", "scraper_id": "HarvardLawReviewScraper", "script_path": "scrapers/harvard_law_review_scraper.py", "stderr_path": "logs/orchestrator_runs/1777656559_HarvardLawReviewScraper.err.log", "stdout_path": "logs/orchestrator_runs/1777656559_HarvardLawReviewScraper.out.log", "timeout_minutes": 45}
stdout | stderr
#2696 partial 2026-03-08T05:09:49+00:00 2026-03-08T05:10:31+00:00 42.0s d=7  |  skip=75  |  err=0
discovered=82  |  processed=82
-
extra_json
{"canonical_name": "Harvard Law Review", "child_pid": 2439295, "discovery_cutoff": true, "discovery_cutoff_details": {"consecutive_duplicates": 40, "processed_articles": 82, "queued_items": 7}, "discovery_cutoff_elapsed_seconds": 36, "discovery_cutoff_max_runtime_seconds": null, "discovery_cutoff_phase": "discovery", "discovery_cutoff_reason": "duplicate_streak", "file_path": "scrapers/harvard_law_review_scraper.py", "heartbeat_at": "2026-03-08T05:10:19+00:00", "heartbeat_source": "orchestrator", "law_review_id": 1, "orchestrator": "lrscraper", "orchestrator_started_at": "2026-03-08T05:09:49+00:00", "scraper_id": "HarvardLawReviewScraper", "script_path": "scrapers/harvard_law_review_scraper.py", "stderr_path": "logs/orchestrator_runs/1772946589_HarvardLawReviewScraper.err.log", "stdout_path": "logs/orchestrator_runs/1772946589_HarvardLawReviewScraper.out.log", "timeout_minutes": 45}
stdout | stderr
#590 success 2026-02-06T02:43:34+00:00 2026-02-06T02:52:11+00:00 516.5s d=9  |  skip=0  |  err=0
discovered=9  |  processed=9
-
extra_json
{"canonical_name": "Harvard Law Review", "child_pid": 2256464, "file_path": "scrapers/harvard_law_review_scraper.py", "heartbeat_at": "2026-02-06T02:52:04+00:00", "heartbeat_source": "orchestrator", "law_review_id": 1, "orchestrator": "lrscraper", "orchestrator_started_at": "2026-02-06T02:43:34+00:00", "scraper_id": "HarvardLawReviewScraper", "script_path": "scrapers/harvard_law_review_scraper.py", "stderr_path": "logs/orchestrator_runs/1770345814_HarvardLawReviewScraper.err.log", "stdout_path": "logs/orchestrator_runs/1770345814_HarvardLawReviewScraper.out.log", "timeout_minutes": 25}
stdout | stderr
#521 failed 2026-01-28T05:32:57+00:00 2026-01-28T05:33:21+00:00 24.0s d=0  |  skip=0  |  err=0
discovered=0  |  processed=0
BrokenPipeError: [Errno 32] Broken pipe
traceback
Traceback (most recent call last):
  File "/home/arbel/sites/lrscraper/scrapers/harvard_law_review_scraper.py", line 103, in discover_urls
    self.print_status(f"Found: {metadata['title']} ({filename})")
  File "/home/arbel/sites/lrscraper/light_base_scraper.py", line 107, in print_status
    print(msg)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/arbel/sites/lrscraper/light_base_scraper.py", line 309, in run
    items = await self.discover_urls()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arbel/sites/lrscraper/smart_scraper_simple.py", line 385, in wrapped_discover
    items = await original_discover()
            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arbel/sites/lrscraper/scrapers/harvard_law_review_scraper.py", line 109, in discover_urls
    self.print_status(f"Error processing {article_url}: {e}", "error")
  File "/home/arbel/sites/lrscraper/light_base_scraper.py", line 107, in print_status
    print(msg)
BrokenPipeError: [Errno 32] Broken pipe
extra_json
{"canonical_name": "Harvard Law Review"}
-
#452 timeout 2026-01-22T12:30:34+00:00 2026-01-22T13:15:34+00:00 2700.0s d=0  |  skip=0  |  err=1
discovered=-  |  processed=-
timeout: Timeout after 45 minutes
extra_json
{"returncode": null}
-