Investigative journalism tools Free software and open-source tools for journalists, journalistic research, discovery, investigative reporting, privacy, data visualization, data driven journalism and datajournalism Investigative journalism and journalistic research Sie sind hier
Startseite
Free software and open source tools for investigative journalism and journalistic research
Free software and open source tools for investigative journalism and journalistic research
Free software for journalists: Tutorials, bookmarks and open source tools for journalistic research, investigations and privacy and other digital tools for investigative journalism and data driven journalism or datajournalism: Independent media tools for journalists and investigative reporting
With free open source software it is possible to run research tools for sensitive documents or data on your own computer or server instead of spying cloud services. Tutorials and tips: How to use open source research tools for investigative journalism
How to search, sort, explore and filter large document collections or many search results
How to use boolean search operators
Tagging and annotation for collaborative investigative journalism
How to (fuzzy) search by a list which entries or names of the list occure in document sets
How to structure data for exploratory search, aggregated overviews and interactive filters with facets and named entities
How to integrate Open Data i.e. from Wikidata to enhance search and structure your document collections
How to setup an own desktop search engine for many documents in a virtual maschine
How to setup an search engine on an encrypted usb key or external harddrive
Tips, tools and How-tos for safer online communications: Surveillance self-defense
Security in a box
Encryption works
Information Security for Journalists
How to build interactive maps with CartoDB
Understanding language data: Open-source NLP software can help
How to scrape structured data from websites with Python and Scrapy
Toolbox: Free software, open source tools and resources
Free software and open source discovery and research tools for journalists:
Search engines for fulltext search and discovery
Research methods, techniques and technology: Fulltext search, Information retrieval, Desktop Search, Enterprise Search and faceted search
Tutorials:
How to search, sort, explore and filter large document collections or many search results
How to use boolean search operators
Open source search tools:
Open Semantic Search: Own semantic search engine
Open Semantic Desktop Search: Own search engine for single desktop users and laptops
InvestigateIX: Secure search engine on encrypted external devices
Recoll: Desktop search for Linux
Fuzzy search with lists
FESS: Enterprise Search engine with user interfaces for search and crawling of files and websites
Kibana for Elastic Search: Search and datavisualization of a Elastic Search index
Banana for Solr: Search and datavisualization of a Solr index
HUE Solr search: Search and datavisualization of a Solr index
Open Semantic ETL for Solr or Open Semantic ETL for Elastic Search: Tools to import files and documents of different file formats to a search index
Apache Manifold CF: Tools to import files to an Elastic Search or Solr search index
Search libraries and APIs
If you want code yourself, you can use this powerful engines as base:
Solr: Index and search API
Elastic Search: Index and search API
Databases, digital archives, data management systems, document management systems and content management systems
Methods: Archive, database, forms, categories (tagging), classification, meta data, repository, document management (DMS), content management (CMS) or enterprise content management (ECM), knowledge management, knowledge base, bookmarks
Zotero: Bookmark database and citations manager with tagging and annotation features
Docear: Bookmark database and citations manager with mindmap, tagging and annotation features
LibreOffice Calc: Open source spreadsheet program
Document cloud: Document management system for paper based documents like scans or PDF
Semantic Mediawiki: Extends Mediawiki to a semantic data base
Drupal CMS: The CMS module fields provides an easy to use UI to create own content types, data fields and forms
Agorum: Automated extraction of structured amounts of money from bills
Tagging and annotation
Methods: Annotation, Tagging, Social Tagging, Folxonomies
Tutorial: Tagging and annotation for collaborative investigative journalism
Zotero: Bookmark database and citations manager with tagging and annotation features
Docear: Bookmark database and citations manager with mindmap, tagging and annotation features
Document Cloud: Tagging and annotation for paper based documents like scans or PDF documents
Neonion: Collaborative annotations within text
Pundit: Annotations within text and within images
Hypothesis
Annotator.js
Text mining, text analysis and document mining
Method: Text mining, Natural Language Processing (NLP), Named entities extraction
Text mining tutorial: How to analyze large document collections: Text mining with the search engine Open Semantic Search
Understanding language data: Open-source NLP software can help
Overview project: Showing most used words and trees of most used words
Jigsaw: Text mining tool (not open source, but free download)
More:
Wikipedia list of open source text mining software
Tapor: Text Analysis Portal for Research
Reconcilation and merging
Methods: Compare, merge, reconcile, link, clustering
Fuzzy search with lists: Checks, if there are search result for each list entry
OpenRefine
DocDiff: Shows and visualize the differences between two versions of a text
Fslint: Compares two directories and searches for same files which are in both directories
Graphs and social network analysis (SNA)
Tools to analyze and visualize connections and relations:
Network analysis tutorial: How to visualize connections & relations in documents with Open Semantic Search
Gephi: Desktop tool for analysis and data visualization of networks, connections and graphs
Cytoscape.js: Javascript library for data visualization of networks, connections and graphs
Semantic Mediawiki: Very flexible CMS for linked data
Detective: Python/Django and neo4j graph database based CMS for connections
Privacy, security, safety and encryption
Digital security: Protect your research, sources and whistleblowers with privacy tools and encryption tools:
Methods: Encryption (PGP, OTR) and anonymization
Tutorials:
Surveillance self-defense: Tips, Tools and How-tos for Safer Online Communications
Security in a box
Encryption works
How to setup an search engine on an encrypted usb key or external harddrive
Information Security for Journalists
Open source tools:
Tails - the amnesic incognito live system Linux based operating system for encryption and anonymous access of the internet
Truecrypt: Hard disk encryption for windows
GNUPG: Open PGP based - Email encryption
Enigmail: Encryption plugin for the Thunderbird E-Mail client
Tor project: Anonymity online
OTR: Encryption for chats and instant messaging
Textsecure: Messenger for encryption (like Whatsapp but for privacy)
Jitsy: Encrypted communicator (like Skype but open source and safer end to end encryption)
Redphone: Encrypted voice over IP communicator for smartphones
Secure Drop: Upload platform for whistleblowers
Global Leaks: Another upload platform for whistleblowing
Media monitoring, news filtering, news pipes and alerts
Open source software for media monitoring, news processing, news filtering and alerting:
Open Semantic Search rules for news pipes and alerts: Filters and alerts for news from different news sources and data sources. Has a very powerfull filter and search query language (Apache Lucene based), f.e. supporting fuzzy search. Supports many file formats and data sources because you can use all standard connectors for Solr.
Mozilla Thunderbird: Desktop software for reading, filtering and autotagging RSS-Feeds
Streamtools: Visual news pipes for stream processing from the New York Times Lab
Huginn: Ruby on rails and SQL based agents
Extract data or convert data
Methods: Data integration, extraction, data converter, data migration, ETL (Extract Transfer Load), Scraping Extract text or structured data from documents
Documents: Tika content analysis toolkit: Extract text and meta data from documents of many different file formats
CSV tables: CSV Manager: Import big csv spreadsheets to Solr based search engines
PDF tables: Tabula: Extracts spreadsheets from PDF documents
Scans and images: Optical character regognition (OCR)
Extract text from images (OCR)
Tesseract: OCR Software to recognize text from images
Scantailor: Deskewing low quality scans
Extract text from sound files (speech recognition)
CMU Sphinx: Open source speech recognition toolkit
Extract structured data from websites (Scraping)
Portia: Extract structured data from websites by a visual user interface
Scrapy: Extract structured data from websites by Python scrapers
Extract transform load (ETL) Frameworks for import and transform or convert data
Transform to plain text: Tika content analysis toolkit
Apache NiFi: Extract, transform, load and distribute data
Talend Open Studio: Import and transform data to other formats
Kettle: Import and transform data to other formats
LogStash: Import and transform data from datasources like logfiles to an structured search index
Data visualization
Method: data visualization
Tools for data visualization or data visualisation:
Kibana for Elastic Search: User interface for search, interactve filtering and data visualization
D3js data driven documents: Data visualization library for Javascript programmers
CartoDB: Open source webapplication and mapping tool for data visualization of spatial data
Apache Zeppelin: Interactive data analysis and data visualisation plattform
TimelineJS: Creating timelines
Cytoscape.js: Javascript library for data visualization of networks, connections and graphs
Semantic result formats: Data visualizations for data from a Semantic Mediawiki
Charts and diagrams
Datawrapper - Webapp and user interface for easy generating charts
HUE Solr search
Kibana for Elastic Search
Apache Zeppelin
Superset
Banana for Solr
NVD3: Javascript library for easy programming of charts with D3
Maps and mapping (spatial data)
Create interactive maps and visualize spatial data (geodata) with open source software for mapping:
CartoDB: Open source webapplication and mapping tool for interactive maps
QGIS: Open source desktop tool for maps
Leaflet: Javascript library for interactive maps
Open Layers: Powerfull javascript library for maps
Open Street Map: Open source and open data for maps
GeoParsePy: Open source for geo parsing to extract geodata for mapping like places and locations from text
Serving tiles: How to run your own map server with open source software
Visualize events on a timeline
Create timelines with open source timeline tools and visualize events on interactive multimedia timelines:
Tutorial on timelines
TimelineJS
Simile Timeline
Odyssey.js: Combines a timeline with a map for timelines for spatial data
Graphs, networks, connections and relations
Network analysis tutorial: How to visualize connections & relations in documents with open semantic search
Gephi: Desktop tool for analysis and data visualization of networks, connections and graphs
Cytoscape.js: Javascript library for data visualization of networks, connections and graphs
Sigma js: Javascript library for data visualization of networks, connections and graphs
Redact documents and delete meta data
Clean sensitive documents and delete meta data stored invisible inside the document files or photos like serial numbers of hardware (i.e. of your photo camera) or software or user names:
PDF Redact Tools: Most secure way to delete meta data from PDFs
MAT: Metadata Anonymisation Toolkit: Userinterface to delete meta data from different document formats and image formats
Statistics and analytics
Method: Data analysis, statistics, chart, diagram, data visualization
LibreOffice Calc: Open source spreadsheet program
HUE Solr search
Kibana for Elastic Search
Statistical software: Specialized computer programs for statistical analysis and econometric analysis
Business Intelligence: Tools for statistics and analytics
Programming with R or Python or another programming language
Business Intelligence: Tools for statistics and analytics
Mining of massive datasets: Book (free PDF download) explaining data mining methods
Universal open source toolset
The ultimate universal open source toolset is a Linux distribution like Debian GNU/Linux or Ubuntu Linux comming with thousands of packages of free software and open source tools, software libraries and programming languages.
You dont have to remove your existing operating system: With open-source virtualization software like Virtual Box for Windows or Mac you can run a Linux distribution within a window in your existing operating system environment.
Maybe you want to start with Linux on your existing system environment with the preconfigurated Debian based virtual maschine (VM) Open Semantic Desktop Search providing a preselected and preconfigurated collection of tools for investigative journalists. Subscribe RSS-NewsfeedFacebookTwitter Subscribe to our Newsfeed. Investigative journalism tools
Search tools
Text analysis, text mining and document mining
Annotation
Databases and document management
Graphs and social network analysis
Privacy, security & encryption
Data visualization
News, monitoring and alerts
Datavisualization
Charts
Timelines
Mapping: Interactive maps
Networks, connections and relations