demuxfb

demuxfb - parse Facebook conversation archives

demuxfb is a Python package to reframe conversations from Facebook ‘Download Your Information’ json dumps into a more exact form, accounting for the different categorizations of messages that the json metadata itself does not distinguish.

Github: https://github.com/nick-killeen/demuxfb

Warning on Misclassification

The exportation functionality Facebook provides is not one-to-one, so reverse-engineering from the compressed form will inevitably result in some misclassification errors. This package takes the route of parsimony rather than trying to finesse the ‘overclassifaction’ and ‘underclassifcation’ margin with a particular context in mind. Expect misclassification.

Functions

build_chat

Builds a Chat from a Facebook archive – performs the package’s task.

Modules

media

Defines media types used by demuxfb.message.MediaMessage.

message

Defines the classification structure of messages.

Example

This example demonstrates the orchestration of a call to build_chat and a simple usage of the resultant Chat object.

>>> from pathlib import Path
>>> import demuxfb
>>> path = Path('C:/users/nicho/downloads/facebook-nicholaskilleen/'
...             'messages/inbox/ourchat_95kldfjg4')
>>> feed = demuxfb.ChatFolderFeed(path)
>>> chat = demuxfb.build_chat(feed, 'Nicholas Killeen') # May take a while.
>>> print('Number of text messages in the conversation:',
...       len([message for message in chat.messages
...            if isinstance(message, demuxfb.message.TextMessage)]))
class demuxfb.Chat

Bases: object

A detailed object representing a Facebook conversation.

messages

Messages in the conversation, ordered by the time they were sent (earliest first).

Type:

List[demuxfb.message.Message]

participants

Participants in the conversation.

Type:

Set[demuxfb.Participant]

get_participant(name: str) Participant | None

Get the demuxfb.Participant object uniquely identifying the given chat-member.

Parameters:

name (str) – The exact(case-sensitive) Facebook account name of the chat-member to get, as captured at the time of the archive snapshot.

Returns:

The participant corresponding to name, or None if no such participant was active in the chat.

Return type:

demuxfb.Participant or None

get_unknown_participant() Participant

Get the unique (within the chat) demuxfb.Participant object that identifies all ‘anonymous’ chat members – those who have blocked you, deleted their accounts, or whose name is otherwise missing under certain contexts.

Even if a participant has a valid named identity, some of their involvements may be attributed to this unknown persona where Facebook fails to be explicit about their identity.

Note: though there may be multiple distinct unidentifiable people in a conversation, they are all characterized by the one object this function returns.

Returns:

The unique (within the chat) object characterizing cases where a named participant identity is not present.

Return type:

demuxfb.Participant

class demuxfb.ChatFeed

Bases: ABC

Interface for an adapter to extract a chat’s json data from some type of source. Expected by demuxfb.build_chat.

abstract message_json_iter() Iterator[dict]

Return an iterator through all of the json messages in the chat, oldest first.

Returns:

An iterator over the json messages in the chat, oldest first.

Return type:

Iterator[dict]

class demuxfb.ChatFileFeed(file: Path)

Bases: ChatFeed

Adapter to extract a chat’s json data from a single json file.

__init__(file: Path) None

Build feed from a json file.

Parameters:

file (pathlib.Path) – Path to a json file representing the chat, as exported by the ‘Download Your Information’ Facebook feature. The file must be unzipped.

Raises:

InvalidChatFeedException – If the file cannot be opened for reading, or cannot be parsed as json.

message_json_iter() Iterator[dict]

Return an iterator through all of the json messages in the chat, oldest first.

Returns:

An iterator over the json messages in the chat, oldest first.

Return type:

Iterator[dict]

class demuxfb.ChatFolderFeed(folder: Path)

Bases: ChatFeed

Adapter to extract a chat’s json data from a folder of message_1.json, message_2.json, … files.

__init__(folder: Path) None

Build feed from a folder of json files.

Parameters:

folder (pathlib.Path) – Path to a directory of json files representing the chat, as exported by the ‘Download Your Information’ Facebook feature. The folder must be unzipped, and contain some number of files exactly of the names message_1.json, message_2.json, …

Raises:

InvalidChatFeedException

  • If folder is not a directory, is empty, does not contain solely ‘message_<NUM>.json’ files; or if any subfile cannot be opened for reading or cannot be parsed as json.

message_json_iter() Iterator[dict]

Return an iterator through all of the json messages in the chat, oldest first.

Returns:

An iterator over the json messages in the chat, oldest first.

Return type:

Iterator[dict]

class demuxfb.IntervalProgressReporter(report_interval_seconds: float = 1.0, report_function: ~typing.Callable[[str], ~typing.Any] = <built-in function print>)

Bases: ProgressReporter

ProgressReporter that logs time and number of messages processed at a regular interval.

__init__(report_interval_seconds: float = 1.0, report_function: ~typing.Callable[[str], ~typing.Any] = <built-in function print>) None

Create reporter.

Parameters:
  • report_interval_seconds (float, defaults to 1.0) – Interval (in seconds) to report at.

  • report_function (function, defaults to print) – Function that takes in a str and logs its value via some side-effect. This function will be used to make the reports.

finish() None

Called when Chat construction finishes.

finish_message(message: Message) None

Called when a message has finished being constructed.

Parameters:

message (demuxfb.mesage.Message) – The message that was just constructed.

start() None

Called when Chat construction begins.

exception demuxfb.InvalidChatFeedException

Bases: Exception

Error for when ChatFeed construction fails.

class demuxfb.Participant(name: str, is_me: bool = False)

Bases: object

Identifies a chat participant.

Two Participant objects represent the same person if and only if they are equivalent (they reference the same location in memory). All unattributable actions are said to be done by one ‘unknown’ persona.

Note: object-equivalency does not hold across multiple chats.

get_name() str

Get this participant’s Facebook account name.

Returns:

This partipant’s Facebook account name. The value will be 'Facebook User' if the participant is anonymous.

Return type:

str

is_me() bool

Return true if this participant is the one who downloaded the Facebook archive.

Returns:

True if this participant is the one who downloaded the Facebook archive.

Return type:

bool

class demuxfb.ProgressReporter

Bases: ABC

Interface for reporting on progress during the construction of a chat, which can take a while. This is an optional argument to demuxfb.build_chat.

abstract finish() None

Called when Chat construction finishes.

abstract finish_message(message: Message) None

Called when a message has finished being constructed.

Parameters:

message (demuxfb.mesage.Message) – The message that was just constructed.

abstract start() None

Called when Chat construction begins.

class demuxfb.Reaction(emoji: str, sender: Participant)

Bases: object

emoji: str
sender: Participant
demuxfb.build_chat(feed: ChatFeed, owner_name: str, progress_reporter: ProgressReporter | None = None) Chat

Build a detailed chat object from an archive.

Parameters:
  • feed (demuxfb.ChatFeed) – The feed defining the source that the json conversation data is to be read from.

  • owner_name (str) – The Facebook account name of the person who downloaded the Facebook archive. This is needed so the builder knows which participant ‘you’ refers to.

  • progress_reporter (demuxfb.ProgressReporter, optional) – Used to report progress in the process of building the chat. If unspecified, no reporting will take place.

Returns:

A detailed object representing the chat read from the specified feed.

Return type:

demuxfb.Chat

Raises:

NoMatchingRuleException – When no enabled message-matching rule in the ruleset matches a json element of the feed.