demuxfb¶
demuxfb - parse Facebook conversation archives¶
demuxfb is a Python package to reframe conversations from Facebook ‘Download Your Information’ json dumps into a more exact form, accounting for the different categorizations of messages that the json metadata itself does not distinguish.
Github: https://github.com/nick-killeen/demuxfb
Warning on Misclassification¶
The exportation functionality Facebook provides is not one-to-one, so reverse-engineering from the compressed form will inevitably result in some misclassification errors. This package takes the route of parsimony rather than trying to finesse the ‘overclassifaction’ and ‘underclassifcation’ margin with a particular context in mind. Expect misclassification.
Functions¶
build_chat
Builds a
Chat
from a Facebook archive – performs the package’s task.
Modules¶
media
Defines media types used by
demuxfb.message.MediaMessage
.message
Defines the classification structure of messages.
Example
This example demonstrates the orchestration of a call to build_chat
and a
simple usage of the resultant Chat
object.
>>> from pathlib import Path
>>> import demuxfb
>>> path = Path('C:/users/nicho/downloads/facebook-nicholaskilleen/'
... 'messages/inbox/ourchat_95kldfjg4')
>>> feed = demuxfb.ChatFolderFeed(path)
>>> chat = demuxfb.build_chat(feed, 'Nicholas Killeen') # May take a while.
>>> print('Number of text messages in the conversation:',
... len([message for message in chat.messages
... if isinstance(message, demuxfb.message.TextMessage)]))
- class demuxfb.Chat¶
Bases:
object
A detailed object representing a Facebook conversation.
- messages¶
Messages in the conversation, ordered by the time they were sent (earliest first).
- Type:
List[demuxfb.message.Message]
- participants¶
Participants in the conversation.
- Type:
Set[demuxfb.Participant]
- get_participant(name: str) Participant | None ¶
Get the
demuxfb.Participant
object uniquely identifying the given chat-member.- Parameters:
name (str) – The exact(case-sensitive) Facebook account name of the chat-member to get, as captured at the time of the archive snapshot.
- Returns:
The participant corresponding to
name
, orNone
if no such participant was active in the chat.- Return type:
demuxfb.Participant or None
See also
- get_unknown_participant() Participant ¶
Get the unique (within the chat)
demuxfb.Participant
object that identifies all ‘anonymous’ chat members – those who have blocked you, deleted their accounts, or whose name is otherwise missing under certain contexts.Even if a participant has a valid named identity, some of their involvements may be attributed to this unknown persona where Facebook fails to be explicit about their identity.
Note: though there may be multiple distinct unidentifiable people in a conversation, they are all characterized by the one object this function returns.
- Returns:
The unique (within the chat) object characterizing cases where a named participant identity is not present.
- Return type:
See also
- class demuxfb.ChatFeed¶
Bases:
ABC
Interface for an adapter to extract a chat’s json data from some type of source. Expected by
demuxfb.build_chat
.See also
- abstract message_json_iter() Iterator[dict] ¶
Return an iterator through all of the json messages in the chat, oldest first.
- Returns:
An iterator over the json messages in the chat, oldest first.
- Return type:
Iterator[dict]
- class demuxfb.ChatFileFeed(file: Path)¶
Bases:
ChatFeed
Adapter to extract a chat’s json data from a single json file.
- __init__(file: Path) None ¶
Build feed from a json file.
- Parameters:
file (pathlib.Path) – Path to a json file representing the chat, as exported by the ‘Download Your Information’ Facebook feature. The file must be unzipped.
- Raises:
InvalidChatFeedException – If the file cannot be opened for reading, or cannot be parsed as json.
- message_json_iter() Iterator[dict] ¶
Return an iterator through all of the json messages in the chat, oldest first.
- Returns:
An iterator over the json messages in the chat, oldest first.
- Return type:
Iterator[dict]
- class demuxfb.ChatFolderFeed(folder: Path)¶
Bases:
ChatFeed
Adapter to extract a chat’s json data from a folder of
message_1.json
,message_2.json
, … files.- __init__(folder: Path) None ¶
Build feed from a folder of json files.
- Parameters:
folder (pathlib.Path) – Path to a directory of json files representing the chat, as exported by the ‘Download Your Information’ Facebook feature. The folder must be unzipped, and contain some number of files exactly of the names
message_1.json
,message_2.json
, …- Raises:
If
folder
is not a directory, is empty, does not contain solely ‘message_<NUM>.json’ files; or if any subfile cannot be opened for reading or cannot be parsed as json.
- message_json_iter() Iterator[dict] ¶
Return an iterator through all of the json messages in the chat, oldest first.
- Returns:
An iterator over the json messages in the chat, oldest first.
- Return type:
Iterator[dict]
- class demuxfb.IntervalProgressReporter(report_interval_seconds: float = 1.0, report_function: ~typing.Callable[[str], ~typing.Any] = <built-in function print>)¶
Bases:
ProgressReporter
ProgressReporter that logs time and number of messages processed at a regular interval.
- __init__(report_interval_seconds: float = 1.0, report_function: ~typing.Callable[[str], ~typing.Any] = <built-in function print>) None ¶
Create reporter.
- Parameters:
report_interval_seconds (float, defaults to 1.0) – Interval (in seconds) to report at.
report_function (function, defaults to print) – Function that takes in a str and logs its value via some side-effect. This function will be used to make the reports.
- finish() None ¶
Called when Chat construction finishes.
- finish_message(message: Message) None ¶
Called when a message has finished being constructed.
- Parameters:
message (demuxfb.mesage.Message) – The message that was just constructed.
- start() None ¶
Called when Chat construction begins.
- exception demuxfb.InvalidChatFeedException¶
Bases:
Exception
Error for when
ChatFeed
construction fails.
- class demuxfb.Participant(name: str, is_me: bool = False)¶
Bases:
object
Identifies a chat participant.
Two Participant objects represent the same person if and only if they are equivalent (they reference the same location in memory). All unattributable actions are said to be done by one ‘unknown’ persona.
Note: object-equivalency does not hold across multiple chats.
- get_name() str ¶
Get this participant’s Facebook account name.
- Returns:
This partipant’s Facebook account name. The value will be
'Facebook User'
if the participant is anonymous.- Return type:
str
- is_me() bool ¶
Return true if this participant is the one who downloaded the Facebook archive.
- Returns:
True if this participant is the one who downloaded the Facebook archive.
- Return type:
bool
- class demuxfb.ProgressReporter¶
Bases:
ABC
Interface for reporting on progress during the construction of a chat, which can take a while. This is an optional argument to
demuxfb.build_chat
.See also
- abstract finish() None ¶
Called when Chat construction finishes.
- abstract finish_message(message: Message) None ¶
Called when a message has finished being constructed.
- Parameters:
message (demuxfb.mesage.Message) – The message that was just constructed.
- abstract start() None ¶
Called when Chat construction begins.
- class demuxfb.Reaction(emoji: str, sender: Participant)¶
Bases:
object
- emoji: str¶
- sender: Participant¶
- demuxfb.build_chat(feed: ChatFeed, owner_name: str, progress_reporter: ProgressReporter | None = None) Chat ¶
Build a detailed chat object from an archive.
- Parameters:
feed (demuxfb.ChatFeed) – The feed defining the source that the json conversation data is to be read from.
owner_name (str) – The Facebook account name of the person who downloaded the Facebook archive. This is needed so the builder knows which participant ‘you’ refers to.
progress_reporter (demuxfb.ProgressReporter, optional) – Used to report progress in the process of building the chat. If unspecified, no reporting will take place.
- Returns:
A detailed object representing the chat read from the specified feed.
- Return type:
- Raises:
NoMatchingRuleException – When no enabled message-matching rule in the ruleset matches a json element of the feed.