Covariates available after instancing a Cleaner object

The following list provides an overview of the available covariates in an instanciated cleaner object. It can be accessed by the raw_data attribute. It is advised to keep the metadata parameter at default for data sets planned to be used with the remaining features of this package.

In case metadata=False (default):

  • created_at - timestamp of the creation of the corresponding tweet.
  • text - shows the complete text of a tweet, regardless of whether it’s longer than 140 characters or not.
  • text_tokens - contains the created lemmarized tokens from "text".
  • hashtags - contains the hashtag(s) of a tweet (without “#”)
  • center_coord_X - the X-coordinate of the center of the bounding box.
  • center_coord_Y - the Y-coordinate of the center of the bounding box.

In case metadata=True, these covariates are available additionally to the ones listed above:

  • extended_tweet - shows the complete text of a tweet if it is longer than 140 characters. Else None.
  • id - the tweets id as integer.
  • id_str - the tweets id as string.
  • place - sub-dictionary: contains information about the tweets associated location.
  • source - hyperlink to the Twitter website, where the tweet object is stored.
  • user - sub-dictionary: contains information about the tweets’ associated user.
  • emojis - contains the emoji(s) of a tweet.
  • bounding_box.coordinates_str - contains all bounding box coordinates as a string. Originates from place
  • retweet_count - number of retweets of the corresponding tweet.
  • favorite_count - number of favorites of the corresponding tweet.
  • user_created_at - timestamp of the users’ profile creation. Originates from user.
  • user_description - textual description of users’ profile. Originates from user.
  • user_favourites_count - The total number of favorites for all of the users tweets. Originates from user.
  • user_followers_count - The total number of followers of the user. Originates from user.
  • user_friends_count - The total number of users followed by the user. Originates from user.
  • user_id - profile id of the users profile as integer. Originates from user.
  • user_listed_count - The number of public lists which this user is a member of. Originates from user.
  • user_location - self-defined location by the user for the profile. Originates from user.
  • user_name - self-defined name for the user themselves. Originates from user.
  • user_screen_name - alias of the self-defined name for the user themselves. Originates from user.
  • user_statuses_count - number of tweets published by the user (incl. retweets). Originates from user.