.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/02_text_with_string_encoders.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via JupyterLite or Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_02_text_with_string_encoders.py: .. _example_string_encoders: ===================================================== Various string encoders: a sentiment analysis example ===================================================== In this example, we explore the performance of string and categorical encoders available in skrub. .. |GapEncoder| replace:: :class:`~skrub.GapEncoder` .. |MinHashEncoder| replace:: :class:`~skrub.MinHashEncoder` .. |TextEncoder| replace:: :class:`~skrub.TextEncoder` .. |TableReport| replace:: :class:`~skrub.TableReport` .. |TableVectorizer| replace:: :class:`~skrub.TableVectorizer` .. |pipeline| replace:: :class:`~sklearn.pipeline.Pipeline` .. |HistGradientBoostingClassifier| replace:: :class:`~sklearn.ensemble.HistGradientBoostingClassifier` .. |RandomizedSearchCV| replace:: :class:`~sklearn.model_selection.RandomizedSearchCV` .. |GridSearchCV| replace:: :class:`~sklearn.model_selection.GridSearchCV` .. GENERATED FROM PYTHON SOURCE LINES 40-46 The Toxicity dataset -------------------- We focus on the toxicity dataset, a corpus of 1,000 tweets, evenly balanced between the binary labels "Toxic" and "Not Toxic". Our goal is to classify each entry between these two labels, using only the text of the tweets as features. .. GENERATED FROM PYTHON SOURCE LINES 46-52 .. code-block:: Python from skrub.datasets import fetch_toxicity dataset = fetch_toxicity() X, y = dataset.X, dataset.y X["is_toxic"] = y .. GENERATED FROM PYTHON SOURCE LINES 53-55 When it comes to displaying large chunks of text, the |TableReport| is especially useful! Click on any cell below to expand and read the tweet in full. .. GENERATED FROM PYTHON SOURCE LINES 55-59 .. code-block:: Python from skrub import TableReport TableReport(X) .. raw:: html

	text	is_toxic
0	Elon Musk is a piece of shit, greedy capitalist who exploits workers, and offers nothing of real benefit to the world. All he’s done is make a name for himself on the backs of other people, using dirty money from his family’s emerald mine they acquired during apartheid. I don’t care that he’s autistic. He thinks we should be cured with his company’s AI chip. He is not a representation of our community. Don’t celebrate him on this page.	Toxic
1	The senile credit card shrill from Delaware needs to resign!!	Toxic
2	He does that a lot -- makes everyone look good but him...I guess it's also probably the Dems and the Media that force him to compulsively tweet abject bullshit like a lying bitch. They're tricky, them libs.	Toxic
3	F*ck Lizzo	Toxic
4	Epstein and trump were best buds!!! Pedophiles who play together!!	Toxic

995	My maternal abuelita taught me how to make plantain empanadas 🥺 and my paternal abuelita needed me to help her brush her dentures 😌 I love them so much 🥰	Not Toxic
996	Funnily enough I was looking online last week and wondering why nobody has opened an eSports/Gaming bar round here. Can’t wait to pop in at some point :)	Not Toxic
997	I can't bear how nice this is. I guess its bearnessities. I'll see my self out	Not Toxic
998	Going to buy a share of Tesla just to ensure it starts going back down	Not Toxic
999	I only saw a couple of these throughout the month and tried to figure out what all of them were. Only ones I missed were Star Guardian Seraphine (thought it was Heartbreaker) and I couldn't figure out the 2nd Soraka was Victorious. So all in all, you did a really good job nailing the characters and the theme presented! I think my faves are KDA Neeko and CCN Xayah.	Not Toxic

Column	Column name	dtype	Null values	Unique values	Mean	Std	Min	Median	Max
0	text	ObjectDType	0 (0.0%)	999 (99.9%)
1	is_toxic	ObjectDType	0 (0.0%)	2 (0.2%)

Column 1	Column 2	Cramér's V
text	is_toxic	0.100

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").

.. GENERATED FROM PYTHON SOURCE LINES 60-73 GapEncoder ---------- First, let's vectorize our text column using the |GapEncoder|, one of the `high cardinality categorical encoders `_ provided by skrub. As introduced in the :ref:`previous example`, the |GapEncoder| performs matrix factorization for topic modeling. It builds latent topics by capturing combinations of substrings that frequently co-occur, and encoded vectors correspond to topic activations. To interpret these latent topics, we select for each of them a few labels from the input data with the highest activations. In the example below we select 3 labels to summarize each topic. .. GENERATED FROM PYTHON SOURCE LINES 73-81 .. code-block:: Python from skrub import GapEncoder gap = GapEncoder(n_components=30) X_trans = gap.fit_transform(X["text"]) # Add the original text as a first column X_trans.insert(0, "text", X["text"]) TableReport(X_trans) .. raw:: html

Column	Column name	dtype	Unique values	Mean	Std	Min	Median	Max
0	text	ObjectDType	999 (99.9%)
1	text: feeling, subreddit, carzzy	Float64DType	999 (99.9%)	2.88	10.4	0.0522	0.162	133.
2	text: governments, government, destroying	Float64DType	999 (99.9%)	10.1	84.0	0.0502	0.687	2.59e+03
3	text: mississippi, beautiful, teammates	Float64DType	999 (99.9%)	5.59	18.9	0.0516	0.257	426.
4	text: yourselves, pseudoscience, ourselves	Float64DType	999 (99.9%)	12.2	90.9	0.0503	0.494	2.79e+03
5	text: successful, succession, legitimacy	Float64DType	999 (99.9%)	7.82	25.4	0.0512	0.539	630.
6	text: unrelentless, relentlessly, innocent	Float64DType	999 (99.9%)	10.1	59.2	0.0504	0.988	1.79e+03
7	text: actually, shallots, virtually	Float64DType	999 (99.9%)	3.06	14.1	0.0523	0.150	257.
8	text: volunteering, intimidated, fallen189	Float64DType	999 (99.9%)	9.50	45.0	0.0507	0.630	1.28e+03
9	text: marxists, qualified, qualifications	Float64DType	999 (99.9%)	9.01	49.5	0.0506	0.720	1.48e+03
10	text: previously, lackluster, shyvana	Float64DType	999 (99.9%)	10.2	90.5	0.0504	0.664	2.81e+03
11	text: salvatore, twinkling, imagined	Float64DType	999 (99.9%)	5.18	17.4	0.0544	0.249	269.
12	text: survivor, players, awwwww	Float64DType	999 (99.9%)	5.92	19.4	0.0522	0.230	281.
13	text: congress, controlled, arresting	Float64DType	999 (99.9%)	9.57	70.1	0.0503	0.786	2.16e+03
14	text: incompetent, leadership, competent	Float64DType	999 (99.9%)	5.23	22.7	0.0521	0.380	583.
15	text: probably, liquidate, nxhcplzrecw	Float64DType	999 (99.9%)	4.60	16.0	0.0516	0.163	201.
16	text: pizzaaaa, voice, lgbfjb	Float64DType	999 (99.9%)	3.14	11.2	0.0522	0.173	155.
17	text: deepening, between, bitchymitchy	Float64DType	999 (99.9%)	6.69	28.4	0.0506	0.536	800.
18	text: credentials, involved, technical	Float64DType	999 (99.9%)	5.38	22.4	0.0523	0.210	435.
19	text: asylum, diapers, apeshit	Float64DType	999 (99.9%)	3.79	13.2	0.0538	0.194	131.
20	text: conservatives, indoctrination, consequences	Float64DType	999 (99.9%)	8.39	38.2	0.0505	0.474	1.09e+03
21	text: nvrseqrvrqrqrr, ƞỉဌဌᕦѓ, qkcuk6	Float64DType	999 (99.9%)	2.70	16.3	0.0522	0.151	427.
22	text: screaming, horrible, bimbos	Float64DType	999 (99.9%)	5.07	14.5	0.0553	0.234	223.
23	text: afghanistan, withdrawal, apparently	Float64DType	999 (99.9%)	7.62	39.9	0.0510	0.538	1.18e+03
24	text: disrespectful, disrespects, respectful	Float64DType	999 (99.9%)	6.57	25.6	0.0518	0.392	571.
25	text: goinggoinggoing, recharging, productive	Float64DType	999 (99.9%)	7.89	27.7	0.0533	0.478	679.
26	text: liberalhypocrisy, politicalmemes, fucksocialism	Float64DType	999 (99.9%)	5.71	28.8	0.0511	0.117	469.
27	text: experienced, relationships, experience	Float64DType	999 (99.9%)	7.09	27.2	0.0510	0.519	727.
28	text: schoolmates, winchester, shotgun	Float64DType	999 (99.9%)	7.02	32.5	0.0506	0.523	921.
29	text: progressive, unsatisfied, absolutely	Float64DType	999 (99.9%)	6.14	25.5	0.0512	0.355	648.
30	text: participation, annoyance, understood	Float64DType	999 (99.9%)	6.77	28.9	0.0508	0.483	790.

Column 1	Column 2	Cramér's V
text: governments, government, destroying	text: pizzaaaa, voice, lgbfjb	0.707
text: congress, controlled, arresting	text: pizzaaaa, voice, lgbfjb	0.707
text	text: governments, government, destroying	0.707
text	text: congress, controlled, arresting	0.707
text: governments, government, destroying	text: schoolmates, winchester, shotgun	0.707
text: congress, controlled, arresting	text: schoolmates, winchester, shotgun	0.707
text: governments, government, destroying	text: congress, controlled, arresting	0.707
text: successful, succession, legitimacy	text: salvatore, twinkling, imagined	0.581
text	text: successful, succession, legitimacy	0.577
text: salvatore, twinkling, imagined	text: participation, annoyance, understood	0.577
text: pizzaaaa, voice, lgbfjb	text: schoolmates, winchester, shotgun	0.577
text: successful, succession, legitimacy	text: participation, annoyance, understood	0.577
text	text: participation, annoyance, understood	0.577
text	text: schoolmates, winchester, shotgun	0.577
text: pizzaaaa, voice, lgbfjb	text: afghanistan, withdrawal, apparently	0.500
text: governments, government, destroying	text: goinggoinggoing, recharging, productive	0.500
text: congress, controlled, arresting	text: goinggoinggoing, recharging, productive	0.500
text	text: afghanistan, withdrawal, apparently	0.500
text: afghanistan, withdrawal, apparently	text: schoolmates, winchester, shotgun	0.500
text: congress, controlled, arresting	text: afghanistan, withdrawal, apparently	0.500

Please enable javascript

The skrub table reports need javascript to display correctly. If you are displaying a report in a Jupyter notebook and you see this message, you may need to re-execute the cell or to trust the notebook (button on the top right or "File > Trust notebook").

.. GENERATED FROM PYTHON SOURCE LINES 82-84 We can use a heatmap to highlight the highest activations, making them more visible for comparison against the original text and vectors above. .. GENERATED FROM PYTHON SOURCE LINES 84-119 .. code-block:: Python import numpy as np from matplotlib import pyplot as plt def plot_gap_feature_importance(X_trans): x_samples = X_trans.pop("text") # We slightly format the topics and labels for them to fit on the plot. topic_labels = [x.replace("text: ", "") for x in X_trans.columns] labels = x_samples.str[:50].values + "..." # We clip large outliers to makes activations more visible. X_trans = np.clip(X_trans, a_min=None, a_max=200) plt.figure(figsize=(10, 10), dpi=200) plt.imshow(X_trans.T) plt.yticks( range(len(topic_labels)), labels=topic_labels, ha="right", size=12, ) plt.xticks(range(len(labels)), labels=labels, size=12, rotation=50, ha="right") plt.colorbar().set_label(label="Topic activations", size=13) plt.ylabel("Latent topics", size=14) plt.xlabel("Data entries", size=14) plt.tight_layout() plt.show() plot_gap_feature_importance(X_trans.head()) .. image-sg:: /auto_examples/images/sphx_glr_02_text_with_string_encoders_001.png :alt: 02 text with string encoders :srcset: /auto_examples/images/sphx_glr_02_text_with_string_encoders_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none /home/circleci/project/examples/02_text_with_string_encoders.py:113: UserWarning: Glyph 4108 (\N{MYANMAR LETTER TTHA}) missing from font(s) DejaVu Sans. plt.tight_layout() .. GENERATED FROM PYTHON SOURCE LINES 120-140 Now that we have an understanding of the vectors produced by the |GapEncoder|, let's evaluate its performance in toxicity classification. The |GapEncoder| excels at handling categorical columns with high cardinality, but here the column consists of free-form text. Sentences are generally longer, with more unique ngrams than high cardinality categories. To benchmark the performance of the |GapEncoder| against the toxicity dataset, we integrate it into a |TableVectorizer|, as introduced in the :ref:`previous example`, and create a |pipeline| by appending a |HistGradientBoostingClassifier|, which consumes the vectors produced by the |GapEncoder|. We set ``n_components`` to 30; however, to achieve the best performance, we would need to find the optimal value for this hyperparameter using either |GridSearchCV| or |RandomizedSearchCV|. We skip this part to keep the computation time for this example small. Recall that the ROC AUC is a metric that quantifies the ranking power of estimators, where a random estimator scores 0.5, and an oracle —providing perfect predictions— scores 1. .. GENERATED FROM PYTHON SOURCE LINES 140-175 .. code-block:: Python from sklearn.ensemble import HistGradientBoostingClassifier from sklearn.model_selection import cross_validate from sklearn.pipeline import make_pipeline from skrub import TableVectorizer def plot_box_results(named_results): fig, ax = plt.subplots() names, scores = zip( *[(name, result["test_score"]) for name, result in named_results] ) ax.boxplot(scores) ax.set_xticks(range(1, len(names) + 1), labels=list(names), size=12) ax.set_ylabel("ROC AUC", size=14) plt.title( "AUC distribution across folds (higher is better)", size=14, ) plt.show() results = [] y = X.pop("is_toxic").map({"Toxic": 1, "Not Toxic": 0}) gap_pipe = make_pipeline( TableVectorizer(high_cardinality=GapEncoder(n_components=30)), HistGradientBoostingClassifier(), ) gap_results = cross_validate(gap_pipe, X, y, scoring="roc_auc") results.append(("GapEncoder", gap_results)) plot_box_results(results) .. image-sg:: /auto_examples/images/sphx_glr_02_text_with_string_encoders_002.png :alt: AUC distribution across folds (higher is better) :srcset: /auto_examples/images/sphx_glr_02_text_with_string_encoders_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 176-183 MinHashEncoder -------------- We now compare these results with the |MinHashEncoder|, which is faster and produces vectors better suited for tree-based estimators like |HistGradientBoostingClassifier|. To do this, we can simply replace the |GapEncoder| with the |MinHashEncoder| in the previous pipeline using ``set_params()``. .. GENERATED FROM PYTHON SOURCE LINES 183-195 .. code-block:: Python from sklearn.base import clone from skrub import MinHashEncoder minhash_pipe = clone(gap_pipe).set_params( **{"tablevectorizer__high_cardinality": MinHashEncoder(n_components=30)} ) minhash_results = cross_validate(minhash_pipe, X, y, scoring="roc_auc") results.append(("MinHashEncoder", minhash_results)) plot_box_results(results) .. image-sg:: /auto_examples/images/sphx_glr_02_text_with_string_encoders_003.png :alt: AUC distribution across folds (higher is better) :srcset: /auto_examples/images/sphx_glr_02_text_with_string_encoders_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 196-210 Remarkably, the vectors produced by the |MinHashEncoder| offer less predictive power than those from the |GapEncoder| on this dataset. TextEncoder ----------- Let's now shift our focus to pre-trained deep learning encoders. Our previous encoders are syntactic models that we trained directly on the toxicity dataset. To generate more powerful vector representations for free-form text and diverse entries, we can instead use semantic models, such as BERT, which have been trained on very large datasets. |TextEncoder| enables you to integrate any Sentence Transformer model from the Hugging Face Hub (or from your local disk) into your |pipeline| to transform a text column in a dataframe. By default, |TextEncoder| uses the e5-small-v2 model. .. GENERATED FROM PYTHON SOURCE LINES 210-224 .. code-block:: Python from skrub import TextEncoder text_encoder = TextEncoder( "sentence-transformers/paraphrase-albert-small-v2", device="cpu", ) text_encoder_pipe = clone(gap_pipe).set_params( **{"tablevectorizer__high_cardinality": text_encoder} ) text_encoder_results = cross_validate(text_encoder_pipe, X, y, scoring="roc_auc") results.append(("TextEncoder", text_encoder_results)) plot_box_results(results) .. image-sg:: /auto_examples/images/sphx_glr_02_text_with_string_encoders_004.png :alt: AUC distribution across folds (higher is better) :srcset: /auto_examples/images/sphx_glr_02_text_with_string_encoders_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 225-231 The performance of the |TextEncoder| is significantly stronger than that of the syntactic encoders, which is expected. But how long does it take to load and vectorize text on a CPU using a Sentence Transformer model? Below, we display the tradeoff between predictive accuracy and training time. Note that since we are not training the Sentence Transformer model, the "fitting time" refers to the time taken for vectorization. .. GENERATED FROM PYTHON SOURCE LINES 231-287 .. code-block:: Python def plot_performance_tradeoff(results): fig, ax = plt.subplots(figsize=(5, 4), dpi=200) markers = ["s", "o", "^"] for idx, (name, result) in enumerate(results): ax.scatter( result["fit_time"], result["test_score"], label=name, marker=markers[idx], ) mean_fit_time = np.mean(result["fit_time"]) mean_score = np.mean(result["test_score"]) ax.scatter( mean_fit_time, mean_score, color="k", marker=markers[idx], ) std_fit_time = np.std(result["fit_time"]) std_score = np.std(result["test_score"]) ax.errorbar( x=mean_fit_time, y=mean_score, yerr=std_score, fmt="none", c="k", capsize=2, ) ax.errorbar( x=mean_fit_time, y=mean_score, xerr=std_fit_time, fmt="none", c="k", capsize=2, ) ax.set_xlabel("Time to fit (seconds)") ax.set_ylabel("ROC AUC") ax.set_title("Prediction performance / training time trade-off") ax.annotate( "", xy=(1.5, 0.98), xytext=(8.5, 0.90), arrowprops=dict(arrowstyle="->", mutation_scale=15), ) ax.text(8, 0.86, "Best time / \nperformance trade-off") ax.legend(bbox_to_anchor=(1, 0.3)) plt.show() plot_performance_tradeoff(results) .. image-sg:: /auto_examples/images/sphx_glr_02_text_with_string_encoders_005.png :alt: Prediction performance / training time trade-off :srcset: /auto_examples/images/sphx_glr_02_text_with_string_encoders_005.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 288-301 The black points represent the average time to fit and AUC for each vectorizer, and the width of the bars represents one standard deviation The green outlier dot on the right side of the plot corresponds to the first time the Sentence Transformers model was downloaded and loaded into memory. During the subsequent cross-validation iterations, the model is simply copied, which reduces computation time for the remaining folds. Conclusion ---------- In conclusion, |TextEncoder| provides powerful vectorization for text, but at the cost of longer computation times and the need for additional dependencies, such as torch. .. rst-class:: sphx-glr-timing **Total running time of the script:** (3 minutes 6.395 seconds) .. _sphx_glr_download_auto_examples_02_text_with_string_encoders.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/skrub-data/skrub/0.4.0?urlpath=lab/tree/notebooks/auto_examples/02_text_with_string_encoders.ipynb :alt: Launch binder :width: 150 px .. container:: lite-badge .. image:: images/jupyterlite_badge_logo.svg :target: ../lite/lab/index.html?path=auto_examples/02_text_with_string_encoders.ipynb :alt: Launch JupyterLite :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: 02_text_with_string_encoders.ipynb <02_text_with_string_encoders.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 02_text_with_string_encoders.py <02_text_with_string_encoders.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: 02_text_with_string_encoders.zip <02_text_with_string_encoders.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_

	text	text: feeling, subreddit, carzzy	text: governments, government, destroying	text: mississippi, beautiful, teammates	text: yourselves, pseudoscience, ourselves	text: successful, succession, legitimacy	text: unrelentless, relentlessly, innocent	text: actually, shallots, virtually	text: volunteering, intimidated, fallen189	text: marxists, qualified, qualifications	text: previously, lackluster, shyvana	text: salvatore, twinkling, imagined	text: survivor, players, awwwww	text: congress, controlled, arresting	text: incompetent, leadership, competent	text: probably, liquidate, nxhcplzrecw	text: pizzaaaa, voice, lgbfjb	text: deepening, between, bitchymitchy	text: credentials, involved, technical	text: asylum, diapers, apeshit	text: conservatives, indoctrination, consequences	text: nvrseqrvrqrqrr, ƞỉဌဌᕦѓ, qkcuk6	text: screaming, horrible, bimbos	text: afghanistan, withdrawal, apparently	text: disrespectful, disrespects, respectful	text: goinggoinggoing, recharging, productive	text: liberalhypocrisy, politicalmemes, fucksocialism	text: experienced, relationships, experience	text: schoolmates, winchester, shotgun	text: progressive, unsatisfied, absolutely	text: participation, annoyance, understood
	text	text: feeling, subreddit, carzzy	text: governments, government, destroying	text: mississippi, beautiful, teammates	text: yourselves, pseudoscience, ourselves	text: successful, succession, legitimacy	text: unrelentless, relentlessly, innocent	text: actually, shallots, virtually	text: volunteering, intimidated, fallen189	text: marxists, qualified, qualifications	text: previously, lackluster, shyvana	text: salvatore, twinkling, imagined	text: survivor, players, awwwww	text: congress, controlled, arresting	text: incompetent, leadership, competent	text: probably, liquidate, nxhcplzrecw	text: pizzaaaa, voice, lgbfjb	text: deepening, between, bitchymitchy	text: credentials, involved, technical	text: asylum, diapers, apeshit	text: conservatives, indoctrination, consequences	text: nvrseqrvrqrqrr, ƞỉဌဌᕦѓ, qkcuk6	text: screaming, horrible, bimbos	text: afghanistan, withdrawal, apparently	text: disrespectful, disrespects, respectful	text: goinggoinggoing, recharging, productive	text: liberalhypocrisy, politicalmemes, fucksocialism	text: experienced, relationships, experience	text: schoolmates, winchester, shotgun	text: progressive, unsatisfied, absolutely	text: participation, annoyance, understood
0	Elon Musk is a piece of shit, greedy capitalist who exploits workers, and offers nothing of real benefit to the world. All he’s done is make a name for himself on the backs of other people, using dirty money from his family’s emerald mine they acquired during apartheid. I don’t care that he’s autistic. He thinks we should be cured with his company’s AI chip. He is not a representation of our community. Don’t celebrate him on this page.	0.1933837014327952	85.24855070594941	71.51389830330372	0.35364620788064444	25.182433245567598	76.75161296759175	1.641381568883616	6.1096666933282835	11.671799756310403	0.8887872375269025	5.050459635922622	0.2153530224051356	22.29103376190467	16.037684682128727	1.139649185096121	0.27813773860310387	0.20301826218628524	0.3048614217555008	112.38315189671357	73.84122232878816	0.42630915071770564	8.69468655204606	43.87224850547277	44.07948323540394	15.528456772002924	0.21852710372078182	2.809929978990054	1.4322395540896813	8.888899284792341	19.749485461867376
1	The senile credit card shrill from Delaware needs to resign!!	0.15671865345937683	0.22906610271281358	0.20133354834987766	0.24662670237253936	0.7885607760730741	0.319242454082064	0.1343965510223436	0.13738096821577683	0.34628888884329995	0.6118109152931422	0.19102056853490026	0.12114843043914961	7.946830618323908	1.3979035455845623	0.17946957194402835	0.11581141652277643	0.601146603889094	2.0755711887079107	42.31693052664091	5.42390013637949	0.12255365188359103	0.49216759323011383	0.6387754602857298	0.6818541086489808	0.1460070483724647	0.14974046857263068	0.4853246121116477	1.1609813586602808	22.3224212435762	0.2590155505137993
2	He does that a lot -- makes everyone look good but him...I guess it's also probably the Dems and the Media that force him to compulsively tweet abject bullshit like a lying bitch. They're tricky, them libs.	0.17762093053747033	5.373340485778671	11.54204068617595	0.5618465895752462	61.10653118358954	17.921564665467123	0.19047322873273748	0.5217720594167614	0.2780490219834079	39.39910504882739	0.21578837074452087	0.4740952286682672	0.36217812483686807	15.811703539255204	67.94756211741466	1.1663363044961808	39.91870970926358	0.14461043803594087	6.7105312936467145	4.937228335782445	0.1867300241319694	4.329876262340845	0.22796859469609562	7.534884404000331	7.531458057003287	0.41932406253456983	2.788555273382306	0.47654000319891276	2.782270372011978	6.461304101304793
3	F*ck Lizzo	8.43469957336859	0.06941378720877091	0.0604550844143667	0.07320848369064374	0.08018938203888964	0.07553849501660992	0.1345225320853905	0.06097591106364483	0.07175806161364937	0.07804476381061952	0.07413467936541489	0.06448426371394106	0.07548023168170592	2.3945685813011846	0.13255430161260862	0.20684822972004857	0.07237385792480032	0.07612309370799819	0.06435265192296107	0.06150412601235826	0.28578324757230006	0.06825089378364918	0.0694128629445275	0.06342419185551572	0.11275883190140357	0.26043329301288554	0.05604139383951578	0.09748354015642943	0.05586746597929549	0.06931313108995911
4	Epstein and trump were best buds!!! Pedophiles who play together!!	0.19971635610001812	0.453442506887897	0.2195328715240662	15.137151357912138	0.16895707977303198	0.1720493836311546	0.2393553621937977	0.1120900629567124	3.1939429853051386	0.2589118129267787	0.11512138941193858	6.937425435811483	10.507100793527187	9.504978584856653	21.214435731316897	0.2028153712117309	0.18434282596966187	0.8876075259355724	1.673669107308873	0.2224584253156217	0.7928612606711977	0.13480957995325327	1.1878465779937568	0.19194869321771577	0.28016701803794297	9.531184240093747	12.070883845841486	1.3474852300500981	0.16361241068937307	0.19409462188538396

995	My maternal abuelita taught me how to make plantain empanadas 🥺 and my paternal abuelita needed me to help her brush her dentures 😌 I love them so much 🥰	2.2077492883966556	5.170358769172776	60.061591647537	0.178751412415224	0.4342629102149485	15.72962851474252	0.19143905039920517	10.172550213018628	0.29292597062561515	6.408171416852745	0.3141663173108904	14.225026630496746	0.235225292546251	1.1333578412900236	1.5802310748405581	0.7489532793922993	3.0718634723256706	32.11703672926804	3.2544999589772123	0.26851180764576865	0.27053359164061563	0.12083832322905727	16.096732132790184	0.7271493478230366	0.19045894882559514	0.15933960634442274	36.27458092137856	6.707205061555636	0.1983031403931516	9.458554979355236
996	Funnily enough I was looking online last week and wondering why nobody has opened an eSports/Gaming bar round here. Can’t wait to pop in at some point :)	0.23361980740695207	0.3444687712632407	15.643551569373843	0.18189818777830072	12.140567051303915	16.21510047108524	0.1547713694701897	82.22337355355373	0.6528454727433594	0.2220823777824233	0.18094388454120913	6.113854388709506	10.062147026357461	0.14294300753950429	0.7540799421229214	0.18576141554065284	2.0857007604279136	0.45404501540563985	0.25282688563725264	0.3672606539016909	0.20697297572348494	0.46888961881671937	7.78237793029482	0.34357366478507323	48.29276076625339	0.12597544512497633	10.567253812163434	0.4579347081948626	0.1517495085295395	10.990668186109458
997	I can't bear how nice this is. I guess its bearnessities. I'll see my self out	0.11122945292203579	0.19272743671911993	4.306706953625315	11.456793491977422	30.040985054585676	0.6030893411303134	0.10147787691284042	2.1855895525285867	0.42973371420410617	7.271709990758754	0.3316387808655318	0.14885970688972702	1.4838694302312851	0.10085083251986196	0.08605713322072525	31.919609992322464	0.5358387005167822	0.18799835100970655	0.16908487451414733	0.17651241512517546	0.09853302559604546	0.11796107376251107	0.36797057126700644	5.060412552145556	0.6566326799307323	0.08262512225743332	0.24315054999166968	16.205031635261722	0.6257425109595323	0.20157636479571392
998	Going to buy a share of Tesla just to ensure it starts going back down	0.27236170784132274	0.251652442978917	2.535039937007804	0.13463287695071902	17.757040287102804	0.1646635475262338	2.172663866276462	0.17020998501951706	2.2919303935729047	18.77026124650674	10.338630205647505	6.521686313937444	0.1849339663524286	0.5896932798021113	0.16108725158393522	0.12154222455164881	0.17973166641650754	0.3198430333603136	0.11490655325413944	2.0501433003850265	0.13837080147408165	0.11670157452716122	0.1309740951903173	0.12283343827602812	23.825366402961674	0.10642798405371016	0.1805765973821555	0.16041528940727487	0.1411390958894429	13.474539560739434
999	I only saw a couple of these throughout the month and tried to figure out what all of them were. Only ones I missed were Star Guardian Seraphine (thought it was Heartbreaker) and I couldn't figure out the 2nd Soraka was Victorious. So all in all, you did a really good job nailing the characters and the theme presented! I think my faves are KDA Neeko and CCN Xayah.	0.3040539697176964	7.8372970655626775	15.041670045887345	7.058515668004012	8.900785389900106	0.35761928281179756	3.223527943758593	108.15879946173652	41.76207911444425	20.46860573066552	2.61723486278615	7.01834002263631	39.92180622130196	1.902148602484707	101.37798272034141	0.35627664681683313	2.672725411970867	3.311894540971104	31.910912315098127	4.610573717589783	0.7619996551873949	0.36223746149203495	37.09483497581099	0.38693531837234	60.300239757038334	0.11737034787053938	12.324438785832818	25.20500213021247	1.5047633310758495	0.6293273363441048

text

is_toxic

text

is_toxic

Please enable javascript

text

text: feeling, subreddit, carzzy

text: governments, government, destroying

text: mississippi, beautiful, teammates

text: yourselves, pseudoscience, ourselves

text: successful, succession, legitimacy

text: unrelentless, relentlessly, innocent

text: actually, shallots, virtually

text: volunteering, intimidated, fallen189

text: marxists, qualified, qualifications

text: previously, lackluster, shyvana

text: salvatore, twinkling, imagined

text: survivor, players, awwwww

text: congress, controlled, arresting

text: incompetent, leadership, competent

text: probably, liquidate, nxhcplzrecw

text: pizzaaaa, voice, lgbfjb

text: deepening, between, bitchymitchy

text: credentials, involved, technical

text: asylum, diapers, apeshit

text: conservatives, indoctrination, consequences

text: nvrseqrvrqrqrr, ƞỉဌဌᕦѓ, qkcuk6

text: screaming, horrible, bimbos

text: afghanistan, withdrawal, apparently

text: disrespectful, disrespects, respectful

text: goinggoinggoing, recharging, productive

text: liberalhypocrisy, politicalmemes, fucksocialism

text: experienced, relationships, experience

text: schoolmates, winchester, shotgun

text: progressive, unsatisfied, absolutely

text: participation, annoyance, understood

text

text: feeling, subreddit, carzzy

text: governments, government, destroying

text: mississippi, beautiful, teammates

text: yourselves, pseudoscience, ourselves

text: successful, succession, legitimacy

text: unrelentless, relentlessly, innocent

text: actually, shallots, virtually

text: volunteering, intimidated, fallen189

text: marxists, qualified, qualifications

text: previously, lackluster, shyvana

text: salvatore, twinkling, imagined

text: survivor, players, awwwww

text: congress, controlled, arresting

text: incompetent, leadership, competent

text: probably, liquidate, nxhcplzrecw

text: pizzaaaa, voice, lgbfjb

text: deepening, between, bitchymitchy

text: credentials, involved, technical

text: asylum, diapers, apeshit

text: conservatives, indoctrination, consequences

text: nvrseqrvrqrqrr, ƞỉဌဌᕦѓ, qkcuk6

text: screaming, horrible, bimbos

text: afghanistan, withdrawal, apparently

text: disrespectful, disrespects, respectful

text: goinggoinggoing, recharging, productive

text: liberalhypocrisy, politicalmemes, fucksocialism

text: experienced, relationships, experience

text: schoolmates, winchester, shotgun

text: progressive, unsatisfied, absolutely

text: participation, annoyance, understood

Please enable javascript