Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
278 views
in Technique[技术] by (71.8m points)

怎样最快地和最简单地判定一个dataframe中有重复项?

有一个较大的dataframe,几百万条记录。怎样:
(1)最快地判断是否存在重复项?也就是只要有重复项就可以,不必标记那一项。
(2)最方便地判断是否存在重复项?也就是代码写得最少
谢谢


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

https://pandas.pydata.org/pan...

import pandas as pd

df = pd.DataFrame({'a': [1, 1], 'b': [1, 1]})

is_duplicate = not all(df.duplicated())

print(is_duplicate)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...