Reading Multiple CSV Files into Python Pandas Dataframe -
the general use case behind question read multiple csv log files target directory single python pandas dataframe quick turnaround statistical analysis & charting. idea utilizing pandas vs mysql conduct data import or append + stat analysis periodically throughout day.
the script below attempts read of csv (same file layout) files single pandas dataframe & adds year column associated each file read.
the problem script reads last file in directory instead of desired outcome being all files within targeted directory.
# assemble of data files single dataframe & add year field # 2010 last available year years = range(1880, 2011) year in years: path ='c:\\documents , settings\\foo\\my documents\\pydata-book\\pydata-book-master`\\ch02\\names\\yob%d.txt' % year frame = pd.read_csv(path, names=columns) frame['year'] = year pieces.append(frame) # concatenates single dataframe names = pd.concat(pieces, ignore_index=true) # expected row total should 1690784 names <class 'pandas.core.frame.dataframe'> int64index: 33838 entries, 0 33837 data columns: name 33838 non-null values sex 33838 non-null values births 33838 non-null values year 33838 non-null values dtypes: int64(2), object(2) # start aggregating data @ year & gender level using groupby or pivot total_births = names.pivot_table('births', rows='year', cols='sex', aggfunc=sum) # prints pivot table total_births.tail() out[35]: sex f m year 2010 1759010 1898382
the append method on instance of dataframe not function same append method on instance of list. dataframe.append() not occur in-place , instead returns new object.
years = range(1880, 2011) names = pd.dataframe() year in years: path ='c:\\documents , settings\\foo\\my documents\\pydata-book\\pydata-book-master`\\ch02\\names\\yob%d.txt' % year frame = pd.read_csv(path, names=columns) frame['year'] = year names = names.append(frame, ignore_index=true) or can use concat:
years = range(1880, 2011) names = pd.dataframe() year in years: path ='c:\\documents , settings\\foo\\my documents\\pydata-book\\pydata-book-master`\\ch02\\names\\yob%d.txt' % year frame = pd.read_csv(path, names=columns) frame['year'] = year names = pd.concat(names, frame, ignore_index=true)
Comments
Post a Comment