data mining - Tidying Time Intervals for Plotting Histogram in R -


i'm doing cluster analysis on mltobs lifetables package , have come across tricky problem year variable in mlt.mx.info dataframe. year contains period life table taken, in intervals. here's table of data:

    1751-1754 1755-1759 1760-1764 1765-1769 1770-1774 1775-1779 1780-1784 1785-1789 1790-1794          1         1         1         1         1         1         1         1         1  1795-1799 1800-1804 1805-1809 1810-1814 1815-1819 1816-1819 1820-1824 1825-1829 1830-1834          1         1         1         1         1         2         3         3         3  1835-1839 1838-1839 1840-1844 1841-1844 1845-1849 1846-1849 1850-1854 1855-1859 1860-1864          4         1         5         3         8         1        10        11        11  1865-1869 1870-1874 1872-1874 1875-1879 1876-1879 1878-1879 1880-1884 1885-1889 1890-1894         11        11         1        12         2         1        15        15        15  1895-1899 1900-1904 1905-1909 1908-1909 1910-1914 1915-1919 1920-1924 1921-1924 1922-1924         15        15        15         1        16        16        16         2         1  1925-1929 1930-1934 1933-1934 1935-1939 1937-1939 1940-1944 1945-1949 1947-1949 1948-1949         19        19         1        20         1        22        22         3         1  1950-1954 1955-1959 1956-1959 1958-1959 1960-1964 1965-1969 1970-1974 1975-1979 1980-1984         30        30         2         1        40        40        41        41        41  1983-1984 1985-1989 1990-1994 1991-1994 1992-1994 1995-1999 2000-2003 2000-2004 2005-2006          1        42        42         1         1        44         3        41        22  2005-2007         14  

as can see, of intervals sit within other intervals. thankfully none of them overlap. want simplify intervals intervals such 1992-1994 , 1991-1994 go 1990-1994.

an idea might modulo of each interval , sort them new intervals way i'm unsure how interval data type. if has ideas i'd appreciate help. want create histogram or barplot illustrate nicely.

if understand problem, you'll want this:

bottom <- seq(1750, 2010, 5) library(dplyr) new_df <- mlt.mx.info %>%   arrange(year) %>%   mutate(year2 = as.numeric(substr(year, 6, 9))) %>%   mutate(new_year = paste0(bottom[findinterval(year2, bottom)], "-",(bottom[findinterval(year2, bottom) + 1] - 1))) view(new_df) 

so does, creates bins, , outputs new column (new_year) bottom of bin. 1750-1754 correspond new value of 1750-1754 (in string form; original integer type, not sure how fix that). want? double check results, looks right me.


Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -