Chloe Cornuejols: final project (write-up)

Women’s status in Cambodia

One of the aspects of the millennium goal for development from the UNESCO is to promote gender equality. Indeed, gender issues are still prevalent in developing countries, and women suffer from deep inequalities, have less access to education and live in a patriarchal society. This situation is changing quickly, especially in Asia where economic growth is very important and that the society has to adapt. Cambodia, in the south east of Asia, is a good example to study the issue of gender status, as it is small enough that we can study issues across the whole country and is divided into smaller administrative divisions, 23 provinces, 159 districts and 1609 communes. It also suffers from deep gender inequalities, and a Khmer proverb even says “men are gold, women are cloth”.

On this map, the repartition of buildup areas allows us to visualize the repartition of the population in the country, mostly in the South East and around the lake, in the North West. The South West and North East are more rural.

To study women’s status in Cambodia, I found data on the Pacific Disaster Center website ( which gathers data from the Cambodian census of 2006, by commune and village. It also has a point shapefile of schools and health facilities. Later, I also used data from the Demographic and Health Survey (, by USAID, which sampled a sub set of the population in 2010 to study domestic violence, among other health issues. They included weights in the sample to be able to generalize the results to the whole country.

I first created an index of gender inequality to be able to visualize the issue better. For that I used data on the sex ratio (proportion of men to women in the population, normally a little under 100% due to better life expectancy of women), the percentage of households whose head is a female (equality would be at 50%), and the ratio of literate men to literate women, which I computed by dividing male literacy by female literacy (equality would be at 100%).

By adding these three variables with a weight of one third for each, I obtained the gender index.

I then tried to identify problematic areas. Gender index seems to be higher in the east of the country, the north and the south west. To see if these results were not only due to a very low density of population in these areas, I recreated the index in the village dataset I had. The south west indeed has very few villages, but these villages themselves have a high gender index. The east and north have more villages, most of them concerned by gender inequality.

One cause for the high gender index found could be low access to education due to a lack of schools in these areas. To observe that, I plotted in excel the total literacy rate of villages ranked by distance to the nearest school (the distance being computed with the near tool), and similarly the literacy ratio (male literacy divided by female literacy) by distance to the nearest school. Trends, even if the correlation is weak, seem to indicate that literacy decreases and literacy ratio increases as villages are farther away from schools, which validates our hypothesis. Improving literacy in general may thus help reduce the gap between male and female literacy.

I then selected villages within 3km of the nearest school and reversed my selection to have a dataset of villages farther away than 3km from the nearest school. I mapped literacy in these villages (red is a high literacy rate, white is a low one) and represented by bigger circles the villages far away from the nearest school.

We can observe that villages father away from schools seem to be in the south west and the north, where we identified strong gender issues. They also seem to be paler, which means literacy is low there.

To solve this issue, the state has to create new schools. To find suitable locations for these new schools, I did a hotspot analysis. We primarily want the new schools to be in low literacy areas and close to population center, so I used a kernel density tool to rasterize villages according to these criteria and gave them a weight of 30% each. We also want the new schools to be far away from existing schools, so I used Euclidian distance on schools and gave it a weight of 20%. Finally, we want to prioritize areas which suffer from high gender inequality so I used the map of gender index by commune previously created and gave it a weight of 20%.

I then reclassified each criterion and did the map algebra to compute the map of where to put new schools.

On this map, preferable locations are deep blue and are situated in the east, the south west and two in the north. Thus we could give recommendations to the state regarding the ideal location for new schools to improve literacy by targeting areas with gender issues.

But if gender inequality was only due to the lack of schools, villages closer to schools (in a 5km radius for example) should experience a low gender index. To check that, I selected villages within 5km of the nearest school (by the same method as previously described) and among them selected the villages that would have in the 20% worse gender index among all villages, which means a gender index above 100.

The dataset I obtained was thus a selection of villages close to schools but with a bad gender index. I then mapped these villages by gender index, and observed that some of those, especially in the north east and north west, have a very bad gender index even though their population has access to school. Another phenomenon must thus be playing a role in setting gender inequalities.

My first hypothesis was that in poor areas, households do not send their children to school even though it is accessible, and children work to help sustain the family. To check this hypothesis I mapped poverty by commune below the layer of gender index by village. Having too few data on revenue, I computed the poverty index by making an average of proportions of the households having no access to water within the house, have no toilets, getting light from fire or candles only and using coal or wood as cooking fuel.

We can see on the map that indeed, areas with a bad gender index (the north west and north east) are in poor communes. The only other poor area is in the south west but correspond to the rural area with very low access to school described earlier.

However, there is still a discrepancy between the level of poverty in the north east (similar to other poor regions) and the very high level of gender inequalities there.

To explain this discrepancy, I look at cultural issues: if the problematic area has a different culture, maybe the education is not adapted to this culture, or some aspects of this culture causes gender inequality in another way.  As I had data on religion, I thus mapped % Muslims, % Christians and % of other religions (than those two and Buddhism) by commune.

The white base is thus Buddhism, the yellow is Christians, the blue Muslims and the red other religions. We then notice that our problematic area in the north east is indeed in the only culturally different area of the country (at least on matters of religion). Our hypothesis on cultural influence is thus reinforced.

To reduce gender inequalities, as measured by the gender index previously computed, my recommendations to the state are to:

–       create new schools near villages with a low literacy rate and low access to existing schools, as parents may be reluctant to send their daughters to school if it is far away

–       give incentives to poor parents to send their children to school, as total literacy seems to be correlated with the gap between male and female literacy

–       adapt education and intervention in the east, where the cultural background is different and thus gender issues may be linked to culture


As I had the data from DHS I previously mentioned, I wanted to see if domestic violence had the same geographic repartition as my gender index. Even extrapolated by stata using the sampling weights, the data was still focused on too few communes so I had to aggregate it at the district level to have a country wide map. Domestic violence was computed by gathering the answer to the question: A husband is justified in beating his wife if

  • she goes out without telling him
  • she neglects the children
  • she argues with him
  • she refuses to have sex
  • she burns the food while cooking

I measured the proportion of people who answered yes to at least one of these questions, compared to the whole population. It is interesting to note that, according to the DHS report, this proportion is 46% for women and 22% for men, which would indicate that men have a worse opinion of domestic violence than women.

As the data was gathered in 2010 and not 2006 as the census as using different methods, I also mapped the wealth (divided by categories from 1 = poorest to 5 = richest in this study).

Surprisingly, high domestic violence is concentrated in the west of the country and in two areas in the south, richer areas, which is a repartition that contrasts sharply with the one of the gender index. I then search of the internet for maps on domestic violence and these maps were similar, the problem was thus not coming from the data.

One explanation would then be that domestic violence is a different issue, which has a different origin and must be taken care of separately. My recommendation would be for the state to focus sensitization campaigns on domestic violence in these regions.

To conclude, I thought that GIS was a good tool to study this issue, as the causes of gender inequality vary across regions and that map are a good way to see it and allowed me to then look for the causes of this variation and to give more recommendations to fight gender inequalities.

My main difficulty was to find data. For example, gender inequality in education manifests itself in secondary education mainly, when girls drop school. However, I could not find a dataset divided by types of schools, or even a list of secondary schools in Cambodia, which I then could have geolocated. In other instances, such as for revenue data, it gave me the opportunity to have to come up with an index on my own, which was really interesting.

Appendix 1: skills used

  • Modeling, for the rasterization in the hotspot analysis
  • Measurement/analysis, hotspot analysis
  • Original data, DHS data under stata format and then converted to cvs and joined
  • Charts, literacy by distance to schools
  • Hotspot analysis, location of new schools
  • Inset map, geography of Cambodia
  • Point graduated symbols in the map on literacy by distance to schools (multivariate attributes for literacy), and in the maps of gender index in villages within 5km of schools (poverty and religion)
  • Creating indices (to create literacy ratio, poverty rate, gender index and religion ratios)
  • Attribute sub-set selection in the maps of gender index in villages within 5km of schools (select villages with gender index > 100)
  • Boundary sub-set selection (select villages within 3 or 5km of schools)
  • Distance, use of near tool (to measure distance of villages to nearest school)
  • Geoprocessing to clip map

Appendix 2: model

Appendix 3: metadata

Yunlu – Final Presentation and Blog

PPT: Yunlu-Museums-NY

My final project mainly focuses on how to achieve dataset available for network analysis. Taking advantage of maps offered by OpenStreetMap (, specific New York City map extracted by BBBike ( and great map importing tool OSM2NetworkDataset ( After successfully built a network dataset for New York County (clipped the geo-database from New York City), I geocoded top 8 museums in Manhattan and used tools of network analysis as follows:

Routing (First stop is set to be Grand Central Station and last stop is American Museum of Natural History, optimizing route by reordering stops of museums)

Closing facilities (3 nearest subway stations to American Museum of Natural History)

OD Matrix (From Stations to Museums)

Service Area (Within 3 minutes drive to museums)

Another analysis I made is to find the most appropriate location for future construction of museums, basing on the parameters as follows:

Median age which I assumed the lower median age is, the higher demand for museums nearby

Number of people employed in art industry in each census tract which I assumed the more people employed in art industry, the higher demand for museums nearby

Distance from existing museums which I assumed the farer from existing museums, the higher demand for museums nearby.

I projected the original shapefile from WGS to UTM Zone 18N

Turned the data in the census tract into raster

Calculated the Euclidean distance of each tract to the existing museums

Reclassified these three parameters and used raster calculator to sum up them to get an index of priority for building future museums

Originally, I intended to calculate the most optimized route to every national park in the United States. However, I found that it is hard to achieve the speed data and route restriction data which are necessary for network analysis.

Then I achieved the network dataset for my favorite city of Beijing and wished to make hotspot analysis as well. Unfortunately it is almost impossible to find or make address locater for the city and neither is dataset of census. I only located my home and my university in the map and found one optimal route between them.

All information about the road is from database of OpenStreetMap. I googled the addresses of museums and geocoded them in ArcGIS. The shapefile is NY County 2000. The attribute data of median income is from 2010 Census Summary File 1. Number of people employed in art industry is from ACS_11_5YR_DP03.

Most difficulties I faced are due to little knowledge of java command line which is essential for any tool to deal with maps. Since the time of achieving a result from java programs is also very considerable, I waited for hours and hours and possibly achieved an error leading to the failure of the program. Thanks to google and related forum in Arcgis website, I finally managed to make my final project presented here.

Thank you Yoh! Thank you Zhongbo! I am pretty much sure I would like to take the advanced course in the spring quarter (sadly it is also my last quarter here).


I chose to study how geographic and neighborhood characteristics can be used to plan HIV prevention interventions for high risk young adults in Los Angeles County.

Despite advances in HIV prevention, infection rates and numbers of new cases of HIV among adolescents continue to rise (Geanuracos et al., 2007). There were over 2,400 cases of HIV reported in Los Angeles County in 2008 and 58% of them were among persons aged 20-39 years. Because HIV incidence data from the LA County Department of Public Health is not available to the public, I used 2009 data from AIDSVu. The map above shows 2009 new cases of HIV or AIDS among persons aged 13-24 (AIDSVu, 2012).

Previous research has shown that areas of high prevalence and incidence of HIV are often characterized by high levels of racial/ethnic segregation and low socioeconomic status (Geanuracos et al., 2007). I used data from the 2009 U.S. Census to perform a hot spot analysis of areas of high HIV risk for young adults in LA County. I combined percent African American households, percent Latino or Hispanic households, percent Asian households, percent of persons aged 18-24, and household median income. Each map was converted to a raster and reclassified in order to create an HIV risk index. Each variable was weighted equally (20%).

I found that the area with the highest HIV risk for young adults is also the area that had the highest number of new cases of HIV or AIDS in 2009 among persons aged 13-24. These results indicate that spatial analysis is an effective way to determine high risk areas for contracting HIV. Including other variables such as sexual orientation or drug use may yield even more accurate results.

Next, I wanted to determine whether young adults in the high risk area had access to HIV prevention services. I geocoded HIV testing locations and used network analysis to determine how far young adults needed to travel to access these services. The analysis on the left shows services areas within a 5 and 10 minute drive from HIV testing locations. The analysis on the right shows service areas within a 5, 10, and 20 minute walk from HIV testing locations. Because Los Angeles is a metropolitan area with an extensive public transportation system, it is likely that nearly all young adults in the high risk area have access to HIV prevention services.

However, some young adults may not spend time traveling to an HIV testing location. Holloway, Cederbaum, Ajayi, and Shoptaw (2012) found that young men who have sex with men (YMSM) desire to receive HIV prevention information quickly and easily in social contexts that they are already attending with friends. Reisner et al. (2009) recommended that health and community service providers engage YMSM for HIV education and prevention at popular venues.

I wanted to know how information about popular venues that young adults frequent could be used to plan HIV prevention interventions in LA County. For this part of the analysis, I used data from the Healthy Young men (HYM) study (Kipke et al., 2007). The purpose of this study was to explore the individual, familial, interpersonal, and community factors that may influence drug use, HIV risk, and mental health. Participants included 526 young men who have sex with men (YMSM) who completed one survey every 6 months between February 2005 and January 2006. Data collection included demographic variables, sexual orientation, perceived health status, personal satisfaction, health behaviors, access to care, history of STIs and HIV, depression, suicidality, and favorite “gay places”. I used the favorite “gay places” variable to determine the ideal location for an HIV mobile testing unit for YMSM in LA County.

The above map shows the most popular venues among HYM study participants. I used spatial statistics to identify the central feature of these venues which is represented by the white star.

The above map shows the area with the greatest number of popular venues. Most of them are located in Hollywood and West Hollywood. West Hollywood is represented by the light gray color in the center of the map.

I again used spatial analysis to determine the ideal location for a mobile HIV testing unit. First, I used directional distribution to determine the area that included 68% (1 standard deviation) of the most popular venues in Hollywood and West Hollywood. Next, I used euclidean distance to measure distances from each venue and kernel density to measure densities based on total number of votes for each venue.

Finally, I used map algebra to combine the results of these analyses. I calculated the areas within 126 meters of a popular venue and with .0000358 votes per square meter.

The results of the analysis indicate that the ideal location for a mobile HIV testing unit is on Santa Monica Boulevard between Palm Avenue and North La Peer Drive or near Westmount Drive.

Interestingly, there is an actual HIV mobile testing unit every weekend on Santa Monica Boulevard near North San Vicente Boulevard (white star). Whether or not this location was selected using spatial analysis, it’s the perfect location to reach the greatest number of YMSM.

Thus, the results of these analyses indicate that geographic and neighborhood characteristics can be used to identify areas where young adults are at high risk for contracting HIV. Additionally, information about popular venues that young adults frequent can be used to identify ideal locations for HIV prevention interventions.

The above layout includes the models that I used to perform the hot spot analysis. I converted features to rasters and then reclassified them so that I could use the raster calculator.